# Lab 7 - Network Visualization

In this lab, we will go through basic graph concepts and **cover visualization of networks and trees**. We will see how to use ggplot, plotly, igraph, and D3 to visualize data in the practice notebooks.


Network visualization is utilized in many domains as a form of presenting data visually as well as a means of exploring complex data: looking for clusters, paths, and exploring connections between thousands or millions of data items with an interactive visual interface provides a powerful tool for exploratory data analysis. 

---

## Basic Graph Concepts 

Graph theory is the study of **graphs** that are mathematical structures to model relations between objects or entities. The **networks and trees** we will use for visualization are mathematically represented as graphs. So we will look at some basic graph theory concepts. 


Many problems related to graph theory are known to be hard problems; the exact solutions require brute-force search in a combinatorially explosive space (e.g. exponentially growing number of attempts as the size of the problem grows). 

### Definition: 

A graph is defined as **a set of vertices (or nodes) and edges (or links) between them**. 


G=(V,E) defines a graph where an edge is said to be incident with two vertices:

$$e_k=(v_i,v_j)$$

For example, here: 

$$e_1=(v_1,v_2)$$
$$e_6=(v_4,v_5)$$
$$etc.$$

<img src="../images/graph1.png">

---

### Degree and Directedness: 

The **degree** *d(v)* of a a vertex *v* is its number of incident edges. In the above example, the degree of the vertex $V_5$ is three, for example. 


A **directed graph** or *digraph* is a graph where edges have **directions**. In that case, we can talk about *in-degrees* and *out-degrees* of a vertex. 

In the example below, the vertes $V_5$ has an in-degree of three and an out-degree of one. 
<img src="../images/graph2.png">

---

### Acyclic Graphs and Trees

An acyclic graph is a graph that has **no cycles** in it. The two examples above do have cycles; starting from a vertex, one can find a path that comes back to the same vertex again. 


A graph is **connected** if there can be found a path between any two vertices. A **tree** is an acyclic, connected graph. 

<img src="../images/tree.gif">

A **leaf** of a tree is a vertex of degree 1. 

---

### Isomorphism



Two graphs are said to be *isomorphic* if they have the same structure; that is their number of vertices and edges are the same and they have the same connectivity. 

<img src="../images/iso.png">

---

### Representation

Graphs can be represented as an edge list between nodes, as a graphical representation, or as an *adjacency matrix*. 

<img src="../images/adj.jpg">

---

### Planar Graph


When drawing a graph, if vertices can be placed so that **no edges cross each other**, that is a **planar** graph. 

The following graphs are the same; one is drawn so that edges do not cross each other. 

<img src="../images/planar.jpg">

We can assign "weights" to the edges of a graph; a directed, weighted graph is also known as a **network flow**. 

<img src="../images/nflow.png">

---

### Map Coloring Problem 

A political map can be colored with four colors so that no two neighboring countries have the same color. This is known as the map coloring problem, and it is an application of the *k-coloring* of a graph where each vertex V can be assigned to a number K so that no two vertices with the same number K have an edge between them. 

Planar graphs have 4-coloring. **When countries on a map are represented by vertices, and their neighboring relations are represented by edges between them, this creates a planar graph**, hence the minimum number of colors to color each country with the above constraint is four. 

<img src="../images/4map.png">

---

## Graph Drawing Problem 

As we have seen, the same graph **can be drawn in may different ways**; some of them visually more useful than others. Use of space is one of the main issues in graph drawing. 

As the number of nodes grow (e.g. we can have millions of nodes), efficient graph drawing, or **layout** becomes an algorithmic problem. 

We want to draw the graphs with **as little crossing edges as possible** (less clutter), **grouping as many vertices as possible** that form natural clusters through the many edges between them, and also using an efficient algorithm so that it takes the **least time** possible to compute the layout of the graph. 

A **layout algorithm** needs to find the *right* locations of the nodes, and draw the edges by avoiding overlapping, making them visually appealing, bundling closer or similar edges, minimizing edge lengths or bends, and minimizing the space utilized.  

The **layout** of a graph can be computed by using one of the following approaches:

**Force-directed layouts:**  the graph is drawn according to a system of forces based on physical metaphors such as springs or molecular mechanics. Nodes repel each other whereas edges attract nodes closer so that the the produced layout has minimum edge lengths, vertices are well-separated, and the number of crossing edges is minimum. This is usually achieved by gradient descent based minimization of an energy function.

<img src="../images/force1.png">
<img src="../images/force2.jpg">

**Layered layouts:** the nodes of the graph are arranged into horizontal (or vertical) layers so that most edges go downwards from one layer to the next and the nodes within each layer are arranged in order to minimize crossings. These layouts are better suited for acyclical (i.e. trees) or almost acyclical graphs. 

<img src="../images/layered.png">

**Circular layouts:** the nodes of the graph are placed on a circle, choosing carefully the ordering of the nodes around the circle to reduce crossings and place adjacent nodes close to each other. Edges may be drawn either as chords of the circle or as arcs inside or outside of the circle. Circular layouts are especially useful for **space utilization** when drawing trees. 

<img src="../images/circ1a.png">


---

**Edges can be highlighted with colors, more attributes can be visualized at the perimeter of vertices**, and similar edges can be **bundled** together to create a more effective visualization.

<img src="../images/circ2.jpg">
<img src="../images/circ3.png">
<img src="../images/circ4.png">


- **Arc diagrams:** vertices are placed on a line; edges may be drawn as semicircles above or below the line, or as smooth curves linked together from multiple semicircles. 

<img src="../images/arc1.png">
<img src="../images/arc2.png">
<img src="../images/arc3.png">

---

**Adjacency matrix :** sometimes the grouping patterns in an adjacency matrix is also useful as a visualization of clusters, if the rows and columns are carefully ordered to make these groupings visible. 

<img src="../images/adj1a.png">
<img src="../images/adj1b.png">

If the data set contains natural locations for the objects (e.g. airport coordinates), those locations can be chosen as natural vertex locations; but sometimes **freeing the data set from these coordinates and using force-directed layouts can lead to insteresting discoveries.** 

---

## Network Visualization

We can use networks to visualize data sets that contain **relationships between entities**. Some examples are: 

---

**Biological networks:** Protein–protein interaction networks, Metabolic networks, signaling pathways, etc. 

<img src="../images/bio1.png">
<img src="../images/bio2.jpg">

---

**Social networks:** social media networks (Facebook friends),  friendship and acquaintance networks, collaboration graphs, disease transmission, etc. 

<img src="../images/soc1.png">
<img src="../images/soc2.png">

---

**Other networks:** Traffic networks (airline traffic, internet traffic, etc.) electrical grid, etc. 

By employing a variety of layouts and network types, we can explore the relations between large numbers of nodes and **discover patterns such as clusters or communities** (densely connected subsets), **critical paths**, shortest paths, central nodes, etc.

<img src="../images/net1.png">
<img src="../images/net2.png">

---
### Social Network Analysis

Social networks are **connections between people**. Networks can also exist between people and work products; for example, as in a software version control system or record of wikipedia page edits. 

In a social network, nodes are people. In a large network, we need to decide if there are criteria for people to exist "as a node". For example, if a social network is constructed from observations about people being in the same place at the same time, that presence is the critera. If a social network is constructed from communication records, as in a software version control system or discussion forum, the criteria are different. 

Edges are the connections between nodes. The number, type and frequency of edges between nodes underly most of the measures of social networks, and have a strong influence on how these networks are visualized. 

---

#### Key Social Network Measures

**Social network analysis** is a substantial field of inquiry and statistical development. Like other families of statistical methods, the number of discrete statistics one can calculate is substantial. There are a few measures that are conceptual core to a lot of social network analysis.

1. **Degree Centrality** : The number of connections (edges) between a node and other nodes. The higher the degree centrality, the more important, or central, a node is in a network. There are two kinds: In degree and out degree. In degree centrality is often a proxy for popularity. If people are frequently interacting with you, then you are likely "popular". The reasons for your popularity depend on the type of network (e.g. a famous actor is likely popular because he is interesting). Out degree centrality can be thought of as an indicator of gregariousness. If you message a lot of people, you might be considered outgoing.

2. **Betweenness Centrality** : High betewenness centrality is indicated by a person being a broker between two networks. These people are often called "connectors". For example, if somebody has a strong network in the defense contractor industry, and a strong set of network connections inside the Pentagon, their betweenness centrality between those two networks will be measured as high compared to a person without strong connections in both networks. In this way, it can be illustrated for example how high betweenness nodes are important for health social networks. 

The following shows the **betweenness** as hue from red (zero) to blue (maximum).

<img src="../images/between.png">

---

## Tree Visualization

Trees are **acyclic connected graphs**, as we have seen above. Their layouts are similar to the network layouts; we usually draw them in layered or circular approaches. 

**Space utilization** is an important issue for tree drawing, because as the hierarchy goes down, there are **more and more** nodes to be drawn. Circular layouts can be very effective for trees. 


**Below example shows how the higher nodes are more sparse, and the lower nodes in hierarchy get denser.** 

<img src="../images/tree1.gif">

Also, we can visualize trees in 2D using **containment as the visual channel** that encodes hierarchy; those trees are called **treemaps**.

<img src="../images/tree2.png">

Here we can see different layouts; **radial (circular) layout** helps to utilize the space more efficiently. 
<img src="../images/tree3.png">

This is a good example of utlizing space in layered approach:

<img src="../images/tree4.png">

The following shows a treemap example. 

<img src="../images/treemap.png">

And this shows a **Reingold-Tilford** tree in radial layout. 
<img src="../images/rt.png">



## Deeper Diving Links 

Following links show many examples of static and dynamic network visualizations in many domains. 

[Dynamic Graph Visualization](http://dynamicgraphs.fbeck.com/)

[D3 Gallery](https://github.com/d3/d3/wiki/Gallery)


[Visual Complexity](http://www.visualcomplexity.com/vc/)



**Standalone Software for Graph Visualization and/or Analysis:**

Following software can be downloaded and used to explore and visualize large, complex network data sets: 

 - [Gephi](https://gephi.org/features/)

 - [GraphViz](http://www.graphviz.org/Gallery.php)

 - [Gruff](http://franz.com/agraph/gruff/)

 - [Cytoscape](http://www.cytoscape.org/)



**References:**

A good summary of graph theory concepts: 

 - [Graph Theory](http://cs.bme.hu/fcs/graphtheory.pdf)

