**Data Visualization course - winter semester 23/24 - FU Berlin**

*Tutorials adapted from the [Information Visualization](https://infovis.fh-potsdam.de/tutorials/) course at the FH Potsdam*


# Tutorial 3: Network analysis

When we are interested in the interactions among entities of interest and the structures that emerge from their relations, we can model the entities and their relations as a graph, which is sufficiently defined by its nodes and edges. Considering a network as a graph allows us to study the overall connectivity of the contained nodes, identify formation of clusters, or position particular nodes in their neighborhood. In this tutorial, we will get started with network analysis, covering some key concepts and techniques. But please note: this is just an introduction. Network analysis is becoming its own extensive field known as network science.

## 🛒 1. Prepare 

First, we will need to assemble our tools: Apart from Altair, we will use [NetworkX](https://networkx.github.io), a powerful network-analysis library. As a bridge between Altair and NetworkX, we are using [nx_altair](https://github.com/Zsailer/nx_altair). 

You will have to install `nx_altair` and maybe also `networkx` via `poetry install`. So let's get this out of the way first, after which we import all the libraries we will be using in this notebook:

In [None]:
import altair as alt
import networkx as nx
import nx_altair as nxa

### Generate a random graph

To get started quickly, we can create a random graph. The `fast_gnp_random_graph` method uses the Erdős-Rényi model to generate a graph according to two main parameters: the number of nodes **`n`** and the probability that a given pair of nodes is connected **`p`**:

In [None]:
G = nx.fast_gnp_random_graph(n=100, p=.1)

nxa.draw_networkx(G)

✏️ *Play with the parameters of `fast_gnp_random_graph`, but go easy on the `n` …*

### Create a network from scratch

You can also create a graph by manually adding nodes and edges with the respective methods `add_edge` and `add_node`:

In [None]:
G = nx.Graph()

G.add_node("Ada")
G.add_node("Bob")
G.add_node("Cai")
G.add_node("Don")
G.add_node("Eva")

G.add_edge("Ada", "Bob")
G.add_edge("Ada", "Cai")
G.add_edge("Ada", "Eva")
G.add_edge("Bob", "Cai")
G.add_edge("Bob", "Don")
G.add_edge("Cai", "Don")

nxa.draw_networkx(G)


Note that you actually do not need to add a node, if it is part of an edge. 

✏️ *Comment out or remove the `add_node()` statements above (lines 3-7) and check the result!*

An even more compact way of creating a graph, adding nodes, and edges is by simply passing a list of edge tuples when creating the graph:

In [None]:
G = nx.Graph([("Ada", "Bob"),
              ("Ada", "Cai"),
              ("Ada", "Eva"),
              ("Bob", "Cai"),
              ("Bob", "Don"),
              ("Cai", "Don")])

### Add attributes to nodes and edges

You can attach attributes to nodes and edges, either when adding them to the graph or later:


In [None]:
G = nx.Graph()

G.add_node(1, time='3pm')
G.nodes[1]['time']

In [None]:
G.nodes[1]['room'] = 5842
G.nodes.data()

Here the nodes are defined as numbers, but as we have seen above NetworkX can also take strings as ids

In addition, you can add attributes to edges. A common way to distinguish between different strengths of connections is to assign weights to edges:

In [None]:
G.add_edge(1,2, weight=4.7)

You can also add or edit edge attributes later:

In [None]:
G.edges[1,2]['weight'] = 3.2
G.edges.data()

### Load a network dataset 

Network data can come in many formats, and thankfully NetworkX can read and write many of them, including [GEXF](https://networkx.github.io/documentation/stable/reference/readwrite/gexf.html), [GML](https://networkx.github.io/documentation/stable/reference/readwrite/gml.html), [GraphML](https://networkx.github.io/documentation/stable/reference/readwrite/graphml.html) and [JSON](https://networkx.github.io/documentation/stable/reference/readwrite/json_graph.html) (as used by D3.js). 

The co-occurrence network of characters in the novel *Les Misérables* (1862) by Victor Hugo serves as a common example dataset for network visualization. Let's load and import it. NetworkX does not (yet) load and parse JSON transparently (as Pandas does so elegantly). Therefore, we need to include the packages `requests` and `json` to get this done:




In [None]:
import requests
import json

url = "http://bost.ocks.org/mike/miserables/miserables.json"

lesmis = requests.get(url).json()

G = nx.readwrite.json_graph.node_link_graph(lesmis, multigraph=False)

nxa.draw_networkx(G, node_tooltip='name')



## 🕸 2. Process

Once we have a graph representation of a network, we can carry out a range of processing steps, for example, to count its elements and generate some graph-theoretical metrics.


### Counting nodes and edges

For a start, we can get the number of nodes and edges:

In [None]:
G.number_of_nodes()

In [None]:
G.number_of_edges()

There is a convenient **`info()`** function (akin to the Pandas function of the same name) it gives us some basic stats, including the average degree, i.e., the number of connections an average node in this graph has:

In [None]:
print(nx.info(G))

### Graph metrics

Networks may vary a lot by their number of edges in relationship to the number of nodes, which is considered the **`density()`** of a network. The density of a network ranges between 0 and 1: from no connections whatsover to all every node is connected to every other node. It thus also relates to the probability of two random nodes being connected, which we have used further above!

In [None]:
density = nx.density(G)
density

Another metric that interests network scientists is the shortest path between a given pair of nodes, i.e., we might want to know the shortest connection between two characters in the les mis network:

In [None]:
names = ("Napoleon", "Jondrette")

ids = [x for x,y in G.nodes(data=True) if y['name'] in names]
ids

In [None]:
path = nx.shortest_path(G, source=ids[0], target=ids[1])
path

✏️ *Of course, these are just their nondescript ids. What would it take to know their names?*

In [None]:
[G.nodes[id]["name"] for id in path]

The length of above path equals the edges between these nodes, which equals the number of elements in the list minus 1:

In [None]:
len(path)-1

## 🥗 3. Present

After adding several centrality measures, we now have a fertile ground to generate insightful network visualizations that go beyond the default encoding you have already seen above. For the following steps we will continue with the Les Misérables network. 

### Force-directed layouts

Let's generate a network visualization and add a few quick customizations that might help to make sense of the network. First let's give the chart a bit more breathing room via the `properties()` call and add `tooltips` to the nodes:

In [None]:
nxa.draw_networkx(G, node_tooltip="name").properties(width=500, height=500)

The `spring_layout` is the default layout; it is an implementation of the Fruchterman-Reingold algorithm and takes several parameters. You can adjust it by generating the `pos` by hand.

✏️ *Have a look into the [documentation](https://networkx.github.io/documentation/stable/reference/generated/networkx.drawing.layout.spring_layout.html?highlight=spring_layout#networkx.drawing.layout.spring_layout) and try out other parameters, e.g. number of iterations:*

In [None]:
pos = nx.spring_layout(G, iterations=100)

nxa.draw_networkx(G, pos, node_tooltip="name").properties(width=500, height=500)

### Custom graph layouts

Since nx_altair generates the network visualization as Altair charts, we can actually decide much more about the visual encoding. The first choice is how the visual variable x/y-position is used. In other words, how should the layout of the network be generated. The `spring_layout` is the default graph layout that nx_altair uses, but NetworkX provides several other [graph layouts](https://networkx.github.io/documentation/stable/reference/drawing.html#module-networkx.drawing.layout).

✏️ *Replace `spring_layout` with another layout that you deem more useful:*


### Challenge visualization

So it is time for another challenge! This time try to visualize a graph where each node is a country and there exists an edge between two countries if they share a common border. Visualize the graph using a spring layout and colour the nodes according to the total cases of the corresponding country. (A quick tip: the range of total cases is pretty wide, so I recommend applying a log-transform on it first.) Beneath this cell I already started the task for you by importing everything you need and the necessary data: ```covid_data``` is our usual dataframe from World In Data and ```adjacency_list``` contains a dictionary of nodes->[neighbours] .

In [None]:
# load covid data
import pandas as pd
import json
import numpy as np
import requests as r

covid_data = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
adjacency_list = r.get('https://raw.githubusercontent.com/P1sec/country_adjacency/master/country_adj.json').json()

In [None]:
case_data = covid_data[['iso_code', 'total_cases', 'continent', 'location']].groupby('iso_code').max()
case_data['total_cases (log scale)'] = np.log(1+ case_data['total_cases'])
case_data = case_data.dropna().to_dict(orient='index')

In [None]:
G = nx.Graph(adjacency_list)
pos = nx.spring_layout(G, iterations=30)

nx.set_node_attributes(G, case_data)

nxa.draw_networkx(G, pos,
                 linewidths=0,
                 node_color='total_cases (log scale)',
                 node_size=100,
                 node_tooltip=['location', 'total_cases'])

## Sources


Tutorials & Documentation
- [Tutorial — NetworkX 2.4 documentation](https://networkx.github.io/documentation/stable/tutorial.html)
- [Exploring and Analyzing Network Data with Python](https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python)
- https://github.com/Zsailer/nx_altair
