# CAI Lab Session 8: Network analysis

In this session you will:

- learn about the `igraph` package for analyzing networks
- compute several descriptive measures of networks
- work on several network models seen in the theory class

## 1. Introduction

In this session we will introduce the `igraph` software package for network analysis. The accompanying notebook `igraph.ipynb` contains examples on how to generate, plot, and compute several descriptive measures over graphs in an easy manner.
Please look at it and make sure you understand what is going on. Once you are familiar with igraph's functionality, you can go on to solve the following tasks.

## 2. Analyzing network models

In class you have seen three main random network models:

**Erdös-Rényii model (ER model).**
The ER model takes two parameters:
$n$, the number of vertices in the resulting network, and
$p$, the probability of having an edge between any two pairs of nodes.
A graph following this model is generated by connecting pairs of vertices with probability $p$, independently for each pair of vertices.

**Watts-Strogatz model (WS model).**
The WS model takes two parameters as well:
$n$, the number of vertices in the resulting network, and
$p$, the probability of rewiring the edges in the initial network.
A graph following this model is generated by initially laying all nodes out in a circle, and connecting each node to its four closest nodes. After that, we randomly reconnect each edge with probability $p$.

**Barabasi-Albert model (BA model).**
The BA model takes two parameters:
$n$, the number of vertices in the resulting network, and
$m$, the number of edges a _new_ vertex brings to attach itself to existing nodes.
A graph in this model is generated by adding new nodes according to the _preferential attachment principle_ until the
resulting graph has the desired size.


Your task is to generate the following plots using `igraph`: 

1. Plot the clustering coefficient and the average shortest-path as a function of the parameter $p$ of the WS model.
2. Plot the average shortest-path length as a function of the network size of the ER model.
3. Plot a histogram of the degree distribution of a BA network. What distribution does this follow? Can you describe it?

For option (1), notice that in order to include both values - average shortest path and clustering coefficient - in the same figure, the clustering coefficient and the average shortest-path values are scaled to be within the range $[0,1]$. This is achieved by dividing the values by the value obtained at the left-most point, that is, when $p=0$.

For option (2), you will have to experiment with appropriate values of $p$ which may depend on the parameter $n$. You will notice that for large values of $n$ your code may take too long, compute values for $n$ that are reasonable for you. Also, make sure that you chose values for $p$ that result (with high probability) in connected graphs. To achieve this, you can use a result from [this famous paper](https://snap.stanford.edu/class/cs224w-readings/erdos60random.pdf) stating (in the following, think of $\epsilon$ as a small positive real number):

- If $p < \frac{(1-\epsilon)\ln n}{n}$ then a graph in $G(n, p)$ will almost surely contain isolated vertices, and thus be disconnected
- If $p > \frac{(1+\epsilon)\ln n}{n}$ then a graph in $G(n, p)$ will almost surely be connected

For option (3), choose a network that is large enough so that results are what is expected from this model.

In [19]:
import pandas as pd
import altair as alt
from igraph import Graph

### 1. Plot the clustering coefficient and the average shortest-path as a function of the parameter $p$ of the WS model.

In [26]:
ds = []
Cs = []
ps = []

for p in range(1, 11):
    g = Graph.Erdos_Renyi(n=100, p=p/10)
    ds.append(g.diameter())
    Cs.append(g.transitivity_undirected())
    ps.append(p/10)

ds = [d/ds[0] for d in ds]
Cs = [C/Cs[0] for C in Cs]

In [28]:
df = pd.DataFrame({'diameter': ds, 'clustering': Cs, 'p': ps})

c1 = alt.Chart(df).mark_line(
    point=True
).encode(
    x=alt.X('p:Q'),
    y=alt.Y('diameter:Q')
)

c2 = alt.Chart(df).mark_line(
    point=True,
    color='red'
).encode(
    x=alt.X('p:Q'),
    y=alt.Y('clustering:Q')
)

c1 + c2


### 2. Plot the average shortest-path length as a function of the network size of the ER model.

### 3. Plot a histogram of the degree distribution of a BA network. What distribution does this follow? Can you describe it?

## 4. Rules of delivery

Nothing to deliver.