## Random Network Models

* Assess the probability distribution of network data
* Simulate networks that resemble real networks in some properties but otherwise random

### Stochastic Block Model
<table>
    <tr>
        <td><img src='Images/sbm_matrix.png' width='400'></td>
        <td><img src='Images/sbm_network.png'></td>
    </tr>
</table>

### Exponential Random Graph Model (ERGM)

$$
P(A=a)=\frac{\exp(c \cdot x(a))}{k(c)},
$$

### Erdos-Renyi Random Networks

* A uniform distribution on networks with the same numbers of nodes and edges.
* A baseline model to assess whether observed network properties differ from random.

Analogous to the uniform distribution on numbers, the Erdos-Renyi random network model specifies a uniform distribution on networks with the same numbers of nodes and edges. Therefore, it is commonly used as a baseline model to assess whether observed network properties differ from random. One example is to examine if the observed network is more clustered than random. To do this, one can compute the clustering coefficient - how likely people who share common friends are themselves friends - of the observed network, simulate Erdos-Renyi random networks and compute their clustering coefficients, and compare the observed values with the random simulations.

There are two variants of the model, $G(n,m)$ and $G(n,p)$
* $G(n,m)$ model generates a random network with $n$ nodes and $m$ edges as the following:
    * Starting from an empty network with $n$ nodes and no edges
    * Choose $m$ distinct pairs of nodes at random
    * Connect each pair with an edge
* Equivalent to sampling from an uniform distribution on all networks with $n$ nodes and $m$ edges.

* $G(n,p)$ model generates a random network with $n$ nodes
    * Starting from an empty network with $n$ nodes and no edges
    * For each pair of nodes, connect this pair with probability $p$
    * The expected number of edges is $\frac{n(n-1)}{2} p$.
* Equivalent to sampling from $\frac{n(n-1)}{2}$ Bernulli distributions with mean $p$.

* The two variants are equivalent when $m=\frac{n(n-1)}{2}p$ in the large network limit, i.e., as $n\rightarrow\infty$ and $pn^2\rightarrow\infty$.
* $G(n,p)$ is usually used to prove theoretical properties because it can be represented by a simple binomial distribution.
    * The degree distribution is $Binomial(n-1,p) \approx Poisson((n-1)p)$.
* $G(n,m)$ is often used as a baseline model for comparison for real networks since its numbers edges are pre-specified.

#### Caveat

* Both models assume that each pair of nodes behave independently. 
* A good theoretical model or a baseline, but not a good model for real networks. 

That is to say, if A and B are friends and B and C are friends, whether or not A and C are friends is completely random and does not depend on their relationships with B. We know that this is not true in real social networks, where friends' friends also tend to be friends. That is why the Erdos-Renyi random network is used as a theoretical model or a baseline, but not a model for real networks. 

In [None]:
network=erdos.renyi.game(N, M, type='gnm')

### Configuration Model

* Generate random networks with any given degree distribution.
* A good model for real networks with heterogeneous degree distribution.
* A uniform distribution on networks with the same degree sequence.

Different from the Erdos-Renyi random network model, the configuration model preserves the degree distribution observed in a real network, and hence it is the most commonly used null model in network analysis. Analogous to the uniform distribution on numbers, the configuration model specifies a uniform distribution on networks with the same degree sequence (or degree distribution). Therefore, it is a more realistic model than Erdos-Renyi. 

Since it is mostly used as a null model for analyzing real networks, the configuration model is defined as a randomization process of a given network. We discuss the details below.

Given a simple network with $N$ nodes and $M$ edges, we can construct a configuration random network as the following:

<img src='Images/config.png'>

* For each edge in the network, cut it into two halves - stubs.
* Pick two from all the available stubs at random and join them as an edge.
* Repeat the step above until no stubs are available. 

In [None]:
network=sample_degseq(d)
network=simplify(network)

### Small World Model

* Small average distance
* High clustering coefficient

#### Clustering Coefficient (Transitivity)

* How many friends' of yours are themselves friends?
* The fraction of connected triplets that are closed triangles
    * A connected triplet is three nodes where at least two pairs of them are connected
    * A triangle means that every pair of the three nodes are connected
$$
C=\frac{3 N_\Delta}{N_{\wedge}}
$$

#### Small World Network
* $d\sim \log N$
* $C \sim O(N)$
<img src='Images/smallworld.png'>

#### Small World Network

<img src='Images/distance_clustering.png'>

In [2]:
network=sample_smallworld(1, 100, 5, 0.05)
network=simplify(network)

ERROR: Error in sample_smallworld(1, 100, 5, 0.05): could not find function "sample_smallworld"


### Scale-Free Networks

* Generate networks with power-law degree distributions
* Rich gets richer

#### Preferential Attachment

* Start with a single node
* At each step, introduce a new node
    * Pick an existing node with the probability proportional to its degree
    * connect the new node to this existing node
* In the long run, the degree distribution will be power-law

In [None]:
g = sample_pa(10000)
degree_distribution(g)
h=hist(degree(g),30)
plot(h$counts,log='xy')

### Further Readings
* Relational event model (https://sites.google.com/site/linkscenterworkshopsna/modules/mini-modules1/relational-event-modeling)
    * Similar to ERGM but focus on high-frequency interactions
* Stochastic actor-oriented model (https://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-060116-054035)
* Dynamic network actor models 
    * Similar to relational event model but focuses on nodes (actors)

### Excercise

* Simulate 1000 networks from one of the random network models
* Compare the mean distance and the clustering coefficient of the Florentine Family Network to those from the random networks

In [None]:
c=numeric(1000)
r=numeric(1000)
for (i in 1:1000) {
    random_network=erdos.renyi.game(vcount(G), ecount(G), type='gnm')
    r[i]=diameter(random_network)
    c[i]=transitivity(random_network, type='global')
}