### CS4423 - Networks
Angela Carnevale  
School of Mathematical and Statistical Sciences  
University of Galway

#### 4. Random Networks

# Week 8, lecture 1: 

# Properties of Random Graphs. Towards Phase Transitions

In [1]:
import networkx as nx
import random
opts = { "with_labels": True, "node_color": 'y'}
opts1 = { "with_labels": True, "node_color": 'lightblue'}

### Recall:

* **ER model $A$:** constructs a graph $G(n,m)$ with $n$ vertices and (exactly) $m$ edges. 

* The subset of the edges is randomly selected from all the $m$-subsets of pairs of nodes $\binom{X}{2}$, where $X$ is the vertex set.

* In `networkx` we can sample a graph from the $G(n,m)$ model by using the random graph constructor `gnm_random_graph` (takes input a number of nodes $n$ and number of vertices $m$).

* **ER model $B$:** constructs a graph $G(n,p)$ with $n$ vertices and edges from the set $\binom{X}{2}$ chosen with probability $p$. 

* The **expected size** of such graph is $Np$ where $N=\binom{n}{2}$. 

* In `networkx` we can sample a graph from the $G(n,p)$ model by using the random graph constructor `gnp_random_graph` (takes input a number of nodes $n$ and a probability $p\in[0,1]$).

**Next:** What are the typical features of a random network with $n$ nodes and (_approx._) $m$ edges?

* When does it contain a **tree** that's not a straight line?
* When does it contain a **triangle**, or indeed a **cycle** of any length? If so, how many?
* When does it contain a small **complete** graph of a given size?

* When does it contain a **large component**, larger than all other components?
* When does the network form a single **connected commponent**?
* How do these properties depend on $n$ and $m$ (or $p$)?

## Probability Distributions

* Denote by $G_n$ the set of *all* graphs on the $n$ points $X = \{0, \dots, n{-}1\}$.

* Set $N = \binom{n}{2}$, the maximal number of edges of a graph $G \in G_n$.

* Regard the ER models $A$ and $B$ as **probability distributions** $P \colon G_n \to \mathbb{R}$.

* Denote  $m(G)$: the number of edges of a graph $G$.

As we have seen:

* The probability of a specific graph $G$ to be sampled from the model $G(n,m)$ is:
$$
P(G) = \begin{cases}
\binom{N}{m}^{-1}, & \text{if } m(G) = m, \\
0, & \text{else.}
\end{cases}
$$

* The probability of a specific graph $G$ to be sampled from the model $G(n, p)$ is:
$$
P(G) = p^m (1-p)^{N-m},
$$
where $m = m(G)$.

## Expected Size and Average Degree

**Note.** We will use the following notation: 

* $\bar a$ indicates the expected value of property $a$ (that is, as the graphs vary across the ensemble produced by the model).

* $\langle a \rangle$ indicates the average of property $a$ over all the nodes of a graph.

In $G(n, m)$:

* the expected **size** is
$$
\bar{m} = m,
$$
as every graph $G$ in $G(n, m)$ has exactly $m$ edges.

* the expected **average degree** is 
$$
\langle k \rangle = \frac{2m}{n},
$$
as every graph has average degree $2m/n$.

* Other properties of $G(n, m)$ are less straightforward, and it is easier to work with 
the $G(n, p)$.

In $G(n, p)$, with $N = \binom{n}{2}$:

* the **expected size** is
$$
\bar{m} = pN
$$

* with **variance**
$$
\sigma_m^2 = N p (1-p);
$$

* the expected **average degree** is (we'll see why soon):
$$
\langle k \rangle = p (n-1).
$$

* with **standard deviation**
$$
\sigma_k = \sqrt{p(1-p)(n-1)}
$$

* In particular, the **relative standard deviation** (or the **coefficient of variation**) of the size of
a random model $B$ graph is
$$
\frac{\sigma_m}{\bar{m}} = \sqrt{\frac{1-p}{pN}} 
= \sqrt{\frac{2(1-p)}{p n (n-1)}}
= \sqrt{\frac{2}{n \langle k \rangle} - \frac{2}{n (n-1)}}
,
$$
a quantity that converges to $0$ as $n \to \infty$ if $p (n-1) = \langle k \rangle$, the average node degree, is kept constant.

* In this sense, for large graphs, the fluctuations in the size of random graphs in model $B$ can be neglected.

* Model $B$ random graphs are used for theoretical purposes.

* Model $A$ random graphs are preferred for sampling random graphs on a computer. 

* A model $A$ graph with $n$ vertices and $m$ edges corresponds to a model $B$ graph with $n$ vertices and
$p = m/N$, where $N = \binom{n}{2}$.

## Degree distribution

**Definition.**
The **degree distribution** $p\colon \mathbb{N}_0 \to \mathbb{R},\, k \mapsto p_k$ of a graph $G$
is defined as
$$
p_k = \frac{n_k}{n},
$$
where, for $k \geq 0$, $n_k$ is the number of nodes of degree $k$ in $G$.


This definition can be extended to ensembles of graphs with $n$ nodes (like the random graphs $G(n, m)$ and
$G(n, p)$), by setting
$$
p_k = \bar{n}_k/n,
$$
where $\bar{n}_k$ denotes the expected value of the random variable $n_k$ over the ensemble of graphs.

* The degree distribution in a random graph $G(n, p)$ is a **binomial distribution**:
$$
p_k = \binom{n-1}{k}p^k (1-p)^{n-1-k} = \mathrm{Bin}(n-1, p, k)
$$
 
That is, in the $G(n,p)$ model, the **probability that a node has degree $k$** is $p_k$.

* Thus the **average degree** of a randomly chosen node is
$$
\langle k \rangle = \sum_{k=0}^{n-1} k p_k = p (n-1)
$$
with standard deviation
$$
\sigma_k = \sqrt{p(1-p)(n-1)}
$$

* When $n$ is large, the typical $p_k$ is a product of a binomial coefficient $\binom{n-1}{k}$
which can be very large, and the probabilities $p^k (1-p)^{n-1-k}$ which can be very small.
This can lead to errors in the computation. 

* Mathematically, we're happy to just compute this value formally. (For ways to overcome computational errors when dealing with binomial coefficients, see the Appendix below.)

* In the limit $n \to \infty$, with $\langle k \rangle  = p (n-1)$ kept constant,
the binomial distribution $\mathrm{Bin}(n-1, p, k)$ is well approximated by the **[Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution)**:
$$
p_k = e^{-\lambda} \frac{\lambda^k}{k!} = \mathrm{Pois}(\lambda, k),
$$
where $\lambda = p (n-1)$.

## Phase Transitions

* POV: for the random graph $G(n, p)$, suppose that $p = p(n)$ is a function of $n$, the number of nodes,
and study the ensemble of graphs $G(n, p(n))$, as $n \to \infty$.
* Typically, $p(n) = c n^e$, for real numbers $c, e$.

* Then, to say that **almost every graph has a property $Q$** means that the
probability of a graph in the ensemble to have property $Q$ tends to $1$,
as $n \to \infty$.

* In other words, for any given property $Q$, there is a **critical probablity function** $p_Q(n)$
such that a.e. graph in $G(n, p(n))$ has property $Q$ if $p(n)$ **decays slower** than $p_Q(n)$,
and a.e. graph in $G(n, p(n))$ fails to have property $Q$ if $p(n)$ **decays faster** than $p_Q(n)$.

* If $Q$ is concerned with specific subgraphs, and if $p(n) \propto p_Q(n)$, we can determine the number of
appearances of such subgraph in the network.

* This is made precise by the following theorem.  (For a proof see Bollobás, *Random Graphs*, CUP 2001.)

**Theorem (Appearance of Subgraphs).**
Let $F$ be a connected graph with $a$ nodes and $b$ edges.

* If $p(n)/n^{-a/b} \to 0$ then almost every graph in the ensemble $G(n, p(n))$ **does not** contain a copy of $F$.

* If $p(n)/n^{-a/b} \to \infty$ then almost every graph in the ensemble $G(n, p(n))$ **does** contain a copy of $F$.

* If $p(n) = c n^{-a/b}$ then, as $n \to \infty$, the number $n_F$ of $F$-subgraphs in $G$ follows a distribution
$$\mathrm{Pois}(\lambda, r),$$ where $\lambda = c^b/|\mathrm{Aut(F)}|$,
with $|\mathrm{Aut}(F)|$ being the number of *automorphisms* (read: symmetries) of $F$. 

That is, the probability of $n_F=r$ is $\mathrm{Pois}(\lambda, r)$.

* So the critical probability function $p_Q(n)$ for $Q$ being the appearance of a subgraph $F$ with $a$ nodes and $b$ edges is $p_Q(n) = c n^{-a/b}$.

For example:

* **Trees**  with $a$ nodes ($b = a - 1$) appear when $p(n) = c n^{-a/(a-1)}$.
* **Cycles** of order $a$ (where $b = a$) appear when $p(n) = c n^{-1}$.
* **Complete** subgraphs of order $a$ (where $b = \binom{a}{2} = \frac12 a(a-1)$) appear when $p(n) = c n^{-2/(a-1)}$.

In practice, we can estimate $n_F$ as follows [we'll see this tomorrow].
* there are $\binom{n}{a}$ ways to pick $a$ nodes from a graph $G$ on $n$ nodes;
* between them, $b$ links are present with probability $p^b$;
* permuting the $a$ nodes of $F$ yields $a!/|\mathrm{Aut}(F)|$ copies of this configuration.

### Appendix
### Computing binomial coefficients

In [None]:
import numpy as np
import pandas as pd

In [None]:
from math import factorial
factorial(16)

In [None]:
# python 3.8 has:
# from math import comb
# which we've seen computes n choose k taking n and k as input

Using the following observation, one can compute a binomial coefficient as product of *reasonable* fractions--without having to compute the quotients of large factorials.

$$\binom{n}{k} = \frac{n \cdot (n-1) \dotsm (n-k+1)}{1 \cdot 2 \dotsm k}$$

In [None]:
def binomial(n, k):
    prod, top, bot = 1, n, 1
    for i in range(k):
        prod = (prod * top) // bot
        top, bot = top - 1, bot + 1
    return prod

In [None]:
l = [binomial(16, k) for k in range(17)]
print(l)

As expected, the 16th row of Pascal's triangle is as follows:

In [None]:
df = pd.DataFrame(l)
df.plot.bar()

Alternatively, for $n$ larger than $k$, [Stirlings formula](https://en.wikipedia.org/wiki/Stirling%27s_approximation)
$$
n! \sim \sqrt{2 \pi n} \left(\tfrac{n}{e}\right)^n
$$
can be used to approximate a binomial coefficient as follows:

$$
\binom{n}{k}  = \frac{n \cdot (n-1) \dots (n-k+1)}{k!}
\approx \frac{(n-k/2)^k}{k^k e^{-k} \sqrt{2 \pi k}}
= \frac{(n/k - 0.5)^k e^k}{\sqrt{2 \pi k}}
$$

In [None]:
from math import exp, sqrt, pi, log
def binom_approx(n, k):
    return (n/k - 0.5)**k * exp(k) / sqrt(2 * pi * k)

In [None]:
n = binomial(100, 2)
k = 50
print(binomial(n, k))

In [None]:
print(binom_approx(n, k))
print(binomial(n, k) / 10**120)

## Code Corner

### `random`

* `random`: [[doc]](https://docs.python.org/2/library/random.html#random.random)


* `sample`: [[doc]](https://docs.python.org/2/library/random.html#random.sample)

### `numpy`

* `random.choice`: [[doc]](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.choice.html)

### `networkx`

* `gnm_random_graph`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.random_graphs.gnm_random_graph.html)


* `gnp_random_graph`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.random_graphs.gnp_random_graph.html)


* `triangles`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.triangles.html)

##  Exercises

1. Show that a node of degree $k$ is the central node of exactly $\binom{k}{2}$ triads.

1.  Show that the number $n_{\Delta}$ of triangles in an graph $G$ with adjacency matrix $A$ is
$n_{\Delta} = \frac16 \mathrm{tr}(A^3)$, where $\mathrm{tr}(B) = \sum_i b_{ii}$ denotes the **trace**
of a square matrix $B$, i.e., the sum of its diagonal entries.
