### CS4423 - Networks
Angela Carnevale   
School of Mathematical and Statistical Sciences  
Universiry of Galway

# Week 8, lecture 2: 

# Phase Transitions: Subgraphs and Giant Component. 


In [None]:
import networkx as nx
import numpy as np
opts = { "with_labels" : True, "node_color": 'y' }

#### 4. Random Graphs.

## Phase Transitions

* For the random graph $G(n, p)$, we now suppose that $p = p(n)$ is a function of the number of nodes.

* We study the ensemble of graphs $G(n, p(n))$, as $n \to \infty$.

* Typically, $p(n) = c n^{-e}=\frac{c}{n^e}$, for real numbers $c, e$.

* To say that **almost every graph has a property $Q$** means that the
probability of a graph in the ensemble to have property $Q$ tends to $1$,
as $n \to \infty$.

* For a given property $Q$, there is a **critical probablity function** $p_Q(n)$
such that 

    * a.e. graph in $G(n, p(n))$ has property $Q$ if $p(n)$ **decays slower** than $p_Q(n)$,
and 

    * a.e. graph in $G(n, p(n))$ fails to have property $Q$ if $p(n)$ **decays faster** than $p_Q(n)$. 

* If $Q$ is concerned with specific subgraphs, and if $p(n) \propto p_Q(n)$, we can determine the number of
appearances of such subgraph in the network.

**Theorem (Appearance of Subgraphs).**
Let $F$ be a connected graph with $a$ nodes and $b$ edges (that is, $b\geq a-1$).

* If $p(n)/n^{-a/b} \to 0$ then almost every graph in the ensemble $G(n, p(n))$ **does not** contain a copy of $F$.

* If $p(n)/n^{-a/b} \to \infty$ then almost every graph in the ensemble $G(n, p(n))$ **does** contain a copy of $F$.

* If $p(n) = c n^{-a/b}$ then, as $n \to \infty$, the number $n_F$ of $F$-subgraphs in $G$ follows a distribution
$$\mathrm{Pois}(\lambda, r),$$ where $\lambda = c^b/|\mathrm{Aut(F)}|$,
with $|\mathrm{Aut}(F)|$ being the number of *automorphisms* (read: symmetries) of $F$. 

That is, the probability of $n_F=r$ is $\mathrm{Pois}(\lambda, r)$.

* So the critical probability function $p_Q(n)$ for $Q$ being the appearance of a subgraph $F$ with $a$ nodes and $b$ edges is $p_Q(n) = c n^{-a/b}$.

For example:

* **Trees**  with $a$ nodes ($b = a - 1$) appear when $p(n) = c n^{-a/(a-1)}$.
* **Cycles** of order $a$ (where $b = a$) appear when $p(n) = c n^{-1}$.
* **Complete** subgraphs of order $a$ (where $b = \binom{a}{2} = \frac12 a(a-1)$) appear when $p(n) = c n^{-2/(a-1)}$.

In practice, we can estimate $n_F$ as follows:
* there are $\binom{n}{a}$ ways to pick $a$ nodes from a graph $G$ on $n$ nodes;
* between them, $b$ links are present with probability $p^b$;
* permuting the $a$ nodes of $F$ yields $a!/|\mathrm{Aut}(F)|$ copies of this configuration.

That is, in total there are 
$$
n_F = \frac{n!}{a!(n-a)!}p^b \cdot\frac{a!}{ |\mathrm{Aut}(F)|} =$$ $$ = \frac{n! p^b}{(n-a)! |\mathrm{Aut}(F)|} \approx \frac{n^a p^b}{|\mathrm{Aut}(F)|}
$$
copies of $F$ in $G$.
* Note how this number depends only on $a$ and $b$.

**Example.** Numbers of 
* triads (trees on 3 vertices) with $a = 3$, $b = 2$ and $|\mathrm{Aut}(F)| = 2$.  $$3 \binom{n}{3} p^2 = \tfrac12 n(n-1)(n-2)p^2 \approx \frac12 n^3 p^2,$$

* triangles where $a = b = 3$ and $|\mathrm{Aut}(F)| = 6$: $\binom{n}{3} p^3 = \tfrac16 n(n-1)(n-2)p^3 \approx \frac16 n^3 p^3$.

* star graph with $a = 4$, $b =3$ and $|\mathrm{Aut}(F)| = 6$: $$4 \binom{n}{4} p^3 = \tfrac16 n(n-1)(n-2)(n-3) p^3 \approx \tfrac16 n^4 p^3,$$

* The actual **number of triads** in a graph $G$ can be determined from its **degree distribution**, as
each node of degree $k$ is the central node of exactly $\binom{k}{2}$ triads (why?)
* `networkx` does not seem to have a function for that specific purpose, but `degree` is good enough.

In [None]:
n, m = 100, 100 # for a graph in the Gnm model
p = 2*m/n/(n-1) # for an analogous graph in the Gnp model
print(f"Expect {n*(n-1)*(n-2) * p**2/2} triads")

In [None]:
G = nx.gnm_random_graph(n, m)
sum(k*(k-1)//2 for k in dict(G.degree()).values())

In [None]:
G2 = nx.gnp_random_graph(n, p)
sum(k*(k-1)//2 for k in dict(G2.degree()).values())

* The actual **number of triangles** in a graph $G$ with adjacency matrix $A$ is $\frac16$ of the trace of $A^3$ (why?)
* Also, the `networkx` function `triangles` returns a dictionary, reporting for each node the number of triangles it is involved in.

In [None]:
n, m = 100, 300
p = 2*m/n/(n-1)  ## for a graph in the Gnp model with n nodes  
                 ## and approx m edges
print(f"Expect {n*(n-1)*(n-2) * p**3/6} triangles")

In [None]:
G = nx.gnp_random_graph(n, p)
sum(nx.triangles(G).values())//3 ## each triangle is accounted for three times, so we divide by 3

In [None]:
A = nx.adjacency_matrix(G)
np.trace((A**3).toarray())//6

* Moreover, $p(n) = \frac1n \ln n$ is the threshold probability for $G$ to be connected.
(This corresponds to $m = \frac12 n \ln n$ in model $A$.)

##  The Giant Connected Component

**Definition (Giant Component).**
A connected component of a graph $G$ is called a **giant component**
if its number of nodes increases with the order $n$ of $G$ as
some positive power of $n$.

Suppose $p(n) = cn^{-1}$ for some positive constant $c$.  (Then the average degree $\langle k \rangle = pn = c$ remains fixed as $n \to \infty$.)

**Theorem (Erdős-Rényi)**.
* If $c < 1$ the graph contains many small components, orders bounded by $O(\ln n)$.
* If $c = 1$ the graph has large components of order $O(n^{2/3})$.
* If $c > 1$ there is a unique **giant component** of order $O(n)$.

* In practice, in a given graph $G$ of order $n$, 
a giant component 
appears when the average degree is $1$, i.e., if $m = \frac12n$, and then
has order $n^{2/3}$
(e.g., $100^{2/3} \approx 21.5$.)

#### 5. Small Worlds

Many real world networks are **small world networks**,
where most pairs of nodes are only a few steps away from each other,
and where nodes tend to form cliques, i.e., subgraphs having
all nodes connected to each other.

* For example, [MathSciNet](https://mathscinet-ams-org.nuigalway.idm.oclc.org/mathscinet/index.html)
allows its users to explore distances between authors in the collaborations network. The distance of an author to Erdös is 
know as this author's [Erdös number](https://en.wikipedia.org/wiki/Erd%C5%91s_number).

* Or, for a cinematographic version of this phenomenon have a look at the [six degrees of Kevin Bacon](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon).

We introduce **three network attributes** that measure these small-world
effects:

* the **characteristic path length** $L$, defined as the
  _average length of all shortest paths in the network_;
  
* the **transitivity** $T$, defined as the _proportion of
  triads that form triangles_;
  
* the **clustering coefficient** $C$, defined as the
  _average node clustering coefficient_.

In terms of these attributes, a network is called a **small world network** if it has 

1. a small **average shortest path length** $L$
(scaling with $\log n$, where $n$ is the number of nodes), and
2. a high **clustering coefficient** $C$.

It turns out that ER random networks do have a small average shortest path length,
but not a high clustering coefficient.
This observation justifies the need for a different model of
random networks, if they are to be used to model the 
clustering behavior of real world networks.

##  Code Corner

### `networkx`

* `triangles`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.triangles.html)

