### CS4423 - Networks
Angela Carnevale  
School of Mathematical and Statistical Sciences  
University of Galway

#### 5. Small Worlds

# Week 9, lecture 2: 

# Transitivity and Clustering.

In [None]:
import networkx as nx
opts = { "with_labels" : True, "node_color": 'y' }

Conclusion from yesterday:

**Definition (Small-world behaviour).**
A network $G = (X, E)$ is said to exhibit a **small world behaviour** if 
its characteristic path length $L$ grows proportionally to the
logarithm of the number $n$ of nodes of $G$:
$$
L \sim \ln n.
$$



In this sense, the ensembles $G(n, m)$ and $G(n, p)$ of random graphs do exhibit small
world behavior (as $n \to \infty$). However, when it comes to **clustering** (coming up next!) these graphs do not behave as small world networks. For that, we'll need to look at different models of random networks.

## Transitivity

**Definition (Graph transitivity).**
Recall that a **triad** is a tree on $3$ nodes or, equivalently, a graph consisting of $2$
adjacent edges (and the nodes they connect).  The **transitivity** $T$ of a graph $G = (X, E)$
is the proportion of **transitive** triads, i.e., triads which are subgraphs of **triangles**. This proportion can be computed as follows:
$$
T = \frac{3 n_{\Delta}}{n_{\wedge}},
$$
where $n_{\Delta}$ is the number of triangles in $G$, and $n_{\wedge}$ is the number of triads.


By definition, $0 \leq T \leq 1$.

**Example.**

In [None]:
G = nx.Graph(((1,2), (2,3), (3,1), (3,4), (3,5)))
nx.draw(G, **opts)

The function `nx.triangles(G)` returns a `python` dictionary reporting for each node
of the graph `G` the number of triangles it is contained in.

In [None]:
print(nx.triangles(G))

Overall, each triangle in `G` is thus accounted for $3$ times, once for each of its
vertices.  Hence, the following sum determines this number $3 n_{\Delta}$.

In [None]:
triple_nr_triangles = sum(nx.triangles(G).values())
print(triple_nr_triangles)

As we've seen, the number $n_{\wedge}$ of triads in `G` can be determined from the graph's degree
sequence, as each node of degree $k$ is the central node of exactly
$\binom{k}{2}$ triads.  

In [None]:
print(G.degree())
print({k : v * (v-1) // 2 for k, v in dict(G.degree()).items()})
nr_triads = sum([v * (v-1) // 2 for v in dict(G.degree()).values()])
nr_triads

The transitivity $T$ of `G` is the quotient of these two quantities, $T = 3 n_{\Delta}/n_{\wedge}$,
which `networkx` computes with a function `transitivity`.

In [None]:
print(triple_nr_triangles / nr_triads )
print(nx.transitivity(G))

* The transitivity of a $G(n, p)$ **random graph** is
$$
T = p,
$$
the probability of any edge as third edge in a triangle.



## Clustering

The concept of **clustering** measures the transitivity of a node, or of an entire graph in a different way.

For that, we'll need to recall the concept of **induced subgraph**. 

Given $G=(X,E)$ and $Y\subset X$, the induced subgraph of $G$ on $Y$ is the graph $H=\left(Y,E\cap \binom{Y}{2}\right)$.

**Definition (Clustering coefficient).**
For a node $i \in X$ of a graph $G = (X, E)$, denote by
$G_i$ the subgraph induced on the neighbours of $i$ in $G$,
and by $m(G_i)$ its number of edges.

The **node clustering coefficient** $c_i$ of node $i$ is defined
as
$$
c_i = \begin{cases}
\binom{k_i}{2}^{-1} m(G_i), & k_i \geq 2, \\
0, & \text{else.}
\end{cases}
$$
That is, the node clustering coefficient measures the proportion of existing edges its **social graph** among the possible edges.

The **graph clustering coefficient** $C$ of $G$ is the 
average node clustering coefficient,
$$
C = \langle c\rangle = \frac1n \sum_{i=1}^n c_i.
$$

By definition, $0 \leq c_i \leq 1$ for all nodes $i \in X$, and $0 \leq C \leq 1$.

**Example.**

In [None]:
G = nx.Graph([(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (2,3), (3,4)])
nx.draw(G, **opts)

In [None]:
N = nx.neighbors(G, 0)
S = G.subgraph(list(N))
nx.draw(S, **opts)

In [None]:
nS = S.number_of_nodes()
nS_choose_2 = nS * (nS - 1) // 2
mS = S.number_of_edges()
print(nS, mS, mS / nS_choose_2 )

In [None]:
nx.clustering(G)

In [None]:
nx.average_clustering(G)

* The **node clustering coefficient** of any node $i$ in a $G(n, p)$ **random graph** is
$c_i = p$. (In any selection of potential edges, by construction a proportion $p$ of them is
present in the random graph; this is true in particular for the $\binom{k}{2}$ potential
edges between the $k$ neighbors of a node of degree $k$.)

* Thus the **graph clustering coefficient** of a $G(n, p)$ **random graph** is
$$
C = p.
$$

* Note that when $p(n) = \langle k \rangle n^{-1}$ for a fixed expected average degree $\langle k \rangle$
then $C = \langle k \rangle / n \to 0$ for $n \to \infty$: in large ERrandom graphs
the number of triangles is negligible.

* In real world networks, one often observes that $C / \langle k \rangle$ does not depend
on $n$ (as $n \to \infty$).

### Clustering vs Transitivity

For a node $i \in X$, denote by $n_i^{\wedge} = \binom{k_i}{2}$ the number of
triads containing $i$ as their central node, and by $n_i^{\Delta}$ the actual
number of triangles containing $i$.

Then the node clustering coefficient is $c_i = n_i^{\Delta}/n_i^{\wedge}$,
or $n_i^{\Delta} = n_i^{\wedge} c_i$.

Moreover $3 n_{\Delta} = \sum_i n_i^{\Delta}$ and $n_{\wedge} = \sum_i n_i^{\wedge}$.

It follows that
$$
T = \frac{3 n_{\Delta}}{n_{\wedge}} = \frac1{n_{\wedge}} \sum_i n_i^{\wedge} c_i
$$
in contrast to
$$
C = \frac1n \sum_i c_i.
$$

That is, $C$ is the (plain) **average** of the node clustering coefficients, whereas $T$ is a
**weighted average** of node clustering coefficients, giving higher weight to
high degree nodes.

The following example illustrates how $C$ and $T$ are different measures: as $n \to \infty$ here, $T \to 0$ while $C \to 1$.

In [None]:
n = 100
G = nx.Graph(["AB"])
G.add_edges_from([(x, k) for x in "AB" for k in range(n)])
    
nx.draw(G, **opts)

In [None]:
nx.average_clustering(G), nx.transitivity(G)

* The fact that ER random networks tend to have low transitivity and clustering shows the need of a new kind of (random) network construction
that is better at modelling real world networks.

* One idea, developed by Watts and Strogatz in 1998, is to start with some **regular network** that
naturally has a **high clustering**, and then to randomly distort its edges, to introduce some **short paths**.

Let's first have a look at some families of networks and their transitivity and clustering.

##  Lattices

* A **triangular lattice** is (a finite porition of) a regular tiling of the Euclidean plane by triangles.
Here, each (inner) vertex has $6$ neighbors, which are linked in a cycle, giving
a node clustering coefficient of $6/\binom{6}{2} = 2/5 = 0.4$.

* A rectangular finite region of a triangular lattice with $m$ strips of $n$ triangles of constant height
can be generated with the command `nx.triangular_lattice_graph(m, n)`

In [None]:
G = nx.triangular_lattice_graph(5, 7)
nx.draw(G, **opts)

In [None]:
print(nx.clustering(G))

In [None]:
nx.average_clustering(G)

* However, other kinds of lattice graphs contain no triangles at all!

In [None]:
G = nx.grid_2d_graph(4, 4)
nx.draw(G, with_labels=True, node_color='y')

In [None]:
nx.average_clustering(G)

## Path Graphs

In [None]:
n = 10
G = nx.path_graph(n)
nx.draw(G, **opts)

In [None]:
nx.average_clustering(G)

* **Idea:** to produce some triangles in a path graph with $n$ vertices, additionally connect each node to all nodes not further than $d$
steps away ...

In [None]:
for v in range(n-2):
    G.add_edge(v, v+2)
nx.draw(G, **opts)
nx.average_clustering(G)

In [None]:
for v in range(n-3):
    G.add_edge(v, v+3)
nx.draw(G, **opts)
nx.average_clustering(G)

## Circle Graphs

* Or, same thing on a cycle ...

In [None]:
n = 10
G = nx.cycle_graph(n)
nx.draw(G, **opts)
print(nx.average_clustering(G))
print(nx.average_shortest_path_length(G))

In [None]:
for v in G:
    G.add_edge(v, (v+2) % n)
nx.draw_circular(G, **opts)
print(nx.average_clustering(G))
print(nx.average_shortest_path_length(G))

Looks like we're going in the right direction: $L$ is getting smaller while $C$ is increasing.

In [None]:
for v in G:
    G.add_edge(v, (v+3) % n)
nx.draw_circular(G, **opts)
print(nx.average_clustering(G))
print(nx.average_shortest_path_length(G))

##  Code Corner

### `networkx`

* `triangles`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.triangles.html)


* `transitivity`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.transitivity.html)


* `clustering`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html)


* `average_clustering`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.average_clustering.html)

* `triangular_lattice_graph`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.lattice.triangular_lattice_graph.html)


* `grid_2d_graph`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.lattice.grid_2d_graph.html)


* `grid_graph`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.lattice.grid_graph.html)


* `path_graph`: [[doc]](https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.generators.classic.path_graph.html)


* `cycle_graph`: [[doc]](https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.generators.classic.cycle_graph.html)


* `circulant_graph`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.classic.circulant_graph.html)


* `draw_circular`: [[doc]](https://networkx.github.io/documentation/stable/reference/generated/networkx.drawing.nx_pylab.draw_circular.html)

##  Exercises

1. What are the characteristic path length $L$, the transitivity $T$, and the clustering coefficient $C$
of the Petersen graph?

1. What are the characteristic path length $L$, the transitivity $T$, and the clustering coefficient $C$
of the Florentine families marital graph?

2. What is the transitivity and what is the clustering coefficient
of a complete graph on $n$ nodes?

3. What is the transitivity and what is the clustering coefficient
of a tree on $n$ nodes?

