### CS4423 - Networks
Angela Carnevale <br />
School of Mathematical and Statistical Sciences<br />
NUI Galway

#### 5. Small Worlds

# Week 9, lecture 2: 

# Transitivity and Clustering.

In [None]:
import networkx as nx
opts = { "with_labels" : True, "node_color": 'y' }

## Clustering

Recall:

**Definition (Graph transitivity).**
Recall that a **triad** is a tree on $3$ nodes or, equivalently, a graph consisting of $2$
adjacent edges (and the nodes they connect).  The **transitivity** $T$ of a graph $G = (X, E)$
is the proportion of **transitive** triads, i.e., triads which are subgraphs of **triangles**. This proportion can be computed as follows:
$$
T = \frac{3 n_{\Delta}}{n_{\wedge}},
$$
where $n_{\Delta}$ is the number of triangles in $G$, and $n_{\wedge}$ is the number of triads.


The transitivity $T$ of `G` is the quotient of these two quantities, $T = 3 n_{\Delta}/n_{\wedge}$,
which `networkx` computes with a function `transitivity`.

The concept of **clustering** measures the transitivity of a node, or of an entire graph in a different way.

For that, we'll need the concept of **induced subgraph**. Given $G=(X,E)$ and $Y\subset X$, the induced subgraph of $G$ on $Y$ is the graph $H=\left(Y,E\cap \binom{Y}{2}\right)$.

**Definition (Clustering coefficient).**
For a node $i \in X$ of a graph $G = (X, E)$, denote by
$G_i$ the subgraph induced on the neighbours of $i$ in $G$,
and by $m(G_i)$ its number of edges.

The **node clustering coefficient** $c_i$ of node $i$ is defined
as
$$
c_i = \begin{cases}
\binom{k_i}{2}^{-1} m(G_i), & k_i \geq 2, \\
0, & \text{else.}
\end{cases}
$$
That is, the node clustering coefficient measures the proportion of existing edges its **social graph** among the possible edges.

The **graph clustering coefficient** $C$ of $G$ is the 
average node clustering coefficient,
$$
C = \langle c\rangle = \frac1n \sum_{i=1}^n c_i.
$$

By definition, $0 \leq c_i \leq 1$ for all nodes $i \in X$, and $0 \leq C \leq 1$.

**Example.**

In [None]:
G = nx.Graph([(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (2,3), (3,4)])
nx.draw(G, **opts)

In [None]:
N = nx.neighbors(G, 0)
S = G.subgraph(list(N))
nx.draw(S, **opts)

In [None]:
nS = S.number_of_nodes()
nS_choose_2 = nS * (nS - 1) // 2
mS = S.number_of_edges()
print(nS, mS, mS / nS_choose_2 )

In [None]:
nx.clustering(G)

In [None]:
nx.average_clustering(G)

* The **node clustering coefficient** of any node $i$ in a $G(n, p)$ **random graph** is
$c_i = p$. (In any selection of potential edges, by construction a proportion of $p$ is
present in the random graph; this is true in particular for the $\binom{k}{2}$ potential
edges between the $k$ neighbors of a node of degree $k$.)

* Thus the **graph clustering coefficient** of a $G(n, p)$ **random graph** is
$$
C = p.
$$

* Note that when $p(n) = \langle k \rangle n^{-1}$ for a fixed expected average degree $\langle k \rangle$
then $C = \langle k \rangle / n \to 0$ for $n \to \infty$: in large random graphs
the number of triangles is negligible.

* In real world networks, one often observes that $C / \langle k \rangle$ does not depend
on $n$ (as $n \to \infty$).

### Clustering vs Transitivity

For a node $i \in X$, denote by $n_i^{\wedge} = \binom{k_i}{2}$ the number of
triads containing $i$ as their central node, and by $n_i^{\Delta}$ the actual
number of triangles containing $i$.

Then the node clustering coefficient is $c_i = n_i^{\Delta}/n_i^{\wedge}$,
or $n_i^{\Delta} = n_i^{\wedge} c_i$.

Moreover $3 n_{\Delta} = \sum_i n_i^{\Delta}$ and $n_{\wedge} = \sum_i n_i^{\wedge}$.

It follows that
$$
T = \frac{3 n_{\Delta}}{n_{\wedge}} = \frac1{n_{\wedge}} \sum_i n_i^{\wedge} c_i
$$
in contrast to
$$
C = \frac1n \sum_i c_i.
$$

That is, $C$ is the (plain) **average** of the node clustering coefficients, whereas $T$ is a
**weighted average** of node clustering coefficients, giving higher weight to
high degree nodes.

The following example illustrates how $C$ and $T$ are different measures: as $n \to \infty$ here, $T \to 0$ while $C \to 1$.

In [None]:
n = 4
G = nx.Graph(["AB"])
G.add_edges_from([(x, k) for x in "AB" for k in range(n)])
    
nx.draw(G, **opts)

In [None]:
nx.average_clustering(G), nx.transitivity(G)

##  Code Corner

### `networkx`

* `shortest_path_length` : [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.generic.shortest_path_length.html)


* `eccentricity`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.distance_measures.eccentricity.html)


* `triangles`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.triangles.html)


* `transitivity`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.transitivity.html)


* `clustering`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.clustering.html)


* `average_clustering`: [[doc]](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.cluster.average_clustering.html)

##  Exercises

1. What are the characteristic path length $L$, the transitivity $T$, and the clustering coefficient $C$
of the Peterson graph?

1. What are the characteristic path length $L$, the transitivity $T$, and the clustering coefficient $C$
of the Florentine families marital graph?

2. What is the transitivity and what is the clustering coefficient
of a complete graph on $n$ nodes?

3. What is the transitivity and what is the clustering coefficient
of a tree on $n$ nodes?

