### CS4423 - Networks
Angela Carnevale  
School of Mathematical and Statistical Sciences  
University of Galway

#### 3. Centrality Measures

# Week 6, lecture 2: Betweenness Centrality.  Examples. A note on graph isomorphism

In [1]:
import networkx as nx
import pandas as pd
import yaml
from queue import Queue
opts = { "with_labels": True, "node_color": 'y'}

Next, recover the graph `G` of marital ties between Florentine families, together with the node attributes we have already determined.

In [2]:
with open('data/florentine.yml', 'r') as f:
    G=yaml.load(f,Loader=yaml.Loader)
    

## Betweenness Centrality

Recall the definitions of **betweenness centrality** of a node and its normalised analogue.

**Definition (Betweenness Centrality).**
In a simple, connected graph $G$, the **betweenness centrality** $c_i^B$ of node $i$
is defined as
$$
c_i^B = \sum_{j \neq i} \sum_{k \neq i} \frac{n_{jk}(i)}{n_{jk}},
$$
where $n_{jk}$ denotes the **number** of shortest paths from
node $j$ to node $k$, and where $n_{jk}(i)$ denotes the
number of those shortest paths **passing through** node $i$.

The **normalised betweenness centrality** of node $i$, defined as
$$
C_i^B = \frac{c_i^B}{(n-1)(n-2)}
$$
takes values in the interval $[0, 1]$.
</div>

* Note that $(n-1)(n-2)$ is the largest number of shortest paths beween pairs of distinct nodes that a given node could possibly sit on.

* How to compute $C_i^B$?

**BFS once more.**  This time as follows:

* A python function which returns a **dictionary** that contains, for each node $x$, a list of **immediate predecessors** of $y$ in a shortest path from $x$ to $y$.  

* We have already seen that this another piece of information that BFS can determine
on the fly: when constructing a **spanning tree** while traversing a graph, we need to remember **one**
immediate predecessor for each newly discovered node.  

* Here we determine and remember **all** immediate
predecessors, requiring little if no extra work.

From this list of predecessors, one can then recursively reconstruct **all shortest paths** from $x$ to $y$.
We still need to keep track of the shortest path lengths in order to decide which neighbor $x$
actually is a predecessor of $y$.

In [14]:
def predecessors(G, x):
    
    # 1. init: set up the two dictionaries and queue
    dists = { y: None for y in G } # distances
    preds = { y: [] for y in G } 
    Q = Queue()
    dists[x] = 0
    Q.put(x)
    
    # 2. loop
    while not Q.empty():
        y = Q.get()
        for z in G.neighbors(y):
            if dists[z] is None:
                dists[z] = dists[y] + 1
                preds[z].append(y)
                Q.put(z)
            elif dists[z] > dists[y]:
                preds[z].append(y)
    
    # 3. stop here
    return preds

In [None]:
p = predecessors(G, 'Medici')
p

In [None]:
nx.draw(G, **opts)

Using the **predecessor lists** with respect to $x$, the **shortest paths** from $x$ to $y$ can be enumerated recursively:
* if $y = x$: the shortest path from $x$ to itself is the empty path starting an ending at $x$.
* else, if $y \neq x$ then each shortest path from $x$ to $y$ travels through
  exactly one of $y$'s predecessors ... and ends in $y$.

So, in formulas, $S_x(x) = \{(x)\}$ and
$$
S_x(y) = \{ p + (y) : p \in S_x(z),\, z \in \mathrm{pre}_x(y)\}
$$

In [None]:
def shortest_paths(x, y, pre):
    if x == y:
        return [[x]]
    paths = []
    for z in pre[y]:
        for path in shortest_paths(x, z, pre):
            paths.append(path + [y])
    return paths

In [None]:
def spaths(x, y, pre):
    if x == y:
        return [[x]]
    return [ p + [y] for p in spaths(x, z, pre) for z in pre[y] ]

In [None]:
shortest_paths('Medici', 'Bischeri', p)

* Now compute betweenness:

In [None]:
betweenness = { x : 0.0 for x in G }
n = G.order()

In [None]:
for x in G: 
    pre = predecessors(G, x)
    for y in G:
        paths = shortest_paths(x, y, pre)
        njk = len(paths)*(n-1)*(n-2)  # normalize
        for p in paths:
            for z in p[1:-1]:  # exclude endpoints
                betweenness[z] += 1/njk # add 1 (normalised) to betweenness of x
                                        # every time a shortest path passes through z

In [None]:
betweenness

* Compare the results to the `networkx` version of betweenness:

In [None]:
nx.betweenness_centrality(G)

In [None]:
nx.draw(G, **opts)

* Finally, let's add the normalized betweenness centralities as attributes to the
nodes of the graph, and display the resulting table.

In [None]:
nx.set_node_attributes(G, betweenness, '$C_i^B$')

In [None]:
pd.DataFrame.from_dict(
    dict(G.nodes(data=True)), 
    orient='index'
).sort_values('degree', ascending=False)

In [None]:
with open('data/florentine.yml', 'w') as f:
    yaml.dump(G,f)

##  Summary and examples

There are many different ways to be important.  As a node in a network, you are important if
* you have **many friends** (degree centrality)
* you have **important friends** (eigenvector centrality)
* you are **close** to many (closeness centrality)
* many interactions **pass through** you (betweenness centralty).

Recall that $C_i^C$ is the normalized closeness centrality of node $i$.  Why
   is $0 \leq C_i^C \leq 1$?  When is $C_i^C = 1$?  Is $C_i^C$ ever $0$?



In a graph of order $n$, the **normalised closeness centrality** of node $i$, defined as
$$
C_i^C = (n-1) c_i^C= {(n-1)}{\left(\sum_{j=1}^n d_{ij}\right)^{-1}}
$$
takes values in the interval $[0, 1]$.

In a connected network of order $n$, a node of degree $n-1$ will have normalised closeness centrality $1$.

Recall that $C_i^B$ is the normalized betweenness centrality of node $i$.
   Why is $0 \leq C_i^B \leq 1$?  When is $C_i^B = 1$?  Is $C_i^B$ ever $0$?

A leaf (node of degree 1) will have betweenness centrality 0.

#### Back to...

## A note on Graph Isomorphism and Symmetries

* Two graphs $G = (X, E)$ and $H = (Y, F)$ are said to be **isomorphic** if there
is an edge-preserving bijection between their vertex sets $X$ and $Y$.

* [Deciding graph isomorphism](https://en.wikipedia.org/wiki/Graph_isomorphism_problem)
is computationally hard.

* An isomorphism of a graph $G$ with itself is called an **automorphism**,
or a **symmetry** of $G$.

* Symmetries, or the lack thereof, are interesting properties of networks.

* For instance, in random selections, like the random trees on $n$ vertices, it turns out that **more symmetric species are less frequently picked**.

Morally, this is because graph symmetries are properties of an isomorphism class, rather than a specific graph or network. Let's gather an intuition for this by looking at (small) trees.

## Trees on 4 vertices


![4-trees](images/t4.png)

According to Cayley's formula, there are indeed $4^{4-2} = 16$ **labelled** trees on $n = 4$ vertices.  But overall, we only see $2$
distinct structures: 
* a **path graph** of length $3$, and 
* a **star graph** with $3$ spikes.

These structures are known as **unlabelled trees**
(as opposed to a **labelled tree**, where each node corresponds to
a specific element of $\{0, \dots, n{-}1\}$).

In [None]:
n = 4
T = nx.random_tree(n)
nx.draw(T)

As a random graph, the path graph occurs far more often than
the star graph.  Is there something wrong with the assumption of uniform
distribution?

No, there isn't.  It's just that the line **shape** appears more often than the
star **shape** in the full list of all **labelled** graphs on 4 points. How often a **shape** appears is a function of how many symmetries (=how many automorphisms) it has.

Morally, fewer symmetries means that a **shape** will occur more often.

From next week and for a while, we will be mostly interested in average/global properties of graphs sampled from random models. In that case, we won't be thinking of specific graphs but rather about structural features of graphs that can occur when sampling.

##  Code Corner

### `networkx`

* `closeness_centrality`: [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html#networkx.algorithms.centrality.closeness_centrality)
   
    
* `betweenness_centrality`: [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#networkx.algorithms.centrality.betweenness_centrality)