### CS4423 - Networks
Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

#### 2. Centrality Measures

# Lecture 10: Closeness and Betweenness  Centrality

We study two more centrality measures
* closeness centrality and
* betweenness centrality

and compare these to the centralities studied earlier in the
example of the marital ties graph of the Florentine families.

Import the packages and set standard drawing options:

In [None]:
import networkx as nx
import pandas as pd
from queue import Queue
opts = { "with_labels": True, "node_color": 'y'}

Next, recover the graph `G` of marital ties between Florentine families, together with the node attributes we have already determined.

In [None]:
G = nx.read_yaml("data/florentine.yml")
G.nodes['Medici']

In [None]:
G.number_of_nodes()

## Closeness Centrality

A node $x$ in a network can be regarded as being central, if it is **close** to (many) other nodes, 
as it can then quickly interact with them.  A simple way to measure closeness in this sense
is based on the sum of all the distances to the other nodes, as follows.

<div class="alert alert-danger">

**Definition (Closeness Centrality).**
In a simple, connected graph $G$, the **closeness centrality** $c_i^C$ of node $i$
is defined as
$$
c_i^C = \Bigl(\sum_j d_{ij}\Bigr)^{-1}.
$$

The **normalized closeness centrality** of node $i$, defined as
$$
C_i^C = (n-1) c_i^C
$$
takes values in the interval $[0, 1]$.
</div>

**BFS again.**  

* The following `python` function implements
BFS for shortest distance from a previous lecture.  
* It takes a graph $G = (X, E)$ and a vertex $x \in X$
as its arguments. 
* It returns a **dictionary**, which for each node as key has the distance to
$x$ as its value.

In [None]:
def distances(G, node):
    
    # 1. init: set up the dictionary and a queue
    d = { node: 0 }
    q = Queue()
    q.put(node)
    
    # 2. loop
    while not q.empty():
        x = q.get()
        for y in G.neighbors(x):
            if y not in d:
                d[y] = d[x] + 1
                q.put(y)
    
    # 3. stop here
    return d

In [None]:
d = distances(G, 'Medici')
print(d)

In [None]:
distances(G, 'Pucci')

* If the sum of the distances is $0$ (why?), computing the closeness will most likely
cause a division-by-zero error.

* From now on, we will work only with the large connected component of `G`, and
call it `GG`.

In [None]:
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)
n = GG.number_of_nodes()

In [None]:
nx.draw(GG, **opts)

In [None]:
d = distances(GG, 'Medici')
sum(d.values())

* Use `distances` to compute the normalized closeness centrality according to the above
definition.

In [None]:
close_cen = { x : (n-1)/sum(distances(GG, x).values()) for x in GG }
close_cen

* Compare the results to the `networkx` version of closeness:

In [None]:
nx.closeness_centrality(GG)

* Let's add those measurements to the table.

In [None]:
nx.set_node_attributes(G, close_cen, '$C_i^C$')

In [None]:
pd.DataFrame.from_dict(
    dict(G.nodes(data=True)), 
    orient='index'
).sort_values('degree', ascending=False)

## Betweenness Centrality

When interactions between non-adjacent agents in a network depend
on middle men (on shortest paths between these agents), power comes
to those in the middle.  Betweennness centrality measures centrality
in terms of the number of shortest paths a node lies on.

<div class="alert alert-warning">
    
**Defintion (Betweenness Centrality).**
In a simple, connected graph $G$, the **betweenness centrality** $c_i^B$ of node $i$
is defined as
$$
c_i^B = \sum_{j \neq i} \sum_{k \neq i} \frac{n_{jk}(i)}{n_{jk}},
$$
where $n_{jk}$ denotes the **number** of shortest paths from
node $j$ to node $k$, and where $n_{jk}(i)$ denotes the
number of those shortest paths **passing through** node $i$.

The **normalized betweenness centrality** of node $i$, defined as
$$
C_i^B = \frac{c_i^B}{(n-1)(n-2)}
$$
takes values in the interval $[0, 1]$.
</div>

**BFS once more.**  This time as a python function, which returns a **dictionary** that contains, for each node $y$, a list of **immediate predecessors** of $y$
in a shortest path from $x$ to $y$.  Yes, that's another piece of information that BFS can determine
on the fly.  From this, recursively, one can reconstruct **all shortest paths** from $x$ to $y$.
We still need to compute the shortest path lengths in order to decide which neighbor $x$
actually is a predecessor of $y$: .

In [None]:
def predecessors(G, node):
    
    # 1. init: set up the two dictionaries and queue
    dists = { node: 0 }
    preds = { x : [] for x in G }
    q = Queue()
    q.put(node)
    
    # 2. loop
    while not q.empty():
        x = q.get()
        for y in G.neighbors(x):
            if y not in dists:
                dists[y] = dists[x] + 1
                q.put(y)
            if dists[y] == dists[x] + 1:
                preds[y].append(x)
    
    # 3. stop here
    return preds

In [None]:
p = predecessors(GG, 'Medici')
p

Using the predcessor lists with respect to $x$, the shortest paths from $x$ to $y$ can be enumerated recursively:
the shortest path from $x$ to itself is the empty path starting an ending at $x$.
Else, if $y \neq x$ then each shortest path from $y$ to $x$ travels through
exactly one of $y$'s predecessors ... and ends in $y$.

In [None]:
def shortest_paths(x, y, pre):
    if x == y:
        return [[x]]
    paths = []
    for p in pre[y]:
        for path in shortest_paths(x, p, pre):
            paths.append(path + [y])
    return paths

In [None]:
shortest_paths('Medici', 'Bischeri', p)

* Now compute betweenness:

In [None]:
between = { x : 0.0 for x in GG }

In [None]:
for x in GG:
    pre = predecessors(GG, x)
    for y in GG:
        paths = shortest_paths(x, y, pre)
        njk = len(paths)*(n-1)*(n-2)
        for p in paths:
            for z in p[1:-1]:  # exclude endpoints
                between[z] += 1/njk

In [None]:
between

In [None]:
nx.betweenness_centrality(GG)

In [None]:
nx.draw(GG, with_labels=True)

* Finally, let's add the normalized betweenness centralities as attributes to the
nodes of the graph, and display the resulting table.

In [None]:
nx.set_node_attributes(G, between, '$C_i^B$')

In [None]:
pd.DataFrame.from_dict(
    dict(G.nodes(data=True)), 
    orient='index'
).sort_values('degree', ascending=False)

##  Code Corner

### `networkx`

* `read_yaml`: [[doc]](https://networkx.github.io/documentation/stable/reference/readwrite/generated/networkx.readwrite.nx_yaml.read_yaml.html#networkx.readwrite.nx_yaml.read_yaml)


* `closeness_centrality`: [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html#networkx.algorithms.centrality.closeness_centrality)
   
    
* `betweenness_centrality`: [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#networkx.algorithms.centrality.betweenness_centrality)

## Exercises

1. Recall that $C_i^C$ is the normalized closeness centrality of node $i$.  Why
   is $0 \leq C_i^C \leq 1$?  When is $C_i^C = 1$?  Is $C_i^C$ ever $0$?

2. Recall that $C_i^B$ is the normalized betweenness centrality of node $i$.
   Why is $0 \leq C_i^B \leq 1$?  When is $C_i^B = 1$?  Is $C_i^B$ ever $0$?
   
3. Determine the closeness centrality and the betweenness centrality of the nodes in some
   random trees.  What do you observe?
   
3. Compute the closeness centrality and the betweenness centrality of the nodes of the Petersen graph.
   What do you observe?