# Graph Centrality Measures
Finding important nodes in a network
<hr>

**Centrality Measures**<br>
Captures the importance of a node in a network. For example, degrees (in/out degree), propagated degree (e.g. if friends of my friends are also important, i.e. a node has a large centrality has neighbours that have large centrality), closeness, betweenness (where removal may break the network) of networks.

1. **Degree centrality**

    For undirected graphs, the degree $k_i$ of node $i$ is the number of edges
    
    - Undirected graphs: $k_i = \sum_{j} A_{ij}$
    - Directed graphs: in-degree ($k_{i}^{in} = \sum_{j} A_{ji}$) and out-degree ($k_{i}^{out} = \sum_{j} A_{ij}$)
    
    Simple and intuitive: Individuals with more connections have more *influence* and access to information. But does not capture the importance of their connections.
    
    
2. **Closeness centrality**
    
    Tracks how close a node is to any other node, by computing the inverse of the average distance to all nodes, i.e small average distance is equivalent to high closeness
    
    $C_i = (\frac{1}{n-1} \sum_{j \neq i} d_{ij})^{-1}$
    
    where $d_{ij}$ is the distance between nodes $i$ and $j$ and is calculated by computing the number of walks required to go from node $i$ to $j$ in its shortest path
    
    Harmonic centrality helps with disconnected networks where $d_{ij}$ is infinity for disconnected nodes.
    
    $H_i = \frac{1}{n-1} \sum_{j \neq i} \frac{1}{d_{ij}}$, which translates to zero for disconnected nodes but gives more weight to small distances
    
    

3. **Betweeness centrality**
    
    Measures the extent to which a node lies on paths between other nodes:
    
    $B_i = \frac{1}{n-2} \sum_{s, t} \frac{n_{st}^i}{g_{st}}$
    
    where $n_{st}^i$ is the number of shortest paths between $s$ and $t$ that passes through $i$, and $g_{st}$ is the total number of shortest paths between $s$ and $t$
    

4. **Eigenvector centrality**

    *If the connected nodes of the node $i$ are important...*
    
    The eigenvector centrality of a node is the weighted importance of the nodes pointing to it (left eigenvector, $x^T A$) or the nodes that it points to (right eigenvector, $Ax$). The eigenvector centrality of a directed graph is the eigenvector, $v$, with the largest eigenvalue. 
    
    Then, the eigenvector centrality of node $i$ is the value at $i^{th}$ index of $v$ and is denoted $v_i$
        
    Recap: For a matrix of size $n$ x $n$, a value $\lambda$ is an eigenvalue that corresponds to an eigenvector $x$, if and only if, $Ax = \lambda x$.
    
    The interpretation of eigenvector centrality is that the ranking of a particular node $i$ satisfies:
    
    $\sum_{j} v_j A_{ji} = \lambda_{max} v_i$
    
    and this implies
    
    $v_i = \frac{1}{\lambda_{max}} \sum_{j} v_j A_{ij}$
    
<hr>

**Katz Centrality**

The eigenvector centrality cannot be properly measured when there are *source (no nodes pointing to it) / sink (not pointing to any node)* nodes in directed graphs as it gives zero eigenvector centrality.

**Solution**: Give every node some fixed (but small) centrality

$x_{i}^{k+1} = \alpha \sum_{j = 1}^{n} A_{ij} x_{j}^{(k)} + \beta_i$

or equivalently,

$x^{k+1} = \alpha \cdot Ax^{(k)} + \beta$

Drawback of Katz centrality: A node of high centrality pointing to many nodes gives them all high centrality.

**Solution**: Page rank remedy. Scale by the degree of a node, for e.g. if an important (high centrality) website points to your website but also 999 other websites at the same time then it has degree 1000 with some centrality measure.

$x_{j}^{k+1} = \alpha \sum_{i = 1}^{n} A_{ij} \frac{x_{i}^{(k)}}{k_i^{out}} + \beta_i$ where $k_i^{out}$ is the out-degree of node $i$

or equivalently,

$x^{(k+1)} = \alpha \cdot Ax^{(k)} + \beta$ where $D = diag(k_1^{out}, \dots, k_n^{out})$

<hr>

**Hubs and Authorities**

An important **hub** is a node that points to many important **authorities**. An important authority is one that has many hubs pointing to it.

We begin with an initial assignment of hub and authority scores for every node $x^0$ and $(y^0)^T$ respectively.

$x^{k+1} = \alpha \cdot A y^k$ and $(y^{k+1})^T = \beta \cdot (x^{k+1})^T A$

# Basic code
A `minimal, reproducible example`

## Eigenvector centrality

In [42]:
# Adjacency matrix
import numpy as np
np.set_printoptions(precision = 4, suppress = True)
A = np.array(
    [
        [1,0,0,0], 
        [1,0,0,0], 
        [1,0,0,0], 
        [1,0,0,0]
    ]
)

In [43]:
# Find the left eigenvector centrality of all nodes
import networkx as nx
G = nx.from_numpy_matrix(A, create_using = nx.DiGraph)
left_eigenvector = nx.eigenvector_centrality(G)

sorted((v, f"{c:0.5f}") for v, c in left_eigenvector.items())

[(0, '1.00000'), (1, '0.00000'), (2, '0.00000'), (3, '0.00000')]

In [44]:
# Which node is the most important, if A^T?
G = nx.from_numpy_matrix(A.T, create_using = nx.DiGraph)
left_eigenvector = nx.eigenvector_centrality(G)

sorted((v, f"{c:0.5f}") for v, c in left_eigenvector.items())

[(0, '0.50000'), (1, '0.50000'), (2, '0.50000'), (3, '0.50000')]

In [45]:
# Find eigenvector centrality
A = np.array(
    [
        [1,1,1,1], 
        [1,0,0,0], 
        [1,0,0,0], 
        [1,0,0,0]
    ]
)

G = nx.from_numpy_matrix(A, create_using = nx.DiGraph)
left_eigenvector = nx.eigenvector_centrality(G)

sorted((v, f"{c:0.5f}") for v, c in left_eigenvector.items())

[(0, '0.79917'), (1, '0.34705'), (2, '0.34705'), (3, '0.34705')]

## Katz centrality

In [78]:
# Compute the Katz centrality of a directed, acyclic graph (DAG)
A = np.array(
    [
        [0, 1, 1, 1], 
        [0, 0, 0, 0], 
        [0, 0, 0, 0], 
        [0, 0, 0, 0]
    ]
)

alpha = 0.1
beta  = 1

G = nx.from_numpy_matrix(A, create_using = nx.DiGraph)
left_eigenvector = nx.katz_centrality(G)
vals = np.array(list(left_eigenvector.values()))
vals

array([0.4647, 0.5112, 0.5112, 0.5112])

## Page-rank centrality

In [25]:
# Compute the Katz centrality of the nodes
import networkx as nx
import numpy as np

A = np.array([
    [0, 1, 1, 1, 1, 1, 1],
    [1, 0, 1, 0, 0, 0, 0],
    [1, 1, 0, 0, 0, 0, 0],
    [1, 0, 0, 0, 1, 0, 0],
    [1, 0, 0, 1, 0, 0, 0],
    [1, 0, 0, 0, 0, 0, 0],
    [1, 0, 0, 0, 0, 1, 0]
]
            )

alpha = 0.1
beta  = 1

G = nx.from_numpy_matrix(A, create_using = nx.DiGraph)
left_eigenvector = nx.katz_centrality(G, alpha = alpha, beta = beta)
vals = np.array(list(left_eigenvector.values()))
left_eigenvector

{0: 0.4902924835393532,
 1: 0.36220399817757487,
 2: 0.36220399817757487,
 3: 0.36220399817757487,
 4: 0.36220399817757487,
 5: 0.358581961957795,
 6: 0.3259836068884889}

In [26]:
# Now compute the page-rank centrality for the same matrix
from networkx.algorithms.link_analysis.pagerank_alg import pagerank

alpha = 0.85
vals = np.array(list(pagerank(G, alpha = alpha).values()))
pagerank(G, alpha = alpha)

{0: 0.34304516922104655,
 1: 0.12178517428468952,
 2: 0.12178517428468952,
 3: 0.12178517428468952,
 4: 0.12178517428468952,
 5: 0.09978782025635273,
 6: 0.07002631338384244}

Note: The first node (row 1) is a node with high centrality as every other node points to it. The last node (row 7) benefits in the case of Katz centrality by being one of the nodes that the first node is pointing to. The effect of this is seen to be diminished when we compute the page-rank centrality as the first node is pointing to every node and the last node is just one of the many.

## Hubs and authorities

In [30]:
# Compute the hub scores of each node
# A hub is a node that points to many important authorities
# An authority is one that has many hubs pointing to it

from networkx.algorithms.link_analysis.hits_alg import hits

A = np.array([
    [0, 1, 1, 1, 1, 1, 1],
    [1, 0, 1, 0, 0, 0, 0],
    [1, 1, 0, 0, 0, 0, 0],
    [1, 0, 0, 0, 1, 0, 0],
    [1, 0, 0, 1, 0, 0, 0],
    [1, 0, 0, 0, 0, 0, 0],
    [1, 0, 0, 0, 0, 1, 0]
]
)

G = nx.from_numpy_matrix(A, create_using = nx.DiGraph)
hubs, authorities = hits(G)

hubs, authorities

({0: 0.25440103615406334,
  1: 0.13175928388048821,
  2: 0.13175928388048824,
  3: 0.13175928388048821,
  4: 0.13175928388048821,
  5: 0.08680254444349544,
  6: 0.13175928388048824},
 {0: 0.2544010361540634,
  1: 0.13175928388048824,
  2: 0.13175928388048821,
  3: 0.13175928388048821,
  4: 0.13175928388048821,
  5: 0.13175928388048824,
  6: 0.08680254444349543})