# Module 3: Influence Measures and Network Centralization

## Degree and Closeness Centrality

### Network Centrality
Centrality measures identify the most important nodes in a network:
    - Influenctial nodes in a social network
    - Nodes that disseminate information to many nodes or prevent epidemics
    - Hubs in a trasportation network
    - Important pages on the Web
    - Nodes that prevent the network from breaking up
    
### Centrality Measures
- Degree centrality
- Closeness centrality
- Betweenness centrality
- Load centrality
- Page Rank
- Katz centrality
- Percolation centrality

### Degree Centrality
- Assumption: inportant nodes have many connections.
- Basic measurement: the number of neighbors
    - undirected networks: use degree
        - $C_{deg}(V) = \frac{d_v}{|N|-1}$, where N is the set of nodes ni the network and $d_v$ is the degree of node v.
        - <img src="https://img.ceclinux.org/bc/d32596f244726222d22581014be742bd102dda.png">
        - <img src="https://img.ceclinux.org/63/e775787eacbd6b2ddd202de25576385672a09d.png">
    - directed networks: use in-degree or out-degree
        - $C_{indeg}(v)=\frac{d_v^{in}}{|N|-1}$, where N=set of nodes in the network, $d_v^{in}$ = the in-degree of node v.
        - <img src="https://img.ceclinux.org/8c/7aaa71c5fd67bcac62f322fb954dcf62f98e1a.png">
        - <img src="https://img.ceclinux.org/92/eda44dc2985b456e117b41f5c6dbcb15ffe00d.png">
        - $C_{outdeg}(v)=\frac{d_v^{in}}{|N|-1}$, where N=set of nodes in the network, $d_v^{in}$ = the out-degree of node v.
        - <img src="https://img.ceclinux.org/2b/b0e4483df8f94b1ea07a7c60560da031cf5f72.png">
        
### Closeness Centrality
- Assumption: important nodes are close to other nodes
<img src="https://img.ceclinux.org/e9/6f598d98a49c409b649622802d4ed253e06df8.png">

### Measuring Disconnected Nodes
<img src="https://img.ceclinux.org/f5/11294253f7df977288d6f9f105afb776317cc4.png">
Consider only nodes that L can reach and normalize by the fraction of nodes L can reach:
<img src="https://img.ceclinux.org/99/7edc8d40668a577477afba158ec5c108ec2ef0.png">
closeCent = nx.closeness_centrality(G, normalized=True)

## Betweenness Centrality
Assumption: important nodes connect other nodes

$C_{btw}(v) = \sum_{s,t \in N}\frac{\sigma_{s,t}(v)}{\sigma_{s,t}}$, where

$\sigma_{s,t}$ = the number of shortest paths between nodes s and t

$\sigma_{s,t}(v)$ = the number shortest paths between nodes s and t that pass through node v

#### Example
<img src="https://img.ceclinux.org/08/979ef88c7876d4574c5240fbd68885cfd18424.png">
<img src="https://img.ceclinux.org/ec/c0efbd2d5e630710290edbbd336ab89711e74b.png">

### Betweenness Centrality - Normalization
Betweness centrality values will be larget in graphs with many nodes. To control for this, we divide centrality values by the number of pairs of nodes in the graph (excluding v):
1. in undirected graphs: $\frac{1}{2}$ (|N|-1)(|N|-2)
2. in directed graphs: (|N|-1)(|N|-2)

<img src="https://img.ceclinux.org/12/14e1e9f8fc3c011f7695b46a747f397cf4e2ba.png">

### Betweenness Centrality - Complexity and Approximation
1. Computing betweeness centrality of all nodes can be very computationally expensive. Can take up to O($|N|^3$)
2. Approximation: rather than computing betweeness centrality based on all pairs of nodes s,t, we can approximate it based on a sample of nodes
<img src="https://img.ceclinux.org/a3/09f0875b08d248c0e555188732dc3e4512a768.png">
<img src="https://img.ceclinux.org/e9/6343a738f4acd9ddab62c58c8c97d66dacbeaf.png">

### Betweenness Centrality - Edges
Can use betweenness centrality to find important edges instead of nodes:

$C_{btw}(e) = \sum_{s,t \in N}\frac{\sigma_{s,t}(e)}{\sigma_{s,t}}$, where

$\sigma_{s,t}$ = the number of shortest paths between nodes s and t

$\sigma_{s,t}(v)$ = the number shortest paths between nodes s and t that pass through edge e

<img src="https://img.ceclinux.org/c0/6bf480a9545a41ddea3935757c338ca8c828e4.png">


## Basic Page Rank
Developed by Google founders to measure the important of webpages from the hyerlink network structure.

PageRank assigns a score of importance to each node. Important nodes are those with **many in-links** from **important pages**. 

PageRank can be used for any type of network, but it is mainly useful for directed networks.

**Basic PageRank Update Rule**: Each node gives an equal share of its current PageRank to all the nodes it links to. The new PageRank of each node is the sum of all the PageRank it receivd from other nodes.
<img src="https://img.ceclinux.org/fc/2d4709a09bfd59f4311b1115b1e3baadbf4d1a.png">

#### Example: calculate the PageRank of each node after 2 steps of the precedure (k=2)
<img src="https://img.ceclinux.org/0e/adbdbfd4f37c563ce070850dc69c7e7fe7a418.png">
<img src="https://img.ceclinux.org/20/a70a57045c2c740e961e2c9e0adc154a4f0be1.png">

For most networks, PageRank values converge as k gets larger (k->infinity)

## Scaled Page Rank

1. The Basic PageRank of a node can be interpreted as the probability that a random walk lands on the node after $\textit{k}$ random steps.
2. Basic PageRank has the problem that, in some networks, a few nodes can "suck up" all the PageRank from the network. 
<img src="https://img.ceclinux.org/82/feff56666419c7a25f979c9013628a04a54c51.png">
3. To fix this problem, Scaled PageRank introduces a parameter $\alpha$, such that the random walker chooses a random node to jump to with probability 1-$\alpha$.
4. In practice, we typically set $\alpha$ to be 0.8 or 0.9. The PageRank result is dependent on the $\alpha$ we set.
5. NetworkX fucntion pagerank(G, alpha=0.8)
6. Scaled Page Rank works much better for **very large network**


## Hubs and Authorities
Given a query to a search engine:
       - **Root**: set of highly  relavant web pages (e.g. pages that contain the query string) - potential $\textit{authorities}$
       - Find all pages that link to a page in root - potential $\texit{hubs}$
       - **Base**: root nodes ad any node that links to a node in root.
       - Consider all edges connecting nodes in the base set.
      <img src="https://img.ceclinux.org/15/2014d0836afac5da3600ece39dbf07c99a1444.png">

### HITS Algorithm
Computing $\textit{k}$ iterations of the HITS algorithm to assign an $\textit{authority score}$ and $\textit{hub score}$ to each node.
1. Assign each node an authority and hub score of 1
2. Apply the ***Authority Update Rule***: each node's authority score is the sum of hub scores of each node that points to it.
3. Apply the ***Hub Update Rule***: each node's hub score is the sum of authority scores of each node that it points to.
4. **Normallize** Authority and Hub scores: $auth(j)=\frac{auth(j)}{\sum{i\in N}auth(i)}$

#### Example:
<img src="https://img.ceclinux.org/39/cb2127e363cb34148759e3a441b4e2a06a3d6b.png">
<img src="https://img.ceclinux.org/ec/db8769fc423de0a1a6ac485a0729bd57291235.png">
<img src="https://img.ceclinux.org/89/0ab45bd5b910e255fe21dce25df9e5d27d90b6.png">
<img src="https://img.ceclinux.org/e9/8eb572f3ec6c8eb95a17fb417f48f2454424cd.png">
<img src="https://img.ceclinux.org/a5/031aab06b1104603cf6a167aa20d67b40781a1.png">

For most networks, as $\textit{k}$ gets larger, authority and hub scores converge to a unique value. 

### HITS Algorithm NetworkX
1. nx.hits(G) to compute the hub and authority scores of network G
2. outputs: two dictionaries, keyed by node, with the hub and authority scores of the nodes.


## Centrality Example
<img src="https://img.ceclinux.org/3f/45edfb53de1ad32806d77111cabd524f3d466e.png">

### Summary
1. No pair of centrality measures produces the exact smae ranking fo nodes, but they have some commonalities.
2. Centrality measures make different assumptions about what it meas to be a "central" node. Thus, they produce different rankings.
3. Teh best centrality measure depends on the context of the network one is analyzing.
4. When identifying central nodes, it is usually best to use multiple centrality measures instead of relying on a single one. 
