### CS4423 - Networks
Angela Carnevale  
School of Mathematical and Statistical Sciences  
University of Galway

#### 6. Directed Networks

# PageRank. 

## History: chess tournaments

In his first mathematical paper in 1895, Edmund Landau suggested a new way to determine the winner in a chess tournament. Namely, one shouldn't just count wins and losses, but the quality of the player we win (or lose!) against should matter too... 

If this sounds familiar, it's because this is the first ever recorded use of **eigenvector** centrality. We have encountered this concept before when dealing with undirected networks. This is used, for instance, to determine the "SJR" index of scientific publications (based on the quality of journals).

Famously, this is also at the core of the main model of endorsement of web pages which search engines use.

## PageRank 

Based on the above discussion, a simpler model of endorsement for web pages involves only
one numerical value $r(p)$ per page $p$, built on the principle that
**a page is as important as the pages linking to it**.
As before, these importance values can be obtained by
repeatedly applying a suitable update rule to a set of current values.

Specifically, PageRank is computed as follows.

* If the network has $n$ nodes, each page $p$ receives an initial PageRank
of $r(p) = 1/n$.

* Choose a number of steps, $k$.

* Perform the following update rule $k$ times.

**Basic PageRank Update Rule:**
Each page divides its current PageRank by the number of
pages it links to, and passes this value on to those pages.
In this way, each page updates its PageRank to be the sum of
all the shares it receives from the pages linking to it.



As in each step, the total PageRank of all nodes is maintained
(each node splits its PageRank into equal parts and passes this on,
nothing is lost or gained overall), there is no need for normalization.

After a number of steps, the PageRank values of the individual nodes 
stabilize.  These values form an equilibrium in the sense that
another application of the update rule will produce exactly the same
values.

**Example.**  The following graph represents
a network of $8$ web pages with hyperlinks.

![pagerank](images/pagerank.png)

The following table shows how the initial PageRank
of $\frac18$ of each page is updated under six iterations
of the basic PageRank update rule
and, in the bottom row, the limit values.

![pagerank-p](images/pagerank-p.png)

For larger examples, one can implement this algorithm in `python`, as follows.

In [None]:
import networkx as nx
n, m = 80, 120
G = nx.gnm_random_graph(n, m, directed=True)

In [None]:
G.out_degree()

The simple algorithm doesn't work if there is a node $x$ with no successors in $G$. (Why?)
So for now, let's add some random edges to make sure each node $x$ has out-degree at least $1$.

In [None]:
import random

for x in G:
    y = x
    while y == x:
        y = random.randrange(n)
    G.add_edge(x, y)

In [None]:
def PageRank(G, k):
    n = G.order()
    r = { x: 1/n for x in G }
    for i in range(k):
        s = { x : 0 for x in G }
        for x in G:
            l = G.out_degree(x)
            v = r[x]/l
            for y in G.successors(x):
                s[y] += v
        r = s
    return r

In [None]:
k = 50
pr = PageRank(G, k)
[(k,pr[k]) for k in sorted(pr, key=pr.get)][:10]

In [None]:
[(k,pr[k]) for k in sorted(pr, key=pr.get, reverse=True)][:10]

In terms of matrix algebra this process can be described as follows.

###  Spectral Analysis of PageRank

Here, we use a **variant of the adjacency matrix** $A$ of the directed graph $G = (X, E)$.

Let $N$ be the $n \times n$ matrix with entries $N_{ij} = 0$
if node $x_j$ is not linked to node $x_i$ (as for the adjacency matrix $A$).
And when $x_j \to x_i$, then set $N_{ij} = 1/l_j$, 
where $l_j$ is the number of links out of $x_j$.

Write $r = (r_1, \dots, r_n)$ for the list of PageRank values of the nodes
$X = \{x_1, \dots, x_n\}$.  Then the **basic PageRank update rule**
can be expressed as **matrix multiplication**:
$$
r \gets N \,r.
$$

It can be shown that repeated application of the basic PageRank update rule
lets the PageRank values converge towards a vector $r^{\ast}$ with the property
$$
N\, r^{\ast} = r^{\ast},
$$
that is, $r^{\ast}$ is an **eigenvector** of $N$ for the eigenvalue $1$.

The argument uses the [Perron-Frobenius Theorem](https://en.wikipedia.org/wiki/Perron%E2%80%93Frobenius_theorem) which we have seen before (in week 6).  Recall that, for a matrix in which all entries are non-negative (such as the matrix $N^T$) the theorem guarantees the existence of a **real eigenvalue**
with corresponding **eigenvector having non-negative entries**.
(Not every matrix with real entries has this property.)