# PageRank

In [2]:
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

Let $G = (V,E)$ be a directed graph on the vertices $1,2,\dotsm,n$. We add self-loops to every vertex that doesn't have an outgoing edge so that
$$
\delta^+(v) = \#\{u\in V: (v,u) \in E\}\ge 1.
$$

Recall that the adjacency matrix 
$$
A = (a_{ij})_{i,j\in[n]} \qquad a_{ij} = \begin{cases}
1&: (i,j)\in E\\
0&:\text{else}
\end{cases},
$$
and the matrix $D$ is
$$
D = \text{diag}(A\boldsymbol{1}) = \begin{bmatrix}
\delta^+(1)&0&\dotsm&0\\
0&\delta^+(2) &\dotsm&0\\
\vdots&&\ddots&\vdots\\
0&\dotsm&0&\delta^+(n)
\end{bmatrix}.
$$
Note that $D$ is invertible as all the diagonal entries $\delta^+(j)\ge 1$ are never zero.

Then the matrix
$$
P = D^{-1} A
$$
is the transition matrix of a random walk on the graph $G$.

PageRank will be a vector $\mathbf{PR} = (\mathbf{PR}_1,\dotsm,\mathbf{PR}_n)$ which measures in some sense the "centrality" of the webpage $j\in[n]$. 

A first guess may be that a good choice is a stationary distribution $\boldsymbol{\pi}$ of $P$:
$$
\pi_i = \sum_{j=1}^n \pi_j p_{ji} = \sum_{j: (j,i)\in E} \pi_j \frac{1}{\delta^+(j)}.
$$
Unfortunately, there could be infinitely many such vectors as we do not know whether or not $P$ is irreducible or not.

One way to overcome this is to **force** somehow the transition matrix $P$ to become irreducible. 

We know that if $P$ and $P'$ are two stochastic matrices that 
$$
\alpha P + (1-\alpha)P'\qquad \alpha\in[0,1]
$$ is stochastic (HW problem) and so may we can average $P$ with a *nice* matrix $P'$ to make a new matrix, $Q$, that is irreducible.

That is we define
$$
Q = \alpha P + (1-\alpha) \frac{1}{n} \boldsymbol{1}\boldsymbol{1}^T.
$$

Note $$
 \boldsymbol{1}\boldsymbol{1}^T = \begin{bmatrix}
 1&1&1&\dotsm&1\\
 1&1&1&\dotsm&1\\
 \vdots&&\ddots&&\vdots\\
 1&1&\dotsm&1&1
 \end{bmatrix}
$$
which has row sums of $n$. So $\frac{1}{n} \boldsymbol{1}\boldsymbol{1}^T$ is stochastic!

If $\alpha\in(0,1)$ then $Q_{i,j}\ge \frac{1-\alpha}{n}>0$ no matter what $P$ was, and so 
$$
Q\qquad\text{is weakly lazy and irreducible}.
$$
Hence it is also aperiodic.

Finally $\mathbf{PR}$ is defined as the unique stationary distribution of $Q$.