# PageRank Revisited

A graph $G=(V,E)$ is a collection of vertices and edges.


<figure class="image" style="width:50%">
  <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/6n-graf.svg/1200px-6n-graf.svg.png" alt="Six circles, representing nodes, labeled A through F. There are directed arrows between certain pairs of nodes." width="300px">
  <figcaption><i>Example of a graph.  </i></figcaption>
</figure>

Graphs can model lots of things, social networks, molecules, air-traffic, musical plays, etc. In those contexts, the vertices might have a name like "Hamilton" or "LAX", but for today we will just number them $0,1,\ldots,N-1$. Depending on context, edges may or may not have a natural version of direction. (Facebook friends are a two-way street, twitter followers are not.) Today, to keep things simple, we will assume no direction.

In order to mathematically model a graph, we construct an `adjacency matrix`. $A$ by the rule $A[i,j]=1$ if user $i$ is friends with user $j$ and $A[i,j]=0$ otherwise.  Example:

In [1]:
import numpy as np 

A=np.array([[0,1,1,1],[1,0,1,0],[1,1,0,0],[1,0,0,0]])
A

There are four people in our social network. User 0 is friends with everybody. Users 1 and 2 are also firends. User 3 doesn't have any friends other than user 0 (Sad!).  

Who is the most popular person? To answer this question formally we construct the degree vector. 

\begin{equation*}
d[i]=\text{number of friends user i has}=\text{number of ones in row i}=\sum_{j}A[i,j]
\end{equation*}

How can we get this via numpy?

Now, let's develop a Random Walk matrix $P$. Suppose you were doing a random walk on the graph similar to the homework. We want 
\begin{equation*}
P[i,j]=\text{Probability that your next location is j given that your current location is i}
\end{equation*}

Looking at A below, what should the top row of $P$ be?

In [2]:
A

The formula should be 
\begin{equation*}
P[i,j]=\frac{A[i,j]}{d[i]}
\end{equation*}

How can we get this in python?

In [3]:
#Bad solution


In [4]:
#better solution attempt 1



Let's assume that the starting location $X_0$ is node zero. Then,

\begin{equation*}
\mathbb{P}(X_1=1)=\text{probability that you moved from space 0 to space 1}=P[0,1]=1/3
\end{equation*}

This is one step of a random walk. How can we compute the two-step transition matrix. 

\begin{align*}
&\mathbb{P}(X_2=1)\\
=&\mathbb{P}(X_2=1, X_1=0) + \mathbb{P}(X_2=1, X_1=1) + \mathbb{P}(X_2=1, X_1=2) + \mathbb{P}(X_2=1, X_1=3)\\
\end{align*}

How can we compute $\mathbb{P}(X_2=1,X_1=3)$? If both of these are true thhis means we took first a step from 0 to 3 then a step from 3 to 1. Therefore,

\begin{equation*}
\mathbb{P}(X_2=1,X_1=3)=P[0,3]P[3,1].
\end{equation*}

Using the above equation we have 
\begin{align*}
&\mathbb{P}(X_2=1)\\
=&\mathbb{P}(X_2=1, X_1=0) + \mathbb{P}(X_2=1, X_1=1) + \mathbb{P}(X_2=1, X_1=2) + \mathbb{P}(X_2=1, X_1=3)\\
=&P[0,0]P[0,1] + P[0,1]P[1,1] + P[0,2]P[2,1] + P[0,3]P[3,1] \\
=&\sum_k P[0,k]P[k,1]\\
\end{align*}

Recall that if $C=AB$ then 
\begin{equation*}
C[i,j]=\sum_k A[i,k]B[k,j]
\end{equation*}

Thus, 
\begin{equation*}
\mathbb{P}(X_2=1)
=P^2[0,1]
\end{equation*}

where $P^2$ is the square of $P$. 



Up above, we said we always started at 0. We can record this information with the vector $\mathbf{x}=(1,0,\ldots,0)$. 

What if we wanted to start completely at random? If instead we would use $\mathbf{x}=(1/N,\ldots,1/N)$

Then 
\begin{equation*}
\mathbb{P}(X_1=2) = \sum_k \mathbb{P}(X_1=2, X_0 = k) = \sum_k P[k,2]\mathbf{x}[k] = (P^T\mathbf{x})[2]
\end{equation*}


In general, if $x$ is the starting distibution

\begin{equation*}
(P^T)^t\mathbf{x}(j) = \mathbb{P}(X_t=j)
\end{equation*}

__What does this have to do with page rank?__

If we want to find the long-run average of the random walker we can multiply by a large power. Recall A is given by

In [5]:
A

__Something is wrong__

__What about teleporting?__

Page rank get's basically the same results, but is slighlty more `democratic`