## PageRank: Step-by-step
This is a barebones PageRank implementation that creates the full n-by-n matrix of links (where n = the number of nodes in your graph). In practice, you will not want to do this.

In [None]:
# the random walker follows links with probability = d
d = 0.75

In [None]:
import numpy as np

# represent your graph as a link matrix
# each row represents a node, the entries represent the probability 
# of following a link to another node
LinkMatrix = np.matrix(( (0.0, 0.0, 1.0, 0.0, 0.0), 
                         (0.5, 0.0, 0.5, 0.0, 0.0), 
                         (0.0, 0.0, 0.0, 0.5, 0.5),
                         (0.0, 0.0, 0.0, 0.0, 1.0), 
                         (0.0, 1.0, 0.0, 0.0, 0.0) ))  

In [None]:
# each row represents a node, the entries represent the probability
# of randomly teleporting to another node
Teleport = np.matrix(( (1./5, 1./5, 1./5, 1./5, 1./5),
                       (1./5, 1./5, 1./5, 1./5, 1./5),
                       (1./5, 1./5, 1./5, 1./5, 1./5),
                       (1./5, 1./5, 1./5, 1./5, 1./5),
                       (1./5, 1./5, 1./5, 1./5, 1./5) ))

In [None]:
# combine the two matrices to arrive at the probability that a
# random walker will move from one node to another (either by 
# following links or teleporting)
T = np.add (d * LinkMatrix, (1-d) * Teleport)
print(T)

### Let's find PageRank
The matrix T represents the Markov Chain associated with the random walker, but that's all. We need to find out where the random walker will be after many steps (that's the long-term steady state distribution of the walker = PageRank scores).

In [None]:
# let's define an initial distribution for our random walker
# all that matters is it sums to 1 (since it's a probability distribution)
x = [0.1, 0.1, 0.2, 0.3, 0.3]

In [None]:
# now we can update each step of the random walk
# the new distribution (x_new) is where we arrive after 
# taking one step according to T
x_new = x*T
print(x_new)
x = x_new

To find the long-term steady state distributon (PageRank scores), we need to keep stepping through the random walk until convergence. For this notebook, we can just do that by repeatedly executing the cell above.

## Topic-Sensitive PageRank

In [None]:
# For topic-sensitive PageRank we can replace the vanilla
# teleportation matrix with a "biased" one, where the bias
# is towards pages of a particular topic. 
SportsTeleport = np.matrix(( (1., 0., 0., 0., 0.),
                             (1., 0., 0., 0., 0.),
                             (1., 0., 0., 0., 0.),
                             (1., 0., 0., 0., 0.),
                             (1., 0., 0., 0., 0.) ))

# Here, we find a version of T that is biased towards sports pages
T_sports = np.add (d * LinkMatrix, (1-d) * SportsTeleport)
print(T_sports)

In [None]:
# just as before, we can define an initial distribution
x_sports = [0.2, 0.2, 0.2, 0.2, 0.3]

In [None]:
# then iterate to find where the sports-walker ends up
x_sports_new = x_sports*T_sports
print(x_sports_new)
x = x_sports_new