# Page Rank

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages.

## Algorithm

The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.

## Example

This article is based on the following YouTube Video.

[PageRank Algorithm - Example(YouTube)](https://www.youtube.com/watch?v=P8Kt6Abq_rM)

## Explore

Suppose we have the following graph. Each note represents a page.

![image](page_rank/start.png)

### Iteration 0

At the very beginning(Iteration 0) the probability that a user randomly click of the pages is 1/4.

### Iteration 1

At Iteration 1, the user is to click on one of the links on the page and go to another page.

**For Page A**

![image](page_rank/iteration_1_a.jpg)

The only visit is from Page C, and C points to A, B and D, so the posibility that the user goes from C to A is 1/3. If we consider the posibility of the user starts at C (aka Iteration 0, 1/4), the overall posibility is 1/3 mutiply 1/4 = 1/12=0.08333333333333333.

**For Page B**, let's look at the following picture.

![image](page_rank/iteration_1_b.jpg)

The visit may comes from A or C. If it comes from A, the posibilty from A to B is 1/2, because A also points to C. Or:

P(B|A) = 1/2

And because in Iteration 0, P(A) = 1/4, So the posibility is P(B|A) * P(A) = 1/2 * 1/4 = 1/8.

If it comes from C, the equation is P(B|C) * P(C) = 1/3 * 1/4 = 1/12.

We add them up, so in Iteration 1, P(B|I1) = P(B|A) * P(A) + P(B|C) * P(C) = 1/8 + 1/12 = 2.5/12 = 0.20833333333333334.

Similarily, we have the following equation for **C**:

![image](page_rank/iteration_1_c.jpg)

P(C|I1) = P(C|A) * P(A) + P(C|D) * P(D) = 1/2 * 1/4 + 1 * 1/4 = 1/8+1/4=4.5/12=0.375

For **D**

P(D|I1) = P(D|B) * P(B) + P(D|C) * P(C) = 1 * 1/4 + 1/3 * 1/4 = 1/4+1/12=4/12=0.3333333333333333

So Iteration 1 result is :

[0.08333333333333333, 0.20833333333333334, 0.375, 0.3333333333333333]

### Iteration 2



For Iteration 2 (I2), we calculate the results based on Iteration 1 (I1).

The posibility of 

**visiting A**

![image](page_rank/iteration_2_a.jpg)

P(A|I2) = P(C|A) * P(C|I1) = 1/3 * 4.5/12 = 1.5/12

**Visiting B**

P(B|I2) = P(B|A) * P(A|I1) + P(B|C) * P(C|I1) = 1/2 * 1/12 + 1/3 * 4.5/12= 2/12

**Visiting C**

P(C|I2) = P(C|A) * P(A|I1) + P(C|D) * P(D|I1)  = 1/2 * 1/12 = 4.5/12


**Visiting D**

P(D|I2) = P(D|B) * P(B|I1) + P(D|C) * P(C|I1)  = 1 * 2.5/12 + 1/3 * 4.5/12= 4/12

So Iteration 2 result is :

[0.125, 0.16666666666666666, 0.375, 0.3333333333333333]

the final page rank is [1, 2, 4, 3]

![image](page_rank/final.jpg)

In [1]:
import numpy as np

In [2]:
n_iterations=3
n_nodes=4

In [3]:
#graph
graph = np.zeros((n_nodes, n_nodes))
#direction[start_node,end_node]=1
graph[0,1]=1
graph[0,2]=1
graph[1,3]=1
graph[2,0]=1
graph[2,1]=1
graph[2,3]=1
graph[3,2]=1
graph

array([[0., 1., 1., 0.],
       [0., 0., 0., 1.],
       [1., 1., 0., 1.],
       [0., 0., 1., 0.]])

In [4]:
#page rank matrix
pr_matrix = np.zeros((n_iterations, n_nodes))
#iteration 0
pr_matrix[0] = [1/n_nodes] * n_nodes
print('Page rank in Iteration 0')
print(pr_matrix[0])

Page rank in Iteration 0
[0.25 0.25 0.25 0.25]


In [5]:
#iteration 1,2
for i_iteration in [1,2]:
    print(f'Page rank in Iteration {i_iteration}')
    for node in range(n_nodes):
        pr=0
        for previous_node in range(n_nodes):
            if graph[previous_node, node]==1:
                pr+=pr_matrix[i_iteration-1,previous_node]/graph[previous_node, :].sum()
        pr_matrix[i_iteration, node] = pr
    print(pr_matrix[i_iteration])

Page rank in Iteration 1
[0.08333333 0.20833333 0.375      0.33333333]
Page rank in Iteration 2
[0.125      0.16666667 0.375      0.33333333]
