# Page rank

## Introduction

All web pages doens't ahve the same importance.

For example has my portfolio <https://im-rises.github.io> more importance in referencing the `informatic` word than <https://www.microsoft.com/>

Answer : NO

All pages do not have the importance.

## The importance of a web page

A web's page importance depends on the number of pages referencing to him.

Each page has a number of successors. If the page A has an importance x with n successors, all pages referenced by will have x/n referencing importance.

![images/diagram_page_rank.png](images/diagram_page_rank.png)

So the importance of a page is defined by the sum of all predecessors' reference.

Example :

$$A = \frac{B}{2} + \frac{A}{2}$$
$$B = \frac{A}{2} + C$$
$$C = \frac{B}{2}$$

## Manual resolution

Because a website reference is defined by Its predecessor websites, the equations for each website is composed by the predecessor websites' influences.

First we check the graph and all the relations between each website :

![images/diagram_page_rank_2.png](images/diagram_page_rank_2.png)

Then we find write all reference value made by one website to another :

![images/diagram_page_rank_2.png](images/diagram_page_rank_3.png)

Equations :  
$$A = \frac{E}{3} + D$$  
$$B = A + \frac{E}{3}$$  
$$C = B$$  
$$D = \frac{C}{2} + \frac{E}{3}$$  
$$E = \frac{C}{2}$$  

With mathematical resolution we can find the referencing value of all the websites :

$$D = \frac{C}{2} + \frac{E}{3} = E + \frac{E}{3} = \frac{4E}{3}$$
$$A = \frac{E}{3} + D = \frac{E}{3} + \frac{4E}{3} = \frac{5E}{3}$$
$$B = A + \frac{E}{3} = \frac{5E}{3} + \frac{E}{3} = 2E$$
$$C = 2E = B$$

$$A + B + C + D + E = 1$$
$$<=> \frac{5E}{3} + 2E + 2E + \frac{4E}{3} + E = 1$$
$$<=> \frac{24E}{3} = 1$$
$$<=> E = \frac{3}{24}$$

$$A = \frac{5}{24}$$
$$B = \frac{3}{12}$$
$$C = \frac{3}{12}$$
$$D = \frac{2}{12}$$
$$E = \frac{3}{24}$$

The sum of all the websites values equals 1. This solution is possible but for more precision we are going to use the matrix method.


## Matrix resolution

The matrix is found by writing all different relations between all different websites.
To calculate our solution, we have one stochastic matrix M and a vector r.

The M matrix represent all relations between all websites with the column header representing the website and the row headers the successors.

For exemple the relation at M[0,3] is D --> A.

Stochastic matrix (M) :

$\left[\begin{array}{cccc}
0 & 0 & 0 & 1 & 1/3 \\
1 & 0 & 0 & 0 & 1/3 \\
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1/2 & 0 & 1/3 \\
0 & 0 & 1/2 & 0 & 0 \\
\end{array} \right]$


The vector (r) is sized n the number of websites. Each value in the vector is the same 1/n so in our case 1/5 because we have 5 websites.

$\left[ \begin{array}{cccc}
1/5 \\
1/5 \\
1/5 \\
1/5 \\
1/5 \\
\end{array} \right]$

To get our solution with the matrix method, we use the following algorithm :
• Initialisation : r0 = [1/N,....,1/N]T ;
• Iteration : rk+1 = M rk ;
• Stop when |rk+1 - rk|L1 < ε


In [11]:
import numpy as np
M = np.array(
   [
       [0,0,0,1,1/3],
       [1,0,0,0,1/3],
       [0,1,0,0,0],
       [0,0,1/2,0,1/3],
       [0,0,1/2,0,0]
   ] 
)

r0 = np.array(
    [1/5,1/5,1/5,1/5,1/5]
).transpose()

epsilon = 0.1
numIteration=0
doLoop=True

rk1=np.dot(M,r0)
print("L'itération a pour valeur r{}".format(numIteration) + " = " + np.array2string(rk1, precision=2, separator=',',suppress_small=True))

while doLoop:
    numIteration+=1
    rk0=rk1
    rk1=np.dot(M,rk1)
    print("L'itération a pour valeur r{}".format(numIteration) + " = " + np.array2string(rk1, precision=2, separator=',',suppress_small=True))
    doLoop = not (np.linalg.norm((rk1-rk0), ord=1) < epsilon)

    

L'itération a pour valeur r0 = [0.27,0.27,0.2 ,0.17,0.1 ]
L'itération a pour valeur r1 = [0.2 ,0.3 ,0.27,0.13,0.1 ]
L'itération a pour valeur r2 = [0.17,0.23,0.3 ,0.17,0.13]
L'itération a pour valeur r3 = [0.21,0.21,0.23,0.19,0.15]
L'itération a pour valeur r4 = [0.24,0.26,0.21,0.17,0.12]
L'itération a pour valeur r5 = [0.21,0.28,0.26,0.14,0.11]
L'itération a pour valeur r6 = [0.18,0.24,0.28,0.17,0.13]
L'itération a pour valeur r7 = [0.21,0.22,0.24,0.19,0.14]
L'itération a pour valeur r8 = [0.23,0.26,0.22,0.17,0.12]
L'itération a pour valeur r9 = [0.21,0.27,0.26,0.15,0.11]


The results of the calculus in the `Manual resolution` are equals to the algorithm's solution at iteration number 9 (r9).

$$A = \frac{5}{24} ≈ 0.21$$

$$B = \frac{3}{12} ≈ 0.27$$

$$C = \frac{3}{12} ≈ 0.26$$

$$D = \frac{2}{12} ≈ 0.15$$

$$E = \frac{3}{24} ≈ 0.11$$ 



## Go further

An issue can happen in some case, it is possible to get in a spider trap.

A spider trap is when a website is referenced by other but is not referencing anything at all.

This can cause issue in our precedents' calculus, to prevent this issue we will add teleportation.

To see how to implement it, follow the guide in the `page_rank_teleport.iypnb` file.
