### Google page rank
__MATH 420__ <br>
_Spring 2021_ <br>


We would like a way to assign a numerical value, let's call it the _rank,_ that corresponds to the _popularity_ of a web page. We'll describe a method that is the basis of the Google page rank.

To start our thinking about this, let's imagine that a popular page, say the _Wall Street Journal_ (WSJ), has a link to the _Kearney Hub._  The editor of the _Hub_ will be _thrilled_ with the traffic that might result from being linked from such a popular web page. But if the _Kearney Hub_ links to the _Wall Street Journal,_ I'd guess that the editors of the WSJ would barely notice. So our first insight is that the rank of a page depends on the ranks of the pages that link to it. If a highly ranked page links to the _Kearney Hub,_ for example, it raises the rank of the _Hub._  But if a lowly ranked page links to the _Hub,_ it doesn't affect the rank of the _Hub_ all that much. 

Our second insight is that if a page links to many pages, that diminishes the influence of a link. A visitor to a page that links to a million other pages, for example, might click on any one of a million links, but a visitor to a page that only links to just ten pages has a good chance of visiting one of these ten pages. So the more links a page has, the less influence it has on the ranks of the pages it links to. We can think of each link from a web page as a vote, with the weight of each vote as $1/n$, where $n$ is the number of links from a web page. Thus, the sum of all the votes from each web page is one. 

Given these insights, let's define the _rank_ of a page to equal the weighted sum of the ranks that link to it. Again, the weight of each link is the reciprocal of the number of pages it links to. So if a page links to $10$ other pages, the weight of each link is $1/10$.

Let's take an example. Let's suppose we have four web pages labeled $A,B,C$, and $D$. 

Suppose pages $B$ and $C$ link to page $A$, and suppose page $B$ has a weight of $1/2$ (that is, it links to a total of two pages), and page $C$ has a weight of $1/3$ (thus page $C$ links to three pages). The rank of $A$ satisfies
$$
  \text{rank}(A) = \frac{1}{2}  \text{rank} (B) + \frac{1}{3}  \text{rank}(C).
$$

For the rank of $B$, suppose that pages $A$ and $C$ link to page $B$. And suppose the weight of page $A$ is $1/2$. Then
$$
  \text{rank}(B) = \frac{1}{2}  \text{rank} (A) + \frac{1}{3} \text{rank} (C).
$$
For the other two pages, let's suppose that 
$$
  \text{rank}(C) = \frac{1}{2}  \text{rank} (B) + \text{rank} (D) ,
$$
$$
\text{rank}(D) = \frac{1}{2}  \text{rank} (A) +  \frac{1}{3} \text{rank} (C). 
$$

In matrix notation, our equations are
$$
  \begin{bmatrix} A \\ B \\ C \\ D \end{bmatrix} = \begin{bmatrix} 0 & 1/2 & 1/3 & 0 \\ 1/2 & 0 & 1/3 & 0 \\ 0 & 1/2 & 0 & 1 \\ 1/2 & 0 & 1/3 & 0 \end{bmatrix} \begin{bmatrix} A \\ B \\ C \\ D \end{bmatrix}. 
$$
Here I tired of writing $\text{rank}(A)$, so I wrote $A$ instead; and similarly for the other variables.  We're expressed these linear equations in the form of a fixed point problem. Latter we'll use this fact as a way to find the ranks.

The coefficient matrix has several nice properties: (a) every column sum is one and (b) all entries are in the interval $[0,1]$. Such a matrix is called a _Markov_ matrix. (See, for example, https://en.wikipedia.org/wiki/Stochastic_matrix). 

Alternatively, subtracting the left and right sides of this equation resuts in
$$
  \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} = 
  \begin{bmatrix} -1 & 1/2 & 1/3 & 0 \\ 
                   1/2 & -1 & 1/3 & 0 \\ 
                   0 & 1/2 & -1 & 1 \\ 
                   1/2 & 0 & 1/3 & -1 \end{bmatrix} \begin{bmatrix} A \\ B \\ C \\ D \end{bmatrix}. 
$$
The equations for the unknowns $A, B, C$, and $D$ are homogeneous and linear. Accordingly, any multiple of a solution provides another solution. This freedom allows us to require that the sum of the ranks have a specific value. Requiring that sum of the ranks be one,
our linear equations are
$$
  \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1\end{bmatrix} = 
  \begin{bmatrix} -1 & 1/2 & 1/3 & 0 \\ 
                   1/2 & -1 & 1/3 & 0 \\ 
                   0 & 1/2 & -1 & 1 \\ 
                   1/2 & 0 & 1/3 & -1 \\
                   1 & 1 & 1 & 1  \\
                   \end{bmatrix} \begin{bmatrix} A \\ B \\ C \\ D \end{bmatrix}. 
$$
These equations are _not_ homogeneous and they are _over determined_ (the number of unknowns is greater than the number of knowns).  It is _not_ certain that the equations have a solution, and if they do have a solution, it's possible that some of the ranks will be negative. Following the logic of how these equations were determined, a negative rank doesn't make much sense.

The Peron-Forbenious theorem comes to the rescue. This theorem tells us that for any Markov matrix $M$, there is a vector $\mathbf{x}$ such that $M \mathbf{x} = \mathbf{x}$
and $1 = \sum x_k$, where each $x_k$ is nonnegative.  More generally, if $\mathbf{x}$ 
is a nonzero vector and $\lambda$ is a number such that $M \mathbf{x} = \lambda \mathbf{x}$, we say that $\mathbf{x}$ is an eigenvector of the matrix $M$ with eigenvalue $\lambda$.
The Peron-Forbenious theorem tells us that is $M$ is a Markov matrix, then $M$ has at least one eigvector with eigenvalue one.


Let's have Julia solve the eigenvalue problem for us. We'll need the package `LinearAlgebra`. 

In [45]:
using LinearAlgebra

We define the matrix `M` and the right-hand side `b` by hand. After that, we can 
use Julia's `\` operator to solve the over determined linear system.

In [46]:
M = [-1 1/2 1/3 0; 1/2 -1 1/3 0; 0 1/2 -1 1; 1/2 0 1/3 -1; 1 1 1 1];

In [47]:
b = [0 ; 0 ; 0 ; 0 ; 1];

In [48]:
M \ b

4-element Vector{Float64}:
 0.22222222222222204
 0.22222222222222224
 0.3333333333333334
 0.22222222222222227

Indeed, each solution is nonnegative, as the theory requires. For the theory, see https://en.wikipedia.org/wiki/Perron%E2%80%93Frobenius_theorem .

As an alternative to solving the linear equations, let's use fixed point iteration to solve the equations. Here is a quickly written recursive method:

In [49]:
function fixed_point(M, x0, tol, iter=0)
    if iter  < 100
       x1 = M * x0
       if norm(x1-x0, Inf) < tol x1 else fixed_point(M, x1, tol,iter+1) end
    else
        error("Fixed point sequence doesn't seem to converge")
    end
end

fixed_point (generic function with 2 methods)

Let's try it--we'll try an initial point of $[1,0,0,0]$.

In [65]:
M = [0 1/2 1/3 0; 1/2 0 1/3 0; 0 1/2 0 1; 1/2 0 1/3 0];

In [66]:
fixed_point(M, [1; 0 ; 0; 0], 1.0e-6)

4-element Vector{Float64}:
 0.22222228348255157
 0.22222229093313217
 0.3333331346511841
 0.22222229093313217

And let us try a different starting value. We get the same fixed point.

In [67]:
fixed_point(M, [1/4; 1/4 ; 1/4; 1/4], 1.0e-6)

4-element Vector{Float64}:
 0.22222232818603516
 0.22222232818603516
 0.33333301544189453
 0.22222232818603516

These numbers are familiar! This is exactly the result we got by solving the
linear equations using the `\` operator.  For our matrix, we can show that if the sum of the members of `x` is one, the sum of the members of $Mx$ is also one. Since started with a vector whose sum of components was one, the method returns a vector that also has a sum of components of one.

We've described what Wikipedia (https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm) refers to the _simplified version_.  

The Patent for the Google Page Rank (https://patentimages.storage.googleapis.com/db/8f/cb/dad63e985797ec/US7058628.pdf) replaces the Markov matrix $M$ for the simplified version by 
$$
  \frac{\alpha}{N} I  + (1 - \alpha) M,
$$
where $N$ is the number of nodes, $I$ is an identity matrix, and $\alpha \in [0,1]$. In general, this is _not_ a Markov matrix, and its largest eigenvalue (called the _dominant eigenvalue_) is strictly less than one. Actually, all eigenvalues are inside the unit circle; consequently, it can be shown that _every_ fixed point sequence converges to the zero vector. And that would make the page rank of every page equal zero. Since every fixed point sequence converges to the zero vector when $\alpha < 1$, generally $\alpha$ is called a _damping factor._ 

But buried in the Patent application is 

"_Note that in order to ensure convergence, the norm of p, must be made equal to 1  after each iteration_"

And this means that the original method modifies the fixed point sequence by dividing each term fixed point sequence by a norm (which norm, the one, two, or infinity, doesn't matter). This is known as the power method for finding the dominant eigenvalue (see https://en.wikipedia.org/wiki/Power_iteration).

With or without a damping factor, the matrix used can have two or more linearly independent eigenvectors corresponding to the eigenvalue with the greatest magnitude. This happens, for example, when there are two or more nonempty disjoint sets of web pages (call them clusters) that are linked to other members of the subset, but not other clusters. Here is an example

In [68]:
M = [0 1 0 0; 1 0 0 0; 0 0 0 1; 0 0 1 0];

When the starting value is $[1,0,0,0]$, we get an error because the fixed point sequence doesn't converge.

In [69]:
fixed_point(M, [1; 0 ; 0; 0], 1.0e-2)

ErrorException: Fixed point sequence doesn't seem to converge

But changing to a starting value of $[1/4,1/4,1/4, 1/4]$, the fixed point sequence converges.

In [70]:
fixed_point(M, [0.25; 0.25 ; 0.25; 0.25], 1.0e-2)

4-element Vector{Float64}:
 0.25
 0.25
 0.25
 0.25

Calling these pages $A$ though $D$, we see that $A$ and $B$ are linked and $C$ and $D$ are linked, but these two sets of nodes aren't linked together. What about the eigenvalues?

In [71]:
x = eigen(M)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
4-element Vector{Float64}:
 -0.9999999999999989
 -0.9999999999999989
  1.0
  1.0
vectors:
4×4 Matrix{Float64}:
  0.707107   0.0       0.707107  0.0
 -0.707107   0.0       0.707107  0.0
  0.0        0.707107  0.0       0.707107
  0.0       -0.707107  0.0       0.707107

Ha! There are two eigenvectors with eigenvalue 1. Using one eigenvector, the rank of $A$ and $B$ tie, but the ranks of $C$ and $D$ are zero.  And the other eigenvector swaps this. 

In [72]:
x.values

4-element Vector{Float64}:
 -0.9999999999999989
 -0.9999999999999989
  1.0
  1.0

In [73]:
x.vectors

4×4 Matrix{Float64}:
  0.707107   0.0       0.707107  0.0
 -0.707107   0.0       0.707107  0.0
  0.0        0.707107  0.0       0.707107
  0.0       -0.707107  0.0       0.707107

Including a damping factor does still gives two linearly independent eigenvectors corresponding to the dominant eigenvalue.  

One way to fix this is to have a fictitious ''super node'' that is linked to every page and every page is linked to the super node. Effectively, the super node idea then includes the possibility that a user will visit a page by entering a url instead of randomly clicking.

Google ranks, I suppose, tens of billions (maybe trillions?) or so of pages. Finding _all_ the eigenvalues of such a huge matrix isn't, I think, possible. And it isn't needed either. The iterative process can be done quickly to find the page rank. Reasonable estimates are that it takes Google a few weeks to construct the graph of links and a few days to compute the page ranks.

The Google Page Rank is named partially in honor of Larry _Page,_ one of the co-founders of Google, not as you might guess after web _page._ But the idea of using eigenvalues to rank options was popularized by the mathematician Thomas Saaty _decades_ before Google used it to rank pages. Stigler's law of eponymy, says that "states that no scientific discovery is named after its original discoverer." (https://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy). And so it is with the Google Page Rank.

The history of the concept of an eigenvalue goes back to at least Euler (1707 – 1783). About 230 years after Euler used eigenvectors and eigenvalues to describe the motion of ridged bodies, Larry Page used the same concept to launch one of the largest companies of all time.

For more information, see https://en.wikipedia.org/wiki/PageRank ; and see https://en.wikipedia.org/wiki/Thomas_L._Saaty

Here is an example that shows that including a damping factor does not alter the fact that $M$ has two linearly independent eigenvectors corresponding to the dominant eigenvalue:

In [58]:
alpha = 0.15

0.15

In [59]:
N = 4

4

In [60]:
xx = alpha/ N * I + (1-alpha) * M

4×4 Matrix{Float64}:
 0.0375  0.85    0.0     0.0
 0.85    0.0375  0.0     0.0
 0.0     0.0     0.0375  0.85
 0.0     0.0     0.85    0.0375

In [61]:
eigen(xx)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
4-element Vector{Float64}:
 -0.8124999999999989
 -0.8124999999999989
  0.8875
  0.8875
vectors:
4×4 Matrix{Float64}:
  0.707107   0.0       0.707107  0.0
 -0.707107   0.0       0.707107  0.0
  0.0        0.707107  0.0       0.707107
  0.0       -0.707107  0.0       0.707107