## Assumption

For each node take the difference in PR score between the current iteration and the last iteration, if this error falls below a certain point the graph has converged.

Starting from arbitrary values assigned to each node in the graph, the computation iterates until convergence below a given threshold is achieved.

[6]Convergence is achieved when the error rate for any vertex in the graph falls below a given threshold value. The error rate of a vertex comuted by difference between the “real” score of the vertex PR(Vi) and the score computed at iteration I, PR^I(Vi) . The error rate is approximated at PR^(I+1)(Vi)+ PR^(I)(Vi).


The computation of PR has no issue, if disregard scales. As damping factor increases, the rate of convergence also increases.

The PageRank algorithm was designed for directed graphs. For this study, we will be using only directed graphs generated from NetworkX library. We will use damping factor as 0.85 and number of iterations as 100.


The PageRank algorithm was designed for directed graphs. There are several factors


The output (Numpy matrix) represents the transition matrix that describes the Markov chain used in PageRank. For PageRank to converge to a unique solution that there must be exists a path between every pair of nodes in the graph. Otherwise, there is a risk of being invalidated PR rank.

In [None]:
def pageRank(n, d, I):
    # Step 1: Initialize the PageRank of every node with a value of 1/n | O(n)
    PR = np.ones(n)/n

    # Step 2: For each iteration, update the PageRank of every node in the graph.
    for i_t in range(I):  # O(k) where k is number of iteration
        
        # 2-1: For the first page, it only processes through random walk. 
        rand = 1 / n  # assign value: O(1)
        PR[0] = d * rand  # assign value & computation: O(1)

        #  2-2: For other pages, they can process through random walk or inter-page links.
        for i in range(1, n):  # O(n) where n is number of pages
            
            # 2 - 3: Sum up the proportional rank from all of its in-neighbors 
            i_prop = PR[i-1] / 1 # assign value & computation: O(1)
            # 2 - 4: Update the PageRank with the weighted sum of proportional rank and random walk
            PR[i] = d * rand + (1-d) * i_prop # assign value & computation: O(1)

# Step 3: normalize PR when there is terminal point
    PR /= PR.sum()  # assign value & computation: O(1)
    return PR # returning value: O(1)


# print(pageRank(10, 0.15, 50010))
'''
References
----------
[1]“Networkx.algorithms.link_analysis.pagerank_alg — NetworkX 2.8.5 Documentation.” 
Networkx.org, 2022, networkx.org/documentation/stable/_modules/networkx/algorithms/link_analysis/pagerank_alg.html#pagerank. 
Accessed 24 July 2022.
'''

    """Returns the PageRank of the nodes in the graph.

    PageRank computes a ranking of the nodes in the graph G based on
    the structure of the incoming links. It was originally designed as
    an algorithm to rank web pages.

    Parameters
    ----------
    G : graph
      A NetworkX graph.  Undirected graphs will be converted to a directed
      graph with two directed edges for each undirected edge.

    d : float, optional
      Damping factor for PageRank, default=0.85.

    personalization: dict, optional
      a nodes personalization value will be zero.
      By default, a uniform distribution is used.

    max_iter : integer, optional
      Maximum number of iterations in power method eigenvalue solver.

    tol : float, optional
      Error tolerance used to check convergence in power method solver.

    weight : weights are set to 1.

    dangling: dict, optional
      The outedges to be assigned to any "dangling" nodes, i.e., nodes without
      any outedges. 
      The dict key is the node the outedge points to and the dict
      value is the weight of that outedge. By default, dangling nodes are given
      outedges according to the personalization vector (uniform if not
      specified). This must be selected to result in an irreducible transition
      matrix. It may be common to have the
      dangling dict to be the same as the personalization dict.


    Returns
    -------
    pagerank : dictionary
       Dictionary of nodes with PageRank as value


# PageRank Implementation on other graphs

In [15]:
import networkx as nx


In [20]:
def pageRank_graph(G, d=0.85, I=100, tol=1.0e-6):
    if len(G) == 0:
            return {}

    D = G.to_directed()

    # Create a copy in (right) stochastic form
    W = nx.stochastic_graph(D)
    # get total number nodes of graph
    N = W.number_of_nodes()
    
    # Initialize the PageRank of every node with a value of 1/n | O(n) 
    '''
    x => PR
    '''
    PR = dict.fromkeys(W, 1.0 / N)
    
    # Assign uniform personalization vector
    p = dict.fromkeys(W, 1.0 / N)
    
    # Set dangling_weights to persolization vector
    dangling_weights = p
    dangling_nodes = [n for n in W if W.out_degree(n, weight=weight) == 0.0]
    
    # power iteration: make up to I iterations
    for _ in range(I):
        PRlast = PR
        PR = dict.fromkeys(PRlast.keys(), 0)
        danglesum = d * sum(PRlast[n] for n in dangling_nodes)
        for n in PR:
            # this matriPR multiply looks odd because it is
            # doing a left multiply PR^T=PRlast^T*W
            for _, nbr, wt in W.edges(n):
                PR[nbr] += d * PRlast[n] * wt
            PR[n] += danglesum * \
                dangling_weights.get(n, 0) + (1.0 - d) * p.get(n, 0)
        # check convergence, l1 norm
        err = sum(abs(PR[n] - PRlast[n]) for n in PR)
        if err < N * tol:
            return PR
    raise nx.PowerIterationFailedConvergence(I)

    
    
    

In [22]:
G = nx.DiGraph(nx.path_graph(4))
pr = _pagerank_python(G, alpha=0.9)
pr

{0: 0.17241401247723942,
 1: 0.32758598752276064,
 2: 0.32758598752276064,
 3: 0.17241401247723942}

## References
[1] A. Langville and C. Meyer,
    "A survey of eigenvector methods of web information retrieval."
    http://citeseer.ist.psu.edu/713792.html
[2] Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry,
    The PageRank citation ranking: Bringing order to the Web. 1999
    http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=1999-66&format=pdf
