Skip to content

pagerank

kent edited this page Dec 19, 2019 · 2 revisions

Introduction

PageRank is an algorithm used to estimate how important a vertex is in the graph. It was named after Larry Page, the inventor of the algorithm, who is also the one of the founders of Google. First, it was invented to rank web pages which are linked to others on world wide web. The basic idea of PageRank is that it estimates the importance of the page by counting the number and quality of hyperlinks to this page. The motivation of this idea is that more important web page are likely to receive more links from other pages. The algorithm intrinsically abstract a giant directed graph from the world wide web. While the edges of the graph are converted by a defined Markov process to a transition probability matrix. The result of algorithm is the solution of stationary distribution of the Markov process. More details in latest version of algorithm includes the random surfer and smallest value setting on Wikipedia page. (https://en.wikipedia.org/wiki/PageRank ) We can find the important node in the graph by this algorithm. The algorithm works by estimating probability on each graph node and approximating to the stationary distribution of Markov process in an iteration procedure. The estimated presicion of result are strongly determinated by the round of algorithm iteration. The stopping time of algorithm can also be controlled by the convergence threshold parameter. Undirected graph edge will be treated as two-way directed edge in algorithm implementation. (Paper: The PageRank Citation Ranking: Bringing Order to the Web - http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf )

Parameters

use --help param to view detailed help information.

Input Format

Input files should be formatted as follows:

<src>,<dst>

where <src> and <dst> are integers of type uint32_t, representing the end nodes of an edge. Note that Plato treats every input graph as undirected by default. For a directed graph, please ensure both <A, B> and <B, A> appear in the input file if they exist. Edges that appear more than once will be considered as multiple edges between the same pair of nodes.

Input example (Following numbers are synthetic and are for demonstration purpose only.):

4564,823192
823192,973033
1713,823192

Output Format

Output files are formatted as follows:

<vertex_id>,<pagerank_value>

where <vertex_id> represents a node and <pagerank_value> gives the PageRank value of node.

Output example (Following numbers are synthetic and are for demonstration purpose only.):

4564,0.2552393450530435
823192,0.5215760529282608

Source code

https://github.com/Tencent/plato/blob/master/plato/algo/pagerank

Algorithms to open source:

  • Network Embedding
    • LINE
    • Word2Vec
    • GraphVite
  • GNN
    • GCN
    • GraphSage
Clone this wiki locally