Skip to content

RaphaelMasset/google-pagerank-algorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Page Ranking Algorithm

A Python implementation of the PageRank algorithm to evaluate the relative importance of pages in a corpus of HTML files.

Overview

  • Parse a directory of HTML pages to build a link graph (corpus).
  • Compute PageRank using:
    • Sampling (random surfer simulation).
    • Iterative update until convergence.
  • Handle dangling pages (pages without outgoing links).
  • Output normalized PageRank scores for each page.
  • Configurable damping factor and sample size.

Usage

Clone the repository and run the script with a directory containing HTML files.

Windows (PowerShell / CMD)

py pagerank.py corpus

macOS / Linux (Terminal)

python3 pagerank.py corpus

Example Session

$ python3 pagerank.py corpus0
PageRank Results from Sampling (n = 10000)
  1.html: 0.2200
  2.html: 0.3900
  3.html: 0.3900
PageRank Results from Iteration
  1.html: 0.2187
  2.html: 0.3906
  3.html: 0.3906

Notes

  • Default damping factor: 0.85
  • Default sample size: 10,000
  • Pure Python implementation; no external dependencies required

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published