Skip to content

Implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.

Notifications You must be signed in to change notification settings

ZdsAlpha/SearchEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large-Scale Search Engine

It is an implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.

It can efficiently index 60GB of wikipedia data dump containing millions of articles in 3 hours on an average laptop. It uses all system resources and eliminates the bottlenecks.

Dataset

I have used wikipedia dump (60GB) xml format. It can found on this link: https://en.wikipedia.org/wiki/Wikipedia:Database_download#XML_schema

About

Implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published