It is an implementation of Google Research Paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine" on Wikipedia dataset along with intense optimizations and data parallelism.
It can efficiently index 60GB of wikipedia data dump containing millions of articles in 3 hours on an average laptop. It uses all system resources and eliminates the bottlenecks.
I have used wikipedia dump (60GB) xml format. It can found on this link: https://en.wikipedia.org/wiki/Wikipedia:Database_download#XML_schema