Skip to content

A program to find the shortest path between two Wikipedia articles.

License

Notifications You must be signed in to change notification settings

MRegirouard/WikiHopper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiHopper

A program to find the shortest path between two Wikipedia articles.

image

Usage

  1. Clone the repository
  2. Run make.
    On the first run, this will download and parse the Wikipedia dump. This will take a long time.
  3. Run ./WikiHopper
    This will first load the aforementioned file, which will take a long time.

This process will take a long time, so it is recommended to run the following install script:
git clone https://github.com/MRegirouard/WikiHopper.git && cd WikiHopper && make -j3 && ./WikiHopper
Note that only three make jobs need to be run, as there are only a few files to build.

The program will promt for two article names, and then will find the shortest path from the start to the end article, traversing via page links.
Note that some paths may not still work, as Wikipedia is often changed, and the dataset is from 2018.

Fun Paths to Try:

"Water" to "Computer"

  1. Water
  2. Aquarium
  3. Computer

"Green" to "1483"

  1. Green
  2. Ancient Egypt
  3. Levant
  4. 15th century
  5. 1483

"Carpet" to "Linux"

  1. Carpet
  2. Afghanistan
  3. Voice of America
  4. Linux

Method

The program uses a iterative breadth-first search to traverse the article graph, and a hashmap for fast retreival. Articles are first loaded from the text file into an Article object, which contains the article name, and a vector of pointers to other Article objects that this one links to within Wikipedia. These objects are then put into an unordered_map, so they can be found in constant time to determine if an article exists or not. The breadth-first search then finds the shortest path between nodes, in a relatively fast manner.

About

A program to find the shortest path between two Wikipedia articles.

Topics

Resources

License

Stars

Watchers

Forks