Skip to content

python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge πŸ“‘

Notifications You must be signed in to change notification settings

adidottxt/wikipedia-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Wikipedia Web Crawler

Simple Python web crawler using BeautifulSoup to test the theory that clicking on the first link (that is not a translation or italicized) on 97% of Wikipedia pages will lead to the page for Knowledge. The final page used to be the page for Philosophy, but this has recently changed to the page for Knowledge.

The program starts with a random Wikipedia article, finds and opens the first Wikipedia link in the body of the given Wikipedia URL, then finds and opens the first Wikipedia link in the body of that URL (and so on) until one of three possibilities occur ––

  1. The pre-determined "target URL" is hit. In this case, the target URL is https://en.wikipedia.org/wiki/Knowledge.
  2. The pre-determined "maximum links" number is hit. In this case, this is specified to be 25 links.
  3. The last link opened has already been opened as part of this exercise –– ergo, the program has hit a cycle.

To run the program, download the wiki-web-crawler.py file to your main user folder, and open the file in Terminal:

Python3 wiki-crawler.py

This program was built as the final project for the Introduction to Python course on Udacity.

About

python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge πŸ“‘

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages