Wikipedia Web Crawler

Simple Python web crawler using BeautifulSoup to test the theory that clicking on the first link (that is not a translation or italicized) on 97% of Wikipedia pages will lead to the page for Knowledge. The final page used to be the page for Philosophy, but this has recently changed to the page for Knowledge.

The program starts with a random Wikipedia article, finds and opens the first Wikipedia link in the body of the given Wikipedia URL, then finds and opens the first Wikipedia link in the body of that URL (and so on) until one of three possibilities occur ––

The pre-determined "target URL" is hit. In this case, the target URL is https://en.wikipedia.org/wiki/Knowledge.
The pre-determined "maximum links" number is hit. In this case, this is specified to be 25 links.
The last link opened has already been opened as part of this exercise –– ergo, the program has hit a cycle.

To run the program, download the wiki-web-crawler.py file to your main user folder, and open the file in Terminal:

Python3 wiki-crawler.py

This program was built as the final project for the Introduction to Python course on Udacity.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
wiki-crawler.py		wiki-crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Web Crawler

About

Releases

Packages

Languages

adidottxt/wikipedia-crawler

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Web Crawler

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages