Skip to content

colinmorris/wiki-pageview-floor

Repository files navigation

Trying to find and analyse the least viewed articles on English Wikipedia. See my blog for a writeup of this investigation, In search of the least viewed article on Wikipedia.

Data pipeline

In the course of this investigation, I looked at a few different sets of articles. In each case, the steps for processing them was basically the same.

The first step is to use Quarry to run a SQL query which generates a csv file with page metadata. The main datasets and corresponding queries were:

The next step is to run get_views.py, passing in the filename of the csv downloaded from quarry. This will create a csv having a column with article name, plus 12 columns having monthly page views in 2021 for that article, with a final convenience column having the total for the year.

merge.py merges the csv's from steps 1 and 2.

The subsequent analysis and visualization of the merged data is done in the included ipython notebooks.

About

Finding and analysing the least viewed articles on English Wikipedia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published