Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 1.52 KB

README.md

File metadata and controls

17 lines (11 loc) · 1.52 KB

Trying to find and analyse the least viewed articles on English Wikipedia. See my blog for a writeup of this investigation, In search of the least viewed article on Wikipedia.

Data pipeline

In the course of this investigation, I looked at a few different sets of articles. In each case, the steps for processing them was basically the same.

The first step is to use Quarry to run a SQL query which generates a csv file with page metadata. The main datasets and corresponding queries were:

The next step is to run get_views.py, passing in the filename of the csv downloaded from quarry. This will create a csv having a column with article name, plus 12 columns having monthly page views in 2021 for that article, with a final convenience column having the total for the year.

merge.py merges the csv's from steps 1 and 2.

The subsequent analysis and visualization of the merged data is done in the included ipython notebooks.