# Article Page Views API Example
This example illustrates how to access page view data using the [Wikimedia REST API](https://www.mediawiki.org/wiki/Wikimedia_REST_API). This example shows how to request monthly counts of page views for one specific article. The API documentation, [pageviews/per-article](https://wikimedia.org/api/rest_v1/#/Pageviews%20data), covers additional details that may be helpful when trying to use or understand this example.

## License
This code example was developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.1 - May 5, 2022



In [2]:
# 
# These are standard python modules
import json, time, urllib.parse
#
# The 'requests' module is not a standard Python module. You will need to install this with pip/pip3 if you do not already have it
import requests

import pandas as pd

In [3]:
#Get the list of dinosaur names from the csv

df = pd.read_csv("dinosaur_genera.cleaned.SEPT.2022 - dinosaur_genera.cleaned.SEPT.2022.csv")
# ' '.join(df['name'].tolist())

In [9]:
#########
#
#    CONSTANTS
#

# The REST API 'pageviews' URL - this is the common URL/endpoint for all 'pageviews' API requests
API_REQUEST_PAGEVIEWS_ENDPOINT = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/'

# This is a parameterized string that specifies what kind of pageviews request we are going to make
# In this case it will be a 'per-article' based request. The string is a format string so that we can
# replace each parameter with an appropriate value before making the request
API_REQUEST_PER_ARTICLE_PARAMS = 'per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}'

# The Pageviews API asks that we not exceed 100 requests per second, we add a small delay to each request
API_LATENCY_ASSUMED = 0.002       # Assuming roughly 2ms latency on the API and network
API_THROTTLE_WAIT = (1.0/100.0)-API_LATENCY_ASSUMED

# When making a request to the Wikimedia API they ask that you include a "unique ID" that will allow them to
# contact you if something happens - such as - your code exceeding request limits - or some other error happens
REQUEST_HEADERS = {
    'User-Agent': '<yangj98@uw.edu>, University of Washington, MSDS DATA 512 - AUTUMN 2022',
}

df = pd.read_csv("dinosaur_genera.cleaned.SEPT.2022 - dinosaur_genera.cleaned.SEPT.2022.csv")
ARTICLE_TITLES = df['name']

# This is just a list of English Wikipedia article titles that we can use for example requests
# ARTICLE_TITLES = [ 'Bison', 'Northern flicker', 'Red squirrel', 'Chinook salmon', 'Horseshoe bat' ]

# This template is used to map parameter values into the API_REQUST_PER_ARTICLE_PARAMS portion of an API request. The dictionary has a
# field/key for each of the required parameters. In the example, below, we only vary the article name, so the majority of the fields
# can stay constant for each request. Of course, these values *could* be changed if necessary.
ARTICLE_PAGEVIEWS_PARAMS_TEMPLATE = {
    "project":     "en.wikipedia.org",
    "access":      "mobile",      # this should be changed for the different access types
    "agent":       "user",
    "article":     "",             # this value will be set/changed before each request
    "granularity": "monthly",
    "start":       "2015010100",
    "end":         "2022093000"    # this is likely the wrong end date
}


In [10]:
print(ARTICLE_PAGEVIEWS_PARAMS_TEMPLATE)

{'project': 'en.wikipedia.org', 'access': 'mobile', 'agent': 'user', 'article': '', 'granularity': 'monthly', 'start': '2015010100', 'end': '2022093000'}


The example relies on some constants that help make the code a bit more readable.

The API request will be made using one procedure. The idea is to make this reusable. The procedure is parameterized, but relies on the constants above for the important parameters. The underlying assumption is that this will be used to request data for a set of article pages. Therefore the parameter most likely to change is the article_title.

In [11]:
#########
#
#    PROCEDURES/FUNCTIONS
#

def request_pageviews_per_article(article_title = None, 
                                  endpoint_url = API_REQUEST_PAGEVIEWS_ENDPOINT, 
                                  endpoint_params = API_REQUEST_PER_ARTICLE_PARAMS, 
                                  request_template = ARTICLE_PAGEVIEWS_PARAMS_TEMPLATE,
                                  headers = REQUEST_HEADERS):
    # Make sure we have an article title
    if not article_title: return None
    
    # Titles are supposed to have spaces replaced with "_" and be URL encoded
    article_title_encoded = urllib.parse.quote(article_title.replace(' ','_'))
    request_template['article'] = article_title_encoded
    
    # now, create a request URL by combining the endpoint_url with the parameters for the request
    request_url = endpoint_url+endpoint_params.format(**request_template)
    
    # make the request
    try:
        # we'll wait first, to make sure we don't exceed the limit in the situation where an exception
        # occurs during the request processing - throttling is always a good practice with a free
        # data source like Wikipedia - or other community sources
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response


In [12]:
# print("Getting pageview data for: ",ARTICLE_TITLES)
views = {}
for article in ARTICLE_TITLES:
    views[article] = request_pageviews_per_article(article)
    print(article, " done!")


“Coelosaurus” antiquus  done!
Aachenosaurus  done!
Aardonyx  done!
Abdarainurus  done!
Abditosaurus  done!
Abelisaurus  done!
Abrictosaurus  done!
Abrosaurus  done!
Abydosaurus  done!
Acantholipan  done!
Acanthopholis  done!
Achelousaurus  done!
Acheroraptor  done!
Achillesaurus  done!
Achillobator  done!
Acristavus  done!
Acrocanthosaurus  done!
Acrotholus  done!
Actiosaurus  done!
Adamantisaurus  done!
Adasaurus  done!
Adelolophus  done!
Adeopapposaurus  done!
Adratiklit  done!
Adynomosaurus  done!
Aegyptosaurus  done!
Aeolosaurus  done!
Aepisaurus  done!
Aepyornithomimus  done!
Aerosteon  done!
Afromimus  done!
Afrovenator  done!
Agathaumas  done!
Aggiosaurus  done!
Agilisaurus  done!
Agnosphitys  done!
Agrosaurus  done!
Agujaceratops  done!
Agustinia  done!
Ahshislepelta  done!
Ajkaceratops  done!
Ajnabia  done!
Akainacephalus  done!
Alamosaurus  done!
Alaskacephale  done!
Albalophosaurus  done!
Albertaceratops  done!
Albertadromeus  done!
Albertavenator  done!
Albertonykus  done!


Elaltitan  done!
Elaphrosaurus  done!
Elemgasem  done!
Elmisaurus  done!
Elopteryx  done!
Elrhazosaurus  done!
Emausaurus  done!
Embasaurus  done!
Enigmosaurus  done!
Eoabelisaurus  done!
Eocarcharia  done!
Eocursor  done!
Eodromaeus  done!
Eolambia  done!
Eomamenchisaurus  done!
Eoraptor  done!
Eosinopteryx  done!
Eotrachodon  done!
Eotriceratops  done!
Eotyrannus  done!
Eousdryosaurus  done!
Epachthosaurus  done!
Epanterias  done!
Epichirostenotes  done!
Epidexipteryx  done!
Equijubus  done!
Erectopus  done!
Erketu  done!
Erliansaurus  done!
Erlikosaurus  done!
Erythrovenator  done!
Eshanosaurus  done!
Eucamerotus  done!
Eucercosaurus  done!
Eucnemesaurus  done!
Eucoelophysis  done!
Euhelopus  done!
Euoplocephalus  done!
Eurolimnornis  done!
Euronychodon  done!
Europasaurus  done!
Europatitan  done!
Europelta  done!
Euskelosaurus  done!
Eustreptospondylus  done!
Fabrosaurus  done!
Falcarius  done!
Ferganasaurus  done!
Ferganocephale  done!
Ferrisaurus  done!
Foraminacephale  done!
Fo

Monolophosaurus  done!
Mononykus  done!
Montanoceratops  done!
Morelladon  done!
Morinosaurus  done!
Moros intrepidus  done!
Morrosaurus  done!
Mosaiceratops  done!
Mosasaur  done!
Murusraptor  done!
Mussaurus  done!
Muttaburrasaurus  done!
Muyelensaurus  done!
Mymoorapelta  done!
Naashoibitosaurus  done!
Nambalia  done!
Nankangia  done!
Nanningosaurus  done!
Nanosaurus  done!
Nanshiungosaurus  done!
Nanuqsaurus  done!
Nanyangosaurus  done!
Napaisaurus  done!
Narambuenatitan  done!
Narindasaurus  done!
Nasutoceratops  done!
Navajoceratops  done!
Nebulasaurus  done!
Nedcolbertia  done!
Nedoceratops  done!
Neimongosaurus  done!
Nemegtomaia  done!
Nemegtonykus  done!
Nemegtosaurus  done!
Neosodon  done!
Neovenator  done!
Neuquenraptor  done!
Neuquensaurus  done!
Ngwevu  done!
Nhandumirim  done!
Niebla antiqua  done!
Nigersaurus  done!
Ningyuansaurus  done!
Ninjatitan  done!
Niobrarasaurus  done!
Nipponosaurus  done!
Noasaurus  done!
Nodocephalosaurus  done!
Nodosaurus  done!
Nomingia  don

Tataouinea  done!
Tatisaurus  done!
Taurovenator  done!
Taveirosaurus  done!
Tazoudasaurus  done!
Technosaurus  done!
Tecovasaurus  done!
Tehuelchesaurus  done!
Teinurosaurus  done!
Teleocrater  done!
Telmatosaurus  done!
Tendaguria  done!
Tengrisaurus  done!
Tenontosaurus  done!
Teratophoneus  done!
Teratosaurus  done!
Termatosaurus  done!
Terminocavus  done!
Tethyshadros  done!
Texacephale  done!
Texasetes  done!
Thanatotheristes  done!
Thanos simonattoi  done!
Thecocoelurus  done!
Thecodontosaurus  done!
Thecospondylus  done!
Theiophytalia  done!
Therapsid  done!
Therizinosaurus  done!
Theropoda  done!
Thescelosaurus  done!
Thespesius  done!
Tianchisaurus  done!
Tianyulong  done!
Tianyuraptor  done!
Tianzhenosaurus  done!
Tichosteus  done!
Tienshanosaurus  done!
Timimus  done!
Timurlengia  done!
Titanoceratops  done!
Titanosaurus  done!
Tlatolophus  done!
Tochisaurus  done!
Tonganosaurus  done!
Tongtianlong  done!
Tornieria  done!
Torosaurus  done!
Torvosaurus  done!
Tototlmimus  do

In [16]:
#print(json.dumps(views,indent=4))
# print("Have %d months of pageview data"%(len(views['items'])))
print(views)
# for month in views:
#     print(json.dumps(month,indent=4))
    

{'“Coelosaurus” antiquus': {'type': 'https://mediawiki.org/wiki/HyperSwitch/errors/bad_request', 'title': 'Invalid parameters', 'method': 'get', 'detail': 'data.params.access should be equal to one of the allowed values: [all-access, desktop, mobile-app, mobile-web]', 'uri': '/wikimedia.org/v1/metrics/pageviews/per-article/en.wikipedia.org/mobile/user/%E2%80%9CCoelosaurus%E2%80%9D_antiquus/monthly/2015010100/2022093000'}, 'Aachenosaurus': {'type': 'https://mediawiki.org/wiki/HyperSwitch/errors/bad_request', 'title': 'Invalid parameters', 'method': 'get', 'detail': 'data.params.access should be equal to one of the allowed values: [all-access, desktop, mobile-app, mobile-web]', 'uri': '/wikimedia.org/v1/metrics/pageviews/per-article/en.wikipedia.org/mobile/user/Aachenosaurus/monthly/2015010100/2022093000'}, 'Aardonyx': {'type': 'https://mediawiki.org/wiki/HyperSwitch/errors/bad_request', 'title': 'Invalid parameters', 'method': 'get', 'detail': 'data.params.access should be equal to one 

Above output should show dictionaries with views per month

In [48]:
#Create the monthly desktop users

with open("dino_monthly_mobile_<start201501>-<end202209>.json", "w") as outfile:
    json.dump(views['items'], outfile)

None
