# English Wikipedia page views, 2008 - 2017

For this assignment, your job is to analyze traffic on English Wikipedia over time, and then document your process and the resulting dataset and visualization according to best practices for open research that were outlined for you in class.

### Example API request
You can use this example API request as a starting point for building your API queries. Note that the [Legacy Pagecounts API](https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts) has slightly different schema than the [pageview API](https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews) shown here.

This sample API request would get you all pageviews by web crawlers on the mobile website for English Wikipedia during the month of September, 2017.

Scrape Pageview Mobile Site traffic (current api)

In [99]:
#current
import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/{access}/{agent}/{granularity}/{start}/{end}'

headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'mobile-web',
            'agent' : 'user',
            'granularity' : 'monthly',
            'start' : '2015070100',
            'end' : '2017091000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pageview_mobile_site = api_call.json()
print(pageview_mobile_site)


{'items': [{'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015070100', 'views': 3179131148}, {'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015080100', 'views': 3192663889}, {'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015090100', 'views': 3073981649}, {'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015100100', 'views': 3173975355}, {'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015110100', 'views': 3142247145}, {'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015120100', 'views': 3276836351}, {'project': 'en.wikipedia', 'access': 'mobile-web', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2016010100', 'vi

Scrape Pageview Mobile App traffic (current api)

In [98]:
#current
import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/{access}/{agent}/{granularity}/{start}/{end}'

headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'mobile-app',
            'agent' : 'user',
            'granularity' : 'monthly',
            'start' : '2015070100',
            'end' : '2017091000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pageview_mobile_app = api_call.json()
print(pageview_mobile_app)

{'items': [{'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015070100', 'views': 109624146}, {'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015080100', 'views': 109669149}, {'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015090100', 'views': 96221684}, {'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015100100', 'views': 94523777}, {'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015110100', 'views': 94353925}, {'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015120100', 'views': 99438956}, {'project': 'en.wikipedia', 'access': 'mobile-app', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2016010100', 'views': 1064

Scrape Pageview Desktop Site traffic (current api)

In [82]:
#current
import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/{access}/{agent}/{granularity}/{start}/{end}'

headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'desktop',
            'agent' : 'user',
            'granularity' : 'monthly',
            'start' : '2015070100',
            'end' : '2017091000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pageview_desktop_site = api_call.json()
print(pageview_desktop_site)

{'items': [{'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015070100', 'views': 4376666686}, {'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015080100', 'views': 4332482183}, {'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015090100', 'views': 4485491704}, {'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015100100', 'views': 4477532755}, {'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015110100', 'views': 4287720220}, {'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015120100', 'views': 4100012037}, {'project': 'en.wikipedia', 'access': 'desktop', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2016010100', 'views': 4436179457}, {'

Scrape Pageview All Site traffic (current api)

In [81]:
#current
import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/{access}/{agent}/{granularity}/{start}/{end}'

headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'all-access',
            'agent' : 'user',
            'granularity' : 'monthly',
            'start' : '2015070100',
            'end' : '2017091000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pageview_all_site = api_call.json()
print(pageview_all_site)

{'items': [{'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015070100', 'views': 7665421980}, {'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015080100', 'views': 7634815221}, {'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015090100', 'views': 7655695037}, {'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015100100', 'views': 7746031887}, {'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015110100', 'views': 7524321290}, {'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2015120100', 'views': 7476287344}, {'project': 'en.wikipedia', 'access': 'all-access', 'agent': 'user', 'granularity': 'monthly', 'timestamp': '2016010100', 'vi

Scrape Pagecount desktop Site traffic (legacy api)

In [62]:
#Legacy

import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/legacy/pagecounts/aggregate/{project}/{access}/{granularity}/{start}/{end}'
headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'desktop-site',
            'granularity' : 'monthly',
            'start' : '2008010100',
            'end' : '2016071000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pagecounts_desktop_site = api_call.json()
print(pagecounts_desktop_site)


{'items': [{'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008010100', 'count': 4930902570}, {'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008020100', 'count': 4818393763}, {'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008030100', 'count': 4955405809}, {'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008040100', 'count': 5159162183}, {'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008050100', 'count': 5584691092}, {'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008060100', 'count': 5712104279}, {'project': 'en.wikipedia', 'access-site': 'desktop-site', 'granularity': 'monthly', 'timestamp': '2008070100', 'count': 5306302874}, {'project': 'en.wikipedia', 'access-site': 'desktop

Scrape Pagecount all Site traffic (legacy api)

In [89]:
#Legacy

import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/legacy/pagecounts/aggregate/{project}/{access}/{granularity}/{start}/{end}'
headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'all-sites',
            'granularity' : 'monthly',
            'start' : '2008010100',
            'end' : '2016071000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pagecounts_all_sites = api_call.json()
print(pagecounts_all_sites)

{'items': [{'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008010100', 'count': 4930902570}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008020100', 'count': 4818393763}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008030100', 'count': 4955405809}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008040100', 'count': 5159162183}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008050100', 'count': 5584691092}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008060100', 'count': 5712104279}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': 'monthly', 'timestamp': '2008070100', 'count': 5306302874}, {'project': 'en.wikipedia', 'access-site': 'all-sites', 'granularity': '

Scrape Pagecount mobile Site traffic (legacy api)

In [63]:
#Legacy

import requests

endpoint = 'https://wikimedia.org/api/rest_v1/metrics/legacy/pagecounts/aggregate/{project}/{access}/{granularity}/{start}/{end}'
headers={'User-Agent' : 'https://github.com/your_github_username', 'From' : 'abhiv@uw.edu'}

params = {'project' : 'en.wikipedia.org',
            'access' : 'mobile-site',
            'granularity' : 'monthly',
            'start' : '2008010100',
            'end' : '2016071000'#use the first day of the following month to ensure a full month of data is collected
            }

api_call = requests.get(endpoint.format(**params))
pagecounts_mobile_site = api_call.json()
print(pagecounts_mobile_site)

{'items': [{'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2014100100', 'count': 3091546685}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2014110100', 'count': 3027489668}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2014120100', 'count': 3278950021}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2015010100', 'count': 3485302091}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2015020100', 'count': 3091534479}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2015030100', 'count': 3330832588}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 'granularity': 'monthly', 'timestamp': '2015040100', 'count': 3222089917}, {'project': 'en.wikipedia', 'access-site': 'mobile-site', 

In [29]:
countresponse['items']

[{'access-site': 'all-sites',
  'count': 7546488744,
  'granularity': 'monthly',
  'project': 'en.wikipedia',
  'timestamp': '2010090100'}]

In [37]:
countresponse['items'][0]['count']

7546488744

In [47]:
viewresponse['items'][3]['views']

3416989181

In [70]:
pageview_mobile_site['items'][3]['timestamp']

'2015100100'

Make a dictionary and add relevant info. timestamp = key, counts = value. Repeat for all 

In [75]:
#pageview_mobile_site


i = 0
dict_pageview_mobile_site={}
for x in pageview_mobile_site['items']:
    dict_pageview_mobile_site[pageview_mobile_site['items'][i]['timestamp']] = pageview_mobile_site['items'][i]['views']
    
    i+=1


In [76]:
dict_pageview_mobile_site

{'2015070100': 3179131148,
 '2015080100': 3192663889,
 '2015090100': 3073981649,
 '2015100100': 3173975355,
 '2015110100': 3142247145,
 '2015120100': 3276836351,
 '2016010100': 3611404079,
 '2016020100': 3242448142,
 '2016030100': 3288785117,
 '2016040100': 3177044999,
 '2016050100': 3296294723,
 '2016060100': 3257882479,
 '2016070100': 3395175122,
 '2016080100': 3418646794,
 '2016090100': 3310247842,
 '2016100100': 3442109005,
 '2016110100': 3507421156,
 '2016120100': 3647567822,
 '2017010100': 4020148351,
 '2017020100': 3522702265,
 '2017030100': 3719395296,
 '2017040100': 3524571150,
 '2017050100': 3567882051,
 '2017060100': 3404097346,
 '2017070100': 3600941034,
 '2017080100': 3502234506}

In [77]:
#pageview_desktop_site


i = 0
dict_pageview_desktop_site={}
for x in pageview_desktop_site['items']:
    dict_pageview_desktop_site[pageview_desktop_site['items'][i]['timestamp']] = pageview_desktop_site['items'][i]['views']
    
    i+=1

In [78]:
dict_pageview_desktop_site

{'2015070100': 4376666686,
 '2015080100': 4332482183,
 '2015090100': 4485491704,
 '2015100100': 4477532755,
 '2015110100': 4287720220,
 '2015120100': 4100012037,
 '2016010100': 4436179457,
 '2016020100': 4250997185,
 '2016030100': 4286590426,
 '2016040100': 4149383857,
 '2016050100': 4191778094,
 '2016060100': 3888839711,
 '2016070100': 4337865827,
 '2016080100': 4695046216,
 '2016090100': 4135006498,
 '2016100100': 4361737690,
 '2016110100': 4392068236,
 '2016120100': 4209608578,
 '2017010100': 4521980398,
 '2017020100': 4026702163,
 '2017030100': 4319971902,
 '2017040100': 3951456992,
 '2017050100': 4187870579,
 '2017060100': 3604550997,
 '2017070100': 3565444544,
 '2017080100': 3575572313}

In [83]:
#pageview_all_site


i = 0
dict_pageview_all_site={}
for x in pageview_all_site['items']:
    dict_pageview_all_site[pageview_all_site['items'][i]['timestamp']] = pageview_all_site['items'][i]['views']
    
    i+=1

In [85]:
dict_pageview_all_site

{'2015070100': 7665421980,
 '2015080100': 7634815221,
 '2015090100': 7655695037,
 '2015100100': 7746031887,
 '2015110100': 7524321290,
 '2015120100': 7476287344,
 '2016010100': 8154016303,
 '2016020100': 7585859457,
 '2016030100': 7673274617,
 '2016040100': 7408147859,
 '2016050100': 7586811330,
 '2016060100': 7243630656,
 '2016070100': 7834439589,
 '2016080100': 8210865519,
 '2016090100': 7528292279,
 '2016100100': 7871021581,
 '2016110100': 7983113161,
 '2016120100': 7986152433,
 '2017010100': 8753941940,
 '2017020100': 7738463562,
 '2017030100': 8223465891,
 '2017040100': 7591080111,
 '2017050100': 7874558299,
 '2017060100': 7123934190,
 '2017070100': 7290503797,
 '2017080100': 7196978615}

In [92]:
#pagecount_all_site


i = 0
dict_pagecounts_all_sites={}
for x in pagecounts_all_sites['items']:
    dict_pagecounts_all_sites[pagecounts_all_sites['items'][i]['timestamp']] = pagecounts_all_sites['items'][i]['count']
    
    i+=1

In [93]:
dict_pagecounts_all_sites

{'2008010100': 4930902570,
 '2008020100': 4818393763,
 '2008030100': 4955405809,
 '2008040100': 5159162183,
 '2008050100': 5584691092,
 '2008060100': 5712104279,
 '2008070100': 5306302874,
 '2008080100': 5140155519,
 '2008090100': 5479533823,
 '2008100100': 5679440782,
 '2008110100': 5415832071,
 '2008120100': 5211708451,
 '2009010100': 5802681551,
 '2009020100': 5547320860,
 '2009030100': 6295159057,
 '2009040100': 5988817321,
 '2009050100': 6267516733,
 '2009060100': 5818924182,
 '2009070100': 5801646978,
 '2009080100': 5790850384,
 '2009090100': 4057515768,
 '2009100100': 6016107147,
 '2009110100': 5768486910,
 '2009120100': 5426505977,
 '2010010100': 5703465285,
 '2010020100': 5762451418,
 '2010030100': 6661347946,
 '2010040100': 6618552152,
 '2010050100': 6410578775,
 '2010060100': 4898035014,
 '2010070100': 5296177638,
 '2010080100': 7381346660,
 '2010090100': 7546488744,
 '2010100100': 10172844562,
 '2010110100': 6948678354,
 '2010120100': 7001952100,
 '2011010100': 7568511227,


In [95]:
i = 0
dict_pagecounts_desktop_site={}
for x in pagecounts_desktop_site['items']:
    dict_pagecounts_desktop_site[pagecounts_desktop_site['items'][i]['timestamp']] = pagecounts_desktop_site['items'][i]['count']
    
    i+=1

In [96]:
i = 0
dict_pagecounts_mobile_site={}
for x in pagecounts_mobile_site['items']:
    dict_pagecounts_mobile_site[pagecounts_mobile_site['items'][i]['timestamp']] = pagecounts_mobile_site['items'][i]['count']
    
    i+=1

In [101]:
i = 0
dict_pageview_mobile_app={}
for x in pageview_mobile_app['items']:
    dict_pageview_mobile_app[pageview_mobile_app['items'][i]['timestamp']] = pageview_mobile_app['items'][i]['views']
    
    i+=1

In [105]:
dict_pageview_mobile = {}
i=0
for x in pageview_mobile_app['items']:
    dict_pageview_mobile[pageview_mobile_app['items'][i]['timestamp']] = pageview_mobile_app['items'][i]['views']+pageview_mobile_site['items'][i]['views']
    
    i+=1
