# A1: Data Curation Assignment
### Data 512: Human Centered Data Science
#### Aaliyah Hänni
#### 10/7/2021

## Project Description
The goal of this assignment is to construct, analyze, and publish a dataset of monthly traffic on English Wikipedia from January 1 2008 through August 30 2021. 

## Data Documentation


### Wikimedia REST API

In order to measure Wikipedia traffic from 2008-2021, you will need to collect data from two different API endpoints, the Legacy Pagecounts API and the Pageviews API.The Legacy Pagecounts API provides access to desktop and mobile traffic data from December 2007 through July 2016.The Pageviews API provides access to desktop, mobile web, and mobile app traffic data from July 2015 through last month.

Legacy Pagecounts API Documentation: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts

Pageviews API Documentation:https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews

Wikimedia REST APIEndpoint: https://wikimedia.org/api/rest_v1/#/Pagecounts_data_(legacy)/get_metrics_legacy_pagecounts_aggregate_project_access_site_granularity_start_end

For each API, you will need to collect data for all months where data is available and then save the raw results into 5 separate JSON source data files (one file per API query type) before continuing to Step 2.

## Data Acquisition

In [1]:
import json
import requests

In [2]:
endpoint_legacy = 'https://wikimedia.org/api/rest_v1/metrics/legacy/pagecounts/aggregate/{project}/{access-site}/{granularity}/{start}/{end}'

endpoint_pageviews = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/{project}/{access}/{agent}/{granularity}/{start}/{end}'

In [19]:
# Parameters for getting aggregated legacy view data 
# see: https://wikimedia.org/api/rest_v1/#!/Legacy_data/get_metrics_legacy_pagecounts_aggregate_project_access_site_granularity_start_end
params_legacy_mobile = {"project" : "en.wikipedia.org",
                 "access-site" : "mobile-site",
                 "granularity" : "monthly",
                 "start" : "2008010100",
                # for end use 1st day of month following final month of data
                 "end" : "2016080100"
                    }

params_legacy_desktop = {"project" : "en.wikipedia.org",
                 "access-site" : "desktop-site",
                 "granularity" : "monthly",
                 "start" : "2008010100",
                # for end use 1st day of month following final month of data
                 "end" : "2016080100"
                    }

# Parameters for getting aggregated current standard pageview data
# see: https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_aggregate_project_access_agent_granularity_start_end
params_pageviews_mobile_app = {"project" : "en.wikipedia.org",
                    "access" : "mobile-app",
                    "agent" : "user",
                    "granularity" : "monthly",
                    "start" : "2015070100",
                    # for end use 1st day of month following final month of data
                    "end" : '2021090100'
                        }
params_pageviews_mobile_web = {"project" : "en.wikipedia.org",
                    "access" : "mobile-web",
                    "agent" : "user",
                    "granularity" : "monthly",
                    "start" : "2015070100",
                    # for end use 1st day of month following final month of data
                    "end" : '2021090100'
                        }
params_pageviews_desktop = {"project" : "en.wikipedia.org",
                    "access" : "desktop",
                    "agent" : "user",
                    "granularity" : "monthly",
                    "start" : "2015070100",
                    # for end use 1st day of month following final month of data
                    "end" : '2021090100'
                        }

# Customize these with your own information
headers = {
    'User-Agent': 'https://github.com/aaliyahfiala42',
    'From': 'fialaa@uw.edu'
}

In [20]:
def api_call(endpoint,parameters):
    call = requests.get(endpoint.format(**parameters), headers=headers)
    response = call.json()
    
    return response

In [21]:
#Get monthly pages views for mobile from Legacy API
monthly_legacy_mobile = api_call(endpoint_legacy, params_legacy_mobile)

#Get monthly pages views for desktop from Legacy API
monthly_legacy_desktop = api_call(endpoint_legacy, params_legacy_desktop)

In [22]:
#print(monthly_legacy_mobile)
#print(monthly_legacy_desktop)

In [23]:
#Get monthly pages views for mobile apps from Pageviews API
monthly_pageviews_mobile_app = api_call(endpoint_pageviews, params_pageviews_mobile_app)

#Get monthly pages views for mobile web from Pageviews API
monthly_pageviews_mobile_web = api_call(endpoint_pageviews, params_pageviews_mobile_web)

#Get monthly pages views for desktop from Pageviews API
monthly_pageviews_desktop = api_call(endpoint_pageviews, params_pageviews_desktop)


In [24]:
#print(monthly_pageviews_mobile_app)
#print(monthly_pageviews_mobile_web)
#print(monthly_pageviews_mobile_desktop)

In [25]:
#Saving data from APIs to JSON files in the format apiname_accesstype_firstmonth-lastmonth.json

with open('legacy_mobile-site_200801-201607.json', 'w') as outfile:
    json.dump(monthly_legacy_mobile, outfile)
    
with open('legacy_desktop-site_200801-201607.json', 'w') as outfile:
    json.dump(monthly_legacy_desktop, outfile)

with open('pageviews_mobile-app_201507-202108.json', 'w') as outfile:
    json.dump(monthly_pageviews_mobile_app, outfile)

with open('pageviews_mobile-web_201507-202108.json', 'w') as outfile:
    json.dump(monthly_pageviews_mobile_web, outfile)
    
with open('pageviews_desktop_201507-202108.json', 'w') as outfile:
    json.dump(monthly_legacy_desktop, outfile)

## Data Processing


## Data Analysis
