## **Session5: Working with web apis**

- Using api service which doesn't require any authentication
- Using api service which require authentication

**Using api service which doesn't require authentication**
Web apis are a very popular way of exposing data. So one of the motivations of learning to fetch data from the web-apis is to be able to fetch data. 

API access can be ```open``` or be behind some kind of ```authorisation```. We will first look at the open web api and how to access the data.

We will be using [this](https://covidtracking.com/data/api/version-2) as an example of data fetch from an open web api.

**API Endpoints**

An api endpoint is just a url which one needs to ```hit``` to get the data. For example [here](https://covidtracking.com/data/api/version-2) you can see that to access a single day of data we need to 
hit the following endpoint:

- URL path:/v2/us/daily/[date-iso-format].json
- Example: https://api.covidtracking.com/v2/us/daily/2021-01-02.json
- Simplified data:/v2/us/daily/[date-iso-format].json
- Simplified data example: https://api.covidtracking.com/v2/us/daily/2021-01-02/simple.json

Lets hit this endpoint.

In [1]:
import requests
import json
date = "2021-01-02"
base_url = "https://api.covidtracking.com"
end_point = f"/v2/us/daily/{date}.json"
url = base_url+end_point
resp = requests.get(url).json()

In [2]:
resp

{'meta': {'build_time': '2021-06-01T07:03:25.055Z',
  'license': 'CC-BY-4.0',
  'version': '2.0-beta',
  'field_definitions': [{'name': 'Total test results',
    'field': 'tests.pcr.total',
    'deprecated': False,
    'prior_names': ['totalTestResults']},
   {'name': 'Hospital discharges', 'deprecated': False, 'prior_names': []},
   {'name': 'Confirmed Cases',
    'field': 'cases.confirmed',
    'deprecated': False,
    'prior_names': ['positiveCasesViral']},
   {'name': 'Cumulative hospitalized/Ever hospitalized',
    'field': 'outcomes.hospitalized.total',
    'deprecated': False,
    'prior_names': ['hospitalizedCumulative']},
   {'name': 'Cumulative in ICU/Ever in ICU',
    'field': 'outcomes.hospitalized.in_icu',
    'deprecated': False,
    'prior_names': ['inIcuCumulative']},
   {'name': 'Cumulative on ventilator/Ever on ventilator',
    'field': 'hospitalization.on_ventilator.cumulative',
    'deprecated': False,
    'prior_names': ['onVentilatorCumulative']},
   {'name': 'Cur

As you can see that response is in a json format and it can be easily parsed to create a tabular view out of it. 

**Class Excercise**:
1. Use the all state metadata to create a tabular view of each state and its code.
2. Now write a function which takes the state code and gives the data for a given state in the following format


| date     | total_cases |confirmed_cases |tests_pcr_total|tests_antibody_total|hospitalized_currently|on_ventilator_currently|death_confirmed|
| ----------- | ----------- |-------|--------------|---------------|--------------------|--------|-------|

3. Using the function defined above and the state codes, collect the data for all the states and store it in a csv.

In [3]:
url = "https://api.covidtracking.com/v2/states.json"
resp = requests.get(url).json()

In [4]:
states = []
codes = []
for state in resp['data']:
    states.append(state['name'])
    codes.append(state['state_code'])
import pandas as pd
codes = pd.DataFrame({'state':states,'code':codes})

In [5]:
def get_state_data(state_code):
    url = f'https://api.covidtracking.com/v2/states/{state_code.lower()}/daily/simple.json'
    raw_data = requests.get(url).json()
    return raw_data
def parse_raw_data(raw_data):
    dates = []
    tot_cases = []
    confirmed_cases = []
    tests_pcr_total = []
    tests_antibody_total = []
    hosp_currently = []
    on_ventilator_currently = []
    deaths = []
    rel_data = raw_data['data']
    for d in rel_data:
        dates.append(d['date'])
        tot_cases.append(d['cases']['total'])
        confirmed_cases.append(d['cases']['confirmed'])
        tests_pcr_total.append(d['tests']['pcr']['total'])
        tests_antibody_total.append(d['tests']['antibody']['encounters']['total'])
        hosp_currently.append(d['outcomes']['hospitalized']['currently'])
        on_ventilator_currently.append(d['outcomes']['hospitalized']['on_ventilator']['currently'])
        deaths.append(d['outcomes']['death']['confirmed'])
    parsed_data = {'dates':dates,
                  'total_cases':tot_cases,
                  'confirmed_cases':confirmed_cases,
                  'tests_pcr_total':tests_pcr_total,
                  'tests_antibody_total':tests_antibody_total,
                  'hospitalized_currently':hosp_currently,
                  'on_ventilator_currently':on_ventilator_currently,
                  'deaths':deaths}
    return parsed_data

In [6]:
codes.head(2)

Unnamed: 0,state,code
0,Alaska,AK
1,Alabama,AL


In [7]:
raw_data = get_state_data('AK')

In [8]:
parsed_data = parse_raw_data(raw_data)

In [9]:
from tqdm import tqdm
tables = []
for state in tqdm(codes['code']):
    raw_data = get_state_data(state)
    parsed_data = parse_raw_data(raw_data)
    table = pd.DataFrame(parsed_data)
    table['state']=state
    tables.append(table)   

100%|███████████████████████████████████████████| 56/56 [00:27<00:00,  2.01it/s]


In [11]:
pd.concat([t for t in tables]).reset_index()

Unnamed: 0,index,dates,total_cases,confirmed_cases,tests_pcr_total,tests_antibody_total,hospitalized_currently,on_ventilator_currently,deaths,state
0,0,2021-03-07,56886.0,,1731628.0,,33.0,2.0,,AK
1,1,2021-03-06,56886.0,,1731628.0,,33.0,2.0,,AK
2,2,2021-03-05,56886.0,,1731628.0,,33.0,2.0,,AK
3,3,2021-03-04,56745.0,,1724484.0,,32.0,2.0,,AK
4,4,2021-03-03,56605.0,,1711018.0,,26.0,2.0,,AK
...,...,...,...,...,...,...,...,...,...,...
20775,367,2020-03-05,,,8.0,,,,,WY
20776,368,2020-03-04,,,4.0,,,,,WY
20777,369,2020-03-03,,,1.0,,,,,WY
20778,370,2020-03-02,,,1.0,,,,,WY


In [None]:
### 
nyt_key = "CpXgZMjl3kv23sSAKjwmXzvCRWjvCOjV"