# Public Health England, Covid-19 API

We are going to work with API of Covid-19 in the UK provided by Public Health England. The API has a python wrapper, so we do not have to code by ourselves, but to get used to the process of accessing API, we first access to the API without the wrapper, and then use the package afterwords.

API documentation: https://coronavirus.data.gov.uk/details/developers-guide

API endpoint: https://api.coronavirus.data.gov.uk/v1/data

uk-covid-19 package documentation: https://pypi.org/project/uk-covid19/


## Load packages

In [2]:
from urllib.request import urlopen
import json
import gzip
import pandas as pd

## Load example URL

In [3]:
example_url = "https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=nation;areaName=england&structure={%22date%22:%22date%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22newCasesByPublishDate%22:%22newCasesByPublishDate%22,%22cumCasesByPublishDate%22:%22cumCasesByPublishDate%22,%22newDeathsByDeathDate%22:%22newDeathsByDeathDate%22,%22cumDeathsByDeathDate%22:%22cumDeathsByDeathDate%22}"

In [4]:
response = urlopen(example_url)
content_gz = response.read()
content_text = gzip.decompress(content_gz)
dic_nat = json.loads(content_text.decode('utf-8')) 

In [20]:
item = dic_nat['data'][0]
pd.DataFrame(item, index=[0])

Unnamed: 0,date,areaName,areaCode,newCasesByPublishDate,cumCasesByPublishDate,newDeathsByDeathDate,cumDeathsByDeathDate
0,2020-11-23,England,E92000001,13329,1314888,,


In [23]:
df_national = pd.concat([pd.DataFrame(item, index=[0]) for item in dic_nat['data']], ignore_index=True)
df_national

Unnamed: 0,date,areaName,areaCode,newCasesByPublishDate,cumCasesByPublishDate,newDeathsByDeathDate,cumDeathsByDeathDate
0,2020-11-23,England,E92000001,13329,1314888,,
1,2020-11-22,England,E92000001,16668,1301559,66,58428
2,2020-11-21,England,E92000001,17615,1284891,132,58362
3,2020-11-20,England,E92000001,17845,1267276,262,58230
4,2020-11-19,England,E92000001,20291,1249431,349,57968
...,...,...,...,...,...,...,...
321,2020-01-07,England,E92000001,0,,,
322,2020-01-06,England,E92000001,0,,,
323,2020-01-05,England,E92000001,0,,,
324,2020-01-04,England,E92000001,0,,,


In [None]:
#print(json.dumps(dic_nat, indent=4))

In [27]:
def getData(url):
    response = urlopen(url)
    content_gz = response.read()
    content_text = gzip.decompress(content_gz)
    dic_nat = json.loads(content_text.decode('utf-8'))
    df_national = pd.concat([pd.DataFrame(item, index=[0]) for item in dic_nat['data']], ignore_index=True)
    return(df_national)

In [28]:
getData(example_url)

Unnamed: 0,date,areaName,areaCode,newCasesByPublishDate,cumCasesByPublishDate,newDeathsByDeathDate,cumDeathsByDeathDate
0,2020-11-23,England,E92000001,13329,1314888,,
1,2020-11-22,England,E92000001,16668,1301559,66,58428
2,2020-11-21,England,E92000001,17615,1284891,132,58362
3,2020-11-20,England,E92000001,17845,1267276,262,58230
4,2020-11-19,England,E92000001,20291,1249431,349,57968
...,...,...,...,...,...,...,...
321,2020-01-07,England,E92000001,0,,,
322,2020-01-06,England,E92000001,0,,,
323,2020-01-05,England,E92000001,0,,,
324,2020-01-04,England,E92000001,0,,,


## Make functions to get the data

### Function to construct the filter string

- we will create a function to create a string like `areaType=nation;areaName=england`, from a dictionary `{'areaType':'nation', 'areaName': 'england'}`

In [35]:
def construct_filter(filter_dic):
    return(str.join(';', [key + '=' + value for key, value in filter_dic.items()]))

In [45]:
construct_filter({'areaType':'nation', 'areaName': 'england'})

'areaType=nation;areaName=scotland'

### Function to construct the URL

In [42]:
def construct_url(filters):
    filter_str = construct_filter(filters)
    url = "https://api.coronavirus.data.gov.uk/v1/data?filters="+filter_str+"&structure={%22date%22:%22date%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22newCasesByPublishDate%22:%22newCasesByPublishDate%22,%22cumCasesByPublishDate%22:%22cumCasesByPublishDate%22,%22newDeathsByDeathDate%22:%22newDeathsByDeathDate%22,%22cumDeathsByDeathDate%22:%22cumDeathsByDeathDate%22}"
    return(url)

In [49]:
current_url = construct_url({'areaType':'nation',
                            'areaName': 'england'})

In [50]:
getData(current_url)

Unnamed: 0,date,areaName,areaCode,newCasesByPublishDate,cumCasesByPublishDate,newDeathsByDeathDate,cumDeathsByDeathDate
0,2020-11-23,England,E92000001,13329,1314888,,
1,2020-11-22,England,E92000001,16668,1301559,66,58428
2,2020-11-21,England,E92000001,17615,1284891,132,58362
3,2020-11-20,England,E92000001,17845,1267276,262,58230
4,2020-11-19,England,E92000001,20291,1249431,349,57968
...,...,...,...,...,...,...,...
321,2020-01-07,England,E92000001,0,,,
322,2020-01-06,England,E92000001,0,,,
323,2020-01-05,England,E92000001,0,,,
324,2020-01-04,England,E92000001,0,,,


### Using the URL function, get the data again

### Function to get the data 

In [53]:
def get_data(filter_dic):
    new_url = construct_url(filter_dic)
    return(getData(new_url))

In [54]:
get_data({'areaType': 'ltla',
         'areaName': 'Colchester'})

Unnamed: 0,date,areaName,areaCode,newCasesByPublishDate,cumCasesByPublishDate,newDeathsByDeathDate,cumDeathsByDeathDate
0,2020-11-23,Colchester,E07000071,11,2086,,
1,2020-11-22,Colchester,E07000071,12,,0,165
2,2020-11-21,Colchester,E07000071,23,,0,165
3,2020-11-20,Colchester,E07000071,37,,1,165
4,2020-11-19,Colchester,E07000071,22,,0,164
...,...,...,...,...,...,...,...
321,2020-01-07,Colchester,E07000071,0,,,
322,2020-01-06,Colchester,E07000071,0,,,
323,2020-01-05,Colchester,E07000071,0,,,
324,2020-01-04,Colchester,E07000071,0,,,


## Use the package

In [8]:
#!pip install uk-covid-19

ERROR: Could not find a version that satisfies the requirement uk-covid-19 (from versions: none)
ERROR: No matching distribution found for uk-covid-19


In [9]:
from uk_covid19 import Cov19API

In [10]:
c_structure = {
    "date": "date",
    "areaName": "areaName",
    "areaCode": "areaCode",
    "newCasesByPublishDate": "newCasesByPublishDate",
    "cumCasesByPublishDate": "cumCasesByPublishDate",
    "newDeathsByDeathDate": "newDeathsByDeathDate",
    "cumDeathsByDeathDate": "cumDeathsByDeathDate"
}

In [11]:
c_filters = ['areaType=ltla',
             'date=2020-11-01']

In [12]:
api = Cov19API(filters=c_filters, structure=c_structure)
df_covid_2 = api.get_dataframe()

In [13]:
df_covid_2

Unnamed: 0,date,areaName,areaCode,newCasesByPublishDate,cumCasesByPublishDate,newDeathsByDeathDate,cumDeathsByDeathDate
0,2020-11-01,Aberdeen City,S12000033,19,2054.0,,
1,2020-11-01,Aberdeenshire,S12000034,8,972.0,,
2,2020-11-01,Adur,E07000223,12,,0.0,34.0
3,2020-11-01,Allerdale,E07000026,38,,0.0,93.0
4,2020-11-01,Amber Valley,E07000032,73,,1.0,116.0
...,...,...,...,...,...,...,...
384,2020-11-01,Wychavon,E07000238,27,,0.0,109.0
385,2020-11-01,Wycombe,E07000007,52,,0.0,93.0
386,2020-11-01,Wyre,E07000128,54,,3.0,166.0
387,2020-11-01,Wyre Forest,E07000239,55,,0.0,120.0
