## Comparison between tests number and positives in London
## and comparison between positives and the number of hospitalised patients in London

### Importing the data

The first steps consist of *installing the library* in the local teminal by running:

    pip install uk-covid19
    
and then *import* the library Cov19API:

In [117]:
from uk_covid19 import Cov19API

We create *filters* to define the area we are interested in 

In [118]:
London_only = [
    'areaType=overview'
]

We define a *structure*, ie a dictionary with the data field we want to request

hospitalCases or newAdmissions ? -> ask which one refers to new hospital admissions and the difference between the two 

In [119]:
structure_tests = {
    "date": "date",
    "newTests": 'newTestsByPublishDate',
    "newCases": "newCasesByPublishDate",   
}

structure_patients = {
    "date": "date",
    "newCases": "newCasesByPublishDate",
    'newPatients': 'newAdmissions'
}

We access API and retrieve the response in JSON format

In [120]:
# Accessing data of new tests and new positive cases
api_tests = Cov19API(filters=London_only, structure=structure_tests)
data_tests = api_tests.get_json()
print(type(data_tests))

import json
with open("data_tests.json", "wt") as OUTF:
    json.dump(data_tests, OUTF)
    

# Accessing data of new positive cases and new hospitalised patients  
api_patients = Cov19API(filters=London_only, structure=structure_patients)
data_patients = api_patients.get_json()
print(type(data_patients))

with open("data_patients.json", 'wt') as OUTF:
    json.dump(data_patients, OUTF)

<class 'dict'>
<class 'dict'>


### Visualising the data

To visualise the data, we will use `panda` and `matplotlib`

In [121]:
# first we need to import the libraries
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.dpi'] = 100

# load tests data
with open("data_tests.json", "rt") as INFILE1:
    data1=json.load(INFILE1)
    
# load patients' data
with open("data_patients.json", "rt") as INFILE2:
    data2=json.load(INFILE2)

Data needs to be extracted in such ways that it can be plotted in a `DataFrame` (ie. python equivalent of excel sheet).
This process consists of many steps:
    #extract values for the x axis of our plot which will be the index of the `DataFrame`
    extract the dates and sort them
    create an index as a `date_range` and fill in values
    plot values in a graph

In [122]:
# data1 = tests
# data2 = patients
datalist1=data1['data'] 
datalist2=data2['data']

In [123]:
# extract the dates and sort them
dates1=[dictionary['date'] for dictionary in datalist1 ]
dates1.sort()

dates2=[dictionary['date'] for dictionary in datalist2 ]
dates2.sort()

# Convert a date string into a pandas datetime object
def parse_date(datestring):
    return pd.to_datetime(datestring, format="%Y-%m-%d")

startdate1=parse_date(dates1[0])
enddate1=parse_date(dates1[-1])
print (startdate1, ' to ', enddate1)

startdate2=parse_date(dates2[0])
enddate2=parse_date(dates2[-1])
print (startdate2, ' to ', enddate2)

2020-01-03 00:00:00  to  2020-10-21 00:00:00
2020-01-03 00:00:00  to  2020-10-21 00:00:00


In [136]:
# create the index
index1=pd.date_range(startdate1, enddate1, freq='D')
timeseriesdf1=pd.DataFrame(index=index1, columns=["newTests", "newCases"])
timeseriesdf1

index2=pd.date_range(startdate2, enddate2, freq='D')
timeseriesdf2=pd.DataFrame(index=index2, columns=["newCases", "newPatients"])
timeseriesdf2

# fill in values for the `DataFrame`
for entry in datalist1:
    date1=parse_date(entry['date'])
    for column in ['newTests', 'newCases']:
        if pd.isna(timeseriesdf1.loc[date1, column]): 
            # replace None with 0 in our data 
            value= str(entry[column]) if entry[column]!=None else 0.0
            timeseriesdf1.loc[date1, column]=value
            
            
for entry in datalist2: 
    date2=parse_date(entry['date'])
    for column in ['newCases', 'newPatients']:
        if pd.isna(timeseriesdf2.loc[date2, column]): 
            value= float(entry[column]) if entry[column]!=None else 0.0
            timeseriesdf2.loc[date2, column]=value
            

# fill in any remaining "holes" due to missing dates
timeseriesdf1.fillna(0.0, inplace=True)
timeseriesdf2.fillna(0.0, inplace=True)
            
timeseriesdf1
timeseriesdf2

# this has created the tables with the data sorted into rows and columns
# the tables can now be used to plot a graph

Unnamed: 0,newCases,newPatients
2020-01-03,0.0,0.0
2020-01-04,0.0,0.0
2020-01-05,0.0,0.0
2020-01-06,0.0,0.0
2020-01-07,0.0,0.0
...,...,...
2020-10-17,16171.0,996.0
2020-10-18,16982.0,0.0
2020-10-19,18804.0,0.0
2020-10-20,21331.0,0.0


In [141]:
# plot graphs with the data

# basic linear graph
timeseriesdf1.plot()
timeseriesdf2.plot()

# logarithmic graph
timeseriesdf1.plot(logy=True)
timeseriesdf2.plot(logy=True)

TypeError: no numeric data to plot

### Saving data in a pickle file

we want to save the wrangled data in order to be able to experiment with the interactive controls, so we save them to a pickle file

In [142]:
timeseriesdf1.to_pickle("timeseriesdf1.pkl")
timeseriesdf2.to_pickle("timeseriesdf2.pkl")

### Adding interactive controls

We will be adding interactive controls to our graphs using the ```ipywidgets``` [library](https://ipywidgets.readthedocs.io/en/stable/index.html), which is basically a Graphical User Interface ([GUI](https://en.wikipedia.org/wiki/Graphical_user_interface)) library that runs in a notebook, so we will be installing it here and then we upload our data 

In [145]:
import ipywidgets as wdg
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.dpi'] = 100

In [150]:
# we create a button that can refresh the data based on our API
def access_api(button):
    print("I'm downloading data from the API...")
    print("...all done.")
    
apibutton=wdg.Button(
    description='Refresh data',
    disabled=False,
    button_style='info',
    tooltip='Click to download current Public Health England data',
    icon='download' 
)
    
apibutton.on_click(access_api)

display(apibutton)

Button(button_style='info', description='Refresh data', icon='download', style=ButtonStyle(), tooltip='Click t…

I'm downloading data from the API...
...all done.


### single control graphs

In [None]:
timeseriesdf1=pd.read_pickle("timeseriesdf1.pkl")
timeseriesdf2=pd.read_pickle("timeseriesdf2.pkl")
