# uk-covid19 API and data

## Available data

https://coronavirus.data.gov.uk/ is the main data source government departments are currently pointing to for up to date COVID19 data. It has a dashboard, an API and also some information that only seems to be available through download (?).

They offer several flavours of API, the Python one of which is described and used below. 

One other useful dataset requires download seperately(?): Cases by local area in England from https://coronavirus.data.gov.uk/downloads/msoa_data/MSOAs_latest.csv

## Python API

See https://coronavirus.data.gov.uk/developers-guide for more information on available data, syntax, allowed parameters etc. UK and England level have more available data/variables (including deaths and details on hospitalisations). LTLA (lower tier local authority) is the highest available spatial resolution but only includes the number of cases (newCasesBySpecimenDate). 

Below are the code snippets to retrieve national data and LTLA data respectivaly as a dataframe.

In [11]:
%pip install uk_covid19
from uk_covid19 import Cov19API

england_data = [
    'areaType=nation',
    'areaName=England'
]
cases_and_deaths = {
    "date": "date",
    "areaName": "areaName",
    "areaCode": "areaCode",
    "newCasesBySpecimenDate": "newCasesBySpecimenDate",
    "newDeaths28DaysByDeathDate": "newDeaths28DaysByDeathDate",
    "newAdmissions": "newAdmissions",
    "covidOccupiedMVBeds": "covidOccupiedMVBeds",
    "hospitalCases": "hospitalCases"
}
api = Cov19API(filters=england_data, structure=cases_and_deaths)
df_England = api.get_dataframe()
print(df_England)

Note: you may need to restart the kernel to use updated packages.
           date areaName   areaCode  newCasesBySpecimenDate  \
0    2020-09-21  England  E92000001                     NaN   
1    2020-09-20  England  E92000001                    32.0   
2    2020-09-19  England  E92000001                  1268.0   
3    2020-09-18  England  E92000001                  3013.0   
4    2020-09-17  England  E92000001                  3455.0   
..          ...      ...        ...                     ...   
231  2020-02-03  England  E92000001                     0.0   
232  2020-02-02  England  E92000001                     0.0   
233  2020-02-01  England  E92000001                     0.0   
234  2020-01-31  England  E92000001                     0.0   
235  2020-01-30  England  E92000001                     2.0   

     newDeaths28DaysByDeathDate  newAdmissions  covidOccupiedMVBeds  \
0                           NaN            NaN                154.0   
1                           3.0    

In [14]:
%pip install uk_covid19
from uk_covid19 import Cov19API

LTLA = [
    'areaType=ltla'
]
# structure parameter
cases_and_deaths = {
    "date": "date",
    "areaName": "areaName",
    "areaCode": "areaCode",
    "newCasesBySpecimenDate": "newCasesBySpecimenDate"    
}
api = Cov19API(filters=LTLA, structure=cases_and_deaths)
df_LTLA = api.get_dataframe()
print(df_LTLA)

Note: you may need to restart the kernel to use updated packages.
             date       areaName   areaCode  newCasesBySpecimenDate
0      2020-09-21  Aberdeen City  S12000033                       4
1      2020-09-20  Aberdeen City  S12000033                      13
2      2020-09-19  Aberdeen City  S12000033                      12
3      2020-09-18  Aberdeen City  S12000033                       6
4      2020-09-17  Aberdeen City  S12000033                       7
...           ...            ...        ...                     ...
71007  2020-02-03           York  E06000014                       0
71008  2020-02-02           York  E06000014                       0
71009  2020-02-01           York  E06000014                       0
71010  2020-01-31           York  E06000014                       0
71011  2020-01-30           York  E06000014                       1

[71012 rows x 4 columns]


Some other data can be downloaded as csv from https://coronavirus.data.gov.uk/, including LSOA and MSOA weekly counts (with values 0-2 suppressed).

## MSOA counts per week

Weekly number of cases for each MSOA is avaialble from https://coronavirus.data.gov.uk/downloads/msoa_data/MSOAs_latest.csv. This file also contains UTLA (utla19_cd), LTLA (lad19_cd?) and MSOA (msoa11_cd) columns which can be used as a lookup table between different area definitions. 

Two issues with this dataset:
- We don't have actual dates, just variables called wk_05, wk_06 etc, presumable week 5 and forward. It's not clear which day a week starts and wether wk_05 is the fifth week of the calendar year (assuming week 1 starts Wednesday 1 January, week 5 starts Wednesday 29 January)
- counts of 0-2 are reported as '-99', so there is no way to distinguish between covid free and low counts

To import the latest version in Python use the code below (a direct pd.read_csv results in HTTP error).

In [2]:
import pandas as pd
import requests
import io
datastr = requests.get('https://coronavirus.data.gov.uk/downloads/msoa_data/MSOAs_latest.csv',allow_redirects=True).text
data_file = io.StringIO(datastr)
MSOA_data = pd.read_csv(data_file)

print(MSOA_data)

       rgn19_cd    rgn19_nm  utla19_cd             utla19_nm   lad19_cd  \
0     E12000007      London  E09000001        City of London  E09000001   
1     E12000007      London  E09000002  Barking and Dagenham  E09000002   
2     E12000007      London  E09000002  Barking and Dagenham  E09000002   
3     E12000007      London  E09000002  Barking and Dagenham  E09000002   
4     E12000007      London  E09000002  Barking and Dagenham  E09000002   
...         ...         ...        ...                   ...        ...   
6787  E12000007      London  E09000011             Greenwich  E09000011   
6788  E12000002  North West  E08000012             Liverpool  E08000012   
6789  E12000002  North West  E08000012             Liverpool  E08000012   
6790  E12000002  North West  E08000012             Liverpool  E08000012   
6791        NaN         NaN        NaN                   NaN        NaN   

                  lad19_nm  msoa11_cd                msoa11_hclnm  wk_05  \
0           City of Lon

## Comparing to microsim output

dashboard.py creates the variable msoacounts_dict, which can be used to compare to the 'real' data mentioned above. We would expect mainly symptomatic cases to get tested, and possibly more and more asymptomatic carriers as part of an expanding track and trace program. We would always expect the reported/tested number to be lower than the microsim predicted number (unless everyone gets tested all the time, cases will remain underreported). 

Two possible options include
- Summing the msoacounts_dict across 7 days and compare to MSOA_data (weekly cases, MSOA level)
- Summing he msoacounts_dict across MSOAs to create LTLA counts, and compare to LTLA counts (daily counts, LTLA level)

In either case, you need to tell the script how the days in the micocrosim correspond to calendar dates and (for MSOA_data) weeks. 

You can do this by setting microsim_start_day to an actual calendar date. You can also change MSOA_data wk_05 to your best guess corresponding date.

In [3]:
import datetime
microsim_start_day = datetime.datetime(2020, 3, 23) # example 23 March 2020 - start lockdown
MSOA_data_start_wk_05 = datetime.datetime(2020, 1, 29)

To compare LTLA data to microsim data, we need to convert the microsim data from MSOAs to LTLAs (1 LTLA is a sum of 1 or more MSOAs).

In [None]:
MSOAs = msoacounts_dict['symptomatic'].index.tolist()  # MSOAs in microsim
# prepare microsim data
import numpy as np
MSOAs = msoacounts_dict['symptomatic'].index.tolist()
Days = msoacounts_dict['symptomatic'].columns.tolist()
MSOA2LTLA_lut = MSOA_data[['msoa11_cd', 'lad19_cd']].copy()
# keep only the relevant MSOAs
MSOA2LTLA_lut = MSOA2LTLA_lut[MSOA2LTLA_lut['msoa11_cd'].isin(MSOAs)]
# corresponding list of LTLAs
LTLAs = MSOA2LTLA_lut.lad19_cd.unique()
# empty dataframe  
output1 = pd.DataFrame(index =LTLAs, columns = Days, data = np.zeros((len(LTLAs),len(Days))))
# loop around MSOAs to assign values to correct LTLA
for m in MSOAs:
    l = MSOA2LTLA_lut[MSOA2LTLA_lut["msoa11_cd"].str.contains(m)].lad19_cd.iloc[0]
    for d in Days:
        output1.loc[l,d] = output1.loc[l,d] + msoacounts_dict['symptomatic'].loc[m,d] +  msoacounts_dict['asymptomatic'].loc[m,d]