# UK COVID-19 Data Visualization

Using the new UK Govt. COVID-19 API, we can view interactive plots for COVID-19 stats across the UK, using both local and national level data.

**Graphs produced for the following...**

- Daily and cumulative case data for ltla's (lower tier local authorities) within the UK
- Daily and cumulative case and death data at national level within the UK
- Total cases and case rates for males and females in England.
- Total deaths and death rates for males and females in England.
- Daily and cumulative hospital admissions at national level within the UK
- Daily and cumulative stats for testing at national level within the UK
- Daily amount of occupied mechanical-ventilator beds vs hospital cases for COVID-19 patients at national level within the UK

<div class="alert-success">

- To get the best view of the graphs, hover over the top right corner of each graph, and select 'Compare data on hover'.
- Zoom into the graph by selecting a section within the graph using your mouse.
- Double click within the graph to zoom back out again.

In [1]:
import pandas as pd
import os
from uk_covid19 import Cov19API
import json

from plotly import __version__
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
cf.go_offline()

import warnings
warnings.filterwarnings("ignore")

In [2]:
areas = ['areaType=ltla']
data = {'areaName':'areaName',
        'date':'date',
        'newCasesBySpecimenDate':'newCasesBySpecimenDate',
        'cumCasesBySpecimenDate':'cumCasesBySpecimenDate',
            }

In [3]:
api = Cov19API(filters=areas, structure=data)
file_path = os.path.join("datasets")
os.makedirs(file_path, exist_ok=True)
csv_path = os.path.join(file_path, 'ltla_case_data.csv')
api.get_csv(save_as=csv_path)
print("")




In [4]:
df = pd.read_csv(csv_path)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51347 entries, 0 to 51346
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   areaName                51347 non-null  object
 1   date                    51347 non-null  object
 2   newCasesBySpecimenDate  51347 non-null  int64 
 3   cumCasesBySpecimenDate  51347 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 1.6+ MB


In [6]:
print(f"There is data for a total of {df.areaName.nunique()} areas within the UK")

There is data for a total of 337 areas within the UK


<div class="alert-info">
Double click on any area name to remove all the lines on the graph, then click one by one on the area names you'd like to see data for.

In [7]:
df.iplot(mode='lines', x="date", y='newCasesBySpecimenDate',
        categories='areaName', xTitle='Date',yTitle='New Cases',
        title='New COVID-19 Cases per Day within UK ltla')

In [8]:
df.iplot(mode='lines', x="date", y='cumCasesBySpecimenDate',
        categories='areaName', xTitle='Date',yTitle='Total Cases',
        title='Total COVID-19 Cases within UK ltla')

In [9]:
areas = ['areaType=nation']
data = {'areaName':'areaName',
         'date':'date',
         'newCasesByPublishDate':'newCasesByPublishDate',
         'cumCasesByPublishDate':'cumCasesByPublishDate',
         'maleCases':'maleCases',
         'femaleCases':'femaleCases',
         'maleDeaths':'maleDeaths',
         'femaleDeaths':'femaleDeaths',
         'newDeathsByPublishDate':'newDeathsByPublishDate',
         'cumDeathsByPublishDate':'cumDeathsByPublishDate'
        }

In [10]:
api = Cov19API(filters=areas, structure=data)
file_path = os.path.join("datasets")
os.makedirs(file_path, exist_ok=True)
csv_path = os.path.join(file_path, 'national_data.csv')
api.get_csv(save_as=csv_path)
print("")




In [11]:
df2 = pd.read_csv(csv_path)

In [12]:
df2.head()

Unnamed: 0,areaName,date,newCasesByPublishDate,cumCasesByPublishDate,maleCases,femaleCases,maleDeaths,femaleDeaths,newDeathsByPublishDate,cumDeathsByPublishDate
0,England,2020-08-06,826,265849.0,,,,,46,41795.0
1,England,2020-08-05,804,265023.0,"[{'age': '65_to_69', 'value': 6293, 'rate': 46...","[{'age': '35_to_39', 'value': 9845, 'rate': 51...","[{'age': '0_to_4', 'value': 4, 'rate': 0.2}, {...","[{'age': '0_to_4', 'value': 5, 'rate': 0.30000...",63,41749.0
2,England,2020-08-04,617,264219.0,,,,,88,41686.0
3,England,2020-08-03,856,263602.0,,,,,9,41598.0
4,England,2020-08-02,676,262746.0,,,,,5,41589.0


In [13]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 860 entries, 0 to 859
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   areaName                860 non-null    object 
 1   date                    860 non-null    object 
 2   newCasesByPublishDate   860 non-null    int64  
 3   cumCasesByPublishDate   662 non-null    float64
 4   maleCases               2 non-null      object 
 5   femaleCases             2 non-null      object 
 6   maleDeaths              1 non-null      object 
 7   femaleDeaths            1 non-null      object 
 8   newDeathsByPublishDate  860 non-null    int64  
 9   cumDeathsByPublishDate  582 non-null    float64
dtypes: float64(2), int64(2), object(6)
memory usage: 67.3+ KB


In [14]:
df2.iplot(mode='lines', x="date", y='newCasesByPublishDate',
        categories='areaName', xTitle='Date',yTitle='New Cases',
        title='New COVID-19 Cases per Day within UK Nations')

In [15]:
df2.iplot(mode='lines', x="date", y='cumCasesByPublishDate',
        categories='areaName', xTitle='Date',yTitle='Total Cases',
        title='Total COVID-19 Cases within UK Nations')

<div class='alert-info'>
There's a lot of missing data from this section of the dataset which will explain the sudden spikes.

In [16]:
df2.iplot(mode='lines', x="date", y='newDeathsByPublishDate',
        categories='areaName', xTitle='Date',yTitle='New Deaths',
        title='New COVID-19 Deaths per Day within UK Nations')

In [17]:
df2.iplot(mode='lines', x="date", y='cumDeathsByPublishDate',
        categories='areaName', xTitle='Date',yTitle='Total Deaths',
        title='Total COVID-19 Deaths within UK Nations')

In [18]:
def df_maker(list1):
    df = [x + "}" for x in list1[0][1:-1].split('}, ') if x[-1]!='}']
    df = [json.loads(x.replace("'", "\"")) for x in df]
    df = pd.DataFrame(df)
    return df

In [19]:
male_cases = df_maker(df2.maleCases.dropna().values)

In [20]:
male_cases['age'] = sorted(male_cases['age'])

In [21]:
male_cases.tail()

Unnamed: 0,age,value,rate
13,65_to_69,7962,468.3
14,70_to_74,992,54.8
15,75_to_79,8758,454.5
16,80_to_84,5779,3476.4
17,90+,7805,866.5


In [22]:
female_cases = df_maker(df2.femaleCases.dropna().values)

In [23]:
female_cases['age'] = sorted(female_cases['age'])

In [24]:
female_cases.tail()

Unnamed: 0,age,value,rate
13,65_to_69,827,51.5
14,70_to_74,10534,1991.6
15,80_to_84,1089,63.1
16,85_to_89,4689,324.7
17,90+,7847,495.2


In [25]:
male_deaths = df_maker(df2.maleDeaths.dropna().values)

In [26]:
male_deaths.head()

Unnamed: 0,age,value,rate
0,0_to_4,4,0.2
1,5_to_9,0,0.0
2,10_to_14,7,0.4
3,15_to_19,9,0.5
4,20_to_24,9,0.6


In [27]:
female_deaths = df_maker(df2.femaleDeaths.dropna().values)

In [28]:
female_deaths.head()

Unnamed: 0,age,value,rate
0,0_to_4,5,0.3
1,5_to_9,2,0.1
2,10_to_14,3,0.2
3,15_to_19,3,0.2
4,20_to_24,10,0.7


In [29]:
fig = go.Figure(data=[go.Bar(name="Male", x=male_cases["age"], y=male_cases["value"]),
                     go.Bar(name="Female", x=female_cases["age"], y=female_cases["value"])])
fig.update_layout(title_text='Total COVID-19 Cases in England by Gender',
                  xaxis_title="Age",
                yaxis_title="Cases",
                legend_title="Gender",)
fig.show()

In [30]:
fig = go.Figure(data=[go.Bar(name="Male", x=male_cases["age"], y=male_cases["rate"]),
                     go.Bar(name="Female", x=female_cases["age"], y=female_cases["rate"])])
fig.update_layout(title_text='COVID-19 Case Rates in England by Gender',
                  xaxis_title="Age",
                yaxis_title="Case Rate",
                legend_title="Gender",)
fig.show()

<div class="alert-info">
There's no male case data for the 85-89 age-group, which explains why there appears to be no cases at all in that age range.

In [31]:
fig = go.Figure(data=[go.Bar(name="Male", x=male_deaths["age"], y=male_deaths["value"]),
                     go.Bar(name="Female", x=female_deaths["age"], y=female_deaths["value"])])
fig.update_layout(title_text='Total COVID-19 Deaths in England by Gender',
                  xaxis_title="Age",
                yaxis_title="Deaths",
                legend_title="Gender",)
fig.show()

In [32]:
fig = go.Figure(data=[go.Bar(name="Male", x=male_deaths["age"], y=male_deaths["rate"]),
                     go.Bar(name="Female", x=female_deaths["age"], y=female_deaths["rate"])])
fig.update_layout(title_text='COVID-19 Death Rates in England by Gender',
                  xaxis_title="Age",
                yaxis_title="Death Rate",
                legend_title="Gender",)
fig.show()

In [33]:
areas = ['areaType=nation']
data = {'areaName':'areaName',
         'date':'date',
         'newAdmissions':'newAdmissions',
         'cumAdmissions':'cumAdmissions',
         'cumTestsByPublishDate':'cumTestsByPublishDate',
         'newTestsByPublishDate':'newTestsByPublishDate',
         'covidOccupiedMVBeds':'covidOccupiedMVBeds',
         'hospitalCases':'hospitalCases',
        }

In [34]:
api = Cov19API(filters=areas, structure=data)
file_path = os.path.join("datasets")
os.makedirs(file_path, exist_ok=True)
csv_path = os.path.join(file_path, 'hospital_data.csv')
api.get_csv(save_as=csv_path)
print("")




In [35]:
df3 = pd.read_csv(csv_path)

In [36]:
df3.head()

Unnamed: 0,areaName,date,newAdmissions,cumAdmissions,cumTestsByPublishDate,newTestsByPublishDate,covidOccupiedMVBeds,hospitalCases
0,England,2020-08-06,,,8858310.0,128998.0,61.0,694.0
1,England,2020-08-05,,,8728320.0,122633.0,63.0,727.0
2,England,2020-08-04,20.0,112773.0,8604990.0,110730.0,65.0,737.0
3,England,2020-08-03,58.0,112753.0,8494260.0,105564.0,67.0,767.0
4,England,2020-08-02,74.0,112695.0,8387470.0,115939.0,68.0,769.0


In [37]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 598 entries, 0 to 597
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   areaName               598 non-null    object 
 1   date                   598 non-null    object 
 2   newAdmissions          584 non-null    float64
 3   cumAdmissions          584 non-null    float64
 4   cumTestsByPublishDate  211 non-null    float64
 5   newTestsByPublishDate  195 non-null    float64
 6   covidOccupiedMVBeds    540 non-null    float64
 7   hospitalCases          565 non-null    float64
dtypes: float64(6), object(2)
memory usage: 37.5+ KB


In [38]:
df3.iplot(mode="lines", x="date", y="newAdmissions",
         xTitle="Date",yTitle="New Admissions", categories="areaName",
         title="Daily COVID-19 Hospital Admissions within UK Nations")

In [39]:
df3.iplot(mode="lines", x="date", y="cumAdmissions",
         xTitle="Date",yTitle="Total Admissions", categories="areaName",
         title="Total COVID-19 Hospital Admissions within UK Nations")

In [40]:
df3.iplot(mode="lines", x="date", y="newTestsByPublishDate",
         xTitle="Date",yTitle="New Tests", categories="areaName",
         title="Daily Amount of COVID-19 Tests within UK Nations")

In [41]:
df3.iplot(mode="lines", x="date", y="cumTestsByPublishDate",
         xTitle="Date",yTitle="Total Tests", categories="areaName",
         title="Total Amount of COVID-19 Tests within UK Nations")

<div class="alert-info">
There's a lot of missing data within both of the testing datasets, which explains the massive gap before July for most countries.

In [42]:
df3.areaName.unique()

array(['England', 'Northern Ireland', 'Scotland', 'Wales'], dtype=object)

In [43]:
england = df3[df3['areaName']=="England"]
wales = df3[df3['areaName']=="Wales"]
scotland = df3[df3['areaName']=="Scotland"]
northern_ireland = df3[df3['areaName']=="Northern Ireland"]

In [44]:
fig = make_subplots(
    rows=2, cols=2, subplot_titles=("England", "Wales", "Scotland", "Northern Ireland")
)

# Add traces
fig.add_trace(go.Line(name="MV Beds", x=england['date'], y=england['covidOccupiedMVBeds']), row=1, col=1)
fig.add_trace(go.Line(name="Hospital Cases", x=england['date'], y=england['hospitalCases']), row=1, col=1)

fig.add_trace(go.Line(name="MV Beds", x=wales['date'], y=wales['covidOccupiedMVBeds']), row=1, col=2)
fig.add_trace(go.Line(name="Hospital Cases", x=wales['date'], y=wales['hospitalCases']), row=1, col=2)

fig.add_trace(go.Line(name="MV Beds", x=scotland['date'], y=scotland['covidOccupiedMVBeds']), row=2, col=1)
fig.add_trace(go.Line(name="Hospital Cases", x=scotland['date'], y=scotland['hospitalCases']), row=2, col=1)

fig.add_trace(go.Line(name="MV Beds", x=northern_ireland['date'], y=northern_ireland['covidOccupiedMVBeds']), row=2, col=2)
fig.add_trace(go.Line(name="Hospital Cases", x=northern_ireland['date'], y=northern_ireland['hospitalCases']), row=2, col=2)

# Update xaxis properties
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)

# Update yaxis properties
fig.update_yaxes(title_text="Daily Amount", row=1, col=1)
fig.update_yaxes(title_text="Daily Amount", row=1, col=2)
fig.update_yaxes(title_text="Daily Amount", row=2, col=1)
fig.update_yaxes(title_text="Daily Amount", row=2, col=2)

# Update title and height
fig.update_layout(title_text="Daily Amount of Occupied MV Beds vs Hospital Cases for COVID-19 Patients",
                  showlegend=False, height=700)

fig.show()