# UK COVID-19 Data Visualization

Using the new UK Govt. COVID-19 API, we can view interactive plots for COVID-19 stats across the UK, using both local and national level data.

**Graphs produced for the following...**

- Daily and cumulative case data for ltla's (lower tier local authorities) within the UK
- Daily and cumulative case and death data at national level within the UK
- Cumulative case rates and death rates at national level within the UK
- Total cases and case rates for males and females in England
- Daily and cumulative hospital admissions at national level within the UK
- Cumulative hospital admissions and admission rate by age group in England
- Daily and cumulative stats for testing at national level within the UK
- Daily amount of occupied mechanical-ventilator beds vs hospital cases for COVID-19 patients at national level within the UK

<div class="alert-success">

- To get the best view of the graphs, hover over the top right corner of each graph, and select 'Compare data on hover'.
- Zoom into the graph by selecting a section within the graph using your mouse.
- Double click within the graph to zoom back out again.

In [1]:
import pandas as pd
import os
from uk_covid19 import Cov19API
import json
import re

from plotly import __version__
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
cf.go_offline()

import warnings
warnings.filterwarnings("ignore")

In [2]:
areas = ['areaType=ltla']
data = {'areaName':'areaName',
        'date':'date',
        'newCasesBySpecimenDate':'newCasesBySpecimenDate',
        'cumCasesBySpecimenDate':'cumCasesBySpecimenDate',
            }

In [3]:
api = Cov19API(filters=areas, structure=data)
file_path = os.path.join("datasets")
os.makedirs(file_path, exist_ok=True)
csv_path = os.path.join(file_path, 'ltla_case_data.csv')
try:
    api.get_csv(save_as=csv_path)
except:
    print("Couldn't obtain new dataset from API")
print("")




In [4]:
df = pd.read_csv(csv_path)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59538 entries, 0 to 59537
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   areaName                59538 non-null  object
 1   date                    59538 non-null  object
 2   newCasesBySpecimenDate  59538 non-null  int64 
 3   cumCasesBySpecimenDate  59538 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 1.8+ MB


In [6]:
print(f"There is data for a total of {df.areaName.nunique()} areas within the UK")

There is data for a total of 380 areas within the UK


<div class="alert-info">
Double click on any area name to remove all the lines on the graph, then click one by one on the area names you'd like to see data for.

In [7]:
df.iplot(mode='lines', x="date", y='newCasesBySpecimenDate',
        categories='areaName', xTitle='Date',yTitle='New Cases',
        title='New COVID-19 Cases per Day within UK ltla')

In [8]:
df.iplot(mode='lines', x="date", y='cumCasesBySpecimenDate',
        categories='areaName', xTitle='Date',yTitle='Total Cases',
        title='Total COVID-19 Cases within UK ltla')

In [9]:
areas = ['areaType=nation']
data = {'areaName':'areaName',
         'date':'date',
         'newCasesByPublishDate':'newCasesByPublishDate',
         'cumCasesByPublishDate':'cumCasesByPublishDate',
         'maleCases':'maleCases',
         'femaleCases':'femaleCases',
         'cumCasesByPublishDateRate':'cumCasesByPublishDateRate',
         'cumDeaths28DaysByPublishDateRate':'cumDeaths28DaysByPublishDateRate',
         'newDeaths28DaysByPublishDate':'newDeaths28DaysByPublishDate',
         'cumDeaths28DaysByPublishDate':'cumDeaths28DaysByPublishDate'
        }

In [10]:
api = Cov19API(filters=areas, structure=data)
file_path = os.path.join("datasets")
os.makedirs(file_path, exist_ok=True)
csv_path = os.path.join(file_path, 'national_data.csv')
try:
    api.get_csv(save_as=csv_path)
except:
    print("Couldn't obtain new dataset from API")
print("")




In [11]:
df2 = pd.read_csv(csv_path)

In [12]:
df2.head()

Unnamed: 0,areaName,date,newCasesByPublishDate,cumCasesByPublishDate,maleCases,femaleCases,cumCasesByPublishDateRate,cumDeaths28DaysByPublishDateRate,newDeaths28DaysByPublishDate,cumDeaths28DaysByPublishDate
0,England,2020-08-23,938,281457.0,"[{'age': '20_to_24', 'value': 5248, 'rate': 33...","[{'age': '50_to_54', 'value': 13173, 'rate': 7...",500.0,65.4,4.0,36786.0
1,England,2020-08-22,1060,280519.0,,,498.4,65.3,17.0,36782.0
2,England,2020-08-21,908,279459.0,,,496.5,65.3,2.0,36765.0
3,England,2020-08-20,1035,278551.0,,,494.9,65.3,6.0,36763.0
4,England,2020-08-19,707,277516.0,,,493.0,65.3,15.0,36757.0


In [13]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 936 entries, 0 to 935
Data columns (total 10 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   areaName                          936 non-null    object 
 1   date                              936 non-null    object 
 2   newCasesByPublishDate             936 non-null    int64  
 3   cumCasesByPublishDate             730 non-null    float64
 4   maleCases                         1 non-null      object 
 5   femaleCases                       1 non-null      object 
 6   cumCasesByPublishDateRate         730 non-null    float64
 7   cumDeaths28DaysByPublishDateRate  652 non-null    float64
 8   newDeaths28DaysByPublishDate      647 non-null    float64
 9   cumDeaths28DaysByPublishDate      652 non-null    float64
dtypes: float64(5), int64(1), object(4)
memory usage: 73.2+ KB


In [14]:
df2.iplot(mode='lines', x="date", y='cumCasesByPublishDate',
        categories='areaName', xTitle='Date',yTitle='Total Cases',
        title='Total COVID-19 Cases within UK Nations')

<div class='alert-info'>
There's a lot of missing data from this section of the dataset which will explain the sudden spikes.

In [15]:
df2.iplot(mode='lines', x="date", y='cumDeaths28DaysByPublishDate',
        categories='areaName', xTitle='Date',yTitle='Total Deaths',
        title='Total COVID-19 Deaths within 28 Days per UK Nation')

In [16]:
england = df2[df2['areaName']=="England"]
wales = df2[df2['areaName']=="Wales"]
scotland = df2[df2['areaName']=="Scotland"]
northern_ireland = df2[df2['areaName']=="Northern Ireland"]

In [17]:
fig = make_subplots(
    rows=2, cols=2, subplot_titles=("England", "Wales", "Scotland", "Northern Ireland")
)

# Add traces
fig.add_trace(go.Line(name="Cases", x=england['date'], y=england['newCasesByPublishDate']), row=1, col=1)
fig.add_trace(go.Line(name="Deaths", x=england['date'], y=england['newDeaths28DaysByPublishDate']), row=1, col=1)

fig.add_trace(go.Line(name="Cases", x=wales['date'], y=wales['newCasesByPublishDate']), row=1, col=2)
fig.add_trace(go.Line(name="Deaths", x=wales['date'], y=wales['newDeaths28DaysByPublishDate']), row=1, col=2)

fig.add_trace(go.Line(name="Cases", x=scotland['date'], y=scotland['newCasesByPublishDate']), row=2, col=1)
fig.add_trace(go.Line(name="Deaths", x=scotland['date'], y=scotland['newDeaths28DaysByPublishDate']), row=2, col=1)

fig.add_trace(go.Line(name="Cases", x=northern_ireland['date'], y=northern_ireland['newCasesByPublishDate']), row=2, col=2)
fig.add_trace(go.Line(name="Deaths", x=northern_ireland['date'], y=northern_ireland['newDeaths28DaysByPublishDate']), row=2, col=2)

# Update xaxis properties
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)

# Update yaxis properties
fig.update_yaxes(title_text="Daily Amount", row=1, col=1)
fig.update_yaxes(title_text="Daily Amount", row=1, col=2)
fig.update_yaxes(title_text="Daily Amount", row=2, col=1)
fig.update_yaxes(title_text="Daily Amount", row=2, col=2)

# Update title and height
fig.update_layout(title_text="Daily Amount of COVID-19 Cases vs Deaths within 28 Days per UK Nation",
                  showlegend=False, height=700, width=950)

fig.show()

In [18]:
fig = make_subplots(
    rows=2, cols=2, subplot_titles=("England", "Wales", "Scotland", "Northern Ireland")
)

# Add traces
fig.add_trace(go.Line(name="Case Rate", x=england['date'], y=england['cumCasesByPublishDateRate']), row=1, col=1)
fig.add_trace(go.Line(name="Death Rate", x=england['date'], y=england['cumDeaths28DaysByPublishDateRate']), row=1, col=1)

fig.add_trace(go.Line(name="Case Rate", x=wales['date'], y=wales['cumCasesByPublishDateRate']), row=1, col=2)
fig.add_trace(go.Line(name="Deaths", x=wales['date'], y=wales['cumDeaths28DaysByPublishDateRate']), row=1, col=2)

fig.add_trace(go.Line(name="Case Rate", x=scotland['date'], y=scotland['cumCasesByPublishDateRate']), row=2, col=1)
fig.add_trace(go.Line(name="Death Rate", x=scotland['date'], y=scotland['cumDeaths28DaysByPublishDateRate']), row=2, col=1)

fig.add_trace(go.Line(name="Case Rate", x=northern_ireland['date'], y=northern_ireland['cumCasesByPublishDateRate']), row=2, col=2)
fig.add_trace(go.Line(name="Death Rate", x=northern_ireland['date'], y=northern_ireland['cumDeaths28DaysByPublishDateRate']), row=2, col=2)

# Update xaxis properties
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)

# Update yaxis properties
fig.update_yaxes(title_text="Rate per 100k population", row=1, col=1)
fig.update_yaxes(title_text="Rate per 100k population", row=1, col=2)
fig.update_yaxes(title_text="Rate per 100k population", row=2, col=1)
fig.update_yaxes(title_text="Rate per 100k population", row=2, col=2)

# Update title and height
fig.update_layout(title_text="Cumulative COVID-19 Case Rate vs Death Rate within 28 Days per UK Nation",
                  showlegend=False, height=700, width=950)

fig.show()

In [19]:
def df_maker(list1):
    df = [x + "}" for x in list1[0][1:-1].split('}, ') if x[-1]!='}']
    df = [json.loads(x.replace("'", "\"")) for x in df]
    df = pd.DataFrame(df)
    return df

In [20]:
def sorter(x):
    pattern = re.compile(r'\d\d?')
    for x in x.split("_"):
        return int(pattern.search(x).group())

In [21]:
male_cases = df_maker(df2.maleCases.dropna().values)

In [22]:
male_cases['age'] = sorted(male_cases['age'], key=lambda x: sorter(x))

In [23]:
male_cases.head()

Unnamed: 0,age,value,rate
0,5_to_9,5248,330.5
1,10_to_14,5836,3510.7
2,15_to_19,7154,537.8
3,20_to_24,7499,394.6
4,25_to_29,6487,479.5


In [24]:
female_cases = df_maker(df2.femaleCases.dropna().values)

In [25]:
female_cases['age'] = sorted(female_cases['age'], key=lambda x: sorter(x))

In [26]:
female_cases.head()

Unnamed: 0,age,value,rate
0,0_to_4,13173,702.0
1,5_to_9,10778,571.9
2,10_to_14,6613,635.9
3,15_to_19,12878,759.7
4,20_to_24,11820,596.8


In [27]:
fig = go.Figure(data=[go.Bar(name="Male", x=male_cases["age"], y=male_cases["value"]),
                     go.Bar(name="Female", x=female_cases["age"], y=female_cases["value"])])
fig.update_layout(title_text='Total COVID-19 Cases in England by Gender',
                  xaxis_title="Age",
                yaxis_title="Cases",
                legend_title="Gender",)
fig.show()

In [28]:
fig = go.Figure(data=[go.Bar(name="Male", x=male_cases["age"], y=male_cases["rate"]),
                     go.Bar(name="Female", x=female_cases["age"], y=female_cases["rate"])])
fig.update_layout(title_text='COVID-19 Case Rates in England by Gender',
                  xaxis_title="Age",
                yaxis_title="Case Rate",
                legend_title="Gender",)
fig.show()

<div class="alert-info">
There's some missing case data in certain age groups, which explains why there appears to be no cases at all in those age ranges.

In [29]:
areas = ['areaType=nation']
data = {'areaName':'areaName',
         'date':'date',
         'newAdmissions':'newAdmissions',
         'cumAdmissions':'cumAdmissions',
         'cumAdmissionsByAge':'cumAdmissionsByAge',
         'cumTestsByPublishDate':'cumTestsByPublishDate',
         'newTestsByPublishDate':'newTestsByPublishDate',
         'covidOccupiedMVBeds':'covidOccupiedMVBeds',
         'hospitalCases':'hospitalCases',
        }

In [30]:
api = Cov19API(filters=areas, structure=data)
file_path = os.path.join("datasets")
os.makedirs(file_path, exist_ok=True)
csv_path = os.path.join(file_path, 'hospital_data.csv')
try:
    api.get_csv(save_as=csv_path)
except:
    print("Couldn't obtain new dataset from API")
print("")




In [31]:
df3 = pd.read_csv(csv_path)

In [32]:
df3.head()

Unnamed: 0,areaName,date,newAdmissions,cumAdmissions,cumAdmissionsByAge,cumTestsByPublishDate,newTestsByPublishDate,covidOccupiedMVBeds,hospitalCases
0,England,2020-08-21,,,,,,64.0,480.0
1,England,2020-08-20,17.0,113198.0,"[{'age': '0_to_5', 'value': 686, 'rate': 16.9}...",10762100.0,149876.0,64.0,516.0
2,England,2020-08-19,58.0,113181.0,"[{'age': '0_to_5', 'value': 685, 'rate': 16.9}...",10609700.0,133540.0,63.0,523.0
3,England,2020-08-18,56.0,113123.0,"[{'age': '0_to_5', 'value': 685, 'rate': 16.9}...",10476200.0,119749.0,64.0,545.0
4,England,2020-08-17,46.0,113067.0,"[{'age': '0_to_5', 'value': 685, 'rate': 16.9}...",10356400.0,141677.0,67.0,572.0


In [33]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 661 entries, 0 to 660
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   areaName               661 non-null    object 
 1   date                   661 non-null    object 
 2   newAdmissions          644 non-null    float64
 3   cumAdmissions          644 non-null    float64
 4   cumAdmissionsByAge     155 non-null    object 
 5   cumTestsByPublishDate  275 non-null    float64
 6   newTestsByPublishDate  256 non-null    float64
 7   covidOccupiedMVBeds    603 non-null    float64
 8   hospitalCases          628 non-null    float64
dtypes: float64(6), object(3)
memory usage: 46.6+ KB


In [34]:
df3.iplot(mode="lines", x="date", y="newAdmissions",
         xTitle="Date",yTitle="New Admissions", categories="areaName",
         title="Daily COVID-19 Hospital Admissions within UK Nations")

In [35]:
df3.iplot(mode="lines", x="date", y="cumAdmissions",
         xTitle="Date",yTitle="Total Admissions", categories="areaName",
         title="Total COVID-19 Hospital Admissions within UK Nations")

In [36]:
age_admissions = df_maker(df3.cumAdmissionsByAge.dropna().values)

In [37]:
age_admissions['age'] = sorted(age_admissions['age'], key=lambda x: sorter(x))

In [38]:
age_admissions.head()

Unnamed: 0,age,value,rate
0,0_to_5,686,16.9
1,6_to_17,47340,537.1
2,18_to_64,618,7.8
3,65_to_84,36134,106.8


In [39]:
fig = go.Figure(data=[go.Bar(name='Total Admissions',x=age_admissions["age"], y=age_admissions["value"]),
                     go.Bar(name='Admission Rate',x=age_admissions["age"], y=age_admissions["rate"])])
fig.update_layout(title_text='Total Hospital Admissions vs Admission Rate per Age Group',
                  xaxis_title="Age",
                yaxis_title="Admissions",
                  yaxis_type="log"
                )
fig.show()

In [40]:
df3.iplot(mode="lines", x="date", y="newTestsByPublishDate",
         xTitle="Date",yTitle="New Tests", categories="areaName",
         title="Daily Amount of COVID-19 Tests within UK Nations")

In [41]:
df3.iplot(mode="lines", x="date", y="cumTestsByPublishDate",
         xTitle="Date",yTitle="Total Tests", categories="areaName",
         title="Total Amount of COVID-19 Tests within UK Nations")

<div class="alert-info">
There's a lot of missing data within both of the testing datasets, which explains the massive gap before July for most countries.

In [42]:
df3.areaName.unique()

array(['England', 'Northern Ireland', 'Scotland', 'Wales'], dtype=object)

In [43]:
england = df3[df3['areaName']=="England"]
wales = df3[df3['areaName']=="Wales"]
scotland = df3[df3['areaName']=="Scotland"]
northern_ireland = df3[df3['areaName']=="Northern Ireland"]

In [44]:
fig = make_subplots(
    rows=2, cols=2, subplot_titles=("England", "Wales", "Scotland", "Northern Ireland")
)

# Add traces
fig.add_trace(go.Line(name="MV Beds", x=england['date'], y=england['covidOccupiedMVBeds']), row=1, col=1)
fig.add_trace(go.Line(name="Hospital Cases", x=england['date'], y=england['hospitalCases']), row=1, col=1)

fig.add_trace(go.Line(name="MV Beds", x=wales['date'], y=wales['covidOccupiedMVBeds']), row=1, col=2)
fig.add_trace(go.Line(name="Hospital Cases", x=wales['date'], y=wales['hospitalCases']), row=1, col=2)

fig.add_trace(go.Line(name="MV Beds", x=scotland['date'], y=scotland['covidOccupiedMVBeds']), row=2, col=1)
fig.add_trace(go.Line(name="Hospital Cases", x=scotland['date'], y=scotland['hospitalCases']), row=2, col=1)

fig.add_trace(go.Line(name="MV Beds", x=northern_ireland['date'], y=northern_ireland['covidOccupiedMVBeds']), row=2, col=2)
fig.add_trace(go.Line(name="Hospital Cases", x=northern_ireland['date'], y=northern_ireland['hospitalCases']), row=2, col=2)

# Update xaxis properties
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_xaxes(title_text="Date", row=2, col=2)

# Update yaxis properties
fig.update_yaxes(title_text="Daily Amount", row=1, col=1)
fig.update_yaxes(title_text="Daily Amount", row=1, col=2)
fig.update_yaxes(title_text="Daily Amount", row=2, col=1)
fig.update_yaxes(title_text="Daily Amount", row=2, col=2)

# Update title and height
fig.update_layout(title_text="Daily Amount of Occupied MV Beds vs Hospital Cases for COVID-19 Patients",
                  showlegend=False, height=700, width=950)

fig.show()