# DIY Covid-19 Dashboard

Student Number: 221067606

This covid-19 dashboard will show three plots:
- A plot of the maximum and minimum transmission rates in the nations of Britain,
- A plot of the percentage of people vaccinated in the nations of Britain over time,
- A plot of the cumulative death rates in Upper Tier Local Authority regions (hereafter: 'utla').

The data used in these charts comes from the Public Health England (hereafter: 'PHE') API.
Source: https://coronavirus.data.gov.uk/details/developers-guide/main-api

In [3]:
from uk_covid19 import Cov19API
from IPython.display import clear_output
from os import listdir as os_listdir
import ipywidgets as wdg
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
import random
import datetime

%matplotlib inline
plt.rcParams["figure.dpi"] = 100

In [4]:
# --------------------------------------------------------------------
# Structures and filters used for sending requests to the PHE API
# --------------------------------------------------------------------

# Regional filters
filter_region_nation_england = ["areaType=nation", "areaName=England"]

filter_region_nation_wales = ["areaType=nation", "areaName=Wales"]

filter_region_nation_scotland = ["areaType=nation", "areaName=Scotland"]

"""
Returns a list of strings, each of which is the name of a UTLA area in Britain.
Source: https://geoportal.statistics.gov.uk/datasets/ons::upper-tier-local-authorities-dec-2022-names-and-codes-in-the-united-kingdom/explore
"""
def get_utla_region_list():
    return [
        "Aberdeen City",
        "Aberdeenshire",
        "Angus",
        "Antrim and Newtownabbey",
        "Ards and North Down",
        "Argyll and Bute",
        "Armagh City, Banbridge and Craigavon",
        "Barking and Dagenham",
        "Barnet",
        "Barnsley",
        "Bath and North East Somerset",
        "Bedford",
        "Belfast",
        "Bexley",
        "Birmingham",
        "Blackburn with Darwen",
        "Blackpool",
        "Blaenau Gwent",
        "Bolton",
        "Bournemouth, Christchurch and Poole",
        "Bracknell Forest",
        "Bradford",
        "Brent",
        "Bridgend",
        "Brighton and Hove",
        "Bristol, City of",
        "Bromley",
        "Buckinghamshire",
        "Bury",
        "Caerphilly",
        "Calderdale",
        "Cambridgeshire",
        "Camden",
        "Cardiff",
        "Carmarthenshire",
        "Causeway Coast and Glens",
        "Central Bedfordshire",
        "Ceredigion",
        "Cheshire East",
        "Cheshire West and Chester",
        "City of Edinburgh",
        "City of London",
        "Clackmannanshire",
        "Conwy",
        "Cornwall",
        "County Durham",
        "Coventry",
        "Croydon",
        "Cumbria",
        "Darlington",
        "Denbighshire",
        "Derby",
        "Derbyshire",
        "Derry City and Strabane",
        "Devon",
        "Doncaster",
        "Dorset",
        "Dudley",
        "Dumfries and Galloway",
        "Dundee City",
        "Ealing",
        "East Ayrshire",
        "East Dunbartonshire",
        "East Lothian",
        "East Renfrewshire",
        "East Riding of Yorkshire",
        "East Sussex",
        "Enfield",
        "Essex",
        "Falkirk",
        "Fermanagh and Omagh",
        "Fife",
        "Flintshire",
        "Gateshead",
        "Glasgow City",
        "Gloucestershire",
        "Greenwich",
        "Gwynedd",
        "Hackney",
        "Halton",
        "Hammersmith and Fulham",
        "Hampshire",
        "Haringey",
        "Harrow",
        "Hartlepool",
        "Havering",
        "Herefordshire, County of",
        "Hertfordshire",
        "Highland",
        "Hillingdon",
        "Hounslow",
        "Inverclyde",
        "Isle of Anglesey",
        "Isle of Wight",
        "Isles of Scilly",
        "Islington",
        "Kensington and Chelsea",
        "Kent",
        "Kingston upon Hull, City of",
        "Kingston upon Thames",
        "Kirklees",
        "Knowsley",
        "Lambeth",
        "Lancashire",
        "Leeds",
        "Leicester",
        "Leicestershire",
        "Lewisham",
        "Lincolnshire",
        "Lisburn and Castlereagh",
        "Liverpool",
        "Luton",
        "Manchester",
        "Medway",
        "Merthyr Tydfil",
        "Merton",
        "Mid and East Antrim",
        "Mid Ulster",
        "Middlesbrough",
        "Midlothian",
        "Milton Keynes",
        "Monmouthshire",
        "Moray",
        "Na h-Eileanan Siar",
        "Neath Port Talbot",
        "Newcastle upon Tyne",
        "Newham",
        "Newport",
        "Newry, Mourne and Down",
        "Norfolk",
        "North Ayrshire",
        "North East Lincolnshire",
        "North Lanarkshire",
        "North Lincolnshire",
        "North Northamptonshire",
        "North Somerset",
        "North Tyneside",
        "North Yorkshire",
        "Northumberland",
        "Nottingham",
        "Nottinghamshire",
        "Oldham",
        "Orkney Islands",
        "Oxfordshire",
        "Pembrokeshire",
        "Perth and Kinross",
        "Peterborough",
        "Plymouth",
        "Portsmouth",
        "Powys",
        "Reading",
        "Redbridge",
        "Redcar and Cleveland",
        "Renfrewshire",
        "Rhondda Cynon Taf",
        "Richmond upon Thames",
        "Rochdale",
        "Rotherham",
        "Rutland",
        "Salford",
        "Sandwell",
        "Scottish Borders",
        "Sefton",
        "Sheffield",
        "Shetland Islands",
        "Shropshire",
        "Slough",
        "Solihull",
        "Somerset",
        "South Ayrshire",
        "South Gloucestershire",
        "South Lanarkshire",
        "South Tyneside",
        "Southampton",
        "Southend-on-Sea",
        "Southwark",
        "St. Helens",
        "Staffordshire",
        "Stirling",
        "Stockport",
        "Stockton-on-Tees",
        "Stoke-on-Trent",
        "Suffolk",
        "Sunderland",
        "Surrey",
        "Sutton",
        "Swansea",
        "Swindon",
        "Tameside",
        "Telford and Wrekin",
        "Thurrock",
        "Torbay",
        "Torfaen",
        "Tower Hamlets",
        "Trafford",
        "Vale of Glamorgan",
        "Wakefield",
        "Walsall",
        "Waltham Forest",
        "Wandsworth",
        "Warrington",
        "Warwickshire",
        "West Berkshire",
        "West Dunbartonshire",
        "West Lothian",
        "West Northamptonshire",
        "West Sussex",
        "Westminster",
        "Wigan",
        "Wiltshire",
        "Windsor and Maidenhead",
        "Wirral",
        "Wokingham",
        "Wolverhampton",
        "Worcestershire",
        "Wrexham",
        "York",
    ]


In [5]:
"""
Included below are a list of helper functions used to query PHE for new data, save that
data to json files or pickles (in the case of UTLA data), convert the data into pandas 
dataframes, and plot the data.
"""


# --------------------------------------------------------------------
# Data fetching, saving, and loading functions
# --------------------------------------------------------------------

"""
Sends a query to the PHE API and returns the JSON data.

Input:
    The filters used to query for data from PHE,
    The structure of the data to request,
    Please see the documentation: 
        https://coronavirus.data.gov.uk/details/developers-guide/main-api#query-parameters
Output:
    The JSON data from PHE, provided from the API in the python structures:
        dictionary { list [ dictionary{} ] }
"""
def query_public_health_england_API_for_data(
    filters_to_query_with, structure_to_query_with
):
    time.sleep(1)
    api = Cov19API(filters=filters_to_query_with, structure=structure_to_query_with)
    data_from_api = api.get_json()
    return data_from_api


def save_dictionary_data_to_JSON_file(
    dictionary_to_save, file_name_with_json_extension
):
    with open(file_name_with_json_extension, "wt") as OUTF:
        json.dump(dictionary_to_save, OUTF)


def load_dictionary_data_from_JSON_file(file_name_with_json_extension):
    variable_to_return = None
    with open(file_name_with_json_extension, "rt") as INFILE:
        variable_to_return = json.load(INFILE)

    return variable_to_return


def save_dataframe_to_pickle(dataframe_to_save, name_of_file_to_save_with_file_extension):
    dataframe_to_save.to_pickle(name_of_file_to_save_with_file_extension)


def load_dataframe_from_pickle(file_name_with_extension):
    return pd.read_pickle(file_name_with_extension)


def query_PHE_and_save_to_JSON(
    filters_to_query_with, structure_to_query_with, file_name_with_json_extension
):
    data_from_api = query_public_health_england_API_for_data(
        filters_to_query_with, structure_to_query_with
    )
    save_dictionary_data_to_JSON_file(data_from_api, file_name_with_json_extension)

    return data_from_api

"""
This function is used to query PHE for new data.
It responds to a button being pressed, and then runs multiple functions to query PHE.
Data from PHE is then saved to json files, overwriting the current files.
The function sleeps for 1 second between each function call, to avoid sending too many requests to PHE too quickly.
"""
def updated_PHE_datasets(button):
    # Vaccination percentages structure:
    vaccination_percentages_structure = {
        "date": "date",
        "percentage_first_vaccinations": "cumVaccinationFirstDoseUptakeByPublishDatePercentage",
        "percentage_second_vaccinations": "cumVaccinationSecondDoseUptakeByPublishDatePercentage",
        "percentage_third_vaccinations": "cumVaccinationThirdInjectionUptakeByPublishDatePercentage",
    }

    structure_for_transmission_rates = {
        "date": "date",
        "transmission_min_growth_rate": "transmissionRateGrowthRateMin",
        "transmission_max_growth_rate": "transmissionRateGrowthRateMax",
        }
    
    # Total vaccination percentages England
    query_PHE_and_save_to_JSON(
        filter_region_nation_england,
        vaccination_percentages_structure,
        "england_vaccination_percentages.json",
    )
    time.sleep(1)

    # Total vaccination percentages Wales
    query_PHE_and_save_to_JSON(
        filter_region_nation_wales,
        vaccination_percentages_structure,
        "wales_vaccination_percentages.json",
    )
    time.sleep(1)

    # Total vaccination percentages Scotland
    query_PHE_and_save_to_JSON(
        filter_region_nation_scotland,
        vaccination_percentages_structure,
        "scotland_vaccination_percentages.json",
    )
    time.sleep(1)

    # Total vaccination percentages England
    query_PHE_and_save_to_JSON(
        filter_region_nation_england,
        structure_for_transmission_rates,
        "england_transmission_rates.json",
    )
    time.sleep(1)

    # Total vaccination percentages Wales
    query_PHE_and_save_to_JSON(
        filter_region_nation_wales,
        structure_for_transmission_rates,
        "wales_transmission_rates.json",
    )
    time.sleep(1)

    # Total vaccination percentages Scotland
    query_PHE_and_save_to_JSON(
        filter_region_nation_scotland,
        structure_for_transmission_rates,
        "scotland_transmission_rates.json",
    )
    time.sleep(1)
    # Total vaccination percentages Scotland


# --------------------------------------------------------------------
# Data wrangling functions
# --------------------------------------------------------------------

"""
Input:
    A set of data in the structure:
    outer_dict{ list[ inner_dicts_of_data{ name_of_data : data } ] }
Output:
    A list of 
Arguments:
    Required:
        The data structure to extract data from,
        The name of the data / field of data of interest 
            (e.g. 'date', 'totalNumberOfPeopleWhoReceivedFirstVaccination').
    Optional:
        The value of the outer_dict.
            By default, this value is set to 'data', to match the data collected from the Public 
            Health England API.
"""
def get_values_of_field_from_list_of_dictionaries(
    list_of_dictionaries, key_of_inner_dict, outer_dictionary_name="data"
):
    return [
        list_of_dictionaries[outer_dictionary_name][i][key_of_inner_dict]
        for i in range(len(list_of_dictionaries[outer_dictionary_name]))
    ]


"""
Input:
    A date value as a string, in the format YYYY-MM-DD.
Output:
    A pandas datetime object based on the input date.
"""
def convert_to_pandas_datetime(date_values):
    return pd.to_datetime(date_values, format="%Y-%m-%d")

"""
Input:
    Preferred start date as a pandas datetime object,
    Preferred end date as a pandas datetime object,
    Desired frequency ('D' for day, 'W' for week, 'M' for month).
Output:
    A series of pandas datetime objects from the start to the end date at the preferred frequency.
"""
def generate_pandas_dataframe_with_start_date_and_end_date(
    start_date, end_date, frequency_to_generate="D"
):
    return pd.date_range(start_date, end_date, freq=frequency_to_generate)


"""
Input:
    A pandas dataframe with a series of a value (i.e. pandas datetime objects),
    A plain python list of column names (as strings).
Output:
    An empty pandas dataframe, with the independent variable used as the index and the list of columns as the dataframe column headers.
"""
def create_pandas_dataframe_with_independent_variable_and_column_names(
    pandas_dataframe_with_data_for_independent_variable, list_of_columns_to_use_in_frame
):
    return pd.DataFrame(
        index=pandas_dataframe_with_data_for_independent_variable,
        columns=list_of_columns_to_use_in_frame,
    )


"""
Input:
    A pandas dataframe which one wants to take a subset of.
        The index of this data frame must be a set of pandas dataframe date values.
    A start date.
    An end date.
Output:
    A pandas dataframe with only the data from the start to the end date.
Note:
    If no start date value is provided, this will default to the earliest date in the original
    dataframe.
    If no end date value is provided, this will default to the latest date in the original
    dataframe.
"""
def get_subset_of_data_frame_based_on_dates(
    data_frame_to_subset, starting_date, end_date
):

    time_stamp_of_starting_date = pd.Timestamp(starting_date)
    time_stamp_of_ending_date = pd.Timestamp(end_date)

    if (
        time_stamp_of_starting_date == None
        or time_stamp_of_starting_date < data_frame_to_subset.index[0]
    ):
        starting_date = data_frame_to_subset.index[0]

    if (
        time_stamp_of_ending_date == None
        or time_stamp_of_ending_date > data_frame_to_subset.index[-1]
    ):
        end_date = data_frame_to_subset.index[-1]

    if time_stamp_of_starting_date > data_frame_to_subset.index[-1]:
        starting_date = data_frame_to_subset.index[-1]

    if time_stamp_of_ending_date < data_frame_to_subset.index[0]:
        end_date = data_frame_to_subset.index[0]

    if time_stamp_of_ending_date < time_stamp_of_starting_date:
        end_date == starting_date

    intermediate_df = data_frame_to_subset[
        data_frame_to_subset.index >= convert_to_pandas_datetime(starting_date)
    ]

    return intermediate_df[
        intermediate_df.index <= convert_to_pandas_datetime(time_stamp_of_ending_date)
    ]


def fill_in_dataframe(raw_data_from_phe, list_of_column_names, dataframe_to_return):
    for entry in raw_data_from_phe["data"]:
        date = convert_to_pandas_datetime(entry["date"])

        for column in list_of_column_names:
            if pd.isna(dataframe_to_return.loc[date, column]):
                value = float(entry[column]) if entry[column] != None else 0
                dataframe_to_return.loc[date, column] = value

    return dataframe_to_return


def wrangle_data_from_phe_to_dataframe(raw_data_from_phe, list_of_column_names):
    # Extract data values from raw data, store in a data frame, and sort.
    pd_of_dates = get_values_of_field_from_list_of_dictionaries(
        raw_data_from_phe, "date"
    )
    pd_of_dates.sort()

    # Make pandas dataframe from dates
    date_values = convert_to_pandas_datetime(pd_of_dates)

    # Make dataframe from list of columns names
    dataframe_to_return = (
        create_pandas_dataframe_with_independent_variable_and_column_names(
            date_values, list_of_column_names
        )
    )

    fill_in_dataframe(raw_data_from_phe, list_of_column_names, dataframe_to_return)

    return dataframe_to_return

"""
Purpose:
    This is a helper function for the vaccination percentages plots, and is exclusive to these plots.
Input:
    The name of a JSON file containing PHE data.
Output:
    The raw data from the PHE JSON file in the structure dict{list[dict{}]}
    A pandas dataframe of the time series data.
"""
def wrangle_vaccination_percentage_data_from_raw_data(data_from_file):
    time_series_of_data = wrangle_data_from_phe_to_dataframe(
        data_from_file,
        [
            "percentage_first_vaccinations",
            "percentage_second_vaccinations",
            "percentage_third_vaccinations",
        ],
    )

    time_series_of_data["percentage_first_vaccinations"] = (
        time_series_of_data["percentage_first_vaccinations"]
        - time_series_of_data["percentage_second_vaccinations"]
    )
    time_series_of_data["percentage_second_vaccinations"] = (
        time_series_of_data["percentage_second_vaccinations"]
        - time_series_of_data["percentage_third_vaccinations"]
    )

    return time_series_of_data

def produce_data_for_vaccination_percentage_plots_from_json(name_of_json_file):
    data_from_file = load_dictionary_data_from_JSON_file(name_of_json_file)

    time_series_of_data = wrangle_vaccination_percentage_data_from_raw_data(
        data_from_file)

    return data_from_file, time_series_of_data

# --------------------------------------------------------------------
# Drawing functions
# --------------------------------------------------------------------
def draw_stacked_bar_plot(
    pandas_dataframe_to_draw_with,
    labels_for_data,
    title,
    x_label,
    y_label,
    horizontal_line_value=None,
    horizontal_line_text=None,
):
    _, ax = plt.subplots()

    stacked_bar_plot_of_vaccination_percentages = pandas_dataframe_to_draw_with.plot(
        kind="bar", stacked=True, ax=ax
    )

    stacked_bar_plot_of_vaccination_percentages.locator_params(axis="x", nbins=20)
    stacked_bar_plot_of_vaccination_percentages.locator_params(axis="y", nbins=10)
    ax.autoscale(enable=True, axis="x")

    ax.legend(labels=labels_for_data, bbox_to_anchor=(1.02, 1), loc="upper left")

    stacked_bar_plot_of_vaccination_percentages.set_title(title)
    stacked_bar_plot_of_vaccination_percentages.set_xlabel(x_label)
    stacked_bar_plot_of_vaccination_percentages.set_ylabel(y_label)

    if horizontal_line_value != None:
        stacked_bar_plot_of_vaccination_percentages.axhline(
            y=horizontal_line_value, linestyle="-."
        )
        stacked_bar_plot_of_vaccination_percentages.text(
            x=0, y=horizontal_line_value + 5, s=horizontal_line_text
        )

def draw_line_chart_with_pandas( 
        pandas_dataframe_to_draw_with, labels_for_data, title, x_label, y_label
    ):
        _, ax = plt.subplots()

        line_bar_plot_of_transmission_rates = pandas_dataframe_to_draw_with.plot(
            kind="line", stacked=False, ax=ax
        )

        # line_bar_plot_of_transmission_rates.locator_params(axis="x", nbins=20)
        # line_bar_plot_of_transmission_rates.locator_params(axis="y", nbins=10)

        ax.autoscale(enable=True, axis="x")

        num_legend_cols = 1
        bbox_anchor_loc = (1.02, 1)
        location_string="upper left"
        if len(pandas_dataframe_to_draw_with.index) > 40:
            num_legend_cols = 4
            bbox_anchor_loc = (0.5, -0.15)
            location_string="upper center"
        
        ax.legend(labels=labels_for_data, bbox_to_anchor=bbox_anchor_loc, loc=location_string, ncol=num_legend_cols)

        line_bar_plot_of_transmission_rates.set_title(title)
        line_bar_plot_of_transmission_rates.set_xlabel(x_label)
        line_bar_plot_of_transmission_rates.set_ylabel(y_label)

### Transmission Rates

The first positive tests for COVID-19 were recorded in late January 2020 ([Source](https://www.bbc.co.uk/news/health-51325192)). Over the following years, the nations of England, Scotland and Wales introduced multiple legal restrictions on movement ('lockdowns'), with the purpose of limiting the spread of the virus. The first lockdown was formally declared on the 23rd March 2020 ([Source](https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns)).

The transmission of the virus was recorded with a reproduction number ('R value'), which summarised how many additional infections were expected for each infection that occurred. For example, with an R value of 2, if one person is infected, two additional people would be expected to become infected too (resulting in three total infections, before considering additional infections beyond that) ([Source](https://www.bbc.co.uk/news/health-52473523)).

The below chart details the percentage change in the number of infections from one day to the next.

For more information on the statistics used, please refer to:

[Public Health England documentation on the maximum transmission rate data.](https://coronavirus.data.gov.uk/metrics/doc/transmissionRateGrowthRateMax)

[Public Health England documentation on the minimum transmission rate data.](https://coronavirus.data.gov.uk/metrics/doc/transmissionRateGrowthRateMin)

In [22]:
"""
This is a class to represent the transmission rate plots.

Contained within are functions specific to loading, wrangling, plotting, and updating this information.
"""
class transmission_rate_plots():

    def __init__(self):
        (self.england_transmissions_rates_from_json_file, self. england_transmission_rates_time_series) = self.produce_data_for_transmission_rates_plots("england_transmission_rates.json")
        (self.wales_transmissions_rates_from_json_file, self.wales_transmission_rates_time_series) = self.produce_data_for_transmission_rates_plots("wales_transmission_rates.json")
        (self.scotland_transmissions_rates_from_json_file, self.scotland_transmission_rates_time_series) = self.produce_data_for_transmission_rates_plots("scotland_transmission_rates.json")
        self.nation_transmission_rate_selector = wdg.SelectMultiple(
            options = ['England Max', 'England Min', 'Wales Max', 'Wales Min', 'Scotland Max', 'Scotland Min'],
            value=['England Max', 'England Min', 'Wales Max', 'Wales Min', 'Scotland Max', 'Scotland Min'],
            rows=6,
            description="Country rates",
            disabled=False
        )
        
        self.tranmission_rate_output=wdg.interactive_output(self.draw_britain_transmission_rate_plot, {'countries_list': self.nation_transmission_rate_selector})
        
        self.update_data_widget = wdg.Button(
            description="Update datasets from Public Health England",
            disabled=False,
            button_style="",
            tooltip="Click me",
            icon="fa-bar-chart",
        )
        self.update_data_widget.on_click(self.update_tranmission_data)
        
        display(self.nation_transmission_rate_selector, self.tranmission_rate_output, self.update_data_widget)
        

    # Transmission rates plot
    def produce_data_for_transmission_rates_plots(self, name_of_json_file):
        data_from_file = load_dictionary_data_from_JSON_file(name_of_json_file)

        time_series_of_data = wrangle_data_from_phe_to_dataframe(
            data_from_file,
            ["transmission_min_growth_rate", "transmission_max_growth_rate"],
        )

        return data_from_file, time_series_of_data

    # Function to draw lines charts, with the maximum and minimum transmission rate data from PHE.
    # Dates of events from: https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns
    def draw_britain_transmission_rate_plot(self, countries_list):
        _, ax = plt.subplots()
        base_df = self.england_transmission_rates_time_series.copy()
        base_df['transmission_min_growth_rate'] = 0.0
        base_df['transmission_max_growth_rate'] = 0.0
        base_df.plot(ax=ax, stacked=False, grid =True, title="Transmission Rate (R-value) in the nations of Britain", xlabel="Transmission rate (weeks)", ylabel="Number of secondary infections from each infection")

        if 'England Max' in countries_list:
            eng_max_df = self.england_transmission_rates_time_series['transmission_max_growth_rate']
            eng_max_df.plot(kind="line", stacked=False, ax=ax, grid=True)
        if 'England Min' in countries_list:
            eng_min_df = self.england_transmission_rates_time_series['transmission_min_growth_rate']
            eng_min_df.plot(kind="line", stacked=False, ax=ax, grid=True)
        if 'Wales Max' in countries_list:
            wal_max_df = self.wales_transmission_rates_time_series['transmission_max_growth_rate']
            wal_max_df.plot(kind="line", stacked=False, ax=ax, grid=True, linestyle="-")
        if 'Wales Min' in countries_list:
            wal_min_df = self.wales_transmission_rates_time_series['transmission_min_growth_rate']
            wal_min_df.plot(kind="line", stacked=False, ax=ax, grid=True, linestyle="-")
        if 'Scotland Max' in countries_list:
            scot_max_df = self.scotland_transmission_rates_time_series['transmission_max_growth_rate']
            scot_max_df.plot(kind="line", stacked=False, ax=ax, grid=True, linestyle="-.")
        if 'Scotland Min' in countries_list:
            scot_min_df = self.scotland_transmission_rates_time_series['transmission_min_growth_rate']
            scot_min_df.plot(kind="line", stacked=False, ax=ax, grid=True, linestyle="-.")

        ax.lines.pop(0)
        ax.lines.pop(0)
        
        ax.axvline(x=pd.to_datetime('2020-03-23', format="%Y-%m-%d"), ls='--', label='First lockdown declared')
        ax.axvline(x=pd.to_datetime('2020-06-23', format="%Y-%m-%d"), ls='--', label='Relaxing of restrictions')
        ax.axvline(x=pd.to_datetime('2020-06-22', format="%Y-%m-%d"), ls='--', label='New restrictions in England')
        ax.axvline(x=pd.to_datetime('2020-10-31', format="%Y-%m-%d"), ls='--', label='Second lockdown declared')
        ax.axvline(x=pd.to_datetime('2020-12-02', format="%Y-%m-%d"), ls='--', label='Second lockdown ended')
        ax.axvline(x=pd.to_datetime('2021-01-06', format="%Y-%m-%d"), ls='--', label='Third lockdown declared')
        ax.axvline(x=pd.to_datetime('2021-07-19', format="%Y-%m-%d"), ls='--', label='Most legal restrictions lifted')
        
        ax.legend(
            labels=[
                'England Max Transmission Rate',
                'England Min Transmission Rate',
                'Wales Max Transmission Rate',
                'Wales Min Transmission Rate',
                'Scotland Max Transmission Rate',
                'Scotland Min Transmission Rate',
                '2020-03-23: First lockdown declared',
                '2020-06-23: Relaxing of restrictions',
                '2020-06-22: New restrictions in England',
                '2020-10-31: Second lockdown declared',
                '2020-12-02: Second lockdown ended',
                '2021-01-06: Third lockdown declared',
                '2021-07-19: Most legal restrictions lifted'
            ],
            bbox_to_anchor=(1.02, 1),
            loc="upper left",
        )
        
        ax.autoscale(enable=True, axis="x")
        
    def update_tranmission_data(self, button):
        
        self.england_transmissions_rates_from_json_file = query_public_health_england_API_for_data(
            ["areaType=nation", "areaName=England"], 
            {
                "date": "date",
                "transmission_min_growth_rate": "transmissionRateGrowthRateMin",
                "transmission_max_growth_rate": "transmissionRateGrowthRateMax",
            }
        )

        self.england_transmission_rates_time_series = wrangle_data_from_phe_to_dataframe(
            self.england_transmissions_rates_from_json_file,
            ["transmission_min_growth_rate", "transmission_max_growth_rate"],
        )

        self.wales_transmissions_rates_from_json_file = query_public_health_england_API_for_data(
            ["areaType=nation", "areaName=Wales"], 
            {
                "date": "date",
                "transmission_min_growth_rate": "transmissionRateGrowthRateMin",
                "transmission_max_growth_rate": "transmissionRateGrowthRateMax",
            }
        )

        self.wales_transmission_rates_time_series = wrangle_data_from_phe_to_dataframe(
            self.wales_transmissions_rates_from_json_file,
            ["transmission_min_growth_rate", "transmission_max_growth_rate"],
        )

        self.scotland_transmissions_rates_from_json_file = query_public_health_england_API_for_data(
            ["areaType=nation", "areaName=Scotland"], 
            {
                "date": "date",
                "transmission_min_growth_rate": "transmissionRateGrowthRateMin",
                "transmission_max_growth_rate": "transmissionRateGrowthRateMax",
            }
        )

        self.scotland_transmission_rates_time_series = wrangle_data_from_phe_to_dataframe(
            self.scotland_transmissions_rates_from_json_file,
            ["transmission_min_growth_rate", "transmission_max_growth_rate"],
        )
        

tranmission_rate_plots_instance = transmission_rate_plots()

SelectMultiple(description='Country rates', index=(0, 1, 2, 3, 4, 5), options=('England Max', 'England Min', '…

Output()

Button(description='Update datasets from Public Health England', icon='bar-chart', style=ButtonStyle(), toolti…

### Vaccination rates

The first vaccination in Britain was given on 8 December 2020, to Margaret Keenan, a patient at the NHS ([Source](https://www.england.nhs.uk/2020/12/landmark-moment-as-first-nhs-patient-receives-covid-19-vaccination/)). This was the Pfizer-BioTech vaccine and marked the start of Britain's vaccination campaign. Vaccinations were initially provided to the most vulnerable (the elderly and those with weakened immune systems), starting with persons aged over 80 years old ([Source](https://www.england.nhs.uk/2020/12/nhs-vaccine-programme-turning-point-in-battle-against-the-pandemic/)).

The types of vaccinations included viral vector vaccines produced by Oxford-AstraZeneca (which achieved an efficacy rate of 72% according to the WHO ([Source](https://www.who.int/news-room/feature-stories/detail/the-oxford-astrazeneca-covid-19-vaccine-what-you-need-to-know)) and Johnson and Johnson ([Source](https://vk.ovg.ox.ac.uk/vk/covid-19-vaccines)). The COVID-19 pandemic further saw the first mass production of mRNA technology to vaccine production, with the introduction of the Moderna and Pfizer-BioNTech vaccines.

[More information on the Moderna Vaccine](https://www.who.int/news-room/feature-stories/detail/the-moderna-covid-19-mrna-1273-vaccine-what-you-need-to-know)

[More information on the Pfizer-BioNTech Vaccine](https://www.who.int/news-room/feature-stories/detail/who-can-take-the-pfizer-biontech-covid-19--vaccine-what-you-need-to-know)

Over eighty percent of the populations of England, Wales and Scotland had recevied at least one dose of a coronavirus vaccination by mid-2021. In the wake of new covid varients, further booster vaccinations were recommended. By late 2022, approximately 70% of the populations of each of these nations had received three vaccinations. Up until December 2022, 100% vaccination had not been achieved, and some five percent of the population in each region had only received the first vaccination.

The below data only accounts for persons over 12 years old.

After clicking 'update datasets', please change on of the date values to show the latest data.


For more information on the statistics used, please refer to:

https://coronavirus.data.gov.uk/metrics/doc/cumVaccinationFirstDoseUptakeByPublishDatePercentage

https://coronavirus.data.gov.uk/metrics/doc/cumVaccinationSecondDoseUptakeByPublishDatePercentage

https://coronavirus.data.gov.uk/metrics/doc/cumVaccinationThirdInjectionUptakeByPublishDatePercentage

In [None]:
"""
This class is used to display three stacked bar plots, which show the percentage of the nations of England, Wales and Scotland who are vaccinated, and how many vaccinations each of those groups have had.

Please note that the data from PHE only counts for persons over 12 years of age.

"""
class vaccination_percentages_plots():
    def __init__(self):
        (self.england_vaccination_rates_from_json_file, self.england_vaccination_rates_time_series) = produce_data_for_vaccination_percentage_plots_from_json("england_vaccination_percentages.json")
        (self.wales_vaccination_rates_from_json_file, self.wales_vaccination_rates_time_series) = produce_data_for_vaccination_percentage_plots_from_json("wales_vaccination_percentages.json")
        (self.scotland_vaccination_rates_from_json_file, self.scotland_vaccination_rates_time_series) = produce_data_for_vaccination_percentage_plots_from_json("scotland_vaccination_percentages.json")

        self.vaccination_percentages_start_date = wdg.DatePicker(
            description="Pick a Date",
            disabled=False,
            value=self.england_vaccination_rates_time_series.index[0],
        )

        self.vaccination_percentages_end_date = wdg.DatePicker(
            description="Pick a Date",
            disabled=False,
            value=self.england_vaccination_rates_time_series.index[-1],
        )

        self.vaccination_percentages_recalculate_button = wdg.Button(
            description="Redraw bar chart",
            disabled=False,
            button_style="",
            tooltip="Click me",
            icon="fa-bar-chart",
        )

        self.vaccination_percentages_output = wdg.interactive_output(
            self.redraw_vaccination_percentages_chart_in_response_to_event,
            {
                "start_date": self.vaccination_percentages_start_date,
                "end_date": self.vaccination_percentages_end_date,
            },
        )

        self.update_data_widget = wdg.Button(
            description="Update datasets from Public Health England",
            disabled=False,
            button_style="",
            tooltip="Click me",
            icon="fa-bar-chart",
        )
        self.update_data_widget.on_click(self.update_tranmission_data)

        display(
            self.vaccination_percentages_start_date,
            self.vaccination_percentages_end_date,
            self.vaccination_percentages_output,
            self.update_data_widget
        )


    def redraw_vaccination_percentages_chart_in_response_to_event(self, start_date, end_date):
        draw_stacked_bar_plot(
            get_subset_of_data_frame_based_on_dates(
                self.england_vaccination_rates_time_series,
                start_date,
                end_date,
            ),
            labels_for_data=["One vaccination", "Two vaccinations", "Three vaccinations"],
            title="Percentage of people vaccinated in England over time",
            x_label="Time",
            y_label="Percentage of population",
        )

        draw_stacked_bar_plot(
            get_subset_of_data_frame_based_on_dates(
                self.wales_vaccination_rates_time_series,
                start_date,
                end_date,
            ),
            labels_for_data=["One vaccination", "Two vaccinations", "Three vaccinations"],
            title="Percentage of people vaccinated in Wales over time",
            x_label="Time",
            y_label="Percentage of population",
        )

        draw_stacked_bar_plot(
            get_subset_of_data_frame_based_on_dates(
                self.scotland_vaccination_rates_time_series,
                start_date,
                end_date,
            ),
            labels_for_data=["One vaccination", "Two vaccinations", "Three vaccinations"],
            title="Percentage of people vaccinated in Scotland over time",
            x_label="Time",
            y_label="Percentage of population",
        )


    def update_tranmission_data(self, button):

        vaccination_percentages_structure = {
            "date": "date",
            "percentage_first_vaccinations": "cumVaccinationFirstDoseUptakeByPublishDatePercentage",
            "percentage_second_vaccinations": "cumVaccinationSecondDoseUptakeByPublishDatePercentage",
            "percentage_third_vaccinations": "cumVaccinationThirdInjectionUptakeByPublishDatePercentage",
        }

        self.england_vaccination_rates_from_json_file = query_public_health_england_API_for_data(
            ["areaType=nation", "areaName=England"], 
            vaccination_percentages_structure
        )
        
        self.england_vaccination_rates_time_series = wrangle_vaccination_percentage_data_from_raw_data(
            self.england_vaccination_rates_from_json_file
        )

        self.wales_vaccination_rates_from_json_file = query_public_health_england_API_for_data(
            ["areaType=nation", "areaName=Wales"], 
            vaccination_percentages_structure
        )
        
        self.wales_vaccination_rates_time_series = wrangle_vaccination_percentage_data_from_raw_data(
            self.wales_vaccination_rates_from_json_file
        )

        self.scotland_vaccination_rates_from_json_file = query_public_health_england_API_for_data(
            ["areaType=nation", "areaName=Scotland"], 
            vaccination_percentages_structure
        )
        
        self.scotland_vaccination_rates_time_series = wrangle_vaccination_percentage_data_from_raw_data(
            self.scotland_vaccination_rates_from_json_file
        )
        

vaccination_percentage_plots_instance = vaccination_percentages_plots()

### Cumulative deaths

There two two major peaks of COVID-19 deaths in the United Kingdom, on a seven-day moving average. The first was 9 April 2020, when approximately 975 people were dying each day. The second was on 19 January 2021, when approximately 1,291 people died each day. ([Source](https://coronavirus.data.gov.uk/details/deaths)) These numbers may be undercounting the total number of deaths, as they only include those who tested positive for COVID-19 within 28 days of their deaths.

Across most UTLA regions, there was a large spike in the cumulative number of deaths at the beginning of 2021, corresponding with the second major spike in the death rate.

Please check and uncheck boxes below to see the trends of different UTLA regions in the UK, and then click 'redraw chart' to display this data.

WARNING: The button to update the UTLA datasets takes a significant amount of time to run. Please only use this if needed.

For more information on the statistics used, please refer to:

https://coronavirus.data.gov.uk/metrics/doc/cumDeaths28DaysByDeathDate

In [None]:
"""
This is a class to load, wrangle, and display cumulative deaths across different Upper Tier Local Authorities (UTLA).

PHE requires that queries for each UTLA are sent individually. To reduce the initial load time of an instance of this class,
it loads from a pickle file rather than opening up each individual JSON file.

A button to update the entire dataset and save the new dataset to a pickle file is provided. To avoid sending too many queries 
to PHE too quickly, the program will sleep for 1 to 5 seconds between each query. It is preferable not to use this unless 
necessary, as it takes a significant amount of time to run.

Features not currently supported, but would be good to add:
- Overlapping the legend with the checkboxes,
- Buttons to check and uncheck all checkboxes,
- The ability to click on a line to hide it,
- Date selection options,
- A hovering feature to show the cumulative death count (available with mplcursors).
"""
class utla_plot_class():

    def __init__(self):
        # UTLA cumulative deaths plot
        self.utla_deaths_df = load_dataframe_from_pickle("ulta_deaths_pickle.pkl")

        # Widgets to select utla region, and display in four columns
        self.list_of_checkboxes = [
            wdg.Checkbox(value=True, description=utla_name)
            for utla_name in sorted(self.utla_deaths_df.columns)
        ]

        self.utla_checkboxes = wdg.HBox(
            [
                wdg.VBox(self.list_of_checkboxes[0 : (len(self.list_of_checkboxes) // 4)]),
                wdg.VBox(self.list_of_checkboxes[(len(self.list_of_checkboxes) // 4) : (len(self.list_of_checkboxes) * 2 // 4)]),
                wdg.VBox(self.list_of_checkboxes[(len(self.list_of_checkboxes) * 2 // 4) : (len(self.list_of_checkboxes) * 3 // 4)]),
                wdg.VBox(self.list_of_checkboxes[(len(self.list_of_checkboxes) * 3 // 4) : -1]),
            ]
        )

        self.redraw_utla_data_button = wdg.ToggleButton(
            value=False,
            description='Redraw chart',
            disabled=False,
        )

        self.utla_output = wdg.interactive_output(
            self.redraw_utla_death_rates_line_chart,
            {
                "redraw_prompt": self.redraw_utla_data_button
            },
        )

        self.utla_death_data_update_button = wdg.Button(
            description="Update data",
            disabled=False,
            tooltip="Click me",
            icon="fa-bar-chart",
            button_style="danger"
        )
        self.utla_death_data_update_button.on_click(self.update_utla_death_data)

        display(self.utla_output, self.utla_checkboxes, self.redraw_utla_data_button, self.utla_death_data_update_button)

    # Create date index for dataframe:
    def create_pandas_date_index_for_utla_data(self):
        list_of_pandas_dates_for_utla_deaths = (
            generate_pandas_dataframe_with_start_date_and_end_date(
                start_date=pd.to_datetime("2020-02-01", format="%Y-%m-%d"),
                end_date=pd.to_datetime("today").normalize(),
                frequency_to_generate="D",
            )
        )

        return list_of_pandas_dates_for_utla_deaths


    # Get list of area names for dataframe
    def create_list_of_utla_names(self):
        list_of_utla_names = []
        list_of_files_in_directory = os_listdir()
        i = 0
        for file_name in list_of_files_in_directory:
            if file_name[0:4] == "utla" and file_name.find("death_rates") != -1:
                with open(file_name, "rt") as INFILE:
                    utla_file_data = json.load(INFILE)
                    try:
                        current_utla_area_name = utla_file_data["data"][0]["area_name"]
                        list_of_utla_names.append(current_utla_area_name)
                    except:
                        # If a utla json file was created due to the query succeeding, but PHE had no data, then skip.
                        continue

        return list_of_utla_names

    # Check each file and fill in dataframe
    def load_data_from_utla_files_and_store_in_pandas_dataframe(self):
        list_of_pandas_dates_for_utla_deaths = self.create_pandas_date_index_for_utla_data()
        list_of_utla_names = self.create_list_of_utla_names()

        utla_deaths_df_to_return = pd.DataFrame(
            columns=list_of_utla_names, index=list_of_pandas_dates_for_utla_deaths
        )

        list_of_files_in_directory = os_listdir()

        for file_name in list_of_files_in_directory:
            if file_name[0:4] == "utla":
                with open(file_name, "rt") as INFILE:
                    utla_file_data = json.load(INFILE)
                    try:
                        current_utla_area_name = utla_file_data["data"][0]["area_name"]

                        # Fill in the utla cumulative deaths dataframe
                        for entry in utla_file_data["data"]:
                            date = convert_to_pandas_datetime(entry["date"])

                            if pd.isna(
                                utla_deaths_df_to_return.loc[date, current_utla_area_name]
                            ):
                                # Assume that any instances of None were before any deaths were recorded in that area
                                value = (
                                    float(entry["cumulative_deaths_28_days"])
                                    if entry["cumulative_deaths_28_days"] != None
                                    else 0
                                )
                                utla_deaths_df_to_return.loc[
                                    date, current_utla_area_name
                                ] = value

                    except:
                        continue

        return utla_deaths_df_to_return

    """
    Removes all NaN values from the UTLA pandas dataframe.

    Assumptions:
        All NaN values from before deaths were first recorded are set to 0.0.
        All NaN values afterwards simply carry over the cumulative death count.

    Input:
        The UTLA dataframe to be changed.

    Output:
        A new pandas dataframe with all NaN values in the original dataframe replaced in line with the assumptions above.
    """
    def remove_nan_values_from_ulta_death_frame(utla_deaths_df_in):
        for utla_name in utla_deaths_df_in:
            for date_value in utla_deaths_df_in.index:

                try:
                    if pd.isna(utla_deaths_df_in.loc[date_value, utla_name]):
                        if (
                            utla_deaths_df_in.loc[
                                date_value - datetime.timedelta(days=1), utla_name
                            ]
                            > 0
                        ):
                            utla_deaths_df_in.loc[
                                date_value, utla_name
                            ] = utla_deaths_df_in.loc[
                                date_value - datetime.timedelta(days=1), utla_name
                            ]
                        else:
                            utla_deaths_df_in.loc[date_value, utla_name] = 0.0

                except:
                    utla_deaths_df_in.loc[date_value, utla_name] = 0.0

        return utla_deaths_df_in

    # Function to redraw utla line chart
    def check_current_selected_utla(self):
        return sorted([
            checked_box_name.description
            for checked_box_name in self.list_of_checkboxes
            if checked_box_name.value == True
        ])

    def redraw_utla_death_rates_line_chart(self, redraw_prompt):
        
        draw_line_chart_with_pandas(
            self.utla_deaths_df[sorted(self.check_current_selected_utla())],
            labels_for_data=sorted(self.check_current_selected_utla()),
            title="Cumulative deaths across utla areas",
            x_label="Time (Days)",
            y_label="Cumulative number of deaths",
        )

    # Query PHE for new UTLA death data
    """
    WARNING:
        These functions query PHE once for each UTLA area.
        To avoid sending too many queries to PHE too quickly, the function will wait for 1 to 5 seconds between making one query and then making another query.
    """
    def fetch_updated_utla_death_rate_data_from_PHE(self):
        structure_for_utla_death_rates = {
            "date": "date",
            "area_name": "areaName",
            "cumulative_deaths_28_days": "cumDeaths28DaysByDeathDate",
        }

        utla_regions = get_utla_region_list()

        for utla_region in utla_regions:
            # Wait for a period of 1 to 5 seconds between queries, to avoid overloading PHE
            time.sleep(random.randint(1, 5))

            query_PHE_and_save_to_JSON(
                ["areaType=utla", "areaName=" + utla_region],
                structure_for_utla_death_rates,
                "utla_" + str(utla_region).lower() + "_death_rates.json",
            )

    """
    WARNING: 
        This operation takes a significant amount of time, and should not be used unless necessary.

    Data after data is received from PHE and saved to JSON files, it is then reloaded into memory, wrangled into a pandas dataframe and saved to a pickle.
    The purpose of saving this to a pickle is to avoid having to open over 100 json files and wrangle the data within them into a pandas dataframe each time the dashboard is loaded.
    """
    def update_utla_death_data(self, button):
        user_willing_to_proceed = input(
            "Warning: this operation will take a significant amount of time. If you are willing to proceed, please type in 'yes'."
        )
        if user_willing_to_proceed != "yes":
            return

        self.fetch_updated_utla_death_rate_data_from_PHE()

        utla_deaths_df_updated_with_json = (
            self.load_data_from_utla_files_and_store_in_pandas_dataframe()
        )

        utla_deaths_df_nans_removed = self.remove_nan_values_from_ulta_death_frame(
            utla_deaths_df_updated_with_json
        )

        save_dataframe_to_pickle(utla_deaths_df_nans_removed, "ulta_deaths_pickle.pkl")

        self.utla_deaths_df = load_dataframe_from_pickle("ulta_deaths_pickle.pkl")
    
utla_chart_instance = utla_plot_class()

##### Updating JSON files

The below buttons will update all JSON files used to create the transmission and vaccination percentage charts above. It will not update UTLA data.

Please note that these will not automatically lead to the above charts being updated. The above buttons should be used for that instead.


In [None]:
# Button to query PHE for new data
update_datasets = wdg.Button(
    description="Update datasets from Public Health England",
    disabled=False,
    button_style="",
    tooltip="Click me",
    icon="fa-bar-chart",
)

update_datasets.on_click(updated_PHE_datasets)

display(update_datasets)

Student Number: 221067606

This dashboard is based on: [DIY Covid-19 Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) (C) Fabrizio Smeraldi, 2020 ([f.smeraldi@qmul.ac.uk](mailto:f.smeraldi@qmul.ac.uk) - [web](http://www.eecs.qmul.ac.uk/~fabri/)). All rights reserved.