# DIY Disease Tracking Dashboard

The topic chosen to initialise the dashboard is Covid-19.

COVID-19 emerged in late-2019 and quickly spread across the globe, leading to a pandemic that impacted daily life. The virus spreaded through respiratory droplets, and its symptoms ranged from mild to severe, from flu-like symptoms to some individuals developing life-long conditions. Governments implemented widespread measures like lockdowns, social distancing, and mask mandates to curb its transmission, while scientists worked tirelessly to develop vaccines. The pandemic disrupted global economies, devastated healthcare systems, and altered how people work, learn, and socialise. Although vaccines and treatments have significantly reduced the impact, COVID-19 continues to pose a challenge, with new variants emerging and ongoing efforts to manage the virus.

***

In [1]:
from IPython.display import clear_output
import ipywidgets as wdg
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
import requests
import time
import json

In [2]:
%matplotlib inline
# make figures larger
plt.rcParams['figure.dpi'] = 100

In [3]:
#standard variables
parameters = {"theme": "infectious_disease", 
              "sub_theme": "respiratory",
              "topic": "COVID-19",
              "geography_type": "Nation", 
              "geography": "England",
             }

In [4]:
metrics = {'cases': 'COVID-19_cases_casesByDay',
          'healthcare': 'COVID-19_healthcare_admissionByDay',
          'deaths': 'COVID-19_deaths_ONSByDay'}

In [5]:
filters={"stratum" : None, # Smallest subgroup a metric can be broken down into e.g. ethnicity, testing pillar
         "age": None, # Smallest subgroup a metric can be broken down into e.g. 15_44 for the age group of 15-44 years
         "sex": None, #  Patient gender e.g. 'm' for Male, 'f' for Female or 'all' for all genders
         "year": 2024, #  Epi year of the metrics value (important for annual metrics) e.g. 2020
         "month": None, # Epi month of the metric value (important for monthly metrics) e.g. 12
         "epiweek" :None, # Epi week of the metric value (important for weekly metrics) e.g. 30
         "date" : None, # The date which this metric value was recorded in the format YYYY-MM-DD e.g. 2020-07-20
         "in_reporting_delay_period": None # Boolean indicating whether the data point is considered to be subject to retrospective updates
        }

In [6]:
class APIwrapper:
    _access_point = "https://api.ukhsa-dashboard.data.gov.uk"
    _last_access=0.0
    def __init__(self, theme, sub_theme, topic, geography_type, geography, metric):
        url_path=(f"/themes/{theme}/sub_themes/{sub_theme}/topics/{topic}/geography_types/" +
                  f"{geography_type}/geographies/{geography}/metrics/{metric}")
        self._start_url=APIwrapper._access_point+url_path
        self._filters=None
        self._page_size=-1
        self.count=None

    def get_page(self, filters={}, page_size=5):
        if page_size>365:
            raise ValueError("Max supported page size is 365")
        if filters!=self._filters or page_size!=self._page_size:
            self._filters=filters
            self._page_size=page_size
            self._next_url=self._start_url
        if self._next_url==None: 
            return []
        curr_time=time.time()
        deltat=curr_time-APIwrapper._last_access
        if deltat<0.33:
            time.sleep(0.33-deltat)
        APIwrapper._last_access=curr_time
        parameters={x: y for x, y in filters.items() if y!=None}
        parameters['page_size']=page_size
        response = requests.get(self._next_url, params=parameters).json()
        self._next_url=response['next']
        self.count=response['count']
        return response['results'] 

    def get_all_pages(self, filters={}, page_size=365):
        data=[]
        while True:
            next_page=self.get_page(filters, page_size)
            if next_page==[]:
                break
            data.extend(next_page)
        return data

In [7]:
# Load JSON files and store the raw data in some variable. Edit as appropriate
# Note to self: all 'metric' values are stored under 'metric_value' regardless of metric group/type.

"""
To do list (OOP):
- Function to manipulate data (pandas, numpy) as dataframe. Done
- Function to visualise data (matplotlib). Done
- Functional interactive buttons (ipywidget). Full marks: create a 'refresh' button that uses the current datetime (from time lib). Done
- Additional functionalities (optional, not necessary): Add different graph (pie chart) using the same database. Done
"""

with open('cases.json', "r") as f_cases:
    cases = json.load(f_cases)
with open("deaths.json", "r") as f_deaths:
    deaths = json.load(f_deaths)
with open("admissions.json", "r") as f_admissions:
    admissions = json.load(f_admissions)

In [None]:
def parse_date(datestring):
    return pd.to_datetime(datestring, format="%Y-%m-%d")

def wrangle_data(rawdata):
    #initialise pointers
    data = {}
    column_names = []
    
    global curr_metrics
    curr_metrics = {}
    
    try:
        for dataset in rawdata:
            for entry in dataset:
                date = entry['date']
                metric = entry['metric']
                value = entry['metric_value']
                #for data dictionary
                if date not in data:
                    data[date]={}
                data[date][metric]=value
                #for column_names
                if entry['metric_group'] not in column_names:
                    column_names.append(entry['metric_group'])
                    curr_metrics[f"{entry['metric_group']}"] = f"{entry['metric']}"
    
    ### Error occured when only dataset was selected as first 'for' loop looked into the set. So 'TypeError' catches this.
    ### Also flex coding abilities. 
    except TypeError:
        for entry in rawdata:
                date = entry['date']
                metric = entry['metric']
                value = entry['metric_value']
                if date not in data:
                    data[date]={}
                data[date][metric]=value
                if entry['metric_group'] not in column_names:
                    column_names.append(entry['metric_group'])
                    curr_metrics[f"{entry['metric_group']}"] = f"{entry['metric']}"
                    
    ###Setting date as indexes
    dates=list(data.keys())
    dates.sort()
    startdate=parse_date(dates[0])
    enddate=parse_date(dates[-1])
    index=pd.date_range(startdate, enddate, freq='D')
    timeseriesdf=pd.DataFrame(index=index, columns=column_names)
    
    for date, entry in data.items():
        pd_date = parse_date(date)
        for column in column_names:
            metric_name = metrics[column]
            value = entry.get(metric_name, 0.0)
            timeseriesdf.loc[date, column]=value
    timeseriesdf.fillna(0.0, inplace=True)
    return timeseriesdf

df=wrangle_data([cases, deaths, admissions]) #df is the dataframe for plotting

In [None]:
###for global year metric
global date_year
date_year = []
for date in df.index.year:
    if date not in date_year:
        date_year.append(date)

## Graphs and Analysis

### Line Graph

Data pulled from United Kingdom Health Security Agency (UKHSA) API.
This Line graph will show the amount of cases, admissions to healthcare hospitals, and/or number of deaths depending on the options chosen.  

For example, you can see the highest increase was in 2022 compared to any other year (simply multi-select 2022 and any other year to compare).

Instructions: 
- If you would like to view multiple metrics, use the Select-Box below the dropdown. In order to see data from 2020-2024, please select all options for 'Year'.
- Hold 'Ctrl'/'Cmd' and left-click to select multiple!
- If the graph is spikey, use the Bandwidth slider to smoothen the curve for better visual representation!

In [None]:
#defining widgets
keys_metric = [key for key in curr_metrics.keys()]
series = wdg.SelectMultiple(
    options = keys_metric,
    value = [keys_metric[0]],
    rows = len(keys_metric),
    description = 'Statistic: ',
    disabled = False
)

scale = wdg.RadioButtons(
    options = ['linear', 'log'],
    description = 'Scale: ',
    disabled = False
)

year_metric = [x for x in date_year]
year = wdg.SelectMultiple(
    options = year_metric,
    description='Year: ',
    disabled=False
)

bandwidth = wdg.FloatSlider(
    value=5,
    min=1,
    max=10,
    step=1,
    description='Bandwidth:',
    continuous_update=False,  # Update only when the slider stops moving
)

In [None]:
### graph and widget initialisation

controls=wdg.HBox([series, year, scale, bandwidth])

output_widget = wdg.Output()

def timeseries_graph(gcols, gscale, gyear, bw):
    with output_widget:
        plt.clf()

        ###set scale to linear or logarithmic
        if gscale=='linear':
            logscale=False
        else:
            logscale=True

        ###check with global to see if correct
        global placeholder_cm
        placeholder_cm = {}                
        global curr_metrics
        for series in gcols:
            if series in curr_metrics.keys():
                placeholder_cm[f'{series}'] = curr_metrics[series]
        
        ###Selecting the year. Error occured when only 1 year was chosen, 'isinstance' will change the single year to a tuple/list to counter act this error.
        if isinstance(gyear, int):
            gyear = [gyear]
        
        ### if all years or no years are selected, then show from 2020 - 2024
        if len(gyear) == len(year_metric) or len(gyear) == 0:
            for col in gcols:
                smoothed_data = df[col].rolling(window=int(bw), min_periods=1).mean()
                plt.plot(smoothed_data, label=f"{col} (Smoothed)")
                plt.xlabel("Date")
        
        ### else show selected years overlapping to compare
        else:
            for year in gyear:
                if isinstance(year, int):
                    filtered_df = df[df.index.year == year][list(gcols)]
                    normalised_index = filtered_df.index.map(lambda x: x.day_of_year)
                    for col in gcols:
                        smoothed_data = filtered_df[col].rolling(window=int(bw), min_periods=1).mean()
                        plt.plot(normalised_index, smoothed_data, label=f"{col} ({year})")
                    plt.xlabel("Day of Year")
    plt.legend(loc='best')
    plt.yscale('log' if logscale else 'linear')
    plt.show()

graph = wdg.interactive_output(timeseries_graph, {'gcols': series, 'gscale': scale, 'gyear': year, 'bw': bandwidth})

### initialise graph with interactive widgets
display(controls, graph, output_widget)

In [None]:
def refresh_graph():
    ### changed code to allow for multiple views in graph (error occured where 2nd graph would present error)
    current = series.value
    current = tuple(current) if not isinstance(current, tuple) else current
    if len(current) > 0:
        other = (series.options[0],) if current[0] != series.options[0] else (series.options[1],)
    else:
        other = (series.options[0],) 
    series.value = other 
    series.value = current  

def access_api():
    list_temp = []
    ### Future functionality of using various metrics (so parameters would have to be globally record to know what 'stage' the graph is in)
    global parameters
    parameters_temp = parameters
    ### Same as above
    global curr_metrics
    for metric in curr_metrics.values():
        parameters_temp['metric'] = f'{metric}'
        api = APIwrapper(**parameters_temp)
        data = api.get_all_pages()
        list_temp.append(data)
    return list_temp

def api_button_callback(button):
    try:
        apidata = access_api()
        global df
        df = wrangle_data(apidata)
        apibutton.icon = "check"
        refresh_graph()
    except requests.exceptions.HTTPError as err:
        raise SystemExit(err)

apibutton = wdg.Button(
    description = 'Refresh data',
    disabled = False,
    button_style = 'Info',
    tooltip = "Click to Refresh graph",
    icon = 'refresh'
)

apibutton.on_click(api_button_callback)
### initialise button
display(apibutton)

***
## Author and License

Eric Yuzon


Special Acknowledgements: 
Based on UK Government [data](https://ukhsa-dashboard.data.gov.uk/) published by the [UK Health Security Agency](https://www.gov.uk/government/organisations/uk-health-security-agency) and on the [DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) by Fabrizio Smeraldi. Released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/)."