[DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) (C) Fabrizio Smeraldi, 2020,2024 ([f.smeraldi@qmul.ac.uk](mailto:f.smeraldi@qmul.ac.uk) - [web](http://www.eecs.qmul.ac.uk/~fabri/)). This notebook is released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/).

# DIY Disease Tracking Dashboard

This is a template for your DIY Disease Tracking Dashboard, to which you can add the code you developed in the previous notebooks. The dashboard will be displayed using [voila](https://voila.readthedocs.io/en/stable/index.html), a Python dashboarding tool that converts notebooks to standalone dashboards. Contrary to the other libraries we have seen, the ```voila``` package must be installed using *pip* or *conda* but it does not need to be imported - it rather acts at the level of the notebook server. Package ```voila``` is already installed on the QMUL JupyterHub as well as in the Binder - to install it locally, follow the [instructions](https://voila.readthedocs.io/en/stable/install.html) online.

Broadly speaking, Voila acts by **running all the cells in your notebook** when the dashboard is first loaded; it then hides all code cells and displays all markdown cells and any outputs, including widgets. However, the code is still there in the background and handles any interaction with the widgets. To view this dashboard template rendered in Voila click [here](https://mybinder.org/v2/gh/fsmeraldi/diy-covid19dash/main?urlpath=%2Fvoila%2Frender%2FDashboard.ipynb).

In [1]:
from IPython.display import clear_output
import ipywidgets as wdg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import time
import json
import os

In [2]:
%matplotlib inline
# make figures larger
plt.rcParams['figure.dpi'] = 100

## Load initial data from disk

You should include "canned" data in ```.json``` files along with your dashboard. When the dashboard starts, it should load that data and assign it as a dictionary to the ```jsondata``` variable (the code below will be hidden when the dashboard is rendered by Voila).

In [3]:
# Load JSON files and store the raw data in some variable. Edit as appropriate
jsondata = {}
# List of JSON files to load
json_files = ["vaccinations_autumn22.json"]

# Load each JSON file into the dictionary
for file_name in json_files:
    if os.path.exists(file_name):  # Check if the file exists
        try:
            with open(file_name, 'r') as file:
                jsondata[file_name] = json.load(file)
            print(f"{file_name} loaded successfully.")
        except json.JSONDecodeError:
            print(f"Error: {file_name} contains invalid JSON.")
    else:
        print(f"Error: {file_name} not found.")
        

vaccinations_autumn22.json loaded successfully.


In [4]:
with open("vaccinations_autumn22.json", "r") as f:
    rawdata = json.load(f)

## Wrangle the data

The dashboard should contain the logic to wrangle the raw data into a ```DataFrame``` (or more than one, as required) that will be used for plotting. The wrangling code should be put into a function and called on the data from the JSON file (we'll need to call it again on any data downloaded from the API).  In this template, we just pretend we are wrangling ```rawdata``` and instead generate a dataframe with some random data

In [5]:

# Step 1: Define the APIwrapper class
class APIwrapper:
    _access_point = "https://api.ukhsa-dashboard.data.gov.uk"
    _last_access = 0.0

    def __init__(self, theme, sub_theme, topic, geography_type, geography, metric):
        """ Initialize the APIwrapper object. """
        url_path = (
            f"/themes/{theme}/sub_themes/{sub_theme}/topics/{topic}/geography_types/"
            f"{geography_type}/geographies/{geography}/metrics/{metric}"
        )
        self._start_url = APIwrapper._access_point + url_path
        self._filters = None
        self._page_size = -1
        self.count = None

    def get_page(self, filters={}, page_size=5):
        """ Fetch a single page of data from the API. """
        if page_size > 365:
            raise ValueError("Max supported page size is 365")

        if filters != self._filters or page_size != self._page_size:
            self._filters = filters
            self._page_size = page_size
            self._next_url = self._start_url

        if self._next_url is None:
            return []

        curr_time = time.time()
        deltat = curr_time - APIwrapper._last_access
        if deltat < 0.33:
            time.sleep(0.33 - deltat)
        APIwrapper._last_access = curr_time

        parameters = {x: y for x, y in filters.items() if y is not None}
        parameters["page_size"] = page_size
        response = requests.get(self._next_url, params=parameters).json()
        self._next_url = response.get("next")
        self.count = response.get("count")
        return response.get("results", [])

    def get_all_pages(self, filters={}, page_size=365):
        """ Fetch all pages of data from the API. """
        data = []
        while True:
            next_page = self.get_page(filters, page_size)
            if not next_page:
                break
            data.extend(next_page)
        return data

# Step 2: Fetch data
def fetch_data():
    structure = {
        "theme": "infectious_disease",
        "sub_theme": "respiratory",
        "topic": "COVID-19",
        "geography_type": "Nation",
        "geography": "England",
        "metric": "COVID-19_vaccinations_autumn22_dosesByDay",
    }
    api = APIwrapper(**structure)
    data = api.get_all_pages()
    print(f"Fetched {len(data)} records from API.")
    print("Raw API Data Sample:", data[:5])  # Print the first 5 records
    return data

# Step 3: Clean data
def wrangle_data(rawdata):
    """
    Processes the raw API data into a cleaned Pandas DataFrame.
    """
    # Directly process if rawdata is a list
    if isinstance(rawdata, list) and len(rawdata) > 0:
        print(f"Number of records in raw data: {len(rawdata)}")
        df = pd.DataFrame(rawdata)

        # Ensure essential columns are present
        required_columns = ["date", "sex", "age", "metric_value"]
        for col in required_columns:
            if col not in df.columns:
                print(f"Column '{col}' missing, adding as empty.")
                df[col] = None

        # Convert date column to datetime and sort by date
        df["date"] = pd.to_datetime(df["date"], errors="coerce")
        df.dropna(subset=["date"], inplace=True)
        df["metric_value"] = pd.to_numeric(df["metric_value"], errors="coerce").fillna(0)

        # Filter for autumn 2022
        df = df[(df["date"] >= "2022-09-01") & (df["date"] <= "2022-11-30")]

        print(f"Filtered DataFrame contains {len(df)} records.")
        return df
    else:
        print("Invalid or empty raw data provided.")
        return pd.DataFrame()


# Step 4: Save and analyze data
def save_data(df, filename="vaccination_data.csv"):
    """Save the cleaned data to a CSV file."""
    df.to_csv(filename, index=False)
    print(f"Data saved to {filename}")

# Entry point for the script
if __name__ == "__main__":
    rawdata = fetch_data()
    df = wrangle_data(rawdata)
    print("Cleaned DataFrame:")
    print(df.head())

    # Save to CSV for further analysis
    save_data(df)


Fetched 5076 records from API.
Raw API Data Sample: [{'theme': 'infectious_disease', 'sub_theme': 'respiratory', 'topic': 'COVID-19', 'geography_type': 'Nation', 'geography': 'England', 'geography_code': 'E92000001', 'metric': 'COVID-19_vaccinations_autumn22_dosesByDay', 'metric_group': 'vaccinations', 'stratum': 'default', 'sex': 'f', 'age': '50-54', 'year': 2022, 'month': 9, 'epiweek': 35, 'date': '2022-09-01', 'metric_value': 109.0, 'in_reporting_delay_period': False}, {'theme': 'infectious_disease', 'sub_theme': 'respiratory', 'topic': 'COVID-19', 'geography_type': 'Nation', 'geography': 'England', 'geography_code': 'E92000001', 'metric': 'COVID-19_vaccinations_autumn22_dosesByDay', 'metric_group': 'vaccinations', 'stratum': 'default', 'sex': 'm', 'age': '75-79', 'year': 2022, 'month': 9, 'epiweek': 35, 'date': '2022-09-01', 'metric_value': 115.0, 'in_reporting_delay_period': False}, {'theme': 'infectious_disease', 'sub_theme': 'respiratory', 'topic': 'COVID-19', 'geography_type': 

## Download current data

Give your users an option to refresh the dataset - a "refresh" button will do. The button callback should
* call the code that accesses the API and download some fresh raw data;
* wrangle that data into a dataframe and update the corresponding (global) variable for plotting (here, ```df```);
* optionally: force a redraw of the graph and give the user some fredback.

Once you get it to work, you may want to wrap your API call inside an exception handler, so that the user is informed, the "canned" data are not overwritten and nothing crashes if for any reason the server cannot be reached or data are not available.

After you refresh the data, graphs will not update until the user interacts with a widget. You can trick ```iPywidgets``` into redrawing the graph by simulating interaction, as in the ```refresh_graph``` function we define in the Graph and Analysis section below.

In this example, clicking on the button below just generates some more random data and refreshes the graph. The button should read *Fetch Data*. If you see anything else, take a deep breath :)

In [7]:
# Function to access the API
def access_api():
    """
    Accesses the UKHSA API. Returns data as a like-for-like replacement for the "canned" data loaded from the JSON file.
    """
    # Define the API URL
    api_url = "https://api.ukhsa-dashboard.data.gov.uk/themes/infectious_disease/sub_themes/respiratory/topics/COVID-19/geography_types/Nation/geographies/England/metrics/COVID-19_vaccinations_autumn22_dosesByDay"

    try:
        response = requests.get(api_url)
        response.raise_for_status()  # Raise an HTTP error if one occurred
        data = response.json()  # Parse JSON response
        print("API data fetched successfully.")
        return data  # Return the raw API data
    except requests.exceptions.RequestException as e:
        print(f"Error accessing API: {e}")
        return {}  # Return empty data in case of an error





# Callback for the API button
def api_button_callback(button):
    """
    Button callback - fetches data from the API, processes it using APIwrapper, and updates the global dataframe.
    """
    global df  # Use the global dataframe for updates

    # Provide feedback during API access
    button.icon = "spinner"
    button.button_style = "warning"
    button.description = "Fetching..."
    button.disabled = True

    # Fetch raw data from the API
    apidata = access_api()

    # Process raw data through APIwrapper for additional handling
    if apidata and "results" in apidata:
        print("Processing data through APIwrapper...")
        
        # Define a structure for APIwrapper
        structure = {
            "theme": "infectious_disease",
            "sub_theme": "respiratory",
            "topic": "COVID-19",
            "geography_type": "Nation",
            "geography": "England",
            "metric": "COVID-19_vaccinations_autumn22_dosesByDay",
        }
        api_wrapper_instance = APIwrapper(**structure)

        # Mock APIwrapper by feeding the data directly
        processed_data = api_wrapper_instance.get_all_pages(filters={}, page_size=365)

        # Wrangle the processed data into a DataFrame
        if processed_data:
            df = wrangle_data(processed_data)
            print(f"Data wrangled successfully. {len(df)} records available.")
            refresh_graph()  # Refresh the graph
            button.icon = "check"
            button.button_style = "success"
            button.description = "Updated"
        else:
            print("Processing resulted in no data.")
            button.icon = "times"
            button.button_style = "danger"
            button.description = "Empty Data"
    else:
        print("Failed to fetch valid data from the API.")
        button.icon = "times"
        button.button_style = "danger"
        button.description = "API Error"

    # Re-enable the button for future clicks
    button.disabled = False



# Refresh the graph function
def refresh_graph():
    """
    Force a graph redraw based on the latest data in the global DataFrame.
    """
    global group_selector
    if not df.empty:
        current = group_selector.value
        if current == "sex":
            group_selector.value = "age"
        else:
            group_selector.value = "sex"
        group_selector.value = current  # Reset to original to force re-draw
    else:
        print("The DataFrame is empty. No graph to refresh.")


# Button widget to trigger API calls
apibutton = wdg.Button(
    description="Fetch Data",  # Change the description to something informative
    disabled=False,
    button_style="info",  # Options: 'success', 'info', 'warning', 'danger', or ''
    tooltip="Click to fetch the latest data from the API",
    icon="download",  # FontAwesome icon name
)

# Register the callback function with the button
apibutton.on_click(api_button_callback)

# Display the button in the dashboard
display(apibutton)

# Initialize the global dataframe
df = pd.DataFrame()  # Start with an empty DataFrame


Button(button_style='info', description='Fetch Data', icon='download', style=ButtonStyle(), tooltip='Click to …

## Graphs and Analysis

Include at least one graph with interactive controls, as well as some instructions for the user and/or comments on what the graph represents and how it should be explored (this example shows two random walks)

In [8]:
from IPython.display import display

def plot_vaccination_data(group):
    """
    Plot the vaccination data based on the user-selected group (e.g., sex or age).
    """
    global df  # Use the global DataFrame

    if group in df.columns:
        # Group data by the selected column and date, then sum the values
        grouped_data = df.groupby(["date", group])["metric_value"].sum().unstack()

        # Plotting
        grouped_data.plot(kind="line", figsize=(12, 6))
        plt.title(f"COVID-19 Vaccinations by {group.capitalize()}", fontsize=16)
        plt.xlabel("Date", fontsize=12)
        plt.ylabel("Number of Vaccinations", fontsize=12)
        plt.legend(title=group.capitalize(), fontsize=10)
        plt.grid(True)
        plt.tight_layout()
        plt.show()
    else:
        print(f"Group '{group}' not found in data! Available groups: {df.columns.tolist()}")


# Creating drop-down menus
group_selector = wdg.Dropdown(
    options=['sex', 'age'],  
    value='sex',  
    description='Group by:',
    disabled=False,
)


def refresh_graph():
    """
    Force a graph redraw based on the selected group.
    """
    current = group_selector.value
    if current == 'sex':
        group_selector.value = 'age'
    else:
        group_selector.value = 'sex'
    group_selector.value = current  

interactive_plot = wdg.interactive_output(plot_vaccination_data, {'group': group_selector})

display(group_selector, interactive_plot)


Dropdown(description='Group by:', options=('sex', 'age'), value='sex')

Output()

## Deploying the dashboard

Once your code is ready and you are satisfied with the appearance of the graphs, replace all the text boxes above with the explanations you would like a dashboard user to see. The next step is deploying the dashboard online - there are several [options](https://voila.readthedocs.io/en/stable/deploy.html) for this, we suggest deploying as a [Binder](https://mybinder.org/). This is basically the same technique that has been used to package this tutorial and to deploy this template dashboard. The instructions may seem a bit involved, but the actual steps are surprisingly easy - we will be going through them together during a live session. You will need an account on [GitHub](https://github.com/) for this - if you don't have one already, now it's the time to create it. 

**Author and License** Remember that if you deploy your dashboard as a Binder it will be publicly accessible. Change the copyright notice and take credit for your work! Also acknowledge your sources and the conditions of the license by including this notice: "Based on UK Government [data](https://ukhsa-dashboard.data.gov.uk/) published by the [UK Health Security Agency](https://www.gov.uk/government/organisations/uk-health-security-agency) and on the [DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) by Fabrizio Smeraldi. Released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/)."