[DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) (C) Fabrizio Smeraldi, 2020,2024 ([f.smeraldi@qmul.ac.uk](mailto:f.smeraldi@qmul.ac.uk) - [web](http://www.eecs.qmul.ac.uk/~fabri/)). This notebook is released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/).

# Disease Tracking Dashboard
### This data on MRSA has been aquired from the UKHSA website.
### Nicole Smitheman

In [1]:
#Library import in order to work with APIs, manipulate and visualise the data and add buttons/widgets.
from IPython.display import clear_output
import ipywidgets as wdg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import time
import json

In [2]:
#inline display of plots
%matplotlib inline
# make figures larger
plt.rcParams['figure.dpi'] = 100

In [3]:
#Defining APIwrapper class: manages tasks such as retrieving pages, rate limiting and filtering.
class APIwrapper:
    # class variables shared among all instances
    _access_point="https://api.ukhsa-dashboard.data.gov.uk"
    _last_access=0.0 # time of last api access
    
    def __init__(self, theme, sub_theme, topic, geography_type, geography, metric):
        """ Init the APIwrapper object, constructing the endpoint from the structure
        parameters """
        # build the path with all the required structure parameters.
        url_path=(f"/themes/{theme}/sub_themes/{sub_theme}/topics/{topic}/geography_types/" +
                  f"{geography_type}/geographies/{geography}/metrics/{metric}")
        # our starting API endpoint
        self._start_url=APIwrapper._access_point+url_path
        self._filters=None
        self._page_size=-1
        # will contain the number of items
        self.count=None

    def get_page(self, filters={}, page_size=5):
        """ Access the API and download the next page of data. Sets the count
        attribute to the total number of items available for this query. Changing
        filters or page_size will cause get_page to restart from page 1. Rate
        limited to three request per second. The page_size parameter sets the number
        of data points in one response page (maximum 365); use the default value 
        for debugging your structure and filters. """
        # Check page size is within range
        if page_size>365:
            raise ValueError("Max supported page size is 365")
        # restart from first page if page or filters have changed
        if filters!=self._filters or page_size!=self._page_size:
            self._filters=filters
            self._page_size=page_size
            self._next_url=self._start_url
        # signal the end of data condition
        if self._next_url==None: 
            return [] # we already fetched the last page
        # simple rate limiting to avoid bans
        curr_time=time.time() # Unix time: number of seconds since the Epoch
        deltat=curr_time-APIwrapper._last_access
        if deltat<0.33: # max 3 requests/second
            time.sleep(0.33-deltat)
        APIwrapper._last_access=curr_time
        # build parameter dictionary by removing all the None
        # values from filters and adding page_size
        parameters={x: y for x, y in filters.items() if y!=None}
        parameters['page_size']=page_size
        # the page parameter is already included in _next_url.
        # This is the API access. Response is a dictionary with various keys.
        # the .json() method decodes the response into Python object (dictionaries,
        # lists; 'null' values are translated as None).
        response = requests.get(self._next_url, params=parameters).json()
        # update url so we'll fetch the next page
        self._next_url=response['next']
        self.count=response['count']
        # data are in the nested 'results' list
        return response['results'] 

    def get_all_pages(self, filters={}, page_size=365):
        """ Access the API and download all available data pages of data. Sets the count
        attribute to the total number of items available for this query. API access rate
        limited to three request per second. The page_size parameter sets the number
        of data points in one response page (maximum 365), and controls the trade-off
        between time to load a page and number of pages; the default should work well 
        in most cases. The number of items returned should in any case be equal to 
        the count attribute. """
        data=[] # build up all data here
        while True:
            # use get_page to do the job, including the pacing
            next_page=self.get_page(filters, page_size)
            if next_page==[]:
                break # we are done
            data.extend(next_page)
        return data

In [4]:
#Opening JSON file and loading data to assign it as a dictionary to jsondata variable
with open("mrsa_js.json", "r") as json_file:
    jsondata = json.load(json_file)
#jsondata

In [5]:
#Creating function to wrangle the data into a dataframe 
def wrangle_data(rawdata):
    """ Parameters: rawdata - data from json file or API call. Returns a dataframe.
    Edit to include the code that wrangles the data, creates the dataframe and fills it in. """
    return pd.json_normalize(jsondata)

# putting the wrangling code into a function allows you to call it again after refreshing the data through 
# the API. You should call the function directly on the JSON data when the dashboard starts, by including 
# the call in this cell as below:
df=wrangle_data(jsondata) # df is the dataframe for plotting

In [6]:
#Adjusting display with pandas
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [7]:
#df

In [8]:
#Function to access API
def access_api():
    """ Accesses the UKHSA API. Return data as a like-for-like replacement for the "canned" data loaded from the JSON file. """
    structure={"theme": "infectious_disease", 
           "sub_theme": "bloodstream_infection",
           "topic": "MRSA",
           "geography_type": "Nation", 
           "geography": "England",
          "metric": "MRSA_cases_countsByOnsetType"}
    
    return APIwrapper(**structure).get_all_pages()

In [9]:
#access_api()

In [10]:
# Creating a refresh button
def api_button_callback(button):
    """ Button callback - it must take the button as its parameter (unused in this case).
    Accesses API, wrangles data, updates global variable df used for plotting. """
    try:
        # Get fresh data from the API. 
        apidata=access_api()
        # wrangle the data and overwrite the dataframe for plotting
        global df
        df=wrangle_data(apidata)
        # refresh graph with new data
        refresh_graph()
        # switch the icon on the button to a "check" sign
        apibutton.icon="check"
    except Exception as e:
        apibutton.icon="exclamation-triangle"
        print(f"Refresh error: {e}")

    
apibutton=wdg.Button(
    description='Refresh',
    disabled=False,
    button_style='info', 
    tooltip="Refresh data",
)

# register your button callback function with the button
apibutton.on_click(api_button_callback) # the name of function

display(apibutton)

Button(button_style='info', description='Refresh', style=ButtonStyle(), tooltip='Refresh data')

## MRSA (Methicillin-resistant Staphylococcus aureus)

In [11]:
#Creating a graph
#Function to plot total cases per month for a certain year
def plot(walk):
    """ Our sample graph plotting function """
    r = df.query("year==@whichwalk.value")
    r = r.query("stratum=='Total cases'")
    #r.plot(x='date', y='metric_value')
    plt.plot(r.date, r.metric_value, "-o")
    plt.grid()
    plt.xticks(rotation=70)
    plt.title("MRSA Total Cases by Month (England)")
    plt.show() 

#Creation of a widget that will have a dropdown of available years
years = df.year.unique()

whichwalk=wdg.Dropdown(
    options=df.year.unique(),
    value=years[0],
    description='Year: ',
    disabled=False,
)

#Function that will refresh graph
def refresh_graph():
    """ We change the value of the widget in order to force a redraw of the graph;
    this is useful when the data have been updated. This is a bit of a gimmick; it
    needs to be customised for one of your widgets. """
    current=whichwalk.value
    if current==whichwalk.options[0]:
        other=whichwalk.options[1]
    else:
        other=whichwalk.options[0]
    whichwalk.value=other # forces the redraw
    whichwalk.value=current # now we can change it back
    
# connect the plotting function and the widget    
graph=wdg.interactive_output(plot, {'walk': whichwalk})

#displays the graph
display(whichwalk, graph)

Dropdown(description='Year: ', options=(np.int64(2023), np.int64(2024)), value=np.int64(2023))

Output()

The graph above explains the total number of MRSA cases in England each month in 2023 and 2024. In 2023, there is a sharp increase in cases from September to October before the numbers drop in November. In 2024, the numbers are higher overall for each month, with the highest being September. There is also more fluctuation in numbers compared to 2023, with noticeably sharper peaks. There was a sharp decline in August, bringing numbers down to the lowest for the year, but this then follows a sharp increase to the highest number of cases for the year in September.