# Measles and MMR Vaccine Dashboard

This dashboard visualises weekly measles cases across UKHSA regions and childhood MMR vaccination coverage in England. Measles outbreaks are closely linked to gaps in population immunity, so viewing coverage alongside case trends helps illustrate how changes in vaccination uptake can influence measles activity.

To view this dashboard template rendered in Voila click [here](https://hub.comp-teach.qmul.ac.uk/user/ec25256/voila/render/diy-covid19dash/Dashboard.ipynb?).

In [1]:
from IPython.display import clear_output, Markdown, display # added to show descriptive text above each graph that renders correctly in Voila. Standard markdown cells were not working
import ipywidgets as wdg
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
import time
import json
import warnings # FutureWarnings related to downcasting in pandas .fillna/.ffill/.bfill kept appearing, so I suppressed these for a cleaner dashboard on the code side.

In [2]:
%matplotlib inline
# make figures larger
plt.rcParams['figure.dpi'] = 100

In [3]:
warnings.simplefilter(action = "ignore", category = FutureWarning) # This is how I suppressed the warnings, as they were interrupting the sequence of cells.

In [4]:
class APIwrapper:
    _access_point="https://api.ukhsa-dashboard.data.gov.uk"
    _last_access=0.0
    
    def __init__(self, theme, sub_theme, topic, geography_type, geography, metric):
        """ Init the APIwrapper object, constructing the endpoint from the structure
        parameters """
        url_path=(f"/themes/{theme}/sub_themes/{sub_theme}/topics/{topic}/geography_types/" +
                  f"{geography_type}/geographies/{geography}/metrics/{metric}")
        self._start_url=APIwrapper._access_point+url_path
        self._filters=None
        self._page_size=-1
        self.count=None

    def get_page(self, filters={}, page_size=5):
        """ Access the API and download the next page of data. Sets the count
        attribute to the total number of items available for this query. Changing
        filters or page_size will cause get_page to restart from page 1. Rate
        limited to three request per second. The page_size parameter sets the number
        of data points in one response page (maximum 365); use the default value 
        for debugging your structure and filters. """
        if page_size>365:
            raise ValueError("Max supported page size is 365")
        if filters!=self._filters or page_size!=self._page_size:
            self._filters=filters
            self._page_size=page_size
            self._next_url=self._start_url
        if self._next_url==None: 
            return []
        curr_time=time.time()
        deltat=curr_time-APIwrapper._last_access
        if deltat<0.33:
            time.sleep(0.33-deltat)
        APIwrapper._last_access=curr_time
        parameters={x: y for x, y in filters.items() if y!=None}
        parameters['page_size']=page_size
        response = requests.get(self._next_url, params=parameters).json()
        self._next_url=response['next']
        self.count=response['count']
        return response['results'] 

    def get_all_pages(self, filters={}, page_size=365):
        """ Access the API and download all available data pages of data. Sets the count
        attribute to the total number of items available for this query. API access rate
        limited to three request per second. The page_size parameter sets the number
        of data points in one response page (maximum 365), and controls the trade-off
        between time to load a page and number of pages; the default should work well 
        in most cases. The number of items returned should in any case be equal to 
        the count attribute. """
        data=[]
        while True:
            next_page=self.get_page(filters, page_size)
            if next_page==[]:
                break
            data.extend(next_page)
        return data

In [5]:
# the below the data structures for the dashboard, each key corresponds to a dataset.
jsondata = {
    "measles_weekly_cases": [],
    "mmr1_coverage": [],
    "mmr2_coverage": []
}

# Created some helper functions to load them
def load_json_file(filename):
    try:
        with open(filename, "rt") as f:
            return json.load(f)
    except FileNotFoundError:
        return []

def save_json_file(data, filename):
    try:
        with open(filename, "wt") as f:
            json.dump(data, f)
    except Exception as e:
        print(f"Error saving {filename}: {e}")

In [6]:
# Measles regions in the UKHSA data
measles_regions = ["East Midlands", "East of England", "London", "North East", 
                   "North West", "South East", "South West", "West Midlands", "Yorkshire and Humber"]

# API request templates without metric values - these will be filled as we wrangle the data
measles_structure_no_metric = {
    "theme": "infectious_disease",
    "sub_theme": "vaccine_preventable",
    "topic": "Measles",
    "geography_type": "UKHSA Region",
    "geography": None,
} # we are pulling regional data for this one, whereas we are not for the below - note the difference in geography_type.

mmr1_structure_no_metric = {
    "theme": "immunisation",
    "sub_theme": "childhood-vaccines",
    "topic": "MMR1",
    "geography_type": "Nation",
    "geography": "England",
}

mmr2_structure_no_metric = {
    "theme": "immunisation",
    "sub_theme": "childhood-vaccines",
    "topic": "MMR2",
    "geography_type": "Nation",
    "geography": "England",
}

In [7]:
def parse_date(datestring):
    return pd.to_datetime(datestring, format="%Y-%m-%d") # As in the example notebook, I have included this to convert string dates into pandas datetime objects

# For measles, I am working with regional data rather than National, so below I need to loop through each entry and
# create a nested dictionary with dates as the top-level keys and columns corresponding to the combination of geography_type and geography,
# representing each region. Ensures that all regions are aligned by date in the DF

def wrangle_measles(data):
    measles_dict = {}
    for entry in data:
        date = entry["date"]
        col = f"{entry['geography_type']}_{entry['geography']}"
        val = float(entry["metric_value"]) if entry["metric_value"] else 0.0
        measles_dict.setdefault(date, {})[col] = val # Organise the data by date, with inner dict for each region

    # Sorting dates and extracting all unique columns across regions
    dates = sorted(measles_dict.keys())
    columns = sorted({k for v in measles_dict.values() for k in v.keys()})

    # Create DF with sorted dates as index and all regions as columns
    df = pd.DataFrame(index=pd.to_datetime(dates), columns=columns)
    df.index.name = "Date"

    # Fill DF with values from nested dictionary
    # parse_date ensures the index is in pandas datetime format, like in example notebook
    for date, row in measles_dict.items():
        for col, val in row.items():
            df.loc[parse_date(date), col] = val
    
    # Filling holes left by missing dates/variants, and adjust types of columns filled in
    return df.fillna(0.0).infer_objects(copy = False)

def wrangle_mmr(mmr1, mmr2):
    # For MMR coverage, I have two datasets, both MMR1 and MMR2. As such, the function takes two arguments
    # The nested dictionary structure is the same as it is for measles, with dates as top-level keys and columns for each stratum
    mmr_dict = {}

    # processing MMR1
    for entry in mmr1:
        date = entry["date"]
        col = f"mmr1_coverage_{entry['stratum']}" # column name reflects the vaccine and stratum (age milestone)
        val = float(entry["metric_value"]) if entry["metric_value"] else 0.0
        mmr_dict.setdefault(date, {})[col] = val
        
    # processing MMR2
    for entry in mmr2:
        date = entry["date"]
        col = f"mmr2_coverage_{entry['stratum']}"
        val = float(entry["metric_value"]) if entry["metric_value"] else 0.0
        mmr_dict.setdefault(date, {})[col] = val

    # Sorting dates and collecting all unique columns, this time for both MMR1 and MMR2
    dates = sorted(mmr_dict.keys())
    columns = sorted({k for v in mmr_dict.values() for k in v.keys()})
    
    # Create DF with sorted dates as index and coverage columns for each stratum
    df = pd.DataFrame(index=pd.to_datetime(dates), columns=columns)
    df.index.name = "Date"

    # Again, filling DF with values from nested dictionary
    for date, row in mmr_dict.items():
        for col, val in row.items():
            df.loc[parse_date(date), col] = val

    # and again, filling any holes
    return df.fillna(0.0).infer_objects(copy = False)

In [8]:
def access_api():
    """ Accesses the UKHSA API. Return data as a like-for-like replacement for the "canned"
    data loaded from the JSON file. If it fails, cached JSON files are loaded instead."""
    try:
        # for measles data, it loops through each region to get weekly cases
        measles_weekly_cases = []
        for region in measles_regions:
            structure = measles_structure_no_metric.copy()
            structure["geography"] = region
            structure["metric"] = "measles_cases_casesByOnsetWeek"
            api = APIwrapper(**structure) # calls APIwrapper, which is saved externally
            data = api.get_all_pages()
            if data:
                measles_weekly_cases.extend(data)
        save_json_file(measles_weekly_cases, "measles_weekly_cases.json")
        # using the save_json_file helper functions we created earlier avoids the need
        # to have a separate "saving" section at the bottom of the cell. It keeps the code cleaner

        # get mmr1 national data
        mmr1_coverage = []
        structure = mmr1_structure_no_metric.copy()
        structure["metric"] = "MMR1_coverage_coverageByYear"
        api = APIwrapper(**structure)
        data = api.get_all_pages()
        if data:
            mmr1_coverage.extend(data)
        save_json_file(mmr1_coverage, "mmr1_coverage.json")

        # get mmr2 national data
        mmr2_coverage = []
        structure = mmr2_structure_no_metric.copy()
        structure["metric"] = "MMR2_coverage_coverageByYear"
        api = APIwrapper(**structure)
        data = api.get_all_pages()
        if data:
            mmr2_coverage.extend(data)
        save_json_file(mmr2_coverage, "mmr2_coverage.json")

        # The below returns the freshly fetched data if the API call succeeds
        # If it fails, the exception is caught and we load the cached JSON files instead
        # if that happens, we use the other JSON helper function.
        return measles_weekly_cases, mmr1_coverage, mmr2_coverage
    except Exception as e:
        print(f"API failure. Loading cached data: {e}")
        return (load_json_file("measles_weekly_cases.json"),
                load_json_file("mmr1_coverage.json"),
                load_json_file("mmr2_coverage.json"))

measles_weekly_cases, mmr1_coverage, mmr2_coverage = access_api() # Uses the above function to get data from API or load cached JSON
measlesdf = wrangle_measles(measles_weekly_cases) # converts measles data into a measles DF
mmrdf = wrangle_mmr(mmr1_coverage, mmr2_coverage) # converts mmr data into a mmr DF

In [9]:
def api_button_callback(button):
    """ Button callback. Accesses API, wrangles data, updates global variable dfs used for plotting. """
    clear_output(wait = True) # clears previous output, avoid stacking duplicate outputs
    global measlesdf, mmrdf # both dataframes here, as opposed to just one
    
    try:
        measles_weekly_cases, mmr1_coverage, mmr2_coverage = access_api() #gets fresh data from the api for all three metrics
        measlesdf = wrangle_measles(measles_weekly_cases) # wrangles the measles data into a df
        mmrdf = wrangle_mmr(mmr1_coverage, mmr2_coverage) # wrangles the mmr data into a df

        # the below turns the button green with a tick to show that the refresh was successful
        button.button_style = "success"
        button.icon = "check"
        
    except Exception as e:
        # the below turns the button red with an X to indicate failure
        button.button_style = "danger"
        button.icon = "times"
        print(f"Failed to refresh data: {e}")
        
    display(apibutton, measles_controls, measles_output, mmr_controls, mmr_output) # Redisplays widgets and outputs so graphs refresh

apibutton = wdg.Button(
    description = "Refresh Data",
    button_style = "info", # starts off turquoise - neutral colour
    tooltip = "Refresh the data for both graphs by clicking here"
)

# Display instructions and the button
apibutton.on_click(api_button_callback) # triggers the callback function
display(Markdown("Click the button below to refresh the data from the UKHSA API. This updates both graphs.")) # provides some info on what the button does
display(apibutton)

Click the button below to refresh the data from the UKHSA API. This updates both graphs.

Button(button_style='info', description='Refresh Data', style=ButtonStyle(), tooltip='Refresh the data for bot…

In [10]:
# Measles graph
def plot_measles(selected_regions, selected_scale):
    safe_regions = [r for r in selected_regions if r in measlesdf.columns] # this ensures only regions actually present in the DF are plotted, if some regions have no data after API refresh or wrangling, errors are prevented.
    
    # determines whether to plot on a log scale. True if log scale, False if linear
    logy = selected_scale == "log"
    
    
    ax = measlesdf[safe_regions].plot(figsize = (12,6), logy = logy) # plots DF filtered to selected regions. Sets a wide plot and applies log scale if requested
    ax.set_title("Measles Weekly Cases by Region") # adds title to plot
    ax.set_ylabel("Cases" if not logy else "Cases (Log)") # labels the y axis, changes depending on the type of graph (log scale or not)
    ax.legend(loc = "center left", bbox_to_anchor=(1,0.5)) # ensures the legen doesn't cover the data
    plt.tight_layout() #adjusts spacing
    plt.show() # displays plot

# Widgets for measles
default_regions = [r for r in ['UKHSA Region_London','UKHSA Region_North West','UKHSA Region_South East'] if r in measlesdf.columns] # defines initial selections
region_selector = wdg.SelectMultiple( # creates multi-select widget
    options = sorted(measlesdf.columns), # contains all available regions, organised alphabetically
    value = default_regions, # sets default selections
    rows = 9, # all 9 regions are visible in the selector by default
    description = "Regions:" # label to the left of the widget
)

scale_selector = wdg.RadioButtons( # creates radio button widget
    options = ['linear','log'],
    value = 'linear', # sets linear as the default option
    description = "Scale:"
)

measles_controls = wdg.HBox([region_selector, scale_selector]) # arranges the two widgets side by side
measles_output = wdg.interactive_output(plot_measles, {'selected_regions': region_selector, 'selected_scale': scale_selector}) # connects the widgets to the plotting function. Whenever a user changes a selection, plot_measles is called.

# The below provides explanation about graph to the user
display(Markdown("""### Graph 1: Measles Weekly Cases
This graph shows weekly reported measles cases by UKHSA region. This helps highlight when and where measles activity is increasing, allowing trends and regional differences to be compared. You can use the controls to select one or more regions and choose a linear or log scale. On Mac, hold Command (⌘) to select multiple regions. On Windows or Linux, hold Ctrl. The Y axis will adjust based on your selected scale."""))
display(measles_controls, measles_output)

# MMR graph
def plot_mmr(selected_year, selected_milestone):
    milestone_map = {"24m":["mmr1_coverage_24m"], "5y":["mmr1_coverage_5y","mmr2_coverage_5y"]} # Maps milestones to the relevant columns in mmrdf. "24m" corresponds to MMR1 coverage at 24 months, "5y" includes both MMR1 and MMR2 coverage at 5 years.

    cols = [c for c in milestone_map[selected_milestone] if c in mmrdf.columns] #Filters the columns from the milestone mapping to only include those actually present in mmrdf

    date_key = pd.to_datetime(f"{selected_year}-03-31") #Converts the user-selected year into a pandas datetime corresponding to the 31st of March of that year. The MMR coverage data in mmrdf is indexed by a single date per year, representing the end of the reporting period. This is the 31st of March.

    row = mmrdf.loc[date_key] # Extracts the data for the selected year
    data = {col.replace("mmr","MMR").replace("_coverage_"," ").upper(): row[col] for col in cols} # Cleans up column names for plotting to make the labels more readable
    df_plot = pd.DataFrame({"Coverage (%)": data.values()}, index=data.keys()) # Creates a new DF specifically for plotting the bar chart
    ax = df_plot.plot(kind = "bar", figsize = (7,5), legend = False) # plots a vertical bar chart, legend is false because the labels are already on the x axis
    ax.set_title(f"MMR Coverage ({selected_milestone}, {selected_year})") # selects a dynamic title based on the selected milestone and year
    ax.set_ylim(0,100) # sets the y-axis range from 0 to 100%
    plt.xticks(rotation = 0) # Keeps x-axis labels horizontal
    plt.tight_layout() # adjusts spacing
    plt.show() # displays plot

years = sorted(mmrdf.index.year.unique()) # Extracts all unique years from mmrdf and sorts them
milestones = sorted({c.split("_")[-1] for c in mmrdf.columns}) # Extracts unique age milestones from the column names

year_selector = wdg.Select( # Dropdown to select year
    options = years,
    value = years[-1], # selects most recent year
    description = "Year:"
)

milestone_selector = wdg.Select( # Dropdown to select age milestone
    options = milestones,
    value = '5y' if '5y' in milestones else milestones[0], # default 5y if it exists, otherwise the first available milestone
    description = "Age:")

mmr_controls = wdg.HBox([year_selector, milestone_selector]) # arranges the two widgets side by side
mmr_output = wdg.interactive_output(plot_mmr, {'selected_year': year_selector, 'selected_milestone': milestone_selector}) # connects the widgets to the plotting function. Whenever a user changes a selection, plot_mmr is called.

# same as above
display(Markdown("""### Graph 2: MMR Vaccine Coverage
This graph shows MMR1 and MMR2 vaccine coverage for each age milestone and year. These metrics indicate levels of population immunity in young children, which is closely linked to the likelihood of measles outbreaks. Use the controls to select a year and milestone. The bar chart displays coverage percentages for the selected milestone and year."""))
display(mmr_controls, mmr_output)

### Graph 1: Measles Weekly Cases
This graph shows weekly reported measles cases by UKHSA region. This helps highlight when and where measles activity is increasing, allowing trends and regional differences to be compared. You can use the controls to select one or more regions and choose a linear or log scale. On Mac, hold Command (⌘) to select multiple regions. On Windows or Linux, hold Ctrl. The Y axis will adjust based on your selected scale.

HBox(children=(SelectMultiple(description='Regions:', index=(2, 4, 5), options=('UKHSA Region_East Midlands', …

Output()

### Graph 2: MMR Vaccine Coverage
This graph shows MMR1 and MMR2 vaccine coverage for each age milestone and year. These metrics indicate levels of population immunity in young children, which is closely linked to the likelihood of measles outbreaks. Use the controls to select a year and milestone. The bar chart displays coverage percentages for the selected milestone and year.

HBox(children=(Select(description='Year:', index=15, options=(2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, …

Output()

Based on UK Government [data](https://ukhsa-dashboard.data.gov.uk/) published by the [UK Health Security Agency](https://www.gov.uk/government/organisations/uk-health-security-agency) and on the [DIY Disease Tracking Dashboard Kit](https://github.com/fsmeraldi/diy-covid19dash) by Fabrizio Smeraldi. Released under the [GNU GPLv3.0 or later](https://www.gnu.org/licenses/).