<h1>Exploring infections through data: Mapping Measles, Mumps, and Rubella in the USA</h1>

In this notebook I'm going to explore creating an interactive map of the historical incidence rates (per 100,000 people) of measles, mumps, and rubella in the USA. The data comes from Project Tycho, a colleciton of National Notifiable Disease Surveillance System reports, and is available for free from <a href="https://www.kaggle.com/pitt/contagious-diseases/home">Kaggle</a>. There are multiple libraries in Python that can be used to create interactive visualisations of data, but in this notebook I will focus on the Bokeh JS API.

In [1]:
#Dependencies
import pandas as pd
import numpy as np
from bokeh.io import show, output_notebook, push_notebook
from bokeh.plotting import figure
from bokeh.models import CategoricalColorMapper, HoverTool, ColumnDataSource, Panel
from bokeh.models.widgets import CheckboxGroup, Slider, RangeSlider, Tabs
from bokeh.layouts import column, row, WidgetBox
from bokeh.palettes import Category20_16
from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application

In [2]:
#Import data
measles = pd.read_csv("measles.csv")
mumps = pd.read_csv("mumps.csv")
rubella = pd.read_csv("rubella.csv")

In [3]:
measles.head()

Unnamed: 0,week,state,state_name,disease,cases,incidence_per_capita
0,192801,AL,ALABAMA,MEASLES,97,3.67
1,192801,AR,ARKANSAS,MEASLES,76,4.11
2,192801,AZ,ARIZONA,MEASLES,8,1.9
3,192801,CA,CALIFORNIA,MEASLES,74,1.38
4,192801,CO,COLORADO,MEASLES,85,8.38


In [4]:
mumps.head()

Unnamed: 0,week,state,state_name,disease,cases,incidence_per_capita
0,196801,AK,ALASKA,MUMPS,7,2.46
1,196801,AL,ALABAMA,MUMPS,39,1.13
2,196801,AZ,ARIZONA,MUMPS,19,1.13
3,196801,CA,CALIFORNIA,MUMPS,247,1.27
4,196801,DC,DISTRICT OF COLUMBIA,MUMPS,1,0.13


In [5]:
rubella.head()

Unnamed: 0,week,state,state_name,disease,cases,incidence_per_capita
0,196601,AL,ALABAMA,RUBELLA,7,0.2
1,196601,AZ,ARIZONA,RUBELLA,29,1.8
2,196601,CA,CALIFORNIA,RUBELLA,7,0.04
3,196601,CT,CONNECTICUT,RUBELLA,11,0.38
4,196601,HI,HAWAII,RUBELLA,1,0.14


In [6]:
#Join the data with an outer join
mmr = pd.merge(measles, mumps, how='outer')

In [7]:
mmr = pd.merge(mmr, rubella, how = 'outer')

In [10]:
mmr.describe()

Unnamed: 0,week,cases,incidence_per_capita
count,268126.0,268126.0,268126.0
mean,197008.487208,74.360838,2.541002
std,1893.399835,295.665568,8.426552
min,192801.0,0.0,0.0
25%,195716.0,0.0,0.0
50%,197332.0,3.0,0.11
75%,198406.0,27.0,1.07
max,200252.0,10402.0,683.06


In [11]:
mmr.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 268126 entries, 0 to 268125
Data columns (total 6 columns):
week                    268126 non-null int64
state                   268126 non-null object
state_name              268126 non-null object
disease                 268126 non-null object
cases                   268126 non-null int64
incidence_per_capita    268126 non-null float64
dtypes: float64(1), int64(2), object(3)
memory usage: 14.3+ MB


In [16]:
mmr["year"] = mmr["week"].apply(lambda x: int(str(x)[0:4]))

In [28]:
mmr["week_num"] = mmr["week"].apply(lambda x: int(str(x)[4:7]))