# How to make a choropleth map using flu 

I thought it'd be interesting to look at the reported flu cases using a map view. We're going to look at total numbers, so I'm going to merge both the fluA and fluB dataframes. 


---
# Viewing data on a US Map

In [105]:
# imports libraries for a choropleth map

import pandas as pd
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

In [106]:
df_fluA = pd.read_csv('fluA_strains.tsv', sep='\t')
df_fluB = pd.read_csv('fluB_strains.tsv', sep='\t')

In [107]:
# check the dataframe 

df_fluA.head(3)

Unnamed: 0,Strain Name,Complete Genome,Subtype,Collection Date,Host,Country,State/Province,Geographic Grouping,Flu Season,Submission Date,...,RERRRKKR,Sensitive Drug,Resistant Drug,Submission Date.1,NCBI Taxon ID,pH1N1-like,US Swine H1 Clade,Global Swine H1 Clade test,H5 Clade,Unnamed: 52
0,A/Alabama/01/2018,Yes,H1N1,01/02/2018,Human,USA,Alabama,North America,17-18,2018-03-24,...,No,-N/A-,-N/A-,03/24/2018,11320,Mixed Positive and Negative Segments,npdm,1A.3.3.2,-N/A-,
1,A/Alabama/02/2018,Yes,H1N1,01/03/2018,Human,USA,Alabama,North America,17-18,2018-03-24,...,No,"Oseltamivir,Zanamivir",-N/A-,03/24/2018,11320,Mixed Positive and Negative Segments,npdm,1A.3.3.2,-N/A-,
2,A/Alabama/03/2018,Yes,H3N2,01/03/2018,Human,USA,Alabama,North America,17-18,2018-03-24,...,No,-N/A-,-N/A-,03/24/2018,11320,Negative,-N/A-,-N/A-,-N/A-,


First thing we have to do is convert the state names to abbreviations so they read into the map correctly.

Here's a dictionary of state names to abbreviations thanks to [rogerallen](https://gist.github.com/rogerallen/1583593)


In [40]:
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY',
}

In [108]:
# Using pandas.DataFrame.map function to create a new column 'abbrev' with proper two-letter State abbreviation
# for both fluA and fluB dataframes

df_fluA['abbrev'] = df_fluA['State/Province'].map(us_state_abbrev)
df_fluB['abbrev'] = df_fluB['State/Province'].map(us_state_abbrev)
df_fluA.head(3)

Unnamed: 0,Strain Name,Complete Genome,Subtype,Collection Date,Host,Country,State/Province,Geographic Grouping,Flu Season,Submission Date,...,Sensitive Drug,Resistant Drug,Submission Date.1,NCBI Taxon ID,pH1N1-like,US Swine H1 Clade,Global Swine H1 Clade test,H5 Clade,Unnamed: 52,abbrev
0,A/Alabama/01/2018,Yes,H1N1,01/02/2018,Human,USA,Alabama,North America,17-18,2018-03-24,...,-N/A-,-N/A-,03/24/2018,11320,Mixed Positive and Negative Segments,npdm,1A.3.3.2,-N/A-,,AL
1,A/Alabama/02/2018,Yes,H1N1,01/03/2018,Human,USA,Alabama,North America,17-18,2018-03-24,...,"Oseltamivir,Zanamivir",-N/A-,03/24/2018,11320,Mixed Positive and Negative Segments,npdm,1A.3.3.2,-N/A-,,AL
2,A/Alabama/03/2018,Yes,H3N2,01/03/2018,Human,USA,Alabama,North America,17-18,2018-03-24,...,-N/A-,-N/A-,03/24/2018,11320,Negative,-N/A-,-N/A-,-N/A-,,AL


In [109]:
# merge the two dataframes

df_all = pd.merge(df_fluA, df_fluB, how = 'outer')

In [111]:
# create a new series object, and place into a dataframe to load into my choropleth map

all_cts = df_all['abbrev'].value_counts()

df_all_cts = pd.DataFrame(all_cts)
df_all_cts = df_all_cts.reset_index()
df_all_cts.columns = ['abbrev', 'counts']
df_all_cts.sample(3)

Unnamed: 0,abbrev,counts
10,AZ,120
42,LA,52
20,HI,92


In [125]:
data_all = dict(
        type = 'choropleth',
        colorscale = 'Greens',
        reversescale = True,
        locations = df_all_cts['abbrev'],
        z = df_all_cts['counts'],
        locationmode = 'USA-states',
        text = ['State/Province'],
        marker = dict(line = dict(color = 'rgb(255,255,255)',width = 1)),
        colorbar = {'title':'Reported cases'}
            ) 

In [128]:
layout = dict(title = 'Influenza Research Database Reported Cases 2017-18',
              geo = dict(scope='usa',
                         showlakes = True,
                         lakecolor = 'rgb(85,173,240)')
             )
             

In [129]:
choromap = go.Figure(data = [data_all],layout = layout)
iplot(choromap,validate=False)

This is a fairly simplified example of reporting the case number by state. Further detail could include case breakdown by subtype per state, or converting this into a bivariate map by adding some additional variable from the dataframe, such as known drug sensitivity.

This should be fine since the dataset is relatively small and restricted (only from one data source), but could be expanded if needed to accommodate a large dataset with loads of interesting variable combinations. 