### CUNY MSDA DATA 608 
#### FINAL PROJECT: OPIOID EPIDEMIC IN THE UNITED STATES
**By Dmitriy Vecheruk**  
Fall Semester 2017 

This notebook reads and prepares the data for the interactive visualization in Dash provided in this final project.
  
The following is meant as a documentation for reproducibility of the charts from the downloaded datasets contained in the repository. The Dash visualization is self-contained and does not require running of this notebook.

In [389]:
import pandas as pd
pd.options.mode.chained_assignment = None
import numpy as np

import os
import string
import json
from bisect import bisect_left

import plotly.offline as py
from plotly.graph_objs import *
from plotly import tools
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
%matplotlib inline

### Load and process national counts of deaths from drug overdose and other causes in the past years

The data for the total deaths related to drug overdose is taken from CDC Wonder according to the ICD codes used by  the National Vital Statistics System in their analysis of drugs involved in drug overdose deaths. https://www.cdc.gov/nchs/data/nvsr/nvsr65/nvsr65_10.pdf

In [396]:
deaths_drugs_year = pd.read_csv("data/Compressed_Mortality_1999-2015_Drug-related_overall.txt",sep="\t")
deaths_drugs_year = deaths_drugs_year[deaths_drugs_year.Year.notnull()]
deaths_drugs_year.head()

Unnamed: 0,Notes,ICD Chapter,ICD Chapter Code,Year,Year Code,Deaths,Population,Crude Rate,Age Adjusted Rate
0,,External causes of morbidity and mortality,V01-Y89,1999.0,1999.0,8867.0,279040168.0,3.2,3.2
1,,External causes of morbidity and mortality,V01-Y89,2000.0,2000.0,9358.0,281421906.0,3.3,3.3
2,,External causes of morbidity and mortality,V01-Y89,2001.0,2001.0,10599.0,284968955.0,3.7,3.7
3,,External causes of morbidity and mortality,V01-Y89,2002.0,2002.0,12659.0,287625193.0,4.4,4.4
4,,External causes of morbidity and mortality,V01-Y89,2003.0,2003.0,13817.0,290107933.0,4.8,4.8


In [397]:
# deaths_drugs_year.tail()

In [398]:
deaths_other_year = pd.read_csv("data/Compressed_Mortality_1999-2015_cause_of_death.txt",sep="\t")
deaths_other_year = deaths_other_year[deaths_other_year.Year.notnull()]
deaths_other_year.head()

Unnamed: 0,Notes,Injury Mechanism & All Other Leading Causes,Injury Mechanism & All Other Leading Causes Code,Year,Year Code,Deaths,Population,Crude Rate,Age Adjusted Rate
0,,Cut/Pierce,GRINJ-001,1999.0,1999.0,2369.0,279040168.0,0.8,0.8
1,,Cut/Pierce,GRINJ-001,2000.0,2000.0,2288.0,281421906.0,0.8,0.8
2,,Cut/Pierce,GRINJ-001,2001.0,2001.0,2532.0,284968955.0,0.9,0.9
3,,Cut/Pierce,GRINJ-001,2002.0,2002.0,2762.0,287625193.0,1.0,1.0
4,,Cut/Pierce,GRINJ-001,2003.0,2003.0,2742.0,290107933.0,0.9,0.9


In [399]:
# deaths_other_year.tail()

In [400]:
# Identify the top 7 causes from the dataset
deaths_other_year.groupby(["Injury Mechanism & All Other Leading Causes","Injury Mechanism & All Other Leading Causes Code"])["Deaths"].sum()\
.sort_values(ascending=False).reset_index().head(5)

Unnamed: 0,Injury Mechanism & All Other Leading Causes,Injury Mechanism & All Other Leading Causes Code,Deaths
0,Non-Injury: Diseases of Heart,GR113 055-068,10939923.0
1,Non-Injury: Malignant neoplasms (Cancers),GR113 020-043,9646498.0
2,"Non-Injury: Cerebrovascular diseases, includin...",GR113-070,2437998.0
3,Non-Injury: Chronic lower respiratory diseases,GR113 082,2280130.0
4,Non-Injury: Alzheimers disease,GR113-052,1257309.0


In [401]:
cause_filter = ["GR113 055-068","GR113 020-043","GR113-070","GR113 082","GR113-052"]
deaths_other_year_filt = deaths_other_year[deaths_other_year[
    "Injury Mechanism & All Other Leading Causes Code"].isin(cause_filter)]

deaths_other_year_filt = deaths_other_year_filt.rename(columns={"Injury Mechanism & All Other Leading Causes":"Cause"}) 
deaths_drugs_year = deaths_drugs_year.rename(columns={"ICD Chapter":"Cause"})
deaths_drugs_year["Cause"] = "Drug overdose related causes"

deaths_year = pd.concat([deaths_other_year_filt[["Cause", "Year","Deaths","Population","Age Adjusted Rate"]],
                         deaths_drugs_year[["Cause", "Year","Deaths","Population","Age Adjusted Rate"]]
                        ], axis = 0,ignore_index=True)
deaths_year["Age Adjusted Rate"] = deaths_year["Age Adjusted Rate"].astype("float")
deaths_year = deaths_year.sort_values(["Cause","Year"])

### Compare to other causes of death in the past years

In [402]:
deaths_year.to_csv("processed_data/deaths_year.csv",index=False)

In [403]:
deaths_year_comp = deaths_year.loc[deaths_year["Year"].isin([1999.0,2015.0])]
deaths_year = deaths_year.sort_values(["Cause","Year"])
deaths_year_comp["Growth since 1999"] = deaths_year_comp.groupby("Cause")["Age Adjusted Rate"].pct_change()*100
deaths_year_comp["Cause"] = deaths_year_comp["Cause"].str.replace("Non-Injury: ","")
deaths_year_comp

Unnamed: 0,Cause,Year,Deaths,Population,Age Adjusted Rate,Growth since 1999
85,Drug overdose related causes,1999.0,8867.0,279040168.0,3.2,
101,Drug overdose related causes,2015.0,30411.0,321418820.0,9.4,193.75
17,Alzheimers disease,1999.0,44536.0,279040168.0,16.5,
33,Alzheimers disease,2015.0,110561.0,321418820.0,29.4,78.181818
51,"Cerebrovascular diseases, including stroke",1999.0,167366.0,279040168.0,61.6,
67,"Cerebrovascular diseases, including stroke",2015.0,140323.0,321418820.0,37.6,-38.961039
68,Chronic lower respiratory diseases,1999.0,124181.0,279040168.0,45.4,
84,Chronic lower respiratory diseases,2015.0,155041.0,321418820.0,41.6,-8.370044
34,Diseases of Heart,1999.0,725192.0,279040168.0,266.4,
50,Diseases of Heart,2015.0,633842.0,321418820.0,168.5,-36.749249


In [404]:
deaths_year_comp.to_csv("processed_data/deaths_year_comp.csv",index=False)

In [406]:
# Create chart with change over time

df = deaths_year_comp[deaths_year_comp["Year"]==2015.0].sort_values("Growth since 1999")
x = "Growth since 1999"
y = "Cause"
category = "Cause"

scl = {
    'Drug overdose related causes': 'rgb(222,45,38)',
    'Alzheimers disease': 'rgb(251,106,74)',
    'Cerebrovascular diseases, including stroke': 'rgb(153,216,201)',
    'Chronic lower respiratory diseases': 'rgb(153,216,201)',
    'Diseases of Heart': 'rgb(153,216,201)',
    'Malignant neoplasms (Cancers)': 'rgb(153,216,201)'
                 }

groups = df[category].unique()
data = []

for item in groups: 
    data.append(
        Bar(
            x = df[df[category]==item][x],
            y = df[df[category]==item][y],
            name = item,
            text = df[df[category]==item][x].round(0).astype("str")+"%",
            textposition = 'auto',
            orientation = 'h',
            marker=dict(color = scl[item])
            )
        )


layout = dict(
    title = 'Mortality rate change from selected causes: 2015 vs. 1999',
    showlegend = False,
    margin=Margin(
        l=260
    ),
)

# data = [trace]

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='deaths_years')

While the incidence for four out of five top non-injury causes of death declined between 1999 and 2015, the death rate for drug overdose related causes increased almost 3 times!

### Create a chart with deaths over time

In [517]:
# Create chart with rates over time

def plotly_figure_line(df,x,y,group=None,group_colors=None,mode="lines+markers",
                       title="",xaxis_title="",yaxis_title=""):
    
    if group is not None:
        pass
    else:
        df["trace_id"] = "1" # set up a dummy variable to iterate over
        group = "trace_id" 
        
    if group_colors is not None:
        pass
    else:
        group_colors = {"1":"rgb(0,0,0)"}

    groups = df[group].unique()
    
    data = []
    
    for item in groups: 
        data.append(
            Scatter(
                x = df[df[group]==item][x],
                y = df[df[group]==item][y],
                name = item,
                mode = mode,
                line = dict(
                        width = 2,
                        color = group_colors[item],
                        shape='spline'
                    )
                )
            )

    return data

df = deaths_year[deaths_year["Cause"]=="Drug overdose related causes"]
x = "Year"
y = "Deaths"
category = "Cause"
title='Deaths from drug overdose related causes: 1999 - 2015'
xaxis_title=""
yaxis_title="Deaths per year"


data = plotly_figure_line(df=df,x=x,y=y,group=None,group_colors = None,mode="lines+markers")

layout = dict(title = title,xaxis = dict(title=xaxis_title),yaxis = dict(title=yaxis_title))

fig = dict(data=data, layout=layout)

py.iplot(fig, filename='deaths_years')

### Load and process deaths by demographic

In [409]:
state_codes = pd.read_csv("processed_data/us_state_codes.csv", sep=";")

In [410]:
deaths_dem = pd.read_csv("data/Compressed Mortality_1999-2015_age_gender.txt",sep="\t")

In [411]:
deaths_dem.head()

Unnamed: 0,Notes,State,State Code,Year,Year Code,Age Group,Age Group Code,Gender,Gender Code,Deaths,Population,Crude Rate
0,,Alabama,1.0,1999.0,1999.0,25-34 years,25-34,Female,F,15.0,308250.0,4.9 (Unreliable)
1,,Alabama,1.0,1999.0,1999.0,25-34 years,25-34,Male,M,27.0,300073.0,9.0
2,,Alabama,1.0,1999.0,1999.0,35-44 years,35-44,Female,F,22.0,351301.0,6.3
3,,Alabama,1.0,1999.0,1999.0,35-44 years,35-44,Male,M,29.0,335631.0,8.6
4,,Alabama,1.0,1999.0,1999.0,45-54 years,45-54,Female,F,19.0,301328.0,6.3 (Unreliable)


In [412]:
# Clean the data
deaths_dem = deaths_dem[deaths_dem.Year.notnull()]

# The death rate is unreliable if the enumerator is below 20
deaths_dem["rate_unreliable"] = 0
deaths_dem.loc[deaths_dem["Deaths"]<20,"rate_unreliable"] = 1
deaths_dem["Crude Rate"] = deaths_dem["Crude Rate"].str.replace("[a-zA-Z\(\)]", "")
deaths_dem["Crude Rate"] = deaths_dem["Crude Rate"].astype("float")

deaths_dem.drop(["Notes","Age Group Code","Year Code","Gender Code"],axis=1,inplace=True)

In [413]:
# Add state names

deaths_dem = deaths_dem.merge(state_codes, how='left', left_on="State", right_on="state_territory")
deaths_dem.drop(["State Code","state_territory"],axis=1,inplace=True)

In [414]:
deaths_dem.head()

Unnamed: 0,State,Year,Age Group,Gender,Deaths,Population,Crude Rate,rate_unreliable,code
0,Alabama,1999.0,25-34 years,Female,15.0,308250.0,4.9,1,AL
1,Alabama,1999.0,25-34 years,Male,27.0,300073.0,9.0,0,AL
2,Alabama,1999.0,35-44 years,Female,22.0,351301.0,6.3,0,AL
3,Alabama,1999.0,35-44 years,Male,29.0,335631.0,8.6,0,AL
4,Alabama,1999.0,45-54 years,Female,19.0,301328.0,6.3,1,AL


In [415]:
deaths_dem.to_csv("processed_data/deaths_dem.csv",index=False)

### Load and clean the data on drug types 

In [558]:
deaths_drugs = pd.read_csv("data/VSRR_Provisional_Drug_Overdose_Death_Counts.csv")

In [559]:
deaths_drugs.head()

Unnamed: 0,State,State Name,Year,Month,Period,Indicator,Data Value,Percent Complete,Percent Pending Investigation,Footnote
0,,Alaska,2015,January,12 month-ending,Number of Deaths,4034,100,0.0,
1,,Alaska,2015,February,12 month-ending,Number of Deaths,4084,100,0.0,
2,,Alaska,2015,March,12 month-ending,Number of Deaths,4101,100,0.0,
3,,Alaska,2015,April,12 month-ending,Number of Deaths,4133,100,0.0,
4,,Alaska,2015,May,12 month-ending,Number of Deaths,4196,100,0.0,


In [560]:
# deaths_drugs["Indicator"].unique()
exclude = ['Number of Deaths', 'Number of Drug Overdose Deaths','Percent with drugs specified']


# Filter to the drug types only
deaths_drugs = deaths_drugs[~deaths_drugs["Indicator"].isin(exclude)]
deaths_drugs["Indicator"] = deaths_drugs["Indicator"].str.replace("\(T.+\)","").str.strip()

In [561]:
deaths_drugs.to_csv("processed_data/deaths_drugs.csv", index=False)

In [562]:
deaths_drugs = pd.read_csv("processed_data/deaths_drugs.csv")

# Setup continuous x axis
deaths_drugs["year_month"] = deaths_drugs["Year"].astype("str")+" "+deaths_drugs["Month"].astype("str")

df = deaths_drugs[(deaths_drugs["State Name"]=="United States") & 
                  (deaths_drugs["Month"].isin(["January","July"]))]

# Setup the colorscale
levels = deaths_drugs.Indicator.unique()
colors = ['rgb(178,24,43)','rgb(239,138,98)','rgb(223,189,179)','rgb(150,150,150)','rgb(93,93,93)','rgb(77,77,177)']
scl = dict([item for item in zip(levels,reversed(colors))])


data = plotly_figure_line(df=df,x="year_month",y="Data Value",group="Indicator",
                          group_colors = scl,mode="lines+markers")

title = "12 Month-ending Provisional Counts of Drug Overdose Deaths by Drug"
xaxis_title = ""
yaxis_title = "Provisional count of deaths"


layout = dict(title = title,
              xaxis = dict(title=xaxis_title),
              yaxis = dict(title=yaxis_title))

fig = dict(data=data, layout=layout)

py.iplot(fig, filename='deaths_drugs')

### Create a choropleth of states

In [416]:
state_deaths = pd.read_csv("data/Compressed Mortality_by_state_overdose_related_1999-2015.txt",sep="\t")

In [417]:
state_deaths.head()

Unnamed: 0,Notes,State,State Code,Year,Year Code,Deaths,Population,Crude Rate,Age Adjusted Rate
0,,Alabama,1.0,1999.0,1999.0,169.0,4430141.0,3.8,3.8
1,,Alabama,1.0,2000.0,2000.0,197.0,4447100.0,4.4,4.5
2,,Alabama,1.0,2001.0,2001.0,216.0,4467634.0,4.8,4.9
3,,Alabama,1.0,2002.0,2002.0,211.0,4480089.0,4.7,4.8
4,,Alabama,1.0,2003.0,2003.0,197.0,4503491.0,4.4,4.4


In [418]:
# Clean the data
state_deaths = state_deaths[state_deaths.Year.notnull()]

# The death rate is unreliable if the enumerator is below 20
state_deaths["aar_unreliable"] = 0
state_deaths.loc[state_deaths["Deaths"]<20,"aar_unreliable"] = 1
state_deaths["Age Adjusted Rate"] = state_deaths["Age Adjusted Rate"].str.replace("[a-zA-Z\(\)]", "")
state_deaths["Age Adjusted Rate"] = state_deaths["Age Adjusted Rate"].astype("float")

state_deaths.drop(["Notes","Crude Rate","Year Code"],axis=1,inplace=True)

In [419]:
# Add state names

state_deaths = state_deaths.merge(state_codes, how='left', left_on="State", right_on="state_territory")
state_deaths.drop(["State Code","state_territory"],axis=1,inplace=True)

In [420]:
state_deaths.loc[state_deaths["Year"]==2015]["Age Adjusted Rate"].describe()

count    51.000000
mean     17.813725
std       6.904231
min       6.900000
25%      13.150000
50%      16.200000
75%      21.050000
max      41.500000
Name: Age Adjusted Rate, dtype: float64

In [421]:
state_deaths.to_csv("processed_data/state_deaths.csv",index=False)

In [422]:
#  Make choropleth map

df = state_deaths.loc[state_deaths["Year"]==2015]

# Set up the colorscale

colors = ['rgb(254,240,217)','rgb(253,204,138)','rgb(252,91,89)','rgb(255,38,11)']
levels = [0.0,0.25,0.5,1.0]
scl = [list(item) for item in zip(levels,colors)]


df['text'] = df['State']+ '<br>' +\
    'Deaths per 100k population: '+df['Age Adjusted Rate'].astype(str)

data = [ dict(
        type='choropleth',
        colorscale = scl,
        autocolorscale = False,
        reversescale = False,
        locations = df['code'],
        z = df['Age Adjusted Rate'].astype("float"),
        locationmode = 'USA-states',
        text = df['text'],
        marker = dict(
            line = dict (
                color = 'rgb(50,50,50)',
                width = 1
            ) ),
        colorbar = dict(
            title = "Deaths per 100k population")
        ) ]

layout = dict(
        title = '2015: Death rates due to drug overdose',
        geo = dict(
            scope='usa',
            projection=dict( type='albers usa' ),
            showlakes = True,
            lakecolor = 'rgb(255, 255, 255)'),
             )
    
fig = dict( data=data, layout=layout )
py.iplot( fig, filename='d3-cloropleth-map' )

### 2. Investigate the data per state

In [423]:
state_filter = "West Virginia"
df = state_deaths[state_deaths["State"] == state_filter]
x = "Year"
y = "Age Adjusted Rate"
title = state_filter + " : Deaths per 100k population from drug overdose related causes: 1999 - 2015"
xaxis_title=""
yaxis_title="Deaths per 100k population"

# Add reliability indicator
df["marker"] = "circle"
df.loc[df["aar_unreliable"]==1,"marker"] = "circle-open"


data = []

data.append(
            Scatter(
                x = df[x],
                y = df[y],
                name = state_filter,
                mode = "lines+markers",
                line = dict(
                        width = 2,
                        color = "rgb(0,0,0)",
                        shape='spline'
                    ),
                marker = dict(symbol=df["marker"])
                )
            )

# Add a trace with the national median rate
state_deaths_median = state_deaths.groupby(["Year"])["Age Adjusted Rate"].median().reset_index()

trace_median = Scatter(
                x = state_deaths_median[x],
                y = state_deaths_median[y],
                name = "National Median",
                mode = "lines",
                line = dict(
                        width = 2,
                        color = "rgb(50,50,50)",
                        dash = 'dot',
                        shape='spline'
                    )
                )

data.append(trace_median)

layout = dict(title = title,xaxis = dict(title=xaxis_title),yaxis = dict(title=yaxis_title),
              legend=dict(xanchor='left',x=0.05))
fig = dict(data=data, layout=layout)

py.iplot(fig, filename='state_deaths_years')

In [424]:
deaths_dem.head(1)

Unnamed: 0,State,Year,Age Group,Gender,Deaths,Population,Crude Rate,rate_unreliable,code
0,Alabama,1999.0,25-34 years,Female,15.0,308250.0,4.9,1,AL


In [425]:
# state_filter = "Nebraska"
deaths_dem_comp = deaths_dem.loc[(deaths_dem["State"] == state_filter) 
                                 & (deaths_dem["Year"].isin([1999.0,2015.0]))]
deaths_dem_comp = deaths_dem_comp.sort_values(["Gender","Age Group","Year"])
deaths_dem_comp["Growth since 1999"] = deaths_dem_comp.groupby(["Gender","Age Group"])["Crude Rate"].pct_change()*100
deaths_dem_comp["Growth since 1999"] = deaths_dem_comp["Growth since 1999"].round(0).fillna(0).astype("int")

# Add rate reliability indicator if either 1999 or 2015 rates are unreliable
deaths_dem_comp["base_rate_unrel"] = deaths_dem_comp.groupby(["Gender","Age Group"])["rate_unreliable"].shift()
deaths_dem_comp["base_rate_unrel"] = deaths_dem_comp["base_rate_unrel"].fillna(0)
deaths_dem_comp["rate_unreliable"] = deaths_dem_comp["rate_unreliable"]+deaths_dem_comp["base_rate_unrel"]

In [433]:
# deaths_dem.loc[(deaths_dem["State"] == state_filter)].sort_values("Year")

In [432]:
# deaths_dem_comp[["State","code","Gender","Age Group","Year","Crude Rate","Growth since 1999","rate_unreliable"]]

In [428]:
# Create chart with change over time by age+gender

df = deaths_dem_comp[deaths_dem_comp["Year"]==2015.0].sort_values("Age Group")
y = "Growth since 1999"
x = "Age Group"
title="Drug overdose mortality rate change: 2015 vs. 1999"
xaxis_title="Age Group"
yaxis_title="Rate change vs. 1999"

genders = list(df.Gender.unique())
df["warning"] = ""
df.loc[df.rate_unreliable==1.0,"warning"] = "*"

scl = {"Male":'rgb(241,163,64)', "Female":'rgb(153,142,195)'}

data = []

for gender in genders:
    data.append(
        Bar(
            x = df[(df["Gender"]==gender)][x],
            y = df[(df["Gender"]==gender)][y],
            name = gender,
            text = df[(df["Gender"]==gender)][y].round(1).astype("str")+"%"+df[(df["Gender"]==gender)]["warning"],
            textposition = 'outside',
            marker=dict(color = scl[gender])
        )
    )

layout = dict(
    title = title, xaxis=dict(title=xaxis_title),
    yaxis=dict(title=yaxis_title),
    showlegend = True,
    barmode='group'
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='deaths_dem_comp')

### Understand the difference between urban / rural regions

In [429]:
urb_deaths = pd.read_csv("data/Compressed_Mortality_1999-2015_year_state_urb.txt",sep="\t")

# Clean the data
urb_deaths = urb_deaths[urb_deaths.Year.notnull()]

# The death rate is unreliable if the enumerator is below 20
urb_deaths["aar_unreliable"] = 0
urb_deaths.loc[urb_deaths["Deaths"]<20,"aar_unreliable"] = 1
urb_deaths["Age Adjusted Rate"] = urb_deaths["Age Adjusted Rate"].str.replace("[a-zA-Z\(\)]", "")
urb_deaths["Age Adjusted Rate"] = urb_deaths["Age Adjusted Rate"].astype("float")
urb_deaths["urb_code"]= urb_deaths.apply(lambda x: [x["2013 Urbanization Code"],x["2013 Urbanization"]],axis=1)
urb_deaths.drop(["Notes","Crude Rate","Year Code", "State Code"],axis=1,inplace=True)

urb_deaths.head(5)

urb_deaths.to_csv("processed_data/urb_deaths.csv",index=False)

In [430]:
groups

array(['Cerebrovascular diseases, including stroke', 'Diseases of Heart',
       'Malignant neoplasms (Cancers)',
       'Chronic lower respiratory diseases', 'Alzheimers disease',
       'Drug overdose related causes'], dtype=object)

In [431]:
state_filter = "West Virginia"
df = urb_deaths[(urb_deaths["State"] == state_filter) & (urb_deaths["Year"].isin([1999,2015]))]\
.sort_values(["Year","2013 Urbanization Code"])

x = "Year"
y = "Age Adjusted Rate"
group = "2013 Urbanization Code"
title = state_filter + " : Deaths per 100k population from drug overdose related causes: 1999 - 2015"
xaxis_title=""
yaxis_title="Deaths per 100k population"

# Add reliability indicator
df["marker"] = "circle"
df.loc[df["aar_unreliable"]==1,"marker"] = "x"

# Add colorscale

ids = [1,2,3,4,5,6]
colors = ['rgb(140,81,10)','rgb(216,179,101)','rgb(146,132,95)','rgb(159,204,199)','rgb(90,180,172)','rgb(1,102,94)']
levels = ["Large Central Metro","Large Fringe Metro","Medium Metro"
          ,"Small Metro","Micropolitan (non-metro)","NonCore (non-metro)"]
col_levels = [item for item in zip(colors[::-1],levels)]
scl = dict(zip(ids,col_levels))

groups = sorted(list(df[group].unique()))

data = []

for item in groups:

    data.append(
            Scatter(
                x = df[df[group]==item][x],
                y = df[df[group]==item][y],
                name = scl[item][1],
                mode = "lines+markers",
                line = dict(
                        width = 2,
                        color = scl[item][0],
                        shape='spline'
                    ),
                marker = dict(symbol=df["marker"],size=8)
                )
            )


layout = dict(title = title
              ,xaxis = dict(title=xaxis_title,type="category")
              ,yaxis = dict(title=yaxis_title)
              ,legend = dict(traceorder="normal"))
fig = dict(data=data, layout=layout)

py.iplot(fig, filename='state_deaths_urb')

### Technical notes

The data presented in these charts was queried using a GUI tools [CDC WONDER](http://wonder.cdc.gov/cmf-icd10.html) and [NVSS Vital Statistics Rapid Release](https://www.cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm). 
Drug overdose related deaths are those classified as fitting at least one the following underlying cause-of-death codes from the Tenth Revision of ICD (ICD–10): X40–X44 (unintentional), X60–X64 (suicide), X85 (homicide), and Y10–Y14 (undetermined). Drug overdose deaths involving selected drug categories are identified by specific multiple cause-of-death codes. 

Further documentation is available [here](https://wonder.cdc.gov/wonder/help/cmf.html#).

Excluding the chart with the split by demograpics, the data shows an age-adjusted death rate. The age-adjusted rate is used for the following methodological reason cited on the [CDC website](https://wonder.cdc.gov/wonder/help/cmf.html#Age-Adjusted Rates):
>The rates of almost all causes of death vary by age. Age adjustment is a technique for "removing" the effects of age from crude rates, so as to allow meaningful comparisons across populations with different underlying age structures. For example, comparing the crude rate of heart disease in Florida to that of California is misleading, because the relatively older population in Florida will lead to a higher crude death rate, even if the age-specific rates of heart disease in Florida and California are the same. For such a comparison, age-adjusted rates are preferable. Age-adjusted rates should be viewed as relative indexes rather than as direct or actual measures of mortality risk.
  

### Reference

1) Centers for Disease Control and Prevention, National Center for Health Statistics. Compressed Mortality File 1999-2015 on CDC WONDER Online Database, released December 2016. Data are from the Compressed Mortality File 1999-2015 Series 20 No. 2U, 2016, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/cmf-icd10.html on Dec 3, 2017 9:27:17 AM
  
2) Ahmad FB, Rossen LM, Spencer MR, Warner M, Sutton P. Provisional drug overdose death counts. National Center for Health Statistics. 2017. Accessed at https://www.cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm on Dec 10, 2017 9:00:00 AM
 
3) US state codes: https://www.50states.com/abbreviations.htm  
4) Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/  
5) Google Trends on "Opioid Epidemic": https://trends.google.com/trends/explore?geo=US&q=opioid%20epidemic  
6) CDC definition of the epidemic: https://www.cdc.gov/ophss/csels/dsepd/ss1978/lesson1/section11.html  
