# Dataset exploration

* [Annual mortality and causes by county, 1979-1988](https://www.cdc.gov/nchs/data_access/cmf.htm)
* [Compressed mortality info, 1968-2016](https://wonder.cdc.gov/controller/datarequest/D140)

[Rainfall and turnout](https://faculty.ucmerced.edu/thansford/Articles/The%20Republicans%20Should%20Pray%20for%20Rain%20-%20Weather,%20Turnour,%20and%20Voting%20in%20U.S.%20Presidential%20Elections.pdf)

[Higher temperatures increase suicide rates in the
United States and Mexico](https://web.stanford.edu/~mburke/papers/BurkeEtAl_NCC_2018.pdf)

As temperatures go up, suicide rates go up.
Heat waves, countries or regions that go through heat waves. During those times there were significant differences in suicide rates.

Is there monthly data?

[WHO mortality data](https://www.who.int/healthinfo/statistics/mortality_rawdata/en/)

[CDC Multiple Mortality Cause files](https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Mortality_Multiple)

[Suicides and gun ownership](https://mason.gmu.edu/~atabarro/BriggsTabarrokFirearmsSuicide.pdf)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Suicides by Month, Year, State

`suicides.txt` from https://wonder.cdc.gov/wonder/help/mcd.html, restricted to suicides
* https://wonder.cdc.gov/mcd-icd10.html
* Click Agree
* https://wonder.cdc.gov/controller/datarequest/D77
* Group results by State, Year, Month
* Ages exclude "Not stated"; Hispanic Origin exclude "Not stated"
* Underlying cause of death: X60-X84 (Intentional self-harm)

In [None]:
suicides = pd.read_csv('data/suicides_month_year_state_1999_2017.txt', sep='\t', na_values='Not Applicable')
suicides = suicides.dropna(subset=['State'])
suicides = suicides.drop(columns='Notes')

In [None]:
suicides.head()

In [None]:
suicides['Month'] = suicides['Month Code'].str.slice(-2).astype(np.int64)

In [None]:
suicides.info()

# Need to get population data to calculate rate. 
CDC only provides population data for an annual-level search
* https://wonder.cdc.gov/mcd-icd10.html
* Click Agree
* https://wonder.cdc.gov/controller/datarequest/D77
* Group results by State, Year
* Ages exclude "Not stated"; Hispanic Origin exclude "Not stated"
* Underlying cause of death: X60-X84 (Intentional self-harm)

In [None]:
suicides_pop = pd.read_csv('data/suicides_year_1999_2017.txt', sep='\t', na_values='Not Applicable')
suicides_pop = suicides_pop.dropna(subset=['State'])
suicides_pop = suicides_pop.drop(columns='Notes')
suicides_pop

## Add population numbers to our suicides dataframe

In [None]:
suicides['Population'] = suicides.merge(suicides_pop, on=['State', 'Year'])['Population_y']

In [None]:
suicides['Crude Rate'] = suicides.Deaths / suicides.Population * 100_000

In [None]:
suicides

# Download heat-wave data
* Database: [North America Land Data Assimilation System (NLDAS) Daily Air Temperatures and Heat Index (1979-2011)](https://wonder.cdc.gov/nasa-nldas.html)
* Group results by State, Year, Month
* Dataset goes from 1999 to 2011

In [None]:
heat = pd.read_csv('data/temps_by_state_month_1999_2011.txt', sep='\t', na_values='Missing')
heat = heat.dropna(subset=['State'])
heat['Month'] = heat['Month, Year Code'].str.slice(-2).astype(np.int64)
heat['Year'] = heat['Month, Year Code'].str.slice(0, 4).astype(np.int64)
heat = heat.drop(columns='Notes')
heat

In [None]:
import matplotlib.pyplot as plt
heat[heat.Year == 1999].groupby(['State', 'Month']).mean().query('State == "Alabama"')

In [None]:
heat.query('Year == 2017')

In [None]:
heat.groupby('State').mean()

# Merge suicide and heat data

In [None]:
heat = heat.rename(columns={'Avg Daily Max Air Temperature (F)':'avg_max_t',
                    'Avg Daily Min Air Temperature (F)':'avg_min_t',
                    'Avg Daily Max Heat Index (F)':'avg_max_heat_index',
                    'Month, Year Code' : 'Month Code'})

In [None]:
suicides = suicides.merge(heat[['State', 'Month Code', 'avg_max_t', 'avg_min_t', 'avg_max_heat_index']], 
               on=['State', 'Month Code'])

In [None]:
suicides

In [None]:
suicides['min_t_diff'] = suicides.avg_min_t - suicides.groupby(['State', 'Month']).avg_min_t.transform('mean')
suicides['max_t_diff'] = suicides.avg_max_t - suicides.groupby(['State', 'Month']).avg_max_t.transform('mean')
suicides['heat_index_diff'] = suicides.avg_max_heat_index - suicides.groupby(['State', 'Month']).avg_max_heat_index.transform('mean')

In [None]:
suicides

## Write cleaned dataset to file

In [None]:
suicides.write_csv('data/suicides_heat.csv')

# Exploratory Data Analysis

In [None]:
sns.barplot(x='State', y='Crude Rate', data=suicides)

In [None]:
sns.scatterplot(x='avg_max_t', y='Crude Rate',  data=suicides.query('4 < Month < 10'), hue='Month', legend='full')

In [None]:
sns.scatterplot(x='avg_max_heat_index', y='Crude Rate', data=suicides.query('4 < Month < 10'), hue='Month', legend='full')

In [None]:
sns.scatterplot(x='heat_index_diff', y='Crude Rate', data=suicides.query('4 < Month < 10'), hue='Month')

In [None]:
sns.scatterplot(x='max_t_diff', y='Crude Rate', data=suicides.query('4 < Month < 10'), hue='Month')

In [None]:
sns.scatterplot(x='min_t_diff', y='Crude Rate', data=suicides.query('4 < Month < 10'), hue='Month')

# Download suicide data - county & month level
* Database: Multiple Cause of Death, 1999-2017 (D77)
* Group results by State, County, Year, Month
* Underlying Cause of Death: UCD-ICD10 Code X60-X84


In [None]:
suicides = {}
suicides[1999] = pd.read_csv('data/suicides_by_month/suicides_1999.txt',sep='\t')
suicides[1999].head()


# Heat wave days

In [None]:
df_hw = pd.read_csv('data/heat_wave_days_1981.txt', sep='\t')

In [None]:
df_hw.head()

# CDC API (too hard to use)

In [None]:
# "https://wonder.cdc.gov/controller/datarequest/[database ID]"
base_url = 'https://wonder.cdc.gov/controller/datarequest/'
# D60 = North America Land Data Assimilation System (NLDAS) Daily Air Temperatures and Heat Index (1979-2011)
db_id = 'D60'
params = { 'accept_datause_restrictions' : 'true' }