# U.S. Gun Deaths: Exploratory Data Analysis

### Summary
* 1 gun death per year for every 10,000 people in the U.S.
* 63% are suicides, and 35% are homicides
* 86% are male gun deaths. Women attempt suicide 3x more often than men, but men die by suicide 3.5x more often than women.
* the homicide gun death rate for black people is 10 times higher than that of white people

### Source
[FiveThirtyEight](http://fivethirtyeight.com/) gathered data from the [Centers for Disease Control and Prevention](https://www.cdc.gov/) and the [Federal Bureau of Investigation](https://www.fbi.gov/) on gun deaths from 2012 through 2014. They made their dataset available to the public [here](https://github.com/fivethirtyeight/guns-data). Thanks FiveThirtyEight!

In [1]:
# read the data and explore it
from csv import reader
file_object = open('guns.csv', 'r')
data = list(reader(file_object))
header = data[0]
data = data[1:] # remove header
print(header) # first column contains the row number
print(data[0])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4']


In [2]:
# this function will help us explore our data

def data_counter(dataset, i):
    # dataset = list of lists of integers or strings
    # i = integer indexing the column we want to count values for
    # returns a dictionary of counts for each value
    dict_out = {}
    for row in dataset:
        if row[i] in dict_out:
            dict_out[row[i]] += 1
        else:
            try:
                dict_out[row[i]] = 1
            except Exception as err:
                print('the dataset input must be a list of lists of integers or strings')
                print(err)
    return dict_out

#### How many gun deaths happen each year in the U.S.?

In [3]:
# count the 2nd column year values
deaths_by_year = data_counter(data, 1)
print(deaths_by_year)

{'2014': 33599, '2013': 33636, '2012': 33563}


In [4]:
# alternatively, we could convert the year strings to integers
deaths_by_year = {} # initialize dictionary
for death in data:
    year = int(death[1]) # string to integer
    if year in deaths_by_year:
        deaths_by_year[year] += 1
    else:
        deaths_by_year[year] = 1
print(deaths_by_year) # ordered nicely

{2012: 33563, 2013: 33636, 2014: 33599}


#### Answer: about 34,000! This is roughly 1 gun death per year for every 10,000 people living in the US

#### Do gun deaths vary a lot from month to month?

In [5]:
# count the 3rd column month values
deaths_by_month = data_counter(data, 2)
print(deaths_by_month)

{'10': 8406, '09': 8508, '05': 8669, '04': 8455, '06': 8677, '03': 8289, '01': 8273, '02': 7093, '12': 8413, '07': 8989, '11': 8243, '08': 8783}


In [6]:
# alternatively, we could use the datetime module
# create a list of datetime objects for each death
# use 1 as the day of the month as it's unknown
from datetime import datetime
dates = [datetime(year=int(death[1]), month=int(death[2]), day=1) for death in data] 

# count the deaths by month
deaths_by_month = {} # initialize dictionary
for date in dates:
    if date.month in deaths_by_month:
        deaths_by_month[date.month] += 1
    else:
        deaths_by_month[date.month] = 1
print(deaths_by_month) # ordered nicely

{1: 8273, 2: 7093, 3: 8289, 4: 8455, 5: 8669, 6: 8677, 7: 8989, 8: 8783, 9: 8508, 10: 8406, 11: 8243, 12: 8413}


#### Answer: February seems to have an abnormally low ammount of gun deaths, even accounting for the fact that it has less days

#### How do gun deaths vary by intent?

In [7]:
intent_dict = {}
for death in data:
    if death[3] in intent_dict:
        intent_dict[death[3]] += 1
    else:
        intent_dict[death[3]] = 1

total = len(data)
normalized_intent_dict = {}
for intent in intent_dict:
    normalized_intent_dict['%'+intent] = round(intent_dict[intent]/total *100)
print(normalized_intent_dict)

{'%Suicide': 63, '%Accidental': 2, '%NA': 0, '%Undetermined': 1, '%Homicide': 35}


#### Answer: 63% suicide, 35% homicide, 2% other

#### How do gun deaths vary by sex?

In [8]:
# count deaths by sex
deaths_by_sex = data_counter(data, 5) # 6th column is sex
print(deaths_by_sex)

Gun deaths by sex:
{'F': 14449, 'M': 86349}


#### Answer: 14% of guns deaths are of women, and women represent 50% of the US population. That's a big difference! 

#### A [2014 study by the American Foundation For Suicide Prevention](https://en.wikipedia.org/wiki/Suicide_in_the_United_States) shows that Women attempt suicide 3x more often than men, but men die by suicide 3.5x more often than women. This indicates that male suicide attempts are more lethal than female ones.

#### How do gun deaths vary by race? We'll normalize by the populations of each race using the Census Bureau data from 2010 (available [here](https://www.census.gov/2010census/)). Gun deaths are often show per 100,000 people, so we'll show that statistic by race.

In [9]:
from csv import reader
file_object = open('census.csv','r')
census_data = list(reader(file_object))
#print(census_data[0])
#print(census_data[1])
#list(enumerate(census_data[0])) #shows the indexes of each column

In [10]:
# manually match the corresponding populations to the races in our gun deaths data
race_populations = {}
race_populations['Asian/Pacific Islander'] = int(census_data[1][14])+int(census_data[1][15])
race_populations['Black'] = int(census_data[1][12])
race_populations['Native American/Native Alaskan'] = int(census_data[1][13])
race_populations['Hispanic'] = int(census_data[1][11])
race_populations['White'] = int(census_data[1][10])
#print(race_populations)

In [13]:
deaths_by_race = data_counter(data, 7)
normalized_deaths_by_race = {}
for race in race_populations:
    normalized_deaths_by_race[race] = round(deaths_by_race[race]/race_populations[race] * 100000)
print(normalized_deaths_by_race)

{'Black': 58, 'Native American/Native Alaskan': 25, 'Hispanic': 20, 'Asian/Pacific Islander': 8, 'White': 34}


#### Answer: the gun death rate for black people is 1.7 times higher than that of white people, 3 times higher than hispanics, and 7 times higher than asians.

#### What if we only consider homicide gun deaths? What are the homicide rates of gun deaths per 100,000 people by race?

In [14]:
homicide_data = []
for death in data:
    intent = death[3]
    if intent == 'Homicide':
        homicide_data.append(death)

homicides_by_race = data_counter(dataset=homicide_data, i=7)

normalized_homicides_by_race = {}
for race in race_populations:
    normalized_homicides_by_race[race] = round(homicides_by_race[race]/race_populations[race] * 100000)
print(normalized_homicides_by_race)

{'Black': 48, 'Native American/Native Alaskan': 9, 'Hispanic': 13, 'Asian/Pacific Islander': 4, 'White': 5}


#### Answer: the homicide gun death rate for black people is 10 times higher than that of white people, 4 times higher than hispanics, and 12 times higher than asians.