# Dataquest Curriculum Project 2: Exploring Gun Deaths in the US

This [dataset](https://github.com/fivethirtyeight/guns-data) comes from [FiveThirtyEight](https://github.com/fivethirtyeight) and contains information on each gun death that occurred in the United States from 2012 to 2014. A row represents a single death, and each column represents demographic or other information about the victim:

- ` ` -- row number
- `year` -- the year in which the fatality occurred.
- `month` -- the month in which the fatality occurred.
- `intent` -- the intent of the perpetrator of the crime. This can be `Suicide`, `Accidental`, `NA`, `Homicide`, or `Undetermined`.
- `police` -- whether a police officer was involved with the shooting. Either `0` (false) or `1` (true).
- `sex` -- the gender of the victim. Either `M` or `F`.
- `age` -- the age of the victim.
- `race` -- the race of the victim. Either `Asian/Pacific Islander`, `Native American/Native Alaskan`, `Black`, `Hispanic`, or `White`.
- `hispanic` -- a code indicating the Hispanic origin of the victim.
- `place` -- where the shooting occurred.
- `education` -- educational status of the victim. Can be one of the following:
    - `1` -- Less than High School
    - `2` -- Graduated from High School or equivalent
    - `3` -- Some College
    - `4` -- At least graduated from College
    - `5` -- Not available

In [60]:
# open the csv file
import csv
file = open("guns.csv", 'r')
data = list(csv.reader(file))

for row in data[:5]:
    print(row)

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4']
['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']
['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']
['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']


In [61]:
# extract the header
header = data[0]
data = data[1:]

for row in data[:5]:
    print(row)

['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4']
['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']
['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']
['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']
['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']


# Count total deaths for each year

In [62]:
# extract the year column
years = [row[1] for row in data]

# keep counts
year_counts = {}
for element in years:
    if element in year_counts:
        year_counts[element] += 1
    else:
        year_counts[element] = 1
        
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

*Seem to be consistent, year-to-year.*

# Examine total deaths by month, year

In [63]:
import datetime
# extract the year & month as a datetime object, 
# set day as fixed because we don't have that data
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) 
         for row in data]

dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [64]:
date_counts = {}
for element in dates:
    if element in date_counts:
        date_counts[element] += 1
    else:
        date_counts[element] = 1

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

*Seem to be fairly consistent, month-to-month, but a graph might better reveal certain trends.*

# Examine total deaths by sex

In [101]:
# extract the sex column
sexes = [row[5] for row in data]

sex_counts = {}
for element in sexes:
    if element in sex_counts:
        sex_counts[element] += 1
    else:
        sex_counts[element] = 1

sex_counts

{'F': 14449, 'M': 86349}

*Male deaths far outnumber female deaths*

# Examine total deaths by intent

In [104]:
# extract the intent column
intents = [row[3] for row in data]

intents[:5]

['Suicide', 'Suicide', 'Suicide', 'Suicide', 'Suicide']

In [105]:
intent_counts = {}
for intent in intents:
    if intent in intent_counts:
        intent_counts[intent] += 1
    else:
        intent_counts[intent] = 1
        
intent_counts

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63175,
 'Undetermined': 807}

*The majority of gun deaths are caused by suicide.*

# Examine total deaths by location

In [102]:
# extract the locations column
locations = [row[9] for row in data]

location_counts = {}
for location in locations:
    if location in location_counts:
        location_counts[location] += 1
    else:
        location_counts[location] = 1
        
location_counts

{'Farm': 470,
 'Home': 60486,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

*The overwhelming majority of gun deaths occur at home. This is likely related to the fact that the majority of gun deaths are caused by Suicide.*

# Examine total deaths by education

In [103]:
# extract the education column
educations = [row[10] for row in data]

education_counts = {}
for education in educations:
    if education in education_counts:
        education_counts[education] += 1
    else:
        education_counts[education] = 1
        
education_counts

{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

*College graduates (category 5) experienced the fewest gun deaths (of the categories in which education level was definitive), and high school graduates (category 2) experienced the most.* 

*Regarding college graduates (category 5) - Additional data/information from other areas would be needed to understand why this is the lowest group, though one might speculate this relates to many causes, a major one being socioeconomic status.*

*Regarding high school graduates (category 2) - A quick [Google search](https://www.census.gov/newsroom/press-releases/2017/educational-attainment-2017.html) shows that over 80% of Americans over age 25 have a high school diploma (and it's met this threshold this from 2000-2017), so more gun death victims being high school-educated than gun death victims who are not high school-educated (category 1) may be partly due to the fact that more Americans happen to be high school educated than not.*

# Examine total deaths by race

In [66]:
# extract the race column
races = [row[7] for row in data]

race_counts = {}
for element in races:
    if element in race_counts:
        race_counts[element] += 1
    else:
        race_counts[element] = 1
        
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

*We would want to compare these numbers (expressed as proportions) with the racial proportions of the US general population*

In [67]:
total_deaths = 0
gun_death_race_proportions = {}
gun_death_race_percentages = {}

for race, deaths in race_counts.items():
    total_deaths += deaths
    
for race, deaths in race_counts.items():
    proportion = deaths/total_deaths
    gun_death_race_proportions[race] = proportion
    gun_death_race_percentages[race] = "{:.2%}".format(proportion)
    
gun_death_race_proportions

{'Asian/Pacific Islander': 0.01315502291712137,
 'Black': 0.23111569674001467,
 'Hispanic': 0.0895057441615905,
 'Native American/Native Alaskan': 0.009097402726244569,
 'White': 0.6571261334550289}

In [68]:
gun_death_race_percentages

{'Asian/Pacific Islander': '1.32%',
 'Black': '23.11%',
 'Hispanic': '8.95%',
 'Native American/Native Alaskan': '0.91%',
 'White': '65.71%'}

## Compare with Racial Proportions from the US Census

In [69]:
# open the US Census Data
census_file = open("census.csv", 'r')
census = list(csv.reader(census_file))

for row in census:
    print(row)

['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races']
['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']


In [70]:
populations = census[1]

# map the census totals to our categories
mapping = {    
    "Asian/Pacific Islander" : int(populations[14]) + int(populations[15]),
    # added "Race Alone-Asian" with "Race Alone-Pacific Islander"
    "Black" : int(populations[12]),
    "Native American/Native Alaskan" : int(populations[13]),
    "Hispanic" : int(populations[11]),
    "White" : int(populations[10]) 
}

mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

*Total populations for each race*

In [71]:
total_population = 0
race_population_proportions = {}
race_population_percentages = {}

# calculate total
for race, population in mapping.items():
    total_population += population

# create proportions and percentages by dividing over the total
for race, population in mapping.items():
    proportion = population / total_population
    race_population_proportions[race] = proportion
    race_population_percentages[race] = "{:.2%}".format(proportion)
    
race_population_proportions

{'Asian/Pacific Islander': 0.05247239703595832,
 'Black': 0.1333856570223443,
 'Hispanic': 0.1478589157790168,
 'Native American/Native Alaskan': 0.012392263246256828,
 'White': 0.6538907669164238}

*Population proportions for each race when compared to the total population*

### Side-by-side race comparison of percentage of gun death rates to percentage of general population

In [72]:
gun_death_race_percentages

{'Asian/Pacific Islander': '1.32%',
 'Black': '23.11%',
 'Hispanic': '8.95%',
 'Native American/Native Alaskan': '0.91%',
 'White': '65.71%'}

In [73]:
race_population_percentages

{'Asian/Pacific Islander': '5.25%',
 'Black': '13.34%',
 'Hispanic': '14.79%',
 'Native American/Native Alaskan': '1.24%',
 'White': '65.39%'}

### Gun deaths in each race per 100,000 people in that race

In [75]:
race_per_hundredk = {}
for race, death_count in race_counts.items():
    race_population = mapping[race]
    race_per_hundredk[race] = death_count / race_population * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

# Examine gun deaths caused by homicide, for each race

In [76]:
# extract the intent column
intents = [row[3] for row in data]

intents[:5]

['Suicide', 'Suicide', 'Suicide', 'Suicide', 'Suicide']

In [77]:
# extract the race column
races = [row[7] for row in data]

races[:5]

['Asian/Pacific Islander', 'White', 'White', 'White', 'White']

In [78]:
homicide_race_counts = {}

for i, race in enumerate(races):
    # only add to counts if the Intent was Homicide
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1

homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

## Rates per 100,000

In [79]:
# these are the population totals from the US Census
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [82]:
# calculate per 100,000 in each race population
homicide_race_per_hundredk = {}
for race, homicide_death_count in homicide_race_counts.items():
    race_population = mapping[race]
    homicide_race_per_hundredk[race] = homicide_death_count / race_population * 100000
    
homicide_race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

In [85]:
# round the numbers
homicide_race_per_hundredk_rounded = {}

for race, number in homicide_race_per_hundredk.items():
    homicide_race_per_hundredk_rounded[race] = float("{:.3}".format(number))

homicide_race_per_hundredk_rounded

{'Asian/Pacific Islander': 3.53,
 'Black': 48.5,
 'Hispanic': 12.6,
 'Native American/Native Alaskan': 8.72,
 'White': 4.64}

# Is there a link between month and homicide rate?

In [87]:
# Recall that we stored the month, year pair for each death
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [92]:
homicide_month_counts = {}
for i, date in enumerate(dates):
    if intents[i] == "Homicide":
        if date.month in homicide_month_counts: 
            homicide_month_counts[date.month] += 1
        else:
            homicide_month_counts[date.month] = 1
        
homicide_month_counts

{1: 2829,
 2: 2178,
 3: 2780,
 4: 2845,
 5: 2976,
 6: 3130,
 7: 3269,
 8: 3125,
 9: 2966,
 10: 2968,
 11: 2919,
 12: 3191}

*Looking at absolute month totals over 2012-2014 inherently won't reveal much, but perhaps we can look at the top 3 months for each year*

In [93]:
homicide_month_year_counts = {}
for i, date in enumerate(dates):
    if intents[i] == "Homicide":
        if date in homicide_month_year_counts: 
            homicide_month_year_counts[date] += 1
        else:
            homicide_month_year_counts[date] = 1

homicide_month_year_counts

{datetime.datetime(2012, 1, 1, 0, 0): 972,
 datetime.datetime(2012, 2, 1, 0, 0): 749,
 datetime.datetime(2012, 3, 1, 0, 0): 966,
 datetime.datetime(2012, 4, 1, 0, 0): 999,
 datetime.datetime(2012, 5, 1, 0, 0): 1003,
 datetime.datetime(2012, 6, 1, 0, 0): 1044,
 datetime.datetime(2012, 7, 1, 0, 0): 1160,
 datetime.datetime(2012, 8, 1, 0, 0): 1090,
 datetime.datetime(2012, 9, 1, 0, 0): 1070,
 datetime.datetime(2012, 10, 1, 0, 0): 979,
 datetime.datetime(2012, 11, 1, 0, 0): 978,
 datetime.datetime(2012, 12, 1, 0, 0): 1083,
 datetime.datetime(2013, 1, 1, 0, 0): 986,
 datetime.datetime(2013, 2, 1, 0, 0): 721,
 datetime.datetime(2013, 3, 1, 0, 0): 923,
 datetime.datetime(2013, 4, 1, 0, 0): 916,
 datetime.datetime(2013, 5, 1, 0, 0): 955,
 datetime.datetime(2013, 6, 1, 0, 0): 1066,
 datetime.datetime(2013, 7, 1, 0, 0): 1137,
 datetime.datetime(2013, 8, 1, 0, 0): 1000,
 datetime.datetime(2013, 9, 1, 0, 0): 954,
 datetime.datetime(2013, 10, 1, 0, 0): 1009,
 datetime.datetime(2013, 11, 1, 0, 0): 9

**Top 3 months for 2012**
1. `August` : `1090`
2. `December` : `1083`
3. `September`: `1070`

**Top 3 months for 2013**
1. `June` : `1066`
2. `December` : `1028`
3. `October` : `1009`

**Top 3 months for 2014**
1. `December` : `1080`
2. `August` : `1035`
3. `June` : `1020`

*It seems like December consistenly shows up among the top 3 months in which homicide-caused gun deaths occurred during a given year. June and August perhaps as well, though they didn't show up in the top 3 every single year. Truly, the best way to find a relationship would be to plot a graph; however, this is not covered in the curriculum yet.*

# Gun deaths by homicide, for each sex

In [96]:
# Recall that we stored the sex of the victim for each death
sexes[:5]

['M', 'F', 'M', 'M', 'M']

In [95]:
homicide_sex_counts = {}

for i, sex in enumerate(sexes):
    if intents[i] == "Homicide":
        if sex in homicide_sex_counts: 
            homicide_sex_counts[sex] += 1
        else:
            homicide_sex_counts[sex] = 1
        
homicide_sex_counts

{'F': 5373, 'M': 29803}

# Gun deaths by accident, for each sex

In [97]:
accidental_sex_counts = {}

for i, sex in enumerate(sexes):
    if intents[i] == "Accidental":
        if sex in accidental_sex_counts: 
            accidental_sex_counts[sex] += 1
        else:
            accidental_sex_counts[sex] = 1
        
accidental_sex_counts

{'F': 218, 'M': 1421}

# Gun deaths by accident, for each race

In [98]:
# Recall that we stored the race of each victim for each death
races[:5]

['Asian/Pacific Islander', 'White', 'White', 'White', 'White']

In [100]:
accidental_race_counts = {}

for i, race in enumerate(races):
    if intents[i] == "Accidental":
        if race in accidental_race_counts:
            accidental_race_counts[race] += 1
        else:
            accidental_race_counts[race] = 1
            
accidental_race_counts

{'Asian/Pacific Islander': 12,
 'Black': 328,
 'Hispanic': 145,
 'Native American/Native Alaskan': 22,
 'White': 1132}