# Exploring US Gun Deaths

The following data set contains information on gun deaths in the US from 2012 to 2014. Each row represents a single fatality. 

The columns contain demographic and other information about the victim. 

## Introduction to the Data Set

We will open the raw data set, convert it to a list of lists (rows), and view the data set using Python

In [15]:
# import the 'csv' library to automate the reading in and parsing of the data set
import csv

with open(r"C:\projectdatasets\guns.csv", "r") as f:
    reader = csv.reader(f)
    data = list(reader)

In [16]:
# show the first five rows
print(data[0:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+']]


## Removing Headers

We will separate the header row from the data

In [17]:
headers = data[:1]

# show the header row
print(headers)

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]


In [18]:
data = data[1:]

# show the first five rows of the data set
print(data[:5])

[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', 'BA+'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED']]


## Number Of Deaths by Year

We will count the number of deaths that occur for each year, and store the results in a 'dictionary' (of key value pairs)

As each row represents a fatality for a particular year, we just need to count how many times each year occurs in the data set, to find out how many deaths occurred for that year

In [22]:
# create list of years, across all rows
years = [row[1] for row in data]

# show the first hundred values, which just represents the year for each row
print(years[:100])

['2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012', '2012']


In [23]:
# create a dictionary to hold the results
year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:  
        year_counts[year] += 1

# show the dictionary
print(year_counts)

{'2012': 33563, '2013': 33636, '2014': 33599}


## Counting Gun Deaths by Month and Year

We will count the number of deaths that occur for each year and month, and store the result in a dictionary

To do this, we will import the 'datetime' library which allows us to convert column values into a 'datetime' data type, so it is easier to perform date calculations

In [26]:
import datetime

# for each row, parse the date fields to show year, month and day for each row (convert them to integers too)
# as 'day' doesn't exist in the data set, use a default value '1'
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]

# show the first ten rows
dates[:10]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [27]:
# for each instance of year and month we created above, store the count of the number of rows for each (i.e. number of deaths)
date_counts = {}

for date in dates:
    if date not in date_counts:
        date_counts[date] = 0
    date_counts[date] += 1

# show the dictionary
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

We can see that the most gun deaths occurred during the month of July, in 2013 (3079 deaths)

## Counting Gun Deaths By Race And Sex

We will count the number of deaths that occur for each race and sex, and store the results in two separate dictionaries

In [28]:
# create list of sexes, across all rows
sexes = [row[5] for row in data]

# show the first hundred values, which just represents the sex of the person who died, for each row
print(sexes[:100])

['M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'F', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'M', 'M']


In [29]:
# create a dictionary to hold the results
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 0
    sex_counts[sex] += 1

# show the dictionary
sex_counts

{'M': 86349, 'F': 14449}

In [31]:
# create list of races, across all rows
races = [row[7] for row in data]

# show the first twenty values, which just represents the race of the person who died, for each row
print(races[:20])

['Asian/Pacific Islander', 'White', 'White', 'White', 'White', 'Native American/Native Alaskan', 'White', 'Native American/Native Alaskan', 'White', 'Black', 'White', 'Native American/Native Alaskan', 'White', 'Native American/Native Alaskan', 'White', 'Native American/Native Alaskan', 'White', 'Native American/Native Alaskan', 'Asian/Pacific Islander', 'White']


In [32]:
# create a dictionary to hold the results
race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 0
    race_counts[race] += 1

# show the dictionary
race_counts

{'Asian/Pacific Islander': 1326,
 'White': 66237,
 'Native American/Native Alaskan': 917,
 'Black': 23296,
 'Hispanic': 9022}

## Analysis

Gun deaths in the US seem to disproportionately affect men, rather than women. They also seem to disproportionately affect minorities, although to confirm this, we should analyse data on the percentage of each race in the overall US population.

There appears to be some seasonal correlation, with gun deaths peaking in the summer and declining in the winter. 

To follow up, it might be useful to filter by the 'intent' column, to see if this factor has different correlations with the season, race, or gender.

## Reading In Census Data

To properly analyse the proportions of gun deaths per race, we will import another data set containing census data. This data will provide a count of the number of people in the US belonging to each race.

We will open the raw data set, convert it to a list of lists (rows), and view the data set using Python

In [33]:
# import the 'csv' library to automate the reading in and parsing of the data set
import csv

with open(r"C:\projectdatasets\census.csv", "r") as f:
    reader = csv.reader(f)
    census = list(reader)

In [36]:
# show the full data set (which is only two rows)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## Proportion of Gun Deaths By Race

We will first use census data to create a mapping table containing the populations for each race 

Note - we will combine the 'Asian' and 'Native Hawiian and Other Pacific Islander' values together to simplify the lists

In [37]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,        
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956  }

In [44]:
# create a new dictionary
race_per_hundredk = {}

# loop through the key (race type) and its value (number of gun deaths) in the 'race_counts' dictionary we created above
# populate the new dictionary with the key (race type), and its value, which this time, is equal to:
# the number of gun deaths / the value of key in the mapping dictionary (population count)

for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

In [45]:
# show the dictionary
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'White': 33.56849303419181,
 'Native American/Native Alaskan': 24.521955573811088,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907}

## Filtering by Intent

We will perform the above analysis again, but this time explore how the 'Intent' column (filtered on the value 'Homicide') effects the proportion of gun deaths by race

In [48]:
# create list of intent values, across all rows
intents = [row[3] for row in data]

# show the first twenty values, which just represents the intent (or cause of death) of the person who died, for each row
print(intents[:20])

['Suicide', 'Suicide', 'Suicide', 'Suicide', 'Suicide', 'Suicide', 'Undetermined', 'Suicide', 'Accidental', 'Suicide', 'Suicide', 'Suicide', 'Suicide', 'Suicide', 'Homicide', 'Suicide', 'Suicide', 'Suicide', 'Homicide', 'Suicide']


In [49]:
# create a dictionary to hold the results
homicide_race_counts = {}

# loop through the 'races' list we created near the beginning (a list of the races for every gun death)
# check if the race is not already stored as a key in the new dictionary (if it isn't, add it as a new key with a '0' value)
# also loop through the indexes in the 'races' list and use that same index to look up the value in the new 'intents' list
# if a row in the 'intents' list contains a 'Homicide' value, increment the associated race key by one
for i,race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1

In [50]:
# show the dictionary
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'White': 9147,
 'Native American/Native Alaskan': 326,
 'Black': 19510,
 'Hispanic': 5634}

Similar to the above, we will now create a new dictionary that shows the proportion of gun deaths by race, having filtered by 'intent' (homicide)

In [51]:
# create a new dictionary
race_per_hundredk = {}

# loop through the key (race type) and its value (number of gun deaths) in the 'homicide_race_counts' dictionary we created above
# populate the new dictionary with the key (race type), and its value, which is equal to:
# the number of gun deaths / the value of key in the mapping dictionary (population count)
for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

In [52]:
# show the dictionary
race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'White': 4.6356417981453335,
 'Native American/Native Alaskan': 8.717729026240365,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914}

## Analysis

From the above, we can conclude that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories.