In this project we will look at the total number of gun deaths by race in the US from 2012 to 2014 using the data set 'guns.csv'. Additionally we will also use the US census, found in the data set 'census.csv', to convert the total number of gun deaths to a 'rate of 100K people' to show the impact of the results.

In [1]:
# Open csv file and convert to a list of lists
import csv
f = open('guns.csv')
csvreader = csv.reader(f)
data = list(csvreader)

print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [2]:
# Remove the header from the data set using slicing techniques
header = data[0]
data = data[1:]
print(data[:5])

[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


Now that the data is in a more usable format, we can start to analyse it. The first column in the data set is the ID column as it increases by 1 for each list entry. 

Let's say we work for the US government and we want to see the effect that guns have on the country by investigating the total number of deaths each year. We can perform this by creating a dictionary, then keeping a count in the dictionary of how many times a gun related death occurs in the year column.

In [3]:
# Use a list comprehension instead of a for loop to condense the amount of code we need to write
years = [line[1] for line in data]

year_counts = {}
# Create a for loop to create a counter to populate the dictionary
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1

year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

From the initial investigation it can be noted that the amount of deaths related to firearms were relatively close from the years 2012 to 2014.

We can investigate further to see the gun deaths for each month of each year.

In [4]:
# Import the datetime module
import datetime

# Use a list comprehension to shorten code
# The datetime module contains a class also named datetime that represents points in time.
# The day is not specified in the data, this will be assigned a value of 1
dates = [datetime.datetime(year = int(line[1]), month = int(line[2]), day = 1) for line in data]

dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [5]:
date_counts = {}

for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

# Returns total gun related deaths for each month
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

We can also investigate the firearm deaths by sex and race. To do this we will use code similar to that used to investigate the gun deaths per year. Instead of writing the code seperately for both sex and race, we will write a generic function.

In [6]:
def counts(data, column):
     # coi = column of interest
    coi = [line[column] for line in data]
    
    num_counts = {}
    
    for item in coi:
        if item in num_counts:
            num_counts[item] += 1
        else:
            num_counts[item] = 1
    return num_counts

In [7]:
sex_counts = counts(data, 5)
race_counts = counts(data, 7)
print(sex_counts)
print(race_counts)

{'M': 86349, 'F': 14449}
{'Asian/Pacific Islander': 1326, 'White': 66237, 'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022}


From the investigation so far it can be noted that gun deaths in the US seem to disproportionately affect men as opposed to women. This disproportion is also observed when viewing the gun deaths by race, in this case a greater number of white individuals died due to firearms when compared with individuals of different races. 

This information isn't really useful unless we know the proportion of each race in the US. We have census data for the US from 2010 that can help in this regard. The 'census.csv' data contains information on the total population of the US, as well as the total population of each racial group in the US. 


In [8]:
f2 = open('census.csv')
csvreader2 = csv.reader(f2)
census = list(csvreader2)
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


First, we will manually create a dictionary of the total population of each race from the census data set above. We will then create another dictionary with each race as the key and the value being the rate of gun deaths per 100k people.

In [9]:
mapping = {}
# Asain and Pacifc Islander need to be added together as they are two seperate races in the census data set
mapping['Asian/Pacific Islander'] = 15159516 + 674625
mapping['Black'] = 40250635
mapping['Native American/Native Alaskan'] = 3739506
mapping['Hispanic'] = 44618105 
mapping['White'] = 197318956

race_per_hundredk = {}

# We can iterate both the key and the value in a dictionary using .items()
for race, deaths in race_counts.items():
    race_per_hundredk[race] = (deaths / mapping[race]) * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'White': 33.56849303419181,
 'Native American/Native Alaskan': 24.521955573811088,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907}

From this investigation we can see that Black individuals have the most gun related deaths per 100k people. We can filter our results, and restrict them to the Homicide intent. This will tell us the gun related muder rate for each race per 100k people.


In [10]:
intent = [line[3] for line in data]
races = [line[7] for line in data]

race_counts_homicide = {}
# Use the eumerate() function to loop through each index and value in the race column
for i, race in enumerate(races):
    if race not in race_counts_homicide:
        race_counts_homicide[race] = 0
    if intent[i] == 'Homicide':
        race_counts_homicide[race] += 1

In [11]:
homicide_race_per_hundredk = {}
# race = key(k) and deaths = value(v)
for k, v in race_counts_homicide.items():
    homicide_race_per_hundredk[k] = (v / mapping[k]) * 100000
    
homicide_race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'White': 4.6356417981453335,
 'Native American/Native Alaskan': 8.717729026240365,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914}

From this we can see that the homicide rate is highest among Black individuals.

Let's now complette a number of other investigations:

- **The link between the month and homicide rate.**

In [12]:
months = [line[2] for line in data]

homicides_per_month = {}

for i, month in enumerate(months):
    if month not in homicides_per_month:
        homicides_per_month[month] = 0
    if intent[i] == 'Homicide':
        homicides_per_month[month] += 1
        
print(homicides_per_month)

{'01': 2829, '02': 2178, '03': 2780, '04': 2845, '05': 2976, '06': 3130, '07': 3269, '08': 3125, '09': 2966, '10': 2968, '11': 2919, '12': 3191}


We can see that tthe number of Homicides trends upwards from Febuary until it reaches it peak in July. The summer months tend to have the highest number of homicides and Febuary contains the lowest homicides of all the months, ~33% lower than the peak month of July.

- **Homicide rate by gender.**

In [13]:
sex = [line[5] for line in data]

homicide_by_gender = {}

for i, gender in enumerate(sex):
    if gender not in homicide_by_gender:
        homicide_by_gender[gender] = 0
    if intent[i] == 'Homicide':
        homicide_by_gender[gender] += 1

        
homicide_by_gender

{'M': 29803, 'F': 5373}

In [14]:
homicide_rates_by_gender = {}

for k, v in homicide_by_gender.items():
    homicide_rates_by_gender[k] = v / sex_counts[k]
    
homicide_rates_by_gender

{'M': 0.34514586156180155, 'F': 0.3718596442660392}

From the calculation we can see that homicide rates by gender account for approximately 37% and 34% of the total gun related deaths of females and males respectively.

- **The rates of other intents by gender and race.**

In [15]:
# unique set of intents in the data set
unique_intent = set(intent)
unique_intent

{'Accidental', 'Homicide', 'NA', 'Suicide', 'Undetermined'}

In [16]:
# intent_req = string of required intent
def death_rates_by_race(data, intent_req):
    
    intent = [line[3] for line in data]
    races = [line[7] for line in data]

    death_counts = {}

    for i, race in enumerate(races):
        if race not in death_counts:
            death_counts[race] = 0
        if intent[i] == intent_req:
            death_counts[race] += 1
            
    death_rates_per_hundredk = {}

    for k, v in death_counts.items():
        death_rates_per_hundredk[k] = (v / mapping[k]) * 100000
    return death_rates_per_hundredk

In [17]:
accidental_deaths_by_race_hundredk = death_rates_by_race(data, 'Accidental')
accidental_deaths_by_race_hundredk

{'Asian/Pacific Islander': 0.07578560782046845,
 'White': 0.5736904466492313,
 'Native American/Native Alaskan': 0.5883130017708221,
 'Black': 0.814893976206835,
 'Hispanic': 0.3249801846133985}

In [18]:
na_deaths_by_race_hundredk = death_rates_by_race(data, 'NA')
na_deaths_by_race_hundredk

{'Asian/Pacific Islander': 0.0,
 'White': 0.0005067936807855399,
 'Native American/Native Alaskan': 0.0,
 'Black': 0.0,
 'Hispanic': 0.0}

In [19]:
suicide_deaths_by_race_hundredk = death_rates_by_race(data, 'Suicide')
suicide_deaths_by_race_hundredk

{'Asian/Pacific Islander': 4.705023152187416,
 'White': 28.06217969245692,
 'Native American/Native Alaskan': 14.841532544673013,
 'Black': 8.278130270491385,
 'Hispanic': 7.106980451097149}

In [20]:
undetermined_deaths_by_race_hundredk = death_rates_by_race(data, 'Undetermined')
undetermined_deaths_by_race_hundredk

{'Asian/Pacific Islander': 0.0631546731837237,
 'White': 0.2964743032595409,
 'Native American/Native Alaskan': 0.3743810011268868,
 'Black': 0.3130385396404305,
 'Hispanic': 0.16136947098044616}

In [21]:
# intent_req = string of required intent
def death_rates_by_gender(data, intent_req):
    
    intent = [line[3] for line in data]
    sex = [line[5] for line in data]

    death_counts = {}

    for i, gender in enumerate(sex):
        if gender not in death_counts:
            death_counts[gender] = 0
        if intent[i] == intent_req:
            death_counts[gender] += 1 
    death_rates = {}

    for k, v in death_counts.items():
        death_rates[k] = (v / sex_counts[k])
    return death_rates

In [22]:
accidental_deaths_by_gender = death_rates_by_gender(data, 'Accidental')
accidental_deaths_by_gender

{'M': 0.0164564731496601, 'F': 0.015087549311371028}

In [23]:
na_deaths_by_gender = death_rates_by_gender(data, 'NA')
na_deaths_by_gender

{'M': 1.1580910027909993e-05, 'F': 0.0}

In [24]:
suicide_deaths_by_gender = death_rates_by_gender(data, 'Suicide')
suicide_deaths_by_gender

{'M': 0.6309974637807039, 'F': 0.6013564952591874}

In [25]:
undetermined_deaths_by_gender = death_rates_by_gender(data, 'Undetermined')
undetermined_deaths_by_gender

{'M': 0.007388620597806575, 'F': 0.011696311163402312}