## Exploring gun deaths in the US

In this project, I'll be working with data on gun deaths in the US from 2012 to 2014. The dataset comes from FiveThirtyEight. I'll take a brief look at how number of deaths changes between the years, and whether there is a difference in the numbers of men and women whose deaths are gun-related. However, I'll primarily be focusing on gun deaths by race, and breaking down the related statistics.

In [1]:
import csv
file = open("guns.csv", "r")
data = list(csv.reader(file))

In [2]:
headers = data[0]
data = data[1:]

In [3]:
years = [row[1] for row in data]
year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 0
    year_counts[year] += 1
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

In [4]:
import datetime
dates = [datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1) for row in data]

In [5]:
date_counts = {}
for date in dates:
    if date not in date_counts:
        date_counts[date] = 0
    date_counts[date] += 1

In [6]:
sexes = [row[5] for row in data]
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 0
    sex_counts[sex] += 1
sex_counts

{'M': 86349, 'F': 14449}

In [7]:
races = [row[7] for row in data]
race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 0
    race_counts[race] += 1

race_counts

{'Asian/Pacific Islander': 1326,
 'White': 66237,
 'Native American/Native Alaskan': 917,
 'Black': 23296,
 'Hispanic': 9022}

## First Thoughts ##

We can see from the above that there is little change in the number of gun deaths from year to year.

There are many more male victims, and knowing that the sex ratio of males to females in the US is approximately 1:1, we can conclude that the victims are disproportionately male.

There are many more white victims than those of other races. However, we require further figures to determine whether or not these figures are proportional to these races as a percentage of the population - a more noteworthy statistic. As gun death rates are typically presented as per 100,000, we need to use the above numbers as well as statistics on the racial make-up of the US population to calculate the number of gun deaths per 100,000 people for each race.

In [8]:
file = open("census.csv", "r")
census = list(csv.reader(file))

In [9]:
mapping = {"Asian/Pacific Islander": (int(census[1][14]) + int(census[1][15])), "Black": int(census[1][12]), "Native American/Native Alaskan": int(census[1][13]), "Hispanic": int(census[1][11]), "White": int(census[1][10])}
# Collecting data on the US population partitioned by race.
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Native American/Native Alaskan': 3739506,
 'Hispanic': 44618105,
 'White': 197318956}

In [10]:
race_per_hundredk = {}
for key in race_counts:
    race_per_hundredk[key] = (race_counts[key]/mapping[key])*100000
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'White': 33.56849303419181,
 'Native American/Native Alaskan': 24.521955573811088,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907}

So, we can now see that, per 100,000, the number of gun deaths relative to race are quite different. While whites were victims of the highest number of gun deaths, blacks are victims of the highest number of gun deaths per 100,000 people of their race, with 57.88 deaths per 100,000. Whites are still high on this figure, however, with 33.57 deaths per 100,000.

We can learn more about these figures by investigating how many of these deaths are due to homicides.

In [11]:
intents = [row[3] for row in data]

race_counts_for_homicides = {}
for index, race in enumerate(races):
    if race not in race_counts_for_homicides:
        race_counts_for_homicides[race] = 0
    if intents[index] == "Homicide":
            race_counts_for_homicides[race] += 1
            
race_sorted_homicides_per_hundredk = {}
for key, value in race_counts_for_homicides.items():
    race_sorted_homicides_per_hundredk[key] = (value/mapping[key])*100000
race_sorted_homicides_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'White': 4.6356417981453335,
 'Native American/Native Alaskan': 8.717729026240365,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914}

These numbers are drastically different for most races, but not so for blacks. Let's calculate the percentage of gun deaths due to homicides for each race.

In [12]:
percentage_homicide_deaths = {}
for key, value in race_counts_for_homicides.items():
    percentage = (value/race_counts[key])*100
    percentage_homicide_deaths[key] = round(percentage, 2)
percentage_homicide_deaths

{'Asian/Pacific Islander': 42.16,
 'White': 13.81,
 'Native American/Native Alaskan': 35.55,
 'Black': 83.75,
 'Hispanic': 62.45}

We can now see that a huge proportion of blacks whose deaths are related to guns die due to homicides. Let's sort the numbers above to see how the percentage for each race compares to that of others. We'll also define some functions for if we want to see just the highest or lowest percentage.

In [13]:
def sorted_gun_deaths(dictionary):
    return(sorted(dictionary.items(), key=lambda x: x[1], reverse = True))

In [14]:
sorted_gun_deaths(percentage_homicide_deaths)

[('Black', 83.75),
 ('Hispanic', 62.45),
 ('Asian/Pacific Islander', 42.16),
 ('Native American/Native Alaskan', 35.55),
 ('White', 13.81)]

In [15]:
def highest_gun_deaths(dictionary):
    highest = None
    for key, value in dictionary.items():
        if highest == None or value > highest[1]:
            highest = (key, value)
    return(highest)

highest_gun_deaths(percentage_homicide_deaths)

('Black', 83.75)

In [16]:
def lowest_gun_deaths(dictionary):
    lowest = None
    for key, value in dictionary.items():
        if lowest == None or value < lowest[1]:
            lowest = (key, value)
    return(lowest)

lowest_gun_deaths(percentage_homicide_deaths)

('White', 13.81)

Since whites are the race that are the victims of the smallest percentage of gun homicides, let's find out the reason for the rest of the deaths. First, we'll create a new list of just the intent column when the race is white. Then, we'll produce a list of the unique intents. Then, we'll find out how often each of them occurs. Finally, we'll calculate how often each intent occurs as a percentage of total deaths.

In [17]:
white_deaths = [row[3] for row in data if row[7] == "White"]

In [18]:
intents_white_deaths = list(set(white_deaths))
intents_white_deaths

['Suicide', 'NA', 'Homicide', 'Undetermined', 'Accidental']

In [19]:
intent_count = {}
for intent in white_deaths:
    if intent not in intent_count:
        intent_count[intent] = 0
    intent_count[intent] += 1
intent_count

{'Suicide': 55372,
 'Undetermined': 585,
 'Accidental': 1132,
 'Homicide': 9147,
 'NA': 1}

In [20]:
intents_percentage = {}
for key, value in intent_count.items():
    percentage = (value/(len(white_deaths)))*100
    intents_percentage[key] = round(percentage, 2)
intents_percentage

{'Suicide': 83.6,
 'Undetermined': 0.88,
 'Accidental': 1.71,
 'Homicide': 13.81,
 'NA': 0.0}

In [21]:
sorted_intents_percentage = sorted_gun_deaths(intents_percentage)
sorted_intents_percentage

[('Suicide', 83.6),
 ('Homicide', 13.81),
 ('Accidental', 1.71),
 ('Undetermined', 0.88),
 ('NA', 0.0)]

Thus, we see that suicide is by far the leading intent in white gun deaths.