This is a simple exploratory data analysis of the gun deaths in the U.S. from year 2012 to 2014.

The dataset came from FiveThirtyEight and can be found at https://github.com/fivethirtyeight/guns-data

The main purpose of this exercise is to practice the basic syntax of Python on this dataset.

In [32]:
import csv

f = open("guns.csv")
data = list(csv.reader(f))

print(data[0:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [33]:
headers = data[0]

data = data[1:]

print(headers)
print(data[0:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


In [34]:
years = [row[1] for row in data]

year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
print(year_counts)

{'2013': 33636, '2014': 33599, '2012': 33563}


It seems like the gun deaths didn't change much from the year 2012 to 2014. Let's see if the gun deaths in the US change by month and year. 

In [35]:
import datetime

dates = [datetime.datetime(year = int(row[1]), 
                           month = int(row[2]), day = 1)
         for row in data]

date_counts = {}
for element in dates:
    if element in date_counts:
        date_counts[element] += 1
    else:
        date_counts[element] = 1

print(date_counts)

{datetime.datetime(2013, 8, 1, 0, 0): 2859, datetime.datetime(2012, 9, 1, 0, 0): 2852, datetime.datetime(2013, 11, 1, 0, 0): 2758, datetime.datetime(2012, 2, 1, 0, 0): 2357, datetime.datetime(2013, 12, 1, 0, 0): 2765, datetime.datetime(2014, 4, 1, 0, 0): 2862, datetime.datetime(2014, 12, 1, 0, 0): 2857, datetime.datetime(2014, 1, 1, 0, 0): 2651, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2012, 12, 1, 0, 0): 2791, datetime.datetime(2014, 3, 1, 0, 0): 2684, datetime.datetime(2014, 9, 1, 0, 0): 2914, datetime.datetime(2014, 5, 1, 0, 0): 2864, datetime.datetime(2013, 10, 1, 0, 0): 2808, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2012, 4, 1, 0, 0): 2795, datetime.datetime(2013, 6, 1, 0, 0): 2920, datetime.datetime(2013, 9, 1, 0, 0): 2742, datetime.datetime(2012, 1, 1, 0, 0): 2758, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2012, 7, 1, 0, 0): 3026, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2012, 5, 1, 0, 0): 2999, datet

Now let's explore the gun deaths by race and sex

In [36]:
sex_counts = {}
race_counts = {}
for row in data:
    if row[5] in sex_counts:
        sex_counts[row[5]] += 1
    else:
        sex_counts[row[5]] = 1
        
    if row[7] in race_counts:
        race_counts[row[7]] += 1
    else:
        race_counts[row[7]] = 1
        
print(sex_counts)
print(race_counts)

{'M': 86349, 'F': 14449}
{'Native American/Native Alaskan': 917, 'Black': 23296, 'Hispanic': 9022, 'Asian/Pacific Islander': 1326, 'White': 66237}


From these 2 dictionaries, it is clear that most of gun deaths occured with males and the highest occurence by race is White.

We explored gun deaths by race in the past screen. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

The data contains information on the total population of the US, as well as the total population of each racial group in the US.

In [37]:
f = open("census.csv")
census = list(csv.reader(f))

print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


Now that we have the census data, we want to find out the rate of gun deaths per racial group by 10,000 people.

However, we would realize that the variables for race in the census data are different from the ones in our original data. Hence, we would have to create a dictionary using the racial variable names in data as key, while manually mapping the values in census data to it.

In [38]:
census_val = census[1]
mapping = {}

mapping["Asian/Pacific Islander"] = census_val[14] + census_val[15]
mapping["Black"] = census_val[12]
mapping["Native American/Native Alaskan"] = census_val[13]
mapping["Hispanic"] = census_val[11]
mapping["White"] = census_val[10]

race_per_hundredk = {}

for key,value in race_counts.items():
    race_per_hundredk[key] = (int(value)/int(mapping[key]))*100000
    
print(race_per_hundredk)

{'White': 33.56849303419181, 'Black': 57.8773477735196, 'Hispanic': 20.220491210910907, 'Native American/Native Alaskan': 24.521955573811088, 'Asian/Pacific Islander': 8.746980714890115e-06}


In this result, we now noticed that Blacks have the highest number of gun deaths per 100,000 people.

We can filter our results, and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, we'll need to redo our work in generating race_counts, but only count rows where the intent was Homicide.

In [39]:
intents = [row[3] for row in data]
races = [row[7] for row in data]

homicide_race_per_hundredk = {}

for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race in homicide_race_per_hundredk:
            homicide_race_per_hundredk[race] += 1
        else:
            homicide_race_per_hundredk[race] = 1

for key,value in homicide_race_per_hundredk.items():
    homicide_race_per_hundredk[key] = (value/int(mapping[key]))*10000

print(homicide_race_per_hundredk)

{'Native American/Native Alaskan': 0.8717729026240364, 'Black': 4.847128498718095, 'Hispanic': 1.2627161104219913, 'White': 0.46356417981453335, 'Asian/Pacific Islander': 3.6874526543164205e-07}


After filtering for intent, we noticed that the highest proportion of gun deaths with homicide intent comes from the Black racial group.

Thanks for reading this. This is just a simple exploratory data analysis.

I hope to do more in the future with this data set such as

1) Figure out the link, if any, between month and homicide rate.

2) Explore homicide rate by gender.

3) Explore the rates of other intents, like Accidental, by gender and race.

4) Find out of gun death rates correlate to location and education.