# Analyzing Gun Deaths in the United States (2012-2014)

This is the beginning of a project from Dataquest, in which I will be analyzing gun deaths in the United States. The dataset is from FiveThirtyEight.

The data contains information on gun deaths in the United States from 2012 to 2014.

The gist of the project is looking at demographic patterns of the victims within this dataset.

In [1]:
import csv # package to open a csv in python

# This is just bringing in the data
csvreader = csv.reader(open("guns.csv", "r"))
# Turn it into a list of lists
gun_data = list(csvreader)

# Printing out the first five rows to ensure they are there
print(gun_data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


## Removing the header from the data

This is stuff I would do in the background in the future, but just playing around with it now so its all out in the open.

Given that the first row is a header row, we want to parse it own so it doesn't get factored into any of our calculation.

In [2]:
# Extracting that row and caling it "headers"
headers = gun_data[0]
# Removing that row from the guns dataset
gun_data = gun_data[1:]

# Again, just diplaying header and the first five rows 
# of gun data to make sure everything is right
print(headers)
print(gun_data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Step 1: Number of Gun Deaths per Year

The first step in this analysis is just exploring how many gun deaths happened in each year. So I will get those counts, and display gun deaths by year.

In [3]:
# Extracting the year column from gun_data
# I made an empty list called years, and then for each
# row in the gun data set, append the year into years
years = []
for row in gun_data:
    years.append(row[1])
    
# Now we will turn this into a dictionary, where year is
# the key and the value is the number of deaths
year_counts = {}
# Populate the dictionary!
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:
        year_counts[year] += 1
        
# Now just printing year counts to see the number of deaths
# per year
print(year_counts)

{'2014': 33599, '2013': 33636, '2012': 33563}


## Step 2: Do Gun Deaths Change by Month?

Using some of the techniques I just recently learned with dates and creating datetime.datetime objects to analyze the gun deaths in a more granular way.

For instance, exploring the number of gun deaths by specific using the month column from the dataset.

In [4]:

# First step is to create the datetime objects
# This should convert the years and months into integers,
# and then datetime objects
import datetime

dates = []
for row in gun_data:
    dates.append(datetime.datetime(year=int(row[1]), month=int(row[2]), day=1))

print(dates[:10])

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 3, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


### How many gun deaths by month?

Exactly as the title implies: How many gun deaths occurred in the United States by month and year?

In [5]:
# Counting how many times each unique date occurs
# Same structure as my year counts from above

date_counts = {}
for date in dates:
    if date not in date_counts:
        date_counts[date] = 1
    else:
        date_counts[date] += 1
    
print(date_counts)

{datetime.datetime(2013, 8, 1, 0, 0): 2859, datetime.datetime(2014, 3, 1, 0, 0): 2684, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2014, 10, 1, 0, 0): 2865, datetime.datetime(2013, 10, 1, 0, 0): 2808, datetime.datetime(2013, 3, 1, 0, 0): 2862, datetime.datetime(2014, 12, 1, 0, 0): 2857, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2013, 9, 1, 0, 0): 2742, datetime.datetime(2013, 6, 1, 0, 0): 2920, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2012, 12, 1, 0, 0): 2791, datetime.datetime(2014, 5, 1, 0, 0): 2864, datetime.datetime(2012, 2, 1, 0, 0): 2357, datetime.datetime(2014, 7, 1, 0, 0): 2884, datetime.datetime(2013, 11, 1, 0, 0): 2758, datetime.datetime(2014, 4, 1, 0, 0): 2862, datetime.datetime(2013, 7, 1, 0, 0): 3079, datetime.datetime(2012, 1, 1, 0, 0): 2758, datetime.datetime(2013, 12, 1, 0, 0): 2765, datetime.datetime(2012, 11, 1, 0, 0): 2729, datetime.datetime(2012, 7, 1, 0, 0): 3026, dat

## Sex and Race

Now I am going to analyze the gun death data by sex and race. Specifically, how many of each reported sex and race were fatal victims of gun violence.

In [12]:
# Just counting the number of times each item appears in:
# 'sex'
sexes = [row[5] for row in gun_data]
    
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 1
    else:
        sex_counts[sex] += 1
print(sex_counts) # Printing again just to check

# race
race = [row[7] for row in gun_data]

race_counts = {}
for race in race:
    if race not in race_counts:
        race_counts[race] = 1
    else:
        race_counts[race] += 1
print(race_counts)

{'M': 86349, 'F': 14449}
{'Native American/Native Alaskan': 917, 'Black': 23296, 'White': 66237, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022}


That first bit shows us the number of male and female victims, as well as the number of victims by race.

### Sex

What I can see in these data is that there are approximately 6 times as many men killed by gun than there are women killed by gun.

### Race

Race differences also present a pretty striking pattern, in a way. The vast majority of the victims were white (66,237) followed by black (22,296), and then hispanic (9,022). Though still a pretty stark constrast.

One interesting other thing to explore would be the interaction between the two - how many of these gun deaths were black men versus white men? black women versus white women?

In [7]:
# Let's get us some census data
csvreader = csv.reader(open("census.csv", "r"))
# Turn it into a list of lists
census = list(csvreader)

# Printing out the first five rows to ensure they are there
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


## Gun Deaths by race, per 100,000 People

The census data that I read in above gives me the number of people who belong to each racial group in the United States.

In [8]:
# Manually create a mapping dictionary to allow the two 
# datasets to talk
mapping = {
    'Asian/Pacific Islander': 15159516 + 674625,
    'Native American/Native Alaskan': 3739506,
    'Black': 40250635,
    'Hispanic': 44618105,
    'White': 197318956
}

print(mapping)
print(race_counts)

# Empty dictionary for our per 100,000 counts
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = (race_counts[race] / float(mapping[race])) * 100000
print(race_per_hundredk)

{'Native American/Native Alaskan': 3739506, 'Black': 40250635, 'White': 197318956, 'Asian/Pacific Islander': 15834141, 'Hispanic': 44618105}
{'Native American/Native Alaskan': 917, 'Black': 23296, 'White': 66237, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022}
{'Native American/Native Alaskan': 24.521955573811088, 'Asian/Pacific Islander': 8.374309664161762, 'White': 33.56849303419181, 'Black': 57.8773477735196, 'Hispanic': 20.220491210910907}


###Summary of the Race by 100,000 people data

Now, we've learned a lot more by looking per capita. While when we look at absolute numbers, white people are killed FAR more frequently than black and hispanic people. However, there are nearly 200 million white people in the United States relative to the approximately 40 million black people.

When we look at gun deaths per 100,000 black people are far and away the highest with approximately 58 deaths per 100,000 relative to the white 34 deaths per 100,000.

Lastly, lets take a look at intent.

##Gun deaths by Homicide intent

Now, I am going to filter out all of the results that are not death by homicide. So, we are only going to look at homicide deaths.

In [9]:
# First we pull out the column of intents so we can work 
# with it
intents = []
for row in gun_data:
    intents.append(row[3])
    
# Now the same with races
races = []
for row in gun_data:
    races.append(row[7])
    
# Empty dictionary for the homicide by races
homicide_race_counts = {}
for idx, race in enumerate(races):
    if intents[idx] == 'Homicide' and race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[idx] == 'Homicide' and race in homicide_race_counts:
        homicide_race_counts[race] += 1

# Just printing again to make sure that it worked
print(homicide_race_counts)

{'Native American/Native Alaskan': 326, 'Black': 19510, 'White': 9147, 'Asian/Pacific Islander': 559, 'Hispanic': 5634}


Again, not the most meaningful without scaling to be per 100,000. So doing that now.

In [10]:
# Empty dictionary to store the results
homicides_per_hundredk = {}

for race in homicide_race_counts:
    homicides_per_hundredk[race] = (homicide_race_counts[race]/float(mapping[race]))*100000
print(homicides_per_hundredk)

{'Native American/Native Alaskan': 8.717729026240365, 'Asian/Pacific Islander': 3.530346230970155, 'White': 4.6356417981453335, 'Black': 48.471284987180944, 'Hispanic': 12.627161104219914}


# Summary and Next Steps

I went through the gun data from FiveThirtyEight, analyzing the number of (a) total and (b) homicide gun deaths by race (per 100,000 people).

In both cases, Black people had far and away the highest gun death rate, at 57 people per 100,000 total, and 48 homicides per 100,000.

Next I would like to look at the interaction between race and gender, suicides by gender, gun deaths at home by gender, gun deaths by education leavel, and gun deaths by age.

# Gun Deaths by Race and Gender

Ideally, want to get the rates for men and women of each race.

In [13]:
# Compute an interaction between the two
# I already have a list of sexes and races from previous analyses
# So I'm going to utilize those, and use the enumerate function to get
# Make an interaction and store it in a new dict
# lists are called 'sexes' and 'races'
    
# Manually created dictionary for the homicide by races
# I want to specify the keys here
sex_race_counts = {
    'Asian/Pacific Islander - Men': 0,
    'Asian/Pacific Islander - Women': 0,
    'Native American/Native Alaskan - Men': 0,
    'Native American/Native Alaskan - Women': 0,
    'Black - Men': 0,
    'Black - Women': 0,
    'Hispanic - Men': 0,
    'Hispanic - Women': 0,
    'White - Men': 0,
    'White - Women': 0
}

# Populating the dictionary is going to have to be a whole series of if statements.
# For example, if white and men, up that count by 1. If hispanic and women, += 1
for idx, sex in enumerate(sexes):
    if sexes[idx] == 'M' and races[idx] == 'Asian/Pacific Islander':
        sex_race_counts['Asian/Pacific Islander - Men'] += 1
    if sexes[idx] == 'F' and races[idx] == 'Asian/Pacific Islander':
        sex_race_counts['Asian/Pacific Islander - Women'] += 1
    if sexes[idx] == 'M' and races[idx] == 'Native American/Native Alaskan':
        sex_race_counts['Native American/Native Alaskan - Men'] += 1
    if sexes[idx] == 'F' and races[idx] == 'Native American/Native Alaskan':
        sex_race_counts['Native American/Native Alaskan - Women'] += 1
    if sexes[idx] == 'M' and races[idx] == 'Black':
        sex_race_counts['Black - Men'] += 1
    if sexes[idx] == 'F' and races[idx] == 'Black':
        sex_race_counts['Black - Women'] += 1
    if sexes[idx] == 'M' and races[idx] == 'Hispanic':
        sex_race_counts['Hispanic - Men'] += 1
    if sexes[idx] == 'F' and races[idx] == 'Hispanic':
        sex_race_counts['Hispanic - Women'] += 1
    if sexes[idx] == 'M' and races[idx] == 'White':
        sex_race_counts['White - Men'] += 1
    if sexes[idx] == 'F' and races[idx] == 'White':
        sex_race_counts['White - Women'] += 1

# Just printing again to make sure that it worked
print(sex_race_counts)

{'Hispanic - Men': 7949, 'Native American/Native Alaskan - Women': 127, 'Native American/Native Alaskan - Men': 790, 'White - Men': 55550, 'Asian/Pacific Islander - Women': 244, 'Black - Women': 2318, 'Asian/Pacific Islander - Men': 1082, 'White - Women': 10687, 'Hispanic - Women': 1073, 'Black - Men': 20978}


Okay, the code above works to calculate what I want! It's just not the most elegant. The for loop has to have about a million different statements. Maybe I will try to clean it up in the future. But for now I'll stick with it.

Without calculating per 100,000 people (yet), its pretty clear to see men are victims of gun violence drastically more frequently than women, regardless of race.

# Suicide by Gun in Men and Women

You often here talk about men committing suicide far more frequently than women. We can't completely answer this question with these data, because it is only gun violence, but we can answer this question in the context of guns.

Do men commit more suicide by gun than women?

In [14]:
# For this we can access the list we already created up above: Intents.
# But we need a list populated with genders
sexes = []
for row in gun_data:
    sexes.append(row[5])
    
# Print a couple rows just to verify we have sexes
print(sexes[:10])
    
suicide_sex_counts = {}
for idx, sex in enumerate(sexes):
    if intents[idx] == 'Suicide' and sex not in suicide_sex_counts:
        suicide_sex_counts[sex] = 0
    if intents[idx] == 'Suicide' and sex in suicide_sex_counts:
        suicide_sex_counts[sex] += 1

# Just printing again to make sure that it worked
print(suicide_sex_counts)


['M', 'F', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'M']
{'M': 54486, 'F': 8689}


Okay, so it seems there are far more men who commit suicide by gun, total. But again. I see no need here to do any sort of weighting by population. Given the most basic of probabilities, the United states should the 50/50 men and women.

So these data show that men commit suicide by gun approximately 6.3 times more often than women.

# Gun Deaths by Education Level

Within this data set, there is a variable that notes the education level of the victim. This variable has five levels:

1 - Less than high school
2 - Graduated from high school or equivalent
3 - Some college
4 - At least graduated from college
5 - Not Available

Here, I want to remove people who we don't have education data for, because it just muddies the waters. Lets remove them and just look at counts again - gun deaths by education level.

In [33]:
# Make a quick list of educations, getting rid of the NAs and the 5s
education = []

for row in gun_data:
    if row[10] != '5' and row[10] != 'NA':
        education.append(row[10])

# Make our empty dictionary in which we'll store this information
educ_counts = {}

# Populate the dictionary!
for level in education:
    if level not in educ_counts:
        educ_counts[level] = 1
    else:
        educ_counts[level] += 1
        
print(educ_counts)

{'1': 21823, '3': 21680, '2': 42927, '4': 12946}


As I suspected, as eduction goes up, the number of deaths by gun goes down (kind of). Really, it just seems like less gun deaths for college grads. Interestingly, it seems like the most death by gun was in the 'high school grad' category.

I suspect this is because the proportion of the population that hasn't graduated high school is a lot lower than the proportion of the population that has. So I suspect if you had data to make a set of anchors here (e.g., to calc deaths per 100,000), we would see a linear(ish) decrease. 

As such, these numbers are generally kind of impossible to interpret because I don't know anyhing about the population (e.g., likely that college grads are a small population, just like people who haven't graduated high school).

# Summary!

That's all for now! 

I've analyzed the demographic data in a handful of different ways. We've seen that African American's are the most likely to be killed by gun, both generally and via homicide. We've also seen that men commit suicide by gun approximately 6 times more often than women do. Lastly, we also gleaned a little bit of info about gun death by education level but it's hard to make any sense of those data without general population rates. 

Maybe I will revisit this data set once I get a little bit more comfortable with data visualizations. Then I can do something a little bit more exciting with this notebook. Right now, the insights are interesting but there is no useful and engaging visualizations. Stay tuned!