# American Gun Deaths

In [1]:
import csv
import datetime
f = open("full_data.csv", "r")
reader = csv.reader(f)
data = list(reader)
headers = data[0]
data = data[1:]

First I wanted to get a general sense of the data as a whole.  In the code below, I am determining the number of gun deaths for each year between 2012 and 2014:

In [2]:
years = []
for row in data:
    years.append(row[1])
    
year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:
        year_counts[year] += 1

year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

There appears to be no substantial difference within this year range in terms of gun deaths nationally per year.  The range of values is less than 100 deaths per year, or 0.3%.  This lets us know that we have a uniform distribution of gun deaths over time, at least at this level of granularity.

The next item I chose to investigate was biological sex:

In [3]:
sex_counts = {}
for row in data:
    sex = row[5]
    if sex not in sex_counts:
        sex_counts[sex] = 1
    else:
        sex_counts[sex] += 1
sex_counts

{'F': 14449, 'M': 86349}

Apparently there are only 17% as many women and girls killed by guns as there are men and boys during this timeframe.



In [4]:
race_counts = {}
for row in data:
    race = row[7]
    if race not in race_counts:
        race_counts[race] = 1
    else:
        race_counts[race] += 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

The largest racial group in the gun deaths dataset is White (66,237), followed by Black (23,296).  Hispanic is a distant third, at 9,022 deaths.  This may in part be a result of the overall ethnic makeup of the United States.  Next I chose to get a better handle on rates of death by race per 100,000, which is more significant than total mortality.

To do this, I added total US census data from the 2010 census.

In [5]:
f = open("census.csv", "r")
reader2 = csv.reader(f)
census = list(reader2)

In [6]:
total_pop_by_race = {
    "Asian/Pacific Islander": census[1][14] + census[1][15],
    "Black": census[1][12],
    "Native American/Native Alaskan": census[1][13],
    "Hispanic": census[1][11],
    "White": census[1][10]
}
total_pop_by_race    

{'Asian/Pacific Islander': '15159516674625',
 'Black': '40250635',
 'Hispanic': '44618105',
 'Native American/Native Alaskan': '3739506',
 'White': '197318956'}

In [34]:
deaths_by_race_per_hundredk = {}
for race, count in (race_counts.items()):    
    deaths_by_race_per_hundredk[race] = float(count)*100000/float(total_pop_by_race[race])

print("Gun Deaths by Race Per 100,000:")
sorted(deaths_by_race_per_hundredk.items(), key=lambda x: x[1], reverse=True)
                            

Gun Deaths by Race Per 100,000:


[('Black', 57.877347773519595),
 ('White', 33.56849303419181),
 ('Native American/Native Alaskan', 24.521955573811088),
 ('Hispanic', 20.220491210910907),
 ('Asian/Pacific Islander', 8.746980714890115e-06)]

At this point I became curious as to how many of these deaths were ruled Suicide, Homicide, and Accidental:

In [35]:
intents = []
for row in data:
    intents.append(row[3])

In [36]:
races = []
for row in data:
    races.append(row[7])

In [37]:
homicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race not in homicide_race_counts:
            homicide_race_counts[race] = 0
        homicide_race_counts[race] += 1
sorted(homicide_race_counts.items(), key=lambda x: x[1], reverse=True)

[('Black', 19510),
 ('White', 9147),
 ('Hispanic', 5634),
 ('Asian/Pacific Islander', 559),
 ('Native American/Native Alaskan', 326)]

## Some Juicy Findings Regarding Total Gun Deaths vs Homicides by Race

In [38]:
homicide_race_counts_per_hundredk = {}
for key, value in (homicide_race_counts.items()):    
    homicide_race_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_homicides = sorted(homicide_race_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("Homicides per 100,000 By Race:")
print('')
for race, count in sorted_homicides:
    print("{} - {} homicides per 100,000".format(race, count))
    print('')

Homicides per 100,000 By Race:

Black - 48.4712849872 homicides per 100,000

Hispanic - 12.6271611042 homicides per 100,000

Native American/Native Alaskan - 8.71772902624 homicides per 100,000

White - 4.63564179815 homicides per 100,000

Asian/Pacific Islander - 3.68745265432e-06 homicides per 100,000



I was super interested to find that an overwhelming majority of Black gun deaths were homicides during this period (83%), and that a large majority of Hispanic gun deaths were, as well (65%).  Conversely, only 15% of White gun deaths were homicides.  Native American/Native Alaskan and Asian/Pacific Islander gun deaths were 36% and 50%, respectively.

This surprised me a lot, so I wanted to see how the data for suicides matched up:

In [30]:
suicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Suicide':
        if race not in suicide_race_counts:
            suicide_race_counts[race] = 0
        suicide_race_counts[race] += 1

## White Americans as Likely to Commit Suicide by Gun as African Americans to Die by Gun in Homicide

In [32]:
suicide_race_counts_per_hundredk = {}
for key, value in (suicide_race_counts.items()):    
    suicide_race_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_suicides = sorted(suicide_race_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("Suicides per 100,000 By Race:")
print('')
for race, count in sorted_suicides:
    print("{} - {} Suicides per 100,000".format(race, count))
    print('')

Suicides per 100,000 By Race:

White - 28.0621796925 Suicides per 100,000

Native American/Native Alaskan - 14.8415325447 Suicides per 100,000

Black - 8.27813027049 Suicides per 100,000

Hispanic - 7.1069804511 Suicides per 100,000

Asian/Pacific Islander - 4.9144047003e-06 Suicides per 100,000



Interestingly, 84% of White gun deaths were suicides - the same proportion of Black gun deaths that were homicides.  60% of Native American/Native Alaskan gun deaths were also ruled suicide.  Only 14% of Black gun deaths were suicides.  Basically, the homicide:suicide ratio for Black and White Americans is a mirror image.  That's a pretty stark finding.

It would appear that Accidental deaths are going to be a small part of this overall data, but I wanted to follow through and check up on this:

## Accidental Gun Deaths at Least Twice as Common for Black, White, and Native Americans as Hispanic Americans

In [43]:
accidental_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Accidental':
        if race not in accidental_race_counts:
            accidental_race_counts[race] = 0
        accidental_race_counts[race] += 1
sorted(accidental_race_counts.items(), key=lambda x: x[1], reverse=True)

[('White', 1132),
 ('Black', 328),
 ('Hispanic', 145),
 ('Native American/Native Alaskan', 22),
 ('Asian/Pacific Islander', 12)]

In [44]:
accidental_race_counts_per_hundredk = {}
for key, value in (accidental_race_counts.items()):    
    accidental_race_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_accidents = sorted(accidental_race_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("Accidents per 100,000 By Race:")
print('')
for race, count in sorted_accidents:
    print("{} - {} Accidents per 100,000".format(race, count))
    print('')

Accidents per 100,000 By Race:

Black - 0.814893976207 Accidents per 100,000

Native American/Native Alaskan - 0.588313001771 Accidents per 100,000

White - 0.573690446649 Accidents per 100,000

Hispanic - 0.324980184613 Accidents per 100,000

Asian/Pacific Islander - 7.91581965148e-08 Accidents per 100,000



As it turns out, Black Americans are also more than twice as likely to die from a gun accident as Hispanic Americans - even though Hispanic Americans are second most likely to be homicide victims.  I have no guesses currently as to why that might be, but maybe other factors like level of education or geography are coming into play.  

Native American/Native Alaskan and White American populations are also roughly twice as likely as Hispanic Americans to die as a result of a firearms-related accident.  Accidental gun death is basically not at all a factor for the Asian/Pacific Islander ethnicity, comparatively.

## Education and Gun Deaths in the United States

In [45]:
education = []
for row in data:
    education.append(row[10])

In [50]:
education_counts = {}
for edu_level in education:
    if edu_level not in education_counts:
        education_counts[edu_level] = 1
    else:
        education_counts[edu_level] += 1
print("Gun Deaths by Education Level:")
sorted(education_counts.items(), key=lambda x: x[1], reverse=True)

Gun Deaths by Education Level:


[('HS/GED', 42927),
 ('Less than HS', 21823),
 ('Some college', 21680),
 ('BA+', 12946),
 ('NA', 1422)]

In [53]:
hs_deaths_by_race = {}
for row in data:
    ed = row[10]
    race = row[7]
    if ed == 'HS/GED':
        if race not in hs_deaths_by_race:
            hs_deaths_by_race[race] = 0
        hs_deaths_by_race[race] += 1
print("High School Educated Gun Deaths by Race:")
sorted(hs_deaths_by_race.items(), key=lambda x: x[1], reverse=True)

High School Educated Gun Deaths by Race:


[('White', 28266),
 ('Black', 10686),
 ('Hispanic', 3206),
 ('Native American/Native Alaskan', 392),
 ('Asian/Pacific Islander', 377)]

In [56]:
hs_counts_per_hundredk = {}
for key, value in (hs_deaths_by_race.items()):    
    hs_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_hs = sorted(hs_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("High School Educated Gun Deaths per 100,000 By Race:")
print('')
for race, count in sorted_hs:
    print("{} - {} Gun Deaths per 100,000".format(race, count))
    print('')

High School Educated Gun Deaths per 100,000 By Race:

Black - 26.5486494809 Gun Deaths per 100,000

White - 14.3250301811 Gun Deaths per 100,000

Native American/Native Alaskan - 10.4826680316 Gun Deaths per 100,000

Hispanic - 7.18542394393 Gun Deaths per 100,000

Asian/Pacific Islander - 2.48688667384e-06 Gun Deaths per 100,000



In [57]:
no_hs_deaths_by_race = {}
for row in data:
    ed = row[10]
    race = row[7]
    if ed == 'Less than HS':
        if race not in no_hs_deaths_by_race:
            no_hs_deaths_by_race[race] = 0
        no_hs_deaths_by_race[race] += 1
print("Gun Deaths by Race - Individuals Without High School or GED:")
sorted(no_hs_deaths_by_race.items(), key=lambda x: x[1], reverse=True)

Gun Deaths by Race - Individuals Without High School or GED:


[('White', 9892),
 ('Black', 7572),
 ('Hispanic', 3897),
 ('Native American/Native Alaskan', 264),
 ('Asian/Pacific Islander', 198)]

In [68]:
no_hs_counts_per_hundredk = {}
for key, value in (no_hs_deaths_by_race.items()):    
    no_hs_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_no_hs = sorted(no_hs_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("Below High School Educated Gun Deaths per 100,000 By Race:")
print('')
for race, count in sorted_no_hs:
    print("{} - {} Gun Deaths per 100,000".format(race, count))
    print('')

Below High School Educated Gun Deaths per 100,000 By Race:

Black - 18.8121255727 Gun Deaths per 100,000

Hispanic - 8.73412261682 Gun Deaths per 100,000

Native American/Native Alaskan - 7.05975602125 Gun Deaths per 100,000

White - 5.01320309033 Gun Deaths per 100,000

Asian/Pacific Islander - 1.30611024249e-06 Gun Deaths per 100,000



In [64]:
ba_deaths_by_race = {}
for row in data:
    ed = row[10]
    race = row[7]
    if ed == 'BA+':
        if race not in ba_deaths_by_race:
            ba_deaths_by_race[race] = 0
        ba_deaths_by_race[race] += 1
print("College Educated Gun Deaths by Race:")
sorted(ba_deaths_by_race.items(), key=lambda x: x[1], reverse=True)

College Educated Gun Deaths by Race:


[('White', 11317),
 ('Black', 820),
 ('Hispanic', 400),
 ('Asian/Pacific Islander', 369),
 ('Native American/Native Alaskan', 40)]

In [69]:
ba_counts_per_hundredk = {}
for key, value in (ba_deaths_by_race.items()):    
    ba_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_ba = sorted(ba_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("College Educated Gun Deaths per 100,000 By Race:")
print('')
for race, count in sorted_ba:
    print("{} - {} Gun Deaths per 100,000".format(race, count))
    print('')

College Educated Gun Deaths per 100,000 By Race:

White - 5.73538408545 Gun Deaths per 100,000

Black - 2.03723494052 Gun Deaths per 100,000

Native American/Native Alaskan - 1.06966000322 Gun Deaths per 100,000

Hispanic - 0.896497061002 Gun Deaths per 100,000

Asian/Pacific Islander - 2.43411454283e-06 Gun Deaths per 100,000



In [66]:
some_college_deaths_by_race = {}
for row in data:
    ed = row[10]
    race = row[7]
    if ed == 'Some college':
        if race not in some_college_deaths_by_race:
            some_college_deaths_by_race[race] = 0
        some_college_deaths_by_race[race] += 1
print("Gun Deaths by Race - Individuals with Some College:")
sorted(some_college_deaths_by_race.items(), key=lambda x: x[1], reverse=True)

Gun Deaths by Race - Individuals with Some College:


[('White', 15803),
 ('Black', 3947),
 ('Hispanic', 1377),
 ('Asian/Pacific Islander', 346),
 ('Native American/Native Alaskan', 207)]

In [67]:
some_college_counts_per_hundredk = {}
for key, value in (some_college_deaths_by_race.items()):
    some_college_counts_per_hundredk[key] = float(value)*100000/float(total_pop_by_race[key])
sorted_some_college = sorted(some_college_counts_per_hundredk.items(), key=lambda x: x[1], reverse=True)

print("Gun Deaths per 100,000 by Race - Individuals with Some College:")
print('')
for race, count in sorted_some_college:
        print("{} - {} Gun Deaths per 100,000".format(race, count))
        print('')

Gun Deaths per 100,000 by Race - Individuals with Some College:

Black - 9.80605647588 Gun Deaths per 100,000

White - 8.00886053745 Gun Deaths per 100,000

Native American/Native Alaskan - 5.53549051666 Gun Deaths per 100,000

Hispanic - 3.0861911325 Gun Deaths per 100,000

Asian/Pacific Islander - 2.28239466618e-06 Gun Deaths per 100,000



Across all educational levels except BA+, Black Americans are most likely to die a gun-related death - including Black Americans with some college. There are almost 3x as many gun-related deaths for White Americans with BA+ as Black Americans with BA+.  This led me further questions about the prevalence of various education levels in the United States - both on the whole, and broken down by race.  An obvious explanation for the preponderance of White, college educated gun deaths is that there are likely more White Americans who complete college.  

To test this, I obtained some additional data from the <a href="https://www.census.gov/data/tables/2016/demo/education-attainment/cps-detailed-tables.html">US Census Bureau</a> regarding educational attainment by Americans from the 2016 census.  Please note that I included Associates degrees, both occupational and academic, as a completed BA, since the original gun violence data set doesn't include as many subcategories, and it seemed inaccurate to call those "Some college".

## Adding Racial Data to the Education and Gun Deaths Equation

In [76]:
educational_attainment = {
    "BA+": 98521000,
    "HS/GED": 70882000,
    "Less than HS": 27837000,
    "Some college": 46750000,
    "NA": 815000
}
education_counts

{'BA+': 12946,
 'HS/GED': 42927,
 'Less than HS': 21823,
 'NA': 1422,
 'Some college': 21680}

In [92]:
ed_deaths_per_hundredk = {}
for key, value in (education_counts.items()):    
    ed_deaths_per_hundredk[key] = float(value)/float(educational_attainment[key])*100000
sorted_ed = sorted(ed_deaths_per_hundredk.items(), key=lambda x: x[1], reverse=True)
print("Gun Deaths per 100,000:")
print('')
for ed, count in sorted_ed:
        print("{} - {} Gun Deaths per 100,000".format(ed, count))
        print('')



Gun Deaths per 100,000:

NA - 174.478527607 Gun Deaths per 100,000

Less than HS - 78.3956604519 Gun Deaths per 100,000

HS/GED - 60.5612144127 Gun Deaths per 100,000

Some college - 46.3743315508 Gun Deaths per 100,000

BA+ - 13.1403457131 Gun Deaths per 100,000



For the total American population, data suggests that the less education an individual has, the higher the likelihood of death by gun.  There seem to be NAs in the gun death education level data at a much greater rate than in the census data.  For whatever reason, the gun death data collection methods left a lot of blanks compared to the Census.  However, the NAs under Education are only about 1% of the gun violence data, so I've chosen not to do anything about them.

In [89]:
white_education = {
    "Less than HS": 21132000,
    "HS/GED": 55625000,
    "Some college": 35880000,
    "BA+": 78825000,
    "NA": 526000
}

In [90]:
white_american_deaths = {}
for row in data:
    ed = row[10]
    race = row[7]
    if race == 'White':
        if ed not in white_american_deaths:
            white_american_deaths[ed] = 0
        white_american_deaths[ed] += 1
print("White American Gun Deaths by Education:")
sorted(white_american_deaths.items(), key=lambda x: x[1], reverse=True)

White American Gun Deaths by Education:


[('HS/GED', 28266),
 ('Some college', 15803),
 ('BA+', 11317),
 ('Less than HS', 9892),
 ('NA', 959)]

In [95]:
white_ed_deaths_per_hundredk = {}
for key, value in (white_american_deaths.items()):    
    white_ed_deaths_per_hundredk[key] = float(value)/float(white_education[key])*100000
sorted_ed_white = sorted(white_ed_deaths_per_hundredk.items(), key=lambda x: x[1], reverse=True)
print("White Gun Deaths per 100,000:")
print('')
for ed, count in sorted_ed_white:
        print("{} - {} Gun Deaths per 100,000".format(ed, count))
        print('')

White Gun Deaths per 100,000:

NA - 182.319391635 Gun Deaths per 100,000

HS/GED - 50.8152808989 Gun Deaths per 100,000

Less than HS - 46.8105243233 Gun Deaths per 100,000

Some college - 44.0440356745 Gun Deaths per 100,000

BA+ - 14.357120203 Gun Deaths per 100,000



For white Americans, the number of gun deaths per 100,000 for those with 1) less than a high school education, or 2) high school or GED completed, are significantly lower than for the overall population - about 40% lower than the general population for white Americans without high school, and 16% lower for white Americans with high school or GED.  There's also less of a total difference than among the total population for white Americans who have less than a BA - the range is only 6 individuals per 100,000 across all 3 categories.

In [79]:
black_education = {
    "Less than HS": 4200000,
    "HS/GED": 10319000,
    "Some college": 6942000,
    "BA+": 8994000,
    "NA": 80000
}

In [97]:
black_american_deaths = {}
for row in data:
    ed = row[10]
    race = row[7]
    if race == 'Black':
        if ed not in black_american_deaths:
            black_american_deaths[ed] = 0
        black_american_deaths[ed] += 1
print("Black American Gun Deaths by Education:")
sorted(black_american_deaths.items(), key=lambda x: x[1], reverse=True)

Black American Gun Deaths by Education:


[('HS/GED', 10686),
 ('Less than HS', 7572),
 ('Some college', 3947),
 ('BA+', 820),
 ('NA', 271)]

In [100]:
black_ed_deaths_per_hundredk = {}
for key, value in (black_american_deaths.items()):    
    black_ed_deaths_per_hundredk[key] = float(value)/float(black_education[key])*100000
sorted_ed_black = sorted(black_ed_deaths_per_hundredk.items(), key=lambda x: x[1], reverse=True)
print("Black Gun Deaths per 100,000:")
print('')
for ed, count in sorted_ed_black:
        print("{} - {} Gun Deaths per 100,000".format(ed, count))
        print('')

Black Gun Deaths per 100,000:

NA - 338.75 Gun Deaths per 100,000

Less than HS - 180.285714286 Gun Deaths per 100,000

HS/GED - 103.556546177 Gun Deaths per 100,000

Some college - 56.8568135984 Gun Deaths per 100,000

BA+ - 9.11718923727 Gun Deaths per 100,000



I blinked hard when I first read this.  I also rechecked my code twice to make sure I hadn't misnamed a variable somewhere.  If someone finds a bug let me know.

It looks like there are <i>3x as many gun deaths for Black Americans as White Americans without high school, and twice as many for those with HS/GED.  There are 20% more Black American gun deaths than White American gun deaths for those with some college.</i>  Interestingly, there are actually 35% <b>fewer</b> Black American gun deaths for individuals in the BA+ category.

# Conclusions

Analysis of this dataset has yielded the following insights:
<ol>
    <li>Almost 6x as many males died a gun-related death as females in the US between 2012 and 2014.</li>
    <li>Overall, 40% more black Americans die as a result of a gun than white Americans.</li>
    <li>White Americans are actually 26% more likely than Native American/Native Alaskans to die as a result of a gun, and 41% more likely than Hispanic Americans.</li>
    <li>Black and Hispanic American gun deaths had a high likelihood of being ruled a homicide in this time period (83% and 65% respectively).</li>
    <li>When a white American died as a result of a gun, it was ruled a suicide 84% of the time.  When it was a Native American/Native Alaskan who died, it was ruled a suicide 60% of the time.</li>
    <li>For Americans on the whole, each additional educational achievement has an insulating effect against dying a gun-related death.  The effect is strongest for Black Americans who have completed College.</li>
    <li>White Americans whose highest educational attainment is high school or below have a significantly lower rate of gun death than the national average - 40% lower if they have not completed high school, and 16% lower if they have.</li>
    <li>Black Americans are 3x as likely to die a gun death as White Americans if both do not have high school, and twice as likely if they do.</li>
    <li>Black Americans with a BA or higher are 35% less likely to die as a gun-related death as White Americans.  This may be a result of the preponderance of White American gun suicides.</li>
</ol>
Thanks for reading!
    