# Exploring Gun Deaths in the US

This project uses the guns.csv data file, which contains information on gun deaths in the US from 2012 through 2014.  Each row in the data represents one death.

The following information is included in the file:
 - `row` - row number in the file (unique identifier)
 - `year` - year of the death
 - `month` - month of the death
 - `intent` - intent of the perpetrator (Suicide, Accidental, NA, Homicide, Undetermined)
 - `police` - whether a police officer was involved in the shooting (1=True)
 - `sex` - gender of the victim (M, F)
 - `age` - age of the victim in years
 - `race` - race of the victim (Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, White)
 - `hispanic` - code indicating Hispanic origin of the victim
 - `place` - where the shooting occurred
 - `eduction` - education level of the victim (1=less than high school, 2=graduated from high school or equivalent, 3=some college, 4=at least graduated from college, 5=not available)

In [1]:
# read in the dataset
import csv

f = open('guns.csv', 'r')
data = list(csv.reader(f))

# preview the data
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [2]:
# extract headers from our data
header = data[0]
data = data[1:]

print('Header: ', header, '\n')
print('Data: ', data[:5])

Header:  ['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'] 

Data:  [['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Initial data review

In [3]:
# create dictionary of number of deaths by year
years = [row[1] for row in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
# display the counts
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

Since there is little variation by year, let's instead look at the variation by month and year.

In [4]:
# create a year/month date object
# default day to 1, since we don't have this in the data
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
# preview the list created
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [5]:
# count number of times each unique date occurs
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
# display the counts
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

From the above, it seems that gun deaths occur more in the summer than the winter.

## Data by sex and race

In addition to date information, we also have information on both sex and race of the victim.  We can perform similar functions on these variables.

In [6]:
# analyze counts by sex
sexes = [row[5] for row in data]
sex_counts = {}
for sex in sexes:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else:
        sex_counts[sex] = 1
sex_counts

{'F': 14449, 'M': 86349}

From the above, the victim is disproportionately male.  If this was random, we would expect about a 50/50 split.

In [7]:
# analyze counts by race
races = [row[7] for row in data]
race_counts = {}
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

From the above, it appears that minorities are disproportionately the victim.  For example, while about 12% of the US population is Black, about 23% (23296 / 100798) of the victims are Black.

## Additional analysis on race

It would be better to be able to analyze the rates of gun deaths per 100,000 people of each race.  To do so, we need to bring in census data (rather than some rough approximation from Google as we did above).

We will import some census data, which contains population counts as of the 2010 US census.

In [8]:
f = open('census.csv', 'r')
census = list(csv.reader(f))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In [9]:
# create a dictionary of population counts from the census for each race
mapping = {}
mapping['Asian/Pacific Islander'] = int(census[1][14]) + int(census[1][15])
mapping['Black'] = int(census[1][12])
mapping['Native American/Native Alaskan'] = int(census[1][13])
mapping['Hispanic'] = int(census[1][11])
mapping['White'] = int(census[1][10])
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [10]:
# create another dictionary to store counts per 100,000 by race
race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

As we had roughly estimated previously, the rate of gun deaths per 100,000 is much higher for people in the `Black` racial category.

## Analysis using intent

We can look further at this data using the information on intent provided.  We can filter our results to look at only `Homicide` intent.

In [11]:
# create dictionary of homicide counts by race
intents = [row[3] for row in data]
homicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1
homicide_race_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [12]:
# create another dictionary to store homicide counts per 100,000 by race
race_per_hundredk_homicide = {}
for race in homicide_race_counts:
    race_per_hundredk_homicide[race] = homicide_race_counts[race] / mapping[race] * 100000
race_per_hundredk_homicide

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

From the above, it appears that gun-related homicides disproportionately affect people in the `Black` racial category.

## Additional steps

We can also investigate some additional items to take this analysis further.  Let's look at the following steps:
- Homicide rate by gender
- Rate of intent `Accidental` by race
- Gun death rates by education
- Gun death rates involving police by race

We can look at each of the above steps in turn.

## Homicide rate by gender

In the above analysis, we looked at the homicide rate by race.  How does this look by gender instead?

In [13]:
# create dictionary of homicide counts by gender
homicide_sex_counts = {}
for i, sex in enumerate(sexes):
    if intents[i] == 'Homicide':
        if sex in homicide_sex_counts:
            homicide_sex_counts[sex] += 1
        else:
            homicide_sex_counts[sex] = 1
homicide_sex_counts

{'F': 5373, 'M': 29803}

Above, we determined the counts of gun deaths by gender was `{'F': 14449, 'M': 86349}`, which is about 6 males for each female.  When looking at just homicides, that ratio drops just a bit to about 5.5 males for each female.

## Accidental gun death by race

Above we looked at homicides by race, but we can also look at other intents, such as accidental deaths.  Does that show the same pattern by race as we saw with homicide?

In [14]:
# create dictionary of accidental counts by race
accidental_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Accidental':
        if race in accidental_race_counts:
            accidental_race_counts[race] += 1
        else:
            accidental_race_counts[race] = 1
accidental_race_counts
# create another dictionary to store accidental counts per 100,000 by race
race_per_hundredk_accident = {}
for race in accidental_race_counts:
    race_per_hundredk_accident[race] = accidental_race_counts[race] / mapping[race] * 100000
race_per_hundredk_accident

{'Asian/Pacific Islander': 0.07578560782046845,
 'Black': 0.814893976206835,
 'Hispanic': 0.3249801846133985,
 'Native American/Native Alaskan': 0.5883130017708221,
 'White': 0.5736904466492313}

The first thing to notice with the above is that the rate of accidental death from guns is much, much lower than that of homicides.

Secondly, gun-related accidental deaths also disproportionately affect people in the `Black` racial category.

## Gun deaths by education level

Another interesting element to the data is the education level.  How does the education level correlate with gun deaths?

In [15]:
# analyze counts by education
educ = [row[10] for row in data]
educ_counts = {}
for ed in educ:
    if ed in educ_counts:
        educ_counts[ed] += 1
    else:
        educ_counts[ed] = 1
educ_counts

{'1': 21823, '2': 42927, '3': 21680, '4': 12946, '5': 1369, 'NA': 53}

From the above, it appears that the most common education level of victims is graduated from high school (or the equivalent).  Of the available values, the least common is at least graduated from college.  It would be interesting to compare overall population education levels to the education levels of victim's of gun violence.  (We would need to bring in additional data, similar to what we did with the census data.)

## Gun deaths involving police by race

We were also provided with an indicator for whether police were involved in the shooting.  Given that this is a current hot topic in the news, it would be interesting to look at the data provided over these years.

In [16]:
# create dictionary of police counts by race
police = [row[4] for row in data]
police_race_counts = {}
for i, race in enumerate(races):
    if police[i] == '1':
        if race in police_race_counts:
            police_race_counts[race] += 1
        else:
            police_race_counts[race] = 1
police_race_counts
# create another dictionary to store police counts per 100,000 by race
race_per_hundredk_police = {}
for race in police_race_counts:
    race_per_hundredk_police[race] = police_race_counts[race] / mapping[race] * 100000
race_per_hundredk_police

{'Asian/Pacific Islander': 0.18946401955117112,
 'Black': 0.8844580961269306,
 'Hispanic': 0.6320304280067475,
 'Native American/Native Alaskan': 0.6685375020122979,
 'White': 0.3593167196769478}

The rates of gun deaths where police are involved is much lower than the rates of gun deaths due to homicides for all races (e.g. less than 1 per 100,000 versus 48 per 100,000 for `Black`).

However, we still see that the rates for `Black` are disproportionately higher.