US Gun Deaths 2012 to 2014

- This project is based on gun-related fatalities from 2012 to 2014. The data set can be found here: https://github.com/fivethirtyeight/guns-data.

- Here, I'm reading the data set into a list of lists using the csv module and
view the first 5 rows of data.

In [1]:
import csv
f = open('guns.csv', 'r')
csvreader = csv.reader(f)
data = list(csvreader)
print(data[0:4])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']]


- The next step is to remove the header from the rest of the data set. This is done to prevent the header row from being read as a data point.

In [2]:
headers = data[0]
data = data[1:]
print(headers)
print(data[0:4])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


- Now, I'm determining how many gun-related fatalities happened each year.

In [3]:
years = (row[1] for row in data)

year_counts = {}
for r in years:
    if r in year_counts:
        year_counts[r] = year_counts[r] + 1
    else:
        year_counts[r] = 1

print(year_counts)



{'2014': 33599, '2012': 33563, '2013': 33636}


- Using the datetime object and list comprehension to save room, I'm able to see how many fatalities happened  each for every unique day in the data set. I have to first make sure the year and month columns are integers before the datetime class can identify them.

In [4]:
import datetime
dates = [datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1) for row in data]
print(dates[:5])

date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

print(date_counts)


[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]
{datetime.datetime(2012, 2, 1, 0, 0): 2357, datetime.datetime(2013, 8, 1, 0, 0): 2859, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2012, 11, 1, 0, 0): 2729, datetime.datetime(2014, 12, 1, 0, 0): 2857, datetime.datetime(2012, 12, 1, 0, 0): 2791, datetime.datetime(2013, 10, 1, 0, 0): 2808, datetime.datetime(2014, 8, 1, 0, 0): 2970, datetime.datetime(2012, 8, 1, 0, 0): 2954, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2013, 7, 1, 0, 0): 3079, datetime.datetime(2013, 1, 1, 0, 0): 2864, datetime.datetime(2012, 5, 1, 0, 0): 2999, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2014, 7, 1, 0, 0): 2884, datetime.datetime(2014, 11, 1, 0, 0): 2756, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2012, 9, 1, 0, 0

- The next step is to find the fatalities listed per gender, and per race.

In [5]:
sexes = [row[5] for row in data]
races = [row[7] for row in data]
sex_counts = {}
race_counts = {}

for s in sexes:
    if s not in sex_counts:
        sex_counts[s] = 0
    sex_counts[s] += 1
        
for r in races:
    if r not in race_counts:
        race_counts[r] = 0
    race_counts[r] += 1

print(sex_counts)
print(race_counts)


{'M': 86349, 'F': 14449}
{'Native American/Native Alaskan': 917, 'Hispanic': 9022, 'White': 66237, 'Asian/Pacific Islander': 1326, 'Black': 23296}


What we've found so far:

- Gun deaths in the US seem to affect men men to a higher degree than women. The same case goes for minorities affected as well, although having some data on the percentage of each race in the overall US population would give us more insight.

- There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter. It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender.


Reading in a census data set:

In [6]:
c = open('census.csv', 'r')
csvreader = csv.reader(c)
census = list(csvreader)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

- While having the total census numbers per race is helpful for the analysis, we would be better off getting the rate of gun fatalities per every 100,000 people of each race.

- In the first block below, I'm mapping the races in the census data set to the races determined in race_counts, which were provided in the gun fatality data set. 

- The second block returns the rate of gun fatalities per 100,000 people in each race by dividing the total number of fatalities by the population of each race, then multiplying it by 100,000.

In [7]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v/mapping[k]) * 100000
    
print(race_per_hundredk)
    

{'Native American/Native Alaskan': 24.521955573811088, 'Hispanic': 20.220491210910907, 'Asian/Pacific Islander': 8.374309664161762, 'White': 33.56849303419181, 'Black': 57.8773477735196}


- My next step is to take the results above and filter by intent. The intent in this step will specifically be the "Homicide" value, located in the fourth column of the gun fatality data set.

In [8]:
intent = [row[3] for row in data]
races = [row[7] for row in data]
homicide_race_counts = {}

for i,race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intent[i] == "Homicide":
        homicide_race_counts[race] += 1
    
race_per_hundredk = {}
for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v/mapping[k]) * 100000
    
print(race_per_hundredk)
    

{'Native American/Native Alaskan': 8.717729026240365, 'Hispanic': 12.627161104219914, 'Asian/Pacific Islander': 3.530346230970155, 'White': 4.6356417981453335, 'Black': 48.471284987180944}


- The data analysis shows that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories. Next, I'll find the suicide rate per month.


In [11]:
intent = [row[3] for row in data]

suicides_per_month = {}

for i,date in enumerate(dates):
    if date not in suicides_per_month:
        suicides_per_month[date] = 0
    if intent[i] == "Suicide":
        suicides_per_month[date] += 1
    
suicides_per_month

{datetime.datetime(2012, 1, 1, 0, 0): 1707,
 datetime.datetime(2012, 2, 1, 0, 0): 1549,
 datetime.datetime(2012, 3, 1, 0, 0): 1707,
 datetime.datetime(2012, 4, 1, 0, 0): 1745,
 datetime.datetime(2012, 5, 1, 0, 0): 1932,
 datetime.datetime(2012, 6, 1, 0, 0): 1724,
 datetime.datetime(2012, 7, 1, 0, 0): 1789,
 datetime.datetime(2012, 8, 1, 0, 0): 1796,
 datetime.datetime(2012, 9, 1, 0, 0): 1717,
 datetime.datetime(2012, 10, 1, 0, 0): 1696,
 datetime.datetime(2012, 11, 1, 0, 0): 1680,
 datetime.datetime(2012, 12, 1, 0, 0): 1624,
 datetime.datetime(2013, 1, 1, 0, 0): 1797,
 datetime.datetime(2013, 2, 1, 0, 0): 1592,
 datetime.datetime(2013, 3, 1, 0, 0): 1870,
 datetime.datetime(2013, 4, 1, 0, 0): 1819,
 datetime.datetime(2013, 5, 1, 0, 0): 1791,
 datetime.datetime(2013, 6, 1, 0, 0): 1797,
 datetime.datetime(2013, 7, 1, 0, 0): 1893,
 datetime.datetime(2013, 8, 1, 0, 0): 1786,
 datetime.datetime(2013, 9, 1, 0, 0): 1728,
 datetime.datetime(2013, 10, 1, 0, 0): 1741,
 datetime.datetime(2013, 11,

- Above, the insight we can see is that February has the lowest suicide for gun-related fatalities from 2012 to 2014.