# Exploring Gun Deaths in the US

The dataset comes from from (FiveThirtyEight)[https://www.fivethirtyeight.com/], and can be found (here)[https://github.com/fivethirtyeight/guns-data].

Here is an explanation of each column:
- ` ` this is an identifier column, which contains the row number. 
- `year` -- the year in which the fatality occurred.
- `month` -- the month in which the fatality occurred.
- `intent` -- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.
- `police` -- whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).
- `sex` -- the gender of the victim. Either M or F.
- `age` -- the age of the victim.
- `race` -- the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.
- `hispanic` -- a code indicating the Hispanic origin of the victim.
- `place` -- where the shooting occurred.
- `education` -- educational status of the victim. Can be one of the following:
  - 1 -- Less than High School
  - 2 -- Graduated from High School or equivalent
  - 3 -- Some College
  - 4 -- At least graduated from College
  - 5 -- Not available

## Importing the data

In [4]:
import csv
data = list(csv.reader(open('guns.csv','r')))
print(data[:5], '\n\n')

# Save the header
headers = data[0]

# Remove the header from the dataset
data = data[1:]

print(headers, '\n\n')
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']] 


['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'] 


[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Counting number of deaths per year

In [5]:
years = [item[1] for item in data]

year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
        
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

There does not seem to be a significant difference in number of deaths from 2012 to 2014.  Let's see analyse the evolution per month.

# Counting number of deaths per month

In [10]:
import datetime
dates = [datetime.datetime(year=int(item[1]), month=int(item[2]), day=1) for item in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [44]:
date_counts = {}

for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Impacts of sex and race 

In [13]:
sex_counts = {}

sex = [item[5] for item in data]

for each in sex:
    if each in sex_counts:
        sex_counts[each] += 1
    else:
        sex_counts[each] = 1

sex_counts

{'F': 14449, 'M': 86349}

In [14]:
race_counts = {}

race = [item[7] for item in data]
for each in race:
    if each in race_counts:
        race_counts[each] += 1
    else:
        race_counts[each] = 1

race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

Without knowing the proportion in each race in the US, we can't make a meaningful analysis.

## Importing the race proportion in the US

In [45]:
census = list(csv.reader(open('census.csv','r')))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

We need to map the heading used in data with the ones used in census.

In [52]:
census_dict = {}

for i, header in enumerate(census[0]):
    try:
        census_dict[header] = int(census[1][i])
    except Exception:
        census_dict[header] = census[1][i]

census_dict

{'Geography': 'United States',
 'Hispanic Origin': 'Total',
 'Id': '0100000US',
 'Id2': '',
 'Race Alone - American Indian and Alaska Native': 3739506,
 'Race Alone - Asian': 15159516,
 'Race Alone - Black or African American': 40250635,
 'Race Alone - Hispanic': 44618105,
 'Race Alone - Native Hawaiian and Other Pacific Islander': 674625,
 'Race Alone - White': 197318956,
 'Sex': 'Both Sexes',
 'Total': 308745538,
 'Two or More Races': 6984195,
 'Year': 'April 1, 2010 Census'}

In [53]:
mapping = {}
mapping['Asian/Pacific Islander'] = census_dict['Race Alone - Asian'] + census_dict['Race Alone - Native Hawaiian and Other Pacific Islander']
mapping['Black'] = census_dict['Race Alone - Black or African American']
mapping['Hispanic'] = census_dict['Race Alone - Hispanic']
mapping['Native American/Native Alaskan'] = census_dict['Race Alone - American Indian and Alaska Native']
mapping['White'] = census_dict['Race Alone - White']

mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

In [54]:
race_per_hundredk = {}
for key in race_counts:
    race_per_hundredk[key] = race_counts[key] / mapping[key] * 100000

race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## Filtering by 'Homicide' intent