# Gun Deaths in the US (2012 - 2014)
In this project, you'll be working with Jupyter notebook, and analyzing data on gun deaths in the US. The list of variables are as follows:

- year -- the year in which the fatality occurred.
- month -- the month in which the fatality occurred.
- intent -- the intent of the perpetrator of the crime. This can be   Suicide, Accidental, NA, Homicide, or Undetermined.
- police -- whether a police officer was involved with the shooting.   Either 0 (false) or 1 (true).
- sex -- the gender of the victim. Either M or F.
- age -- the age of the victim.
- race -- the race of the victim. Either Asian/Pacific Islander,      Native American/Native Alaskan, Black, Hispanic, or White.
  hispanic -- a code indicating the Hispanic origin of the victim.
- place -- where the shooting occurred. Has several categories,  which you're encouraged to explore on your own.
- education -- educational status of the victim. Can be one of the following:
- 1 -- Less than High School
- 2 -- Graduated from High School or equivalent
- 3 -- Some College
- 4 -- At least graduated from College
- 5 -- Not available

In [13]:
import csv
import datetime
f = open("guns.csv", 'r')
data_1 = list(csv.reader(f))
print(data_1[0:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [14]:
headers = data_1[0]
data_1 = data_1[1:]
print(headers)
print(data_1[0:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


Below, we calculate deaths by each corresponding year (2012 - 2014). We'd need to create a dictionary containing year as the key, and the total as value.

In [15]:
years = []
for row in data_1:
    years.append(row[1])

year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] = year_counts[year] + 1
    else:
        year_counts[year] = 1
print(year_counts)

{'2012': 33563, '2014': 33599, '2013': 33636}


Below we want to combine our year and month columns into datetime format. Then, we loop over this list of datetime data to calculate the total of incidents for every unique date.

In [16]:
dates = []
for row in data_1:
    year = int(row[1])
    month = int(row[2])
    dt = datetime.datetime(year = year, month = month,day = 1)
    dates.append(dt)
    

date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] = date_counts[date] + 1
    else:
        date_counts[date] = 1

print(date_counts)


{datetime.datetime(2014, 10, 1, 0, 0): 2865, datetime.datetime(2014, 8, 1, 0, 0): 2970, datetime.datetime(2014, 5, 1, 0, 0): 2864, datetime.datetime(2012, 1, 1, 0, 0): 2758, datetime.datetime(2012, 5, 1, 0, 0): 2999, datetime.datetime(2013, 7, 1, 0, 0): 3079, datetime.datetime(2013, 11, 1, 0, 0): 2758, datetime.datetime(2014, 9, 1, 0, 0): 2914, datetime.datetime(2013, 6, 1, 0, 0): 2920, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2013, 8, 1, 0, 0): 2859, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2014, 4, 1, 0, 0): 2862, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2013, 2, 1, 0, 0): 2375, datetime.datetime(2014, 6, 1, 0, 0): 2931, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2014, 12, 1, 0, 0): 2857, datetime.datetime(2012, 4, 1, 0, 0): 2795, datetime.datetime(2012, 10, 1, 0, 0): 2733, datetime.datetime(2013, 10, 1, 0, 0): 2808, datetime.datetime(2012, 9, 1, 0, 0): 2852, datetime.datetime(2012, 8, 1, 0, 0): 2954, datet

Below, we again want to calculate total incidents by gender followed by race.

In [17]:
sex_counts = {}
race_counts = {}

for row in data_1:
    if row[5] in sex_counts:
        sex_counts[row[5]] = sex_counts[row[5]] + 1
    else:
        sex_counts[row[5]] = 1
        
for row in data_1:
    if row[7] in race_counts:
        race_counts[row[7]] = race_counts[row[7]] + 1
    else:
        race_counts[row[7]] = 1

print(sex_counts)
print(race_counts)
        

{'F': 14449, 'M': 86349}
{'Black': 23296, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022, 'Native American/Native Alaskan': 917, 'White': 66237}


While it's interesting to see this data, not knowing the porportion of each race (that's why crime rates are usually expressed in 'rate per 1xxxx')within the population will lead to misleading conclusions e.g. while incidents amongs Native Americans/Alaskans are low, this may be caused by their already low population. This we will look in more detail below by combining our existing data at a new dataset 'census.csv'.

In [18]:
f = open("census.csv", 'r')
data_2 = list(csv.reader(f))
print(data_2)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


In [8]:
mapping = {
    'Asian/Pacific Islander':15834141,
    'Black':40250635,
    'Native American/Native Alaskan':3739506,
    'Hispanic':44618105,
    'White':197318956
    }

race_per_hundredk = {}

for key,val in race_counts.items():
    if key in mapping:
        race_per_hundredk[key] = (race_counts[key]/mapping[key])*100000
    
print(race_per_hundredk)    


{'White': 33.56849303419181, 'Black': 57.8773477735196, 'Hispanic': 20.220491210910907, 'Native American/Native Alaskan': 24.521955573811088, 'Asian/Pacific Islander': 8.374309664161762}


Below, for example, we can tell that for Whites, 33.57 out of every 100,000 were killed by guns between 2012 - 2014. However, because these gun related deaths can be from suicide, accidental etc it would be interesting to see only homocide incidents.

In [33]:
intents = []

for row in data_1:
    intents.append(row[3])

races = []
for row in data_1:
    races.append(row[7])

homicide_race_counts = {}

for idx,val in enumerate(races):
    i = idx
    race = val
    if intents[i] == "Homicide":
        if val in homicide_race_counts:
            homicide_race_counts[val] = homicide_race_counts[val] + 1
        else:
            homicide_race_counts[val] = 1

for key,val in homicide_race_counts.items():
    if key in mapping:
        homicide_race_counts[key] = (homicide_race_counts[key]/mapping[key])*100000
    
print(homicide_race_counts)
    

{'Black': 48.471284987180944, 'White': 4.6356417981453335, 'Hispanic': 12.627161104219914, 'Native American/Native Alaskan': 8.717729026240365, 'Asian/Pacific Islander': 3.530346230970155}


As we can see, most homicides occur among Blacks, while the least occurance is within the Asian/Pacific Islander community. Perhaps, additional calculations can be done on homicide rates amongs different educational background, it'll be interesting to see there are high number of those with high levels of education die from homicide.