# Guided Project: Exploring Gun Deaths in the US

Goal: Finding something interesting about gun deaths in the US.

## Initializing Dataset

The initial dataset is [guns.csv](https://github.com/fivethirtyeight/guns-data).

In [1]:
import csv

file = open("guns.csv", "r")
data = list(csv.reader(file))
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In [2]:
headers = data[0]
data = data[1:]

print("Header: ", headers)
for row in data[:5]:
    print("Row: ", row)

Header:  ['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
Row:  ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4']
Row:  ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3']
Row:  ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']
Row:  ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']
Row:  ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']


## Exploring Time Frequency

We first want to check the year frequency of US gun deaths.

In [3]:
years = [row[1] for row in data]

year_counts = {}
for y in years:
    if y in year_counts:
        year_counts[y] += 1
    else:
        year_counts[y] = 1
print(sorted(year_counts.items(), key = lambda x: x[1]))

[('2012', 33563), ('2014', 33599), ('2013', 33636)]


The years are only 2012, 2013, and 2014, with no drastic difference of frequencies.

We now check the frequencies of gun deaths within specific year and month. We create a column of `datetime` object from the year and the month in each row of `data`, and append it to the initial `data`.

To make easy, we define `column_type_counts()` function which is built in similar fashion with `year_counts`, only now we can specify which column (index) we want to count.

In [4]:
import datetime

for row in data:
    yr = int(row[1]) # The row entry is of type string - convert to int
    mo = int(row[2])
    row.append(datetime.datetime(year = yr, month = mo, day = 1))

# Create a function column_type_ counts, similar code to year_counts
def column_type_counts(dataset, index):
    column = [row[index] for row in dataset]
    result = {}
    for entry in column:
        if entry in result:
            result[entry] += 1
        else:
            result[entry] = 1
    return result
            
# Now, back to date_counts
date_counts = column_type_counts(data, -1)
for key, val in sorted(date_counts.items()):
    print("Year ", key.year, " month ", key.month, ": ", val)

Year  2012  month  1 :  2758
Year  2012  month  2 :  2357
Year  2012  month  3 :  2743
Year  2012  month  4 :  2795
Year  2012  month  5 :  2999
Year  2012  month  6 :  2826
Year  2012  month  7 :  3026
Year  2012  month  8 :  2954
Year  2012  month  9 :  2852
Year  2012  month  10 :  2733
Year  2012  month  11 :  2729
Year  2012  month  12 :  2791
Year  2013  month  1 :  2864
Year  2013  month  2 :  2375
Year  2013  month  3 :  2862
Year  2013  month  4 :  2798
Year  2013  month  5 :  2806
Year  2013  month  6 :  2920
Year  2013  month  7 :  3079
Year  2013  month  8 :  2859
Year  2013  month  9 :  2742
Year  2013  month  10 :  2808
Year  2013  month  11 :  2758
Year  2013  month  12 :  2765
Year  2014  month  1 :  2651
Year  2014  month  2 :  2361
Year  2014  month  3 :  2684
Year  2014  month  4 :  2862
Year  2014  month  5 :  2864
Year  2014  month  6 :  2931
Year  2014  month  7 :  2884
Year  2014  month  8 :  2970
Year  2014  month  9 :  2914
Year  2014  month  10 :  2865
Year  2

The numbers here also seem quite stable. Interestingly, the number of gun deaths in February is significantly lower than the other months. Additionally, the gun death number in the summer period (month 6-8) seems to be higher.

## Exploring Sex and Race Frequencies

We now explore the counts of sex and race categories. We reuse the function `column_type_counts()`, taking into account that column `sex` and `race` has an index of 5 and 7, respectively.

In [5]:
sex_counts = column_type_counts(data, 5)
race_counts = column_type_counts(data, 7)
print(sorted(sex_counts.items()))
print(sorted(race_counts.items()))

[('F', 14449), ('M', 86349)]
[('Asian/Pacific Islander', 1326), ('Black', 23296), ('Hispanic', 9022), ('Native American/Native Alaskan', 917), ('White', 66237)]


There are several patterns observed:
1. Significantly more males have gun deaths (~80%) than females.
2. White race has the most gun deaths (66237), followed by Black race (23296), and Hispanic (9022), while the total rest are less than 2300.
3. Further examination might be how much the suicide rates is, age distribution, education distribution, and places of death.

## Counting Average Gun Deaths per 100,000 people on Each Race

The number of gun deaths for each race is not descriptive enough compared to the average number of gun deaths for each race, which seems closer to the idea of chance that a random person with a particular race in the US is a gun death victim.

We need more information about the number of US total populations for each race. We get this new data from [census.csv (link is not given)]().

In [6]:
import csv

census_file = open("census.csv", "r")
census = list(csv.reader(census_file))
for row in census:
    print("Row: ", row)

Row:  ['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races']
Row:  ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']


We manually enter this data into a dictionary `mapping`. Then, a new variabe of dictionary `race_per_hundredk` is created to store the average number of deaths per 100,000 persons of each race in the US.

In [7]:
mapping = {"White" : 197318956,
           "Hispanic" : 44618105,
           "Black" : 40250635,
           "Native American/Native Alaskan" : 3739506 ,
           "Asian/Pacific Islander" : 15159516 + 674625}
race_per_hundredk = {}
for key in mapping:
    race_per_hundredk[key] = race_counts[key]/mapping[key]*100000

# Print race_per_hundredk by decreasing values
print(sorted(race_per_hundredk.items(), key = lambda x: x[1], reverse = True))

[('Black', 57.8773477735196), ('White', 33.56849303419181), ('Native American/Native Alaskan', 24.521955573811088), ('Hispanic', 20.220491210910907), ('Asian/Pacific Islander', 8.374309664161762)]


## Counting Average Gun Deaths per 100,000 people on Each Race: Homicide Version

We select a sub-dataset of the original `data` which has `'intent'` equals to "Homicide", and put them as `homicide_data`. Then we process  `homicide_data` to get `homicide_race_per_hundredk`, similar to our previous works.

In [8]:
homicide_data = [row for row in data if row[3] == "Homicide"]
homicide_race_counts = column_type_counts(homicide_data, 7)

# Not needed at the moment:
#print("Homicide gun death counts based on race:\n")
#print(sorted(homicide_race_counts.items()), "\n")

homicide_race_per_hundredk = {}
for key in mapping:
    homicide_race_per_hundredk[key] = homicide_race_counts[key]/mapping[key]*100000

print("Average homicide gun death counts per 100,000 based on race:\n")
print(sorted(homicide_race_per_hundredk.items(), key = lambda x: x[1], reverse = True))

Average homicide gun death counts per 100,000 based on race:

[('Black', 48.471284987180944), ('Hispanic', 12.627161104219914), ('Native American/Native Alaskan', 8.717729026240365), ('White', 4.6356417981453335), ('Asian/Pacific Islander', 3.530346230970155)]


From above, it can be seen that for homocide cases, the race with the largest average gun deaths (per 100,000 people) in the US is Black, followed by Hispanic and Native American/Native Alaskan.

If we compare the average counts for all US gun deaths versus Homicide US gun deaths, there is a trend between Black versus White races. Although both races are listed as the top two in death frequencies (58% and 36%), 83.75% of US Black victims' deaths are due to Homicide; while in stark contrast, only 13.81% of US White victims' deaths are due to Homicide.

# Conclusion

