# Exploring Gun Deaths in the US

The dataset is from [FiveThirtyEight](https://fivethirtyeight.com/), and can be found [here](https://raw.githubusercontent.com/fivethirtyeight/guns-data/master/full_data.csv).
It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single incident of death by gun shut. The columns contain demographic and other information about the victims.

**year** -- the year in which the fatality occurred.

**month**-- the month in which the fatality occurred.

**intent** -- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.

**police** -- whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).

**sex** -- the gender of the victim. Either M or F.

**age** -- the age of the victim.

**race** -- the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.

**hispanic** -- a code indicating the Hispanic origin of the victim.
place -- where the shooting occurred. Has several categories, which you're encouraged to explore on your own.

**education** -- educational status of the victim. Can be one of the following:

1. Less than High School
2. Graduated from High School or equivalent
3. Some College
4. At least graduated from College
5. Not available

### In this project, I'll explore the dataset, to find the total number of gun deaths by gender and race in the us. The US censor data will also be use to determine the rate gun deaths per 100000 people for each race.



# Read CSV 

In [8]:
import csv

f = open('guns.csv')
read = csv.reader(f)

#converts file into a list of lists
data = list(read)

In [9]:
#view first 5 rows of the data
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  'BA+'],
 ['2',
  '2012',
  '01',
  'Suicide',
  '0',
  'F',
  '21',
  'White',
  '100',
  'Street',
  'Some college'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  'BA+'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+']]

# Remove header

To begin analyzing the data,it needs to be transformed into a more readable format.

In [11]:
headers = data[0]

data = data[1:]

print(data[:5])

[['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', 'Some college'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', 'BA+'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', 'BA+'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', 'HS/GED'], ['6', '2012', '02', 'Suicide', '0', 'M', '17', 'Native American/Native Alaskan', '100', 'Home', 'Less than HS']]


# Total number of Gun deaths per year


In [4]:
#Extract the 'year' column from the data
years = [i[1] for i in data]

#Store uniques year in the data as keys and the total number of gun deaths as values
year_counts = {} #empty dictionary
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
print(year_counts)



{'2012': 33563, '2014': 33599, '2013': 33636}


The total number of gun deaths from 2012 to 2014 are close to each other.



# Total number of Gun deaths per month

Explore further by calculating the total number of gun deaths for each month in each year.




In [5]:
import datetime

dates = [datetime.datetime(year=int(i[1]), month=int(i[2]),day=1) for i in data] 
dates[:5]

date_count = {}

for date in dates:
    if date in date_count:
        date_count[date] += 1
    else:
        date_count[date] = 1
        
date_count

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

Seems like there is a trend in the total number of gun death,it shows increase in gun deaths in warmer months which peaks in July and declines in winter

#Ananlyze gender, sex and race's relationship with gun death in the US

In [14]:
gender_counts = {}

genders = [i[5] for i in data]
for gender in genders:
    if gender in gender_counts:
        gender_counts[gender] += 1
    else:
        gender_counts[gender] = 1
            
gender_counts 

{'F': 14449, 'M': 86348}

In [15]:
race_counts = {}

races = [i[7] for i in data]
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
            
race_counts 

{'Asian/Pacific Islander': 1325,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

## Findings so far

Gun deaths in the US disproportionately affect men vs women. According to the data 85%   of gun deaths are male. It also seems to disproportionately affect races. Majority of the dead people are either black or white.

It would be helpful to have  some data on the percentage of each race in the US overall populations.

There also seem to be a seasonal trend. Gun deaths rises in the warmer months (May - September), it peaks in the summer and decline in the winter.



# Comparing US Gun Deaths by Race Using the US Census Data


The total number of deaths per race was explored above. To meaningfully compare the numbers, we need to compute the rate of gun deaths per **100000** people of each race.

Import data that contains information on the total population of US, as well as the total population of each racial group.


In [18]:
d = csv.reader(open('census.csv'))
census = list(d)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per **100000** people in each race, we'll need to divide the total number of gun deaths by the population of each race. 

In [19]:
#manual dictionary to combine race population total

mapping = {'Asian/Pacific Islander':15159516 + 674625,
           'Black': 40250635,
           'Hispanic' :44618105,
           'Native American/Native Alaskan': 3739506,
           'White': 197318956
}

race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k])*100000
race_per_hundredk    

{'Asian/Pacific Islander': 8.36799419684339,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

Blacks have the highest rate of gun deaths in the US

# Homicide Rate by Gun


In [15]:
intents = [i[3] for i in data]
races = [i[7] for i in data]

homicide_race_counts = {}

for i,v in enumerate(races):
    if intents[i] == 'Homicide':
        if v in homicide_race_counts:
            homicide_race_counts[v] += 1
        else:
            homicide_race_counts[v] = 1
            
homicide_race_counts    

race_per_hundredk_homicides ={}
for k,v in homicide_race_counts.items():
    race_per_hundredk_homicides[k] = (v / mapping[k]) * 100000
    
race_per_hundredk_homicides    

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

It appears that gun related homicides disproportionately affect people of black and hispanic races.

Some areas to explore further:

The link between month and homicide rate.
Homicide rate by gender.
The rates of other intents by gender and race.
Gun death rates by location and education.


# Create count_category() Function

Since I am repeating the same operations, I will create a function that takes in the column index as the parameter and  calculate the  total number of gun deaths for it.

In [23]:
def count_category(column):
    
    col_list = [i[column] for i in data]
    return_dict = {}
    
    for v in col_list:
        if v in return_dict:
            return_dict[v] += 1
        else:
            return_dict[v] = 1
            
    return return_dict         
            

In [24]:
gun_death_by_intent = count_category(3)
gun_death_by_intent

{'Accidental': 1639,
 'Homicide': 35176,
 'NA': 1,
 'Suicide': 63174,
 'Undetermined': 807}

In [26]:
gun_death_by_age = count_category(6)
gun_death_by_age 

{'0': 33,
 '1': 38,
 '10': 53,
 '100': 1,
 '101': 2,
 '102': 2,
 '107': 1,
 '11': 61,
 '12': 117,
 '13': 229,
 '14': 364,
 '15': 561,
 '16': 864,
 '17': 1185,
 '18': 1753,
 '19': 2065,
 '2': 50,
 '20': 2219,
 '21': 2504,
 '22': 2712,
 '23': 2472,
 '24': 2437,
 '25': 2230,
 '26': 2231,
 '27': 2070,
 '28': 1986,
 '29': 1955,
 '3': 66,
 '30': 1869,
 '31': 1833,
 '32': 1824,
 '33': 1700,
 '34': 1698,
 '35': 1631,
 '36': 1512,
 '37': 1500,
 '38': 1491,
 '39': 1389,
 '4': 54,
 '40': 1414,
 '41': 1485,
 '42': 1492,
 '43': 1527,
 '44': 1449,
 '45': 1372,
 '46': 1437,
 '47': 1532,
 '48': 1621,
 '49': 1669,
 '5': 43,
 '50': 1674,
 '51': 1755,
 '52': 1715,
 '53': 1708,
 '54': 1684,
 '55': 1596,
 '56': 1625,
 '57': 1472,
 '58': 1510,
 '59': 1430,
 '6': 50,
 '60': 1361,
 '61': 1306,
 '62': 1099,
 '63': 1041,
 '64': 1126,
 '65': 1039,
 '66': 998,
 '67': 865,
 '68': 868,
 '69': 879,
 '7': 43,
 '70': 883,
 '71': 791,
 '72': 736,
 '73': 737,
 '74': 671,
 '75': 676,
 '76': 582,
 '77': 575,
 '78': 598,
 

In [28]:
gun_death_by_police = count_category(4)
gun_death_by_police

{'0': 99395, '1': 1402}

In [29]:
gun_death_by_education = count_category(10)
gun_death_by_education

{'BA+': 12945,
 'HS/GED': 42927,
 'Less than HS': 21823,
 'NA': 1422,
 'Some college': 21680}

In [30]:
gun_death_by_place = count_category(9)
gun_death_by_place

{'Farm': 470,
 'Home': 60485,
 'Industrial/construction': 248,
 'NA': 1384,
 'Other specified': 13751,
 'Other unspecified': 8867,
 'Residential institution': 203,
 'School/instiution': 671,
 'Sports': 128,
 'Street': 11151,
 'Trade/service area': 3439}

There seems to be a lot of gun deaths happening at home. we can further explore the intents behind these deaths


In [32]:
def group_by(column1, column2, value):
    
    list_column_1 = [row[column1] for row in data]
    list_column_2 = [row[column2] for row in data]
    return_dictionary = {}
    for i, k in enumerate(list_column_1):
        if k not in return_dictionary:
            return_dictionary[k] = 0
        if list_column_2[i] == value:
            return_dictionary[k] += 1
    
    if column1 == 7:
        race_per_hundredk = {}
        for i in list_column_1:
            race_per_hundredk[i] = return_dictionary[i] / mapping[i] * 100000 / 1
        return(race_per_hundredk)
    
    return return_dictionary

In [33]:
suicide_by_race = group_by(7,3,"Suicide")
suicide_by_race

{'Asian/Pacific Islander': 4.698707684869043,
 'Black': 8.278130270491385,
 'Hispanic': 7.106980451097149,
 'Native American/Native Alaskan': 14.841532544673013,
 'White': 28.06217969245692}

In [34]:
homicide_place = group_by(9,3, 'Homicide')
homicide_place

{'Farm': 76,
 'Home': 13613,
 'Industrial/construction': 85,
 'NA': 1383,
 'Other specified': 5429,
 'Other unspecified': 3604,
 'Residential institution': 73,
 'School/instiution': 361,
 'Sports': 22,
 'Street': 8867,
 'Trade/service area': 1663}

In [35]:
suicide_place = group_by(9,3,'Suicide')
suicide_place

{'Farm': 378,
 'Home': 45414,
 'Industrial/construction': 155,
 'NA': 0,
 'Other specified': 8031,
 'Other unspecified': 4774,
 'Residential institution': 125,
 'School/instiution': 295,
 'Sports': 98,
 'Street': 2181,
 'Trade/service area': 1723}

Majority of gun deaths in homes, farms,residential institutions have been suicides


In [36]:
homicide_race = group_by(7,3,'Homicide')
homicide_race

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}