## US Gun Deaths Data Set

[Original article by FiveThirtyEight about Guns](http://fivethirtyeight.com/features/gun-deaths/)

The data set contains cleaned gun-death data from the CDC for 2012-2014.

### Assignment

- Import the csv
- Read it into a list
- Preview the first 5 entries

In [1]:
import csv
with open('guns.csv', newline='') as csvfile:
    f=list(csv.reader(csvfile))
    print (f[:5])
    

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


### Assignment

- Remove the header row from the list of lists
- Save it to a separate list

In [2]:
header=f[0]
data=f[1:]
print (header)

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


### Assignment

- Count the number of gun deaths by year
    - It may help to do a list comprehension to get the years
    - Iterate over the years with a dictionary to keep count
    
    

In [3]:
dc={}
for line in data:
    key=line[1]
    if key in dc.keys():        
        dc[key]=dc[key]+1
    else:
        dc[key]=1
print (dc)

{'2012': 33563, '2013': 33636, '2014': 33599}


### Assignment

- Import the datetime library
- Create a new list called "dates" with values from the data (set all the day values to 1)    
- Count they number of gun deaths by month and year



In [5]:
import datetime as dt
dates=[]  #list for the dates
counts={} #dictionary for dates and death counts
for line in data:
    dat=dt.date(int(line[1]),int(line[2]),1)
    if dat not in dates:
        dates.append(dat)
        counts[dat]=1
    else:
        counts[dat]=counts[dat]+1
#print (dates)
for key in sorted(counts):
    print ("%s : %s" %(key, counts[key]))

2012-01-01 : 2758
2012-02-01 : 2357
2012-03-01 : 2743
2012-04-01 : 2795
2012-05-01 : 2999
2012-06-01 : 2826
2012-07-01 : 3026
2012-08-01 : 2954
2012-09-01 : 2852
2012-10-01 : 2733
2012-11-01 : 2729
2012-12-01 : 2791
2013-01-01 : 2864
2013-02-01 : 2375
2013-03-01 : 2862
2013-04-01 : 2798
2013-05-01 : 2806
2013-06-01 : 2920
2013-07-01 : 3079
2013-08-01 : 2859
2013-09-01 : 2742
2013-10-01 : 2808
2013-11-01 : 2758
2013-12-01 : 2765
2014-01-01 : 2651
2014-02-01 : 2361
2014-03-01 : 2684
2014-04-01 : 2862
2014-05-01 : 2864
2014-06-01 : 2931
2014-07-01 : 2884
2014-08-01 : 2970
2014-09-01 : 2914
2014-10-01 : 2865
2014-11-01 : 2756
2014-12-01 : 2857


### Assignment

- Find the number of gun deaths by Sex
- Find the number of gun deaths by Race
- How does this compare to the overall population in the US?

In [44]:
import pandas as pd
import numpy as np

df=pd.DataFrame(data,columns=header)
sex=df.groupby('sex').count()
print(sex.iloc[:,0])  #number of gun deaths by Sex

sex
F    14449
M    86349
Name: , dtype: int64


In [89]:
race=df.groupby('race').size()
total=df.count().iloc[0]
print (race)  #number of gun deaths by race

race
Asian/Pacific Islander             1326
Black                             23296
Hispanic                           9022
Native American/Native Alaskan      917
White                             66237
dtype: int64


In [120]:
df_race=pd.DataFrame(race,columns=["death"])
df_race["ratio_death"]=df_race["death"]/total
print (df_race)

                                death  ratio_death
race                                              
Asian/Pacific Islander           1326     0.013155
Black                           23296     0.231116
Hispanic                         9022     0.089506
Native American/Native Alaskan    917     0.009097
White                           66237     0.657126


In [95]:
mapping = { "Asian/Pacific Islander": 15159516 + 674625, "Native American/Native Alaskan": 3739506, "Black": 40250635, "Hispanic": 44618105, "White": 197318956 }

In [118]:
population = pd.DataFrame()
population['race'] = mapping.keys()
population['population'] = mapping.values()
population['ratio_population']=population['population']/population['population'].sum()  
population.set_index("race", inplace=True)
print(population)


                                population  ratio_population
race                                                        
Asian/Pacific Islander            15834141          0.052472
Native American/Native Alaskan     3739506          0.012392
Hispanic                          44618105          0.147859
Black                             40250635          0.133386
White                            197318956          0.653891


In [121]:
result = pd.concat([df_race, population], axis=1, join='inner')
#result.columns=["death","total"]
print(result)

                                death  ratio_death  population  \
race                                                             
Asian/Pacific Islander           1326     0.013155    15834141   
Native American/Native Alaskan    917     0.009097     3739506   
Hispanic                         9022     0.089506    44618105   
Black                           23296     0.231116    40250635   
White                           66237     0.657126   197318956   

                                ratio_population  
race                                              
Asian/Pacific Islander                  0.052472  
Native American/Native Alaskan          0.012392  
Hispanic                                0.147859  
Black                                   0.133386  
White                                   0.653891  


### Assignment

- Reuse the data structure counting deaths by race
- Use the dictionary below that has the actual population of each race
- Compute the rates of gun deaths per race per 100,000 people

mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

In [122]:
result["rates"]=result["death"]/result["population"]*100000
print(result)

                                death  ratio_death  population  \
race                                                             
Asian/Pacific Islander           1326     0.013155    15834141   
Native American/Native Alaskan    917     0.009097     3739506   
Hispanic                         9022     0.089506    44618105   
Black                           23296     0.231116    40250635   
White                           66237     0.657126   197318956   

                                ratio_population      rates  
race                                                         
Asian/Pacific Islander                  0.052472   8.374310  
Native American/Native Alaskan          0.012392  24.521956  
Hispanic                                0.147859  20.220491  
Black                                   0.133386  57.877348  
White                                   0.653891  33.568493  


### Assignment

You may not know this, but over half of all gun deaths are suicide.

- Redo the computation of rates of gun deaths per race per 100,000 people
- This time only count those that are "Homicide"
- How are these different than the previous calculation?


In [143]:
race_h=df[(df.intent=="Homicide")].groupby('race').size()
result_h = pd.concat([race_h, population], axis=1, join='inner')
result_h["rates"]=result_h[0]/result_h["population"]*100000
print(result_h)


                                    0  population  ratio_population      rates
race                                                                          
Asian/Pacific Islander            559    15834141          0.052472   3.530346
Native American/Native Alaskan    326     3739506          0.012392   8.717729
Hispanic                         5634    44618105          0.147859  12.627161
Black                           19510    40250635          0.133386  48.471285
White                            9147   197318956          0.653891   4.635642
