
**In this project, we'll explore the dataset, and try to find patterns in the demographics of the victims. Our first step is to read the data in and take a look at it.**

In [2]:
import csv
f = open("guns.csv","r")
csvreader = csv.reader(f)
data = list(csvreader)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


** In order to analyze the data properly, we'll have to remove the header row, which contains the names of each column. We can remove this using list slicing.**


In [3]:
headers = data[0]
data = data[1:]
print(headers)
print("\n")
print(data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


**The year column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.**


In [5]:
years = [item[1] for item in data]
year_counts={}
for k in years:
    if k not in year_counts:
        year_counts[k] = 1
    else:
        year_counts[k] += 1
print(year_counts)        

{'2013': 33636, '2012': 33563, '2014': 33599}


**It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a datetime.datetime object using the year and month columns. We'll then be able to count up gun deaths by date, like we did by year in the last screen.**

In [10]:
import datetime
dates = [datetime.datetime(year = int(item[1]),month = int(item[2]),day=1) for item in data]
print(dates[:5])

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]


**The sex and race columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.**

In [12]:
sex_lst = [item[5] for item in data]
sex_counts={}
for k in sex_lst:
    if k not in sex_counts:
        sex_counts[k] = 1
    else:
        sex_counts[k] += 1
        
race_lst = [item[7] for item in data]
race_counts={}
for k in race_lst:
    if k not in race_counts:
        race_counts[k] = 1
    else:
        race_counts[k] += 1 
        
print(sex_counts)
print("\n")
print(race_counts)
        

{'M': 86349, 'F': 14449}


{'White': 66237, 'Native American/Native Alaskan': 917, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022, 'Black': 23296}


**However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.**

The data contains information on the total population of the US, as well as the total population of each racial group in the US. The data is stored in the ```census.csv``` file, and only consists of two rows.

In [13]:
f = open("census.csv","r")
csvreader = csv.reader(f)
census = list(csvreader)
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]




**In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, we'll need to divide the total number of gun deaths by the population of each race. From the census dataset, we know that the number of people in the White racial category is 197318956. We'd divide 66237 by 197318956:**

white_gun_death_rate = 66237 / 197318956

This gives us the percentage chance that a given person in the White census race category would have been killed by a gun in the US from 2012 to 2014. If you do this computation, you'll see that the rate is a very small number, 0.0003356849303419181. It's for this reason that it's typical to express crime statistics as the "rate per 100000". This tells you the number of people in a given group out of every 100000 that were killed by guns in the US. To get this, we just multiply by 100000:

rate_per_hundredk = 0.0003356849303419181 * 100000

This gives us 33.56, which we can interpret as "33.56 out of every 100000 people in the White census race category in the US were killed by guns between 2012 and 2014".

We'll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly differently in census and in data. We'll need to manually construct a dictionary that allows us to map between them, and perform the division.

Here's a list of the race name in data, and the corresponding race name in census:
```
- Asian/Pacific Islander -- Race Alone - Asian plus Race Alone - Native Hawaiian and Other Pacific Islander.
- Black -- Race Alone - Black or African American.
- Hispanic -- Race Alone - Hispanic
- Native American/Native Alaskan -- Race Alone - American Indian and Alaska Native
- White -- Race Alone - White
    
```    

We'll need to create a dictionary that has each race name from data as a key, and has the population count for the races from census as the values.

In [21]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

**We can filter our results, and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, we'll need to redo our work in generating race_counts, but only count rows where the intent was Homicide.**

We can do this by first extracting the intent column, then using the enumerate() function to loop through each index and value in the race column. If the value in the same position in intents is Homicide, we'll count the value in the race column.

Finally, we'll use the mapping dictionary to convert from raw counts to rates.

In [22]:
intents = [item[3] for item in data]
races = [item[7] for item in data]
homicide_race_counts={}
for i,race in enumerate(races):
    if intents[i] == 'Homicide':
        if race not in homicide_race_counts:
            homicide_race_counts[race] = 1
        else:
            homicide_race_counts[race] += 1 
            
race_per_hundredk = {}
for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk   

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

# Hence we find the Gun Deaths in US  