## Introducing US Gun Deaths Data

In [5]:
import csv
data = list(csv.reader(open("guns.csv")))
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

I read our data into the list of lists data. Each inner list in data represents a single row. Each item in the inner lists represents a single column for that row.



## Removing Headers From A List Of Lists

The first item in the data list is a header row. In order to analyze the data properly, I'll have to remove the header row, which contains the names of each column. I can remove this using list slicing.

In [6]:
headers = data[0]
data = data[1:]



## Counting Gun Deaths By Year

The year column contains information on the year in which gun deaths occurred. I can use this column to calculate how many gun deaths happened in each year.

I can perform this operation by creating a dictionary, then keeping count in the dictionary of how many times each element occurs in the year column.

In [8]:
years = [row[1] for row in data]
years_counts = {}
for item in years:
    if item in years_counts:
        years_counts[item] += 1
    else:
        years_counts[item] = 1
years_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year. In order to do this, I'll have to create a datetime.datetime object using the year and month columns. I'll then be able to count up gun deaths by date.

## Exploring Gun Deaths By Month And Year

In [9]:
import datetime
dates = [datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1) for row in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [10]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Exploring Gun Deaths By Race And Sex

The sex and race columns contain potentially interesting information on how gun deaths in the US vary by gender and race. 

In [11]:
sex_counts = {}
race_counts = {}

for row in data:
    if row[5] in sex_counts:
        sex_counts[row[5]] += 1
    else:
        sex_counts[row[5]] = 1
        

    if row[7] in race_counts:
        race_counts[row[7]] += 1
    else:
        race_counts[row[7]] = 1

race_counts
#sex_counts



{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

I found that the most number of victims of Gun deaths in the US are male and in White race.

There is a minor seasonal correlation with gun deaths peaking in the summer and declining in the winter.  It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender.

My analysis only gives us the total number of gun deaths by race in the US. Unless I know the proportion of each race in the US, I won't be able to meaningfully compare those numbers. What I really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, I import some census data to read in data about what percentage of the US population falls into each racial category.

## Reading Census dataset

In [12]:
census = list(csv.reader(open("census.csv", "r")))
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## Computing Rates Of Gun Deaths Per Race

Earlier, I computed the number of gun deaths per race, and created a dictionary, race_counts.
In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, I'll need to divide the total number of gun deaths by the population of each race. This gives us the percentage chance that a given person in the White census race category would have been killed by a gun in the US from 2012 to 2014. If I do this computation, I'll see that the rate is a very small number, 0.0003356849303419181. It's for this reason that it's typical to express crime statistics as the "rate per 100000". This tells us the number of people in a given group out of every 100000 that were killed by guns in the US. To get this, I just multiply by 100000. This gives us 33.56, which I can interpret as "33.56 out of every 100000 people in the White census race category in the US were killed by guns between 2012 and 2014".

I'll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly differently in census and in data. I'll need to manually construct a dictionary that allows me to map between them, and perform the division.

Here's a list of the race name in data, and the corresponding race name in census:

Asian/Pacific Islander -- Race Alone - Asian plus Race Alone - Native Hawaiian and Other Pacific Islander.
Black -- Race Alone - Black or African American.
Hispanic -- Race Alone - Hispanic
Native American/Native Alaskan -- Race Alone - American Indian and Alaska Native
White -- Race Alone - White

I'll need to create a dictionary that has each race name from data as a key, and has the population count for the races from census as the values.

In [13]:
mapping = {"Asian/Pacific Islander":15834141, "Black":40250635, "Native American/Native Alaskan":3739506, "Hispanic":44618105, "White":197318956}
race_per_hundredk = {}
for key in race_counts:
    race_per_hundredk[key] = race_counts[key]*100000/mapping[key]
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.877347773519595,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## Filtering By Intent

I can filter our results, and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, I'll need to redo our work in generating race_counts, but only count rows where the intent was Homicide.

I can do this by first extracting the intent column, then using the enumerate() function to loop through each index and value in the race column. If the value in the same position in intents is Homicide, I'll count the value in the race column.

Finally, I'll use the mapping dictionary to convert from raw counts to rates.

In [14]:
intents = [row[3] for row in data]
races = [row[7] for row in data]
homicide_race_counts = {}
for i, race in enumerate(races):
    if intents[i] == "Homicide":
        if race not in homicide_race_counts:
            homicide_race_counts[race] = 1
        else:
            homicide_race_counts[race] += 1
homicide_race_counts

for race in homicide_race_counts:
    homicide_race_counts[race] = homicide_race_counts[race]*100000/mapping[race]
    
homicide_race_counts

{'Asian/Pacific Islander': 3.5303462309701548,
 'Black': 48.47128498718095,
 'Hispanic': 12.627161104219912,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

I found that the most number of victims of Gun deaths in the US are in Black race by considering the proportion of each race in the US.

 I restricted the gun-related murder rate to the Homicide intentin in each racial category. In this case again , the highest number of victioms is in Black race.