# Analyzing Gun Deaths In America

## Objective

Use list comprehensions, modules, and the datetime package to find patterns in Gun Deaths in America data.

## Data Set

The data set contains information on gun deaths in the US from 2012 to 2014.
The data set came from [FiveThirtyEight](http://fivethirtyeight.com/) and can be found [here](https://github.com/fivethirtyeight/guns-data)

Each row in the data set represents a single fatality.
The column contain demographics and other information about the victim.
Here are the first few rows of the data set:

|   | year | month | intent  | police | sex | age  | race                   | hispanic | place           | education |
|---|------|-------|---------|--------|-----|------|------------------------|----------|-----------------|-----------|
| 1 | 2012 | 1     | Suicide | 0      | M   | 34.0 | Asian/Pacific Islander | 100      | Home            | 4.0       |
| 2 | 2012 | 1     | Suicide | 0      | F   | 21.0 | White                  | 100      | Street          | 3.0       |
| 3 | 2012 | 1     | Suicide | 0      | M   | 60.0 | White                  | 100      | Other specified | 4.0       |
| 4 | 2012 | 2     | Suicide | 0      | M   | 64.0 | White                  | 100      | Home            | 4.0       |
| 5 | 2012 | 2     | Suicide | 0      | M   | 31.0 | White                  | 100      | Other specified | 2.0       |

Here's an explanation of each column:

| Column    | Description                                                                                                                                                                                                           |
|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| --        | this is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row.                                                                                    |
| year      | the year in which the fatality occurred.                                                                                                                                                                              |
| month     | the month in which the fatality occurred.                                                                                                                                                                             |
| intent    | the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.                                                                                                           |
| police    | whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).                                                                                                                                |
| sex       | the gender of the victim. Either M or F.                                                                                                                                                                              |
| age       | the age of the victim.                                                                                                                                                                                                |
| race      | the race of the victim. Either Asian/Pacific Islander, Native American/Native Alaskan, Black, Hispanic, or White.                                                                                                     |
| hispanic  | a code indicating the Hispanic origin of the victim.                                                                                                                                                                  |
| place     | where the shooting occurred. Has several categories, which you're encouraged to explore on your own.                                                                                                                  |
| education | educational status of the victim. Can be one of the following: 1 -- Less than High School, 2 -- Graduated from High School or equivalent, 3 -- Some College, 4 -- At least graduated from College, 5 -- Not available |


## Explore the Data

In [4]:
import csv
with open("csv/guns.csv", "r") as f:
    reader = csv.reader(f)
    data = list(reader)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


In order to analyze data properly, remove the header row, which contains the names of each column

In [5]:
headers = data[:1]
data = data[1:]
print(headers)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


## Computing Gun Deaths By Year

Calculate how many gun deaths happened in each year.

In [6]:
years = [row[1] for row in data]

year_counts = {}
for each in years:
    if each not in year_counts:
        year_counts[each] = 0
    year_counts[each] += 1

year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

## Exploring Gun Deaths By Month And Year

**Observations :**
It looks like gun deaths didn't change much by year from 2012 to 2014

Let's look if gun deaths in the US change by month and year

In [8]:
import datetime

dates = [datetime.datetime(year=int(each[1]), month=int(each[2]), day=1) for each in data]
dates

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 3, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.datetime(2012, 4, 1, 0, 0),
 datetime.da

In [11]:
date_counts = {}

for each in dates:
    if each not in date_counts:
        date_counts[each] = 0
    date_counts[each] += 1

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

## Exploring Gun Deaths By Race And Sex

In [12]:
sexes = [each[5] for each in data]
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 0
    sex_counts[sex] += 1
sex_counts

{'F': 14449, 'M': 86349}

In [14]:
races = [each[7] for each in data]
race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 0
    race_counts[race] += 1
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

### Findings so far

Gun deaths in the US seem to disproportionately affect men vs women. They also seem to disproportionately affect minorities, although having some data on the percentage of each race in the overall US population would help.

There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter. It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender.

## Reading In A Second Dataset

In [17]:
import csv

with open("csv/census.csv", "r") as f:
    reader = csv.reader(f)
    census = list(reader)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

## Calculating Rates of Gun Deaths per Race

Create dictionary that maps each key from race_counts to the population count of the race from census

In [18]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

Express crime statistics as the "rate per 100000 people"
(the number of people in a given group out of every 100000 that were killed by guns in the America)

In [19]:
race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

## Filtering By Intent

Filter the results and restrict them to the 'Homicide' intent.

In [21]:
intents = [each[3] for each in data]
homicide_race_counts = {}
for i,race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1

race_per_hundredk = {}
for k,v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000

race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

### Findings

It appears that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories.