# 1. Introducing US Gun Deaths Data

Analyzing data on gun deaths in the US. The data looks like this -

|year|month|intent|police|sex|age|race|hispanic|place|education
:---:|:---:|:----:|:----:|:---:|:---:|:---:|:---:|:---:|:----:
1|2012|1|Suicide|0|M|34.0|Asian/Pacific Islander|100|Home|4.0
2|2012|1|Suicide|0|F|21.0|White|100|Street|3.0
3|2012|1|Suicide|0|M|60.0|White|100|Other specified|4.0
4|2012|2|Suicide|0|M|64.0|White|100|Home|4.0
5|2012|2|Suicide|0|M|31.0|White|100|Other specified|2.0

Fields in the data - 
* `intent` -- the intent of the perpetrator of the crime. This can be `Suicide`, `Accidental`, `NA`, `Homicide`, or `Undetermined`.
* `police` -- whether a police officer was involved with the shooting. Either `0` (false) or `1` (true).
* `race` -- Either `Asian/Pacific Islander`, `Native American/Native Alaskan`, `Black`, `Hispanic`, or `White`.
* `hispanic` -- a code indicating the Hispanic origin of the victim.
* `education` -- educational status of the victim. Ranges from `1` (which is encoded for 'Less than high school') to `4` (for 'at least graduated from college). `5` is 'Not Available'.

**Instructions**
* Read the dataset in as a list using the [`csv`](https://docs.python.org/3/library/csv.html) module.
* Call [`list()`](https://docs.python.org/3/library/functions.html#func-list) on the result to get a list of all the data in the file.
* Display the first `5` rows of `data` to verify everything.

In [16]:
import csv
csv_file = open("guns.csv", "r")
csv_raw_data = csv.reader(csv_file)

data = list(csv_raw_data)
csv_file.close()

data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

# 2. Removing Headers From A List Of Lists

**Instructions**

* Extract the first row of `data`, and assign it to the variable `headers`.
* Remove the first row from `data`.
* Display `headers`.
* Display the first `5` rows of `data` to verify that you removed the header row properly.

_P.S. If we don't use `deepcopy()`, `headers` will be assigned with the first datapoint after execution of the block. Since in Python, `=` only assigns the reference and we are modifying `data` after assigning it._ 

In [17]:
from copy import deepcopy
headers = deepcopy(data[0])
data = data[1:]
print(headers)
print(data[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


# 3. Counting Gun Deaths By Year

**Instructions**

* Use a list comprehension to extract the `year` column from `data`.
* Create an empty dictionary called `year_counts`.
* Loop through each element in `years` and increment the count value of the currosponding year in the dictionary.

In [18]:
years = [row[1] for row in data]
year_counts = {}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
print(year_counts)

{'2014': 33599, '2012': 33563, '2013': 33636}


# 4. Exploring Gun Deaths By Month And Year

It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a [`datetime.datetime`](https://docs.python.org/3/library/datetime.html#datetime-objects) object using the year and month columns.

**Instructions**

* Use a list comprehension to create a `datetime.datetime` object for each row. Assign the result to `dates`.
* Display the first 5 rows in `dates` to verify everything worked.
* Count up how many times each unique `date` occurs in `dates`. Assign the result to `date_counts`.
* Display `date_counts`.

In [19]:
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
print(dates[:5])
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
print("date_counts contents:")
print(date_counts)

[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0), datetime.datetime(2012, 2, 1, 0, 0)]
date_counts contents:
{datetime.datetime(2014, 8, 1, 0, 0): 2970, datetime.datetime(2012, 2, 1, 0, 0): 2357, datetime.datetime(2012, 5, 1, 0, 0): 2999, datetime.datetime(2013, 2, 1, 0, 0): 2375, datetime.datetime(2013, 5, 1, 0, 0): 2806, datetime.datetime(2012, 10, 1, 0, 0): 2733, datetime.datetime(2013, 4, 1, 0, 0): 2798, datetime.datetime(2012, 3, 1, 0, 0): 2743, datetime.datetime(2014, 10, 1, 0, 0): 2865, datetime.datetime(2014, 6, 1, 0, 0): 2931, datetime.datetime(2012, 6, 1, 0, 0): 2826, datetime.datetime(2013, 12, 1, 0, 0): 2765, datetime.datetime(2012, 1, 1, 0, 0): 2758, datetime.datetime(2014, 4, 1, 0, 0): 2862, datetime.datetime(2014, 2, 1, 0, 0): 2361, datetime.datetime(2013, 11, 1, 0, 0): 2758, datetime.datetime(2012, 4, 1, 0, 0): 2795, datetime.datetime(2014, 3, 1, 0, 0): 2684, datetime.date

# 5. Exploring Gun Deaths By Race And Sex

**Instructions**

* Count up how many times each item in the `sex` column occurs.
* Count up how many times each item in the `race` column occurs.
* Display `race_counts` and `sex_counts` to verify your work, and see if you can spot any patterns.
* Write a markdown cell detailing what you've learned so far, and what you think might need further examination.

In [20]:
def column_counts(col_num):
    col_values = [row[col_num] for row in data]
    col_counts = {}
    for col_value in col_values:
        if col_value in col_counts:
            col_counts[col_value] += 1
        else:
            col_counts[col_value] = 1
    return col_counts

race_counts = column_counts(7)
sex_counts = column_counts(5)

print(race_counts)
print(sex_counts)

{'White': 66237, 'Black': 23296, 'Native American/Native Alaskan': 917, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022}
{'F': 14449, 'M': 86349}


Notes:
1. A lot of variation in the death counts in different races. Whites have significantly more death count in them and Asian/Pacific Islanders have minimum deaths. This might be because of the difference in the racial distribution in US (i.e. most of the population being white and Native American/Native Alaskan being a minority).
2. Female deaths are significantly lower than those in males, which might be because of less inclusion of females in the scenarios where guns are used (e.g. gang-wars and stuff - trying not to be to steriotypical).
3. It might be worthwhile to note the trend of cross-section of these two features and see if they follow the same pattern (e.g. if white male to white female ratio is the same as that of male to female ratio, or if female distribution in all the races is identical to that of the general population).

# 6. Reading In A Second Dataset

We explored gun deaths by race in the past screen. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data from `census.csv` which is about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

Data -

|Id|Year|Id.1|Sex|Id.2|Hispanic Origin|Id.3|Id2|Geography|Total|Race Alone - White|Race Alone - Hispanic|Race Alone - Black or African American|Race Alone - American Indian and Alaska Native|Race Alone - Asian|Race Alone - Native Hawaiian and Other Pacific Islander|Two or More Races
:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:
0|cen42010|April 1, 2010 Census|totsex|Both Sexes|tothisp|Total|0100000US|NaN|United States|308745538|197318956|44618105|40250635|3739506|15159516|674625|6984195

**Instructions**

* Read in `census.csv`, and convert to a list of lists. Assign the result to the `census` variable.
* Display `census` to verify your work.

In [21]:
with open("census.csv") as f:
    census = list(csv.reader(f))
print(census)

[['Id', 'Year', 'Id', 'Sex', 'Id', 'Hispanic Origin', 'Id', 'Id2', 'Geography', 'Total', 'Race Alone - White', 'Race Alone - Hispanic', 'Race Alone - Black or African American', 'Race Alone - American Indian and Alaska Native', 'Race Alone - Asian', 'Race Alone - Native Hawaiian and Other Pacific Islander', 'Two or More Races'], ['cen42010', 'April 1, 2010 Census', 'totsex', 'Both Sexes', 'tothisp', 'Total', '0100000US', '', 'United States', '308745538', '197318956', '44618105', '40250635', '3739506', '15159516', '674625', '6984195']]


# 7. Computing Rates Of Gun Deaths Per Race

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per `100000` people in each race, we'll need to divide the total number of gun deaths by the population of each race and then multiply the result with `100000`. For example, we carry out this calculation in case of white, we get the number `33.56`, which we can interpret as "`33.56` out of every `100000` people in the `White` census race category in the US were killed by guns between `2012` and `2014`".

**Instructions**

* Manually create a dictionary, `mapping` that maps each key from `race_counts` to the population count of the race from `census`.
* Create an empty dictionary, `race_per_hundredk`.
* Loop through each key in `race_counts` and populate it with the rate of gun deaths per 100k for each race.
* Print `race_per_hundredk`.

In [24]:
mapping = {}
mapping['Asian/Pacific Islander'] = int(census[1][15]) + int(census[1][16]) 
mapping['Black'] = int(census[1][13])
mapping['Native American/Native Alaskan'] = int(census[1][14]) 
mapping['Hispanic'] = int(census[1][12])
mapping['White'] = int(census[1][11])

race_per_hundredk = {}
for race in race_counts:
    race_per_hundredk[race] = race_counts[race] / mapping[race] * 100000

print(race_per_hundredk)

{'White': 148.45318957405297, 'Black': 622.9699858751396, 'Asian/Pacific Islander': 17.313372033811998, 'Native American/Native Alaskan': 6.049005786200562, 'Hispanic': 22.41455321139654}


# 8. Filtering By Intent

We can filter our results, and restrict them to the `Homicide` intent. This will tell us what the gun-related murder rate per `100000` people in each racial category is. In order to do this, we'll need to redo our work in generating `race_counts`, but only count rows where the `intent` was `Homicide`.

**Instructions**

* Extract the `intent` column using a list comprehension. The `intent` column is the fourth column in `data`.
* Extract the `race` column using a list comprehension. The `race` column is the eighth column in `data`.
* Create an empty dictionary called `homicide_race_counts`.
* Use the [`enumerate()`](https://docs.python.org/3/library/functions.html#enumerate) function to loop through each item in `races`. The position should be assigned to the loop variable `i`, and the value to the loop variable `race`.
* When you're done, `homicide_race_counts` should have one key for each of the racial categories in `data`. The associated value should be the number of gun deaths by homicide for that race.

In [25]:
intents = [row[3] for row in data]
races = [row[7] for row in data]

homicide_race_counts = {}

for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_race_counts:
            homicide_race_counts[race] += 1
        else:
            homicide_race_counts[race] = 1

print(homicide_race_counts)

{'White': 9147, 'Black': 19510, 'Native American/Native Alaskan': 326, 'Asian/Pacific Islander': 559, 'Hispanic': 5634}


Notes:
todo...

# 9. Next Steps

Here are some potential next steps:

* Figure out the link, if any, between month and homicide rate.
* Explore the homicide rate by gender.
* Explore the rates of other intents, like `Accidental`, by gender and race.
* Find out if gun death rates correlate to location and education.