# Analyzing Gun Deaths
## Compiled By: Dana Hagist

In this project, I'll be working with Jupyter notebook, and analyzing data on gun deaths in the US. 
The dataset came from FiveThirtyEight, and can be found https://github.com/fivethirtyeight/guns-data. The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. 

The first row of the data is a header row, which tells you what kind of data is in each column of the CSV file. Each row contains information about the fatality, and the victim. Here's an explanation of each column:
- First is an identifier column, which contains the row number. It's common in CSV files to include a unique identifier for each row, but we can ignore it in this analysis.
- year -- the year in which the fatality occurred.
- month -- the month in which the fatality occurred.
- intent -- the intent of the perpetrator of the crime. This can be Suicide, Accidental, NA, Homicide, or Undetermined.
- police -- whether a police officer was involved with the shooting. Either 0 (false) or 1 (true).
- sex -- the gender of the victim. Either M or F.
- age -- the age of the victim.
- race -- the race of the victim. Either Asian/Pacific Islander, Native- American/Native Alaskan, Black, Hispanic, or White.
- hispanic -- a code indicating the Hispanic origin of the victim.
- place -- where the shooting occurred. Has several categories, which you're encouraged to explore on your own.
- education -- educational status of the victim. Can be one of the following:
    - 1 -- Less than High School
    - 2 -- Graduated from High School or equivalent
    - 3 -- Some College
    - 4 -- At least graduated from College
    - 5 -- Not available

In this project, we'll explore the dataset, and try to find patterns in the demographics of the victims. Our first step is to read the data in and take a look at it.

Task: 
Read the dataset in as a list using the csv module.
- Import the csv module.
- Open the file using the open() function.
- Use the csv.reader() function to load the opened file.

Call list() on the result to get a list of all the data in the file.
- Assign the result to the variable data.

Display the first 5 rows of data to verify everything.

In [1]:
# Solution:
import csv
a = open('guns.csv','r')
b = csv.reader(a)
data = list(b)
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


### Removing the Header Row
In the cell above, we read our data into the list of lists data. Each inner list in data represents a single row. Each item in the inner lists represents a single column for that row.

You will notice above that the first item in the data list is a header row. In order to analyze the data properly, we'll have to remove the header row, which contains the names of each column. We can remove this using list slicing. You can read more about lists here.

Task:
- Extract the first row of data, and assign it to the variable headers.
- Remove the first row from data.
- Display headers.
- Display the first 5 rows of data to verify that you removed the header row properly.

In [2]:
# Solution: 
headers = data[0]
data_noheader = data[1:]
print(headers)
print(data_noheader[:5])

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]


### Calculating Gun Deaths Per Year

The year column contains information on the year in which gun deaths occurred. We can use this column to calculate how many gun deaths happened in each year.

We can perform this operation by creating a dictionary, then keeping count in the dictionary of how many times each element occurs in the year column.

Task: Use a list comprehension to extract the year column from data.
- Because the year column is the second column in the data, you'll need to get the element at index 1 in each row.
- Assign the result to the variable years.

Create an empty dictionary called year_counts.

Loop through each element in years.
    - If the element isn't a key in year_counts, create it, and set the value to 1.
    - If the element is a key in year_counts, increment the value by one.

Display year_counts to see how many gun deaths occur in each year.

In [3]:
# Solution: 
years = [row[1] for row in data_noheader]
year_counts ={}
for year in years:
    if year in year_counts:
        year_counts[year] += 1
    else:
        year_counts[year] = 1
print(year_counts)

{'2013': 33636, '2014': 33599, '2012': 33563}


### Analyzing Gun Deaths by Month and Year

It looks like gun deaths didn't change much by year from 2012 to 2014. Let's see if gun deaths in the US change by month and year. In order to do this, we'll have to create a datetime.datetime object using the year and month columns. We'll then be about to count up gun deaths by date, like we did by year in the last screen.

As you may recall from an earlier mission, you can create a datetime object by specifying the year, month, and day keyword arguments:


date = datetime(year=2016, month=12, day=1)

We can use the month and year column of data to create a datetime. We'll specify a fixed day because we're missing that column in our data.

If we create a datetime.datetime object for each row, we can then count up how many gun deaths occurred in each month and year using a similar procedure to what we did in the last screen.

Task: Use a list comprehension to create a datetime.datetime object for each row. Assign the result to dates.
- The year column in the second element in each row.
- The month column is the third element in each row.
- Make sure to convert year and month to integers using int().
- Pass year, month, and day=1 into the datetime.datetime() function.

Display the first 5 rows in dates to verify everything worked.

In [4]:
# Solution: 
import datetime
dates = [datetime.datetime(year=int(row[1]),month=int(row[2]),day=1) for row in data_noheader]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

Task: 
Count up how many times each unique date occurs in dates. Assign the result to date_counts.
- This follows a similar procedure to what we did in the last screen with year_counts.

Display date_counts.

In [5]:
# Solution: 
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

### Exploring Gun Deaths by Gender and Race

The sex and race columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.

Task: Count up how many times each item in the sex column occurs.
- Assign the result to sex_counts.

Count up how many times each item in the race column occurs.
- Assign the result to race_counts.

Display race_counts and sex_counts to verify your work, and see if you can spot any patterns.

In [7]:
# Solution: 
sex = [row[5] for row in data_noheader]
sex_counts ={}
for gender in sex:
    if gender in sex_counts:
        sex_counts[gender] += 1
    else:
        sex_counts[gender] = 1
print(sex_counts)

{'M': 86349, 'F': 14449}


In [8]:
# Solution, cont:
races = [row[7] for row in data_noheader]
race_counts = {}
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else:
        race_counts[race] = 1
print(race_counts)

{'Native American/Native Alaskan': 917, 'Hispanic': 9022, 'Asian/Pacific Islander': 1326, 'White': 66237, 'Black': 23296}


Task: Write a markdown cell detailing what you've learned so far, and what you think might need further examination.

#### Solution: 
Some things that seem apparent in the data:
- Appears that there are disproportionately more gun deaths during the Summer months.
- Appears that white, black, and hispanic individuals are involved in most gun deaths, in that order.  However, unclear whether this is disproportionate relative to percentages of each race in our population.
- Males represent a much larger proportion of people involved in gun deaths.

### Exploring Country Demographics

We explored gun deaths by race in the past screen. However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. What we really want to get is a rate of gun deaths per 100000 people of each race. In order to do this, we'll need to read in data about what percentage of the US population falls into each racial category. Luckily, we can import some census data to help us out.

The data contains information on the total population of the US, as well as the total population of each racial group in the US. The data is stored in the census.csv file, and only consists of two rows.

The first row is a header row, and the second row consists of population counts. We'll need to read this file in using the csv.reader() function.

Task: 

Read in census.csv, and convert to a list of lists. Assign the result to the census variable.

Display census to verify your work.

In [12]:
# Solution: 
a = open('census.csv', 'r')
b = csv.reader(a)
census = list(b)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

### Exploring Gun Deaths Per Capita by Race

Earlier, we computed the number of gun deaths per race, and created a dictionary, race_counts, that looked like this:


{
     'Asian/Pacific Islander': 1326,
     'Black': 23296,
     'Hispanic': 9022,
     'Native American/Native Alaskan': 917,
     'White': 66237
}

In order to get from the raw counts of gun deaths by race to a rate of gun deaths per 100000 people in each race, we'll need to divide the total number of gun deaths by the population of each race. From the census dataset, we know that the number of people in the White racial category is 197318956. We'd divide 66237 by 197318956:


white_gun_death_rate = 66237 / 197318956

This gives us the percentage chance that a given person in the White census race category would have been killed by a gun in the US from 2012 to 2014. If you do this computation, you'll see that the rate is a very small number, 0.0003356849303419181. It's for this reason that it's typical to express crime statistics as the "rate per 100000". This tells you the number of people in a given group out of every 100000 that were killed by guns in the US. To get this, we just multiply by 100000:


rate_per_hundredk = 0.0003356849303419181 * 100000

This gives us 33.56, which we can interpret as "33.56 out of every 100000 people in the White census race category in the US were killed by guns between 2012 and 2014".

We'll need to calculate these same rates for each racial category. The only stumbling block is that the racial categories are named slightly differently in census and in data. We'll need to manually construct a dictionary that allows us to map between them, and perform the division. 

Here's a list of the race name in data, and the corresponding race name in census:
- Asian/Pacific Islander -- Race Alone - Asian plus Race Alone - Native Hawaiian and Other Pacific Islander.
- Black -- Race Alone - Black or African American.
- Hispanic -- Race Alone - Hispanic
- Native American/Native Alaskan -- Race Alone - American Indian and Alaska Native
- White -- Race Alone - White

We'll need to create a dictionary that has each race name from data as a key, and has the population count for the races from census as the values.

Task: Manually create a dictionary, mapping that maps each key from race_counts to the population count of the race from census.
- The keys in the dictionary should be Asian/Pacific Islander, Black, Native American/Native Alaskan, Hispanic, and White.
- In the case of Asian/Pacific Islander, you'll need to add the counts from census for Race Alone - Asian, and Race Alone - Native Hawaiian and Other Pacific Islander.

In [13]:
# Solution: 
CAPI = 674625+15159516
CB = 40250635
CNA = 3739506
CH = 44618105
CW = 197318956
mapping = {'Asian/Pacific Islander':CAPI,'Black':CB,'Native American/Native Alaskan':CNA,'Hispanic':CH,'White':CW}
mapping

{'Asian/Pacific Islander': 15834141,
 'Black': 40250635,
 'Hispanic': 44618105,
 'Native American/Native Alaskan': 3739506,
 'White': 197318956}

Task:
- Create an empty dictionary, race_per_hundredk.

- Loop through each key in race_counts.
    - Divide the value associated with the key in race_counts by the value associated with the key in mapping.
    - Multiply by 100000.
    - Assign the result to the same key in race_per_hundredk.
    
- When you're done, race_per_hundredk should contain the rate of gun deaths per 100000 people for each racial category.
- Print race_per_hundredk to verify your work.   

In [15]:
# Solution: 
race_per_hundredk = {}
for k,v in race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000
    
print(race_per_hundredk)

{'Native American/Native Alaskan': 24.521955573811088, 'Hispanic': 20.220491210910907, 'Asian/Pacific Islander': 8.374309664161762, 'White': 33.56849303419181, 'Black': 57.8773477735196}


### Exploring Gun Homicides

We can filter our results, and restrict them to the Homicide intent. This will tell us what the gun-related murder rate per 100000 people in each racial category is. In order to do this, we'll need to redo our work in generating race_counts, but only count rows where the intent was Homicide.

We can do this by first extracting the intent column, then using the enumerate() function to loop through each index and value in the race column. If the value in the same position in intents is Homicide, we'll count the value in the race column.

Finally, we'll use the mapping dictionary to convert from raw counts to rates.

Task: 
- Extract the intent column using a list comprehension. The intent column is the fourth column in data.
    - Assign the result to intents.
- Extract the race column using a list comprehension. The race column is the eighth column in data.
    - Assign the result to races.
- Create an empty dictionary called homicide_race_counts
- Use the enumerate() function to loop through each item in races. The position should be assigned to the loop variable i, and the value to the loop variable race.
    - Check the value at position i in intents.
    - If the value at position i in intents is Homicide:
        - If the key race doesn't exist in homicide_race_counts, create it.
        - Add 1 to the value associated with race in homicide_race_counts.
- When you're done, homicide_race_counts should have one key for each of the racial categories in data. The associated value should be the number of gun deaths by homicide for that race.
- Perform the same procedure we did in the last screen using mapping on homicide_race_counts to get from raw numbers to rates per 100000.
- Display homicide_race_counts to verify your work.

In [17]:
# Solution:
intents = [row[3] for row in data_noheader]
races = [row[7] for row in data_noheader]
homicide_rate_counts = {}
for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race in homicide_rate_counts:
            homicide_rate_counts[race] += 1
        else:
            homicide_rate_counts[race] = 1
homicide_rate_counts

{'Asian/Pacific Islander': 559,
 'Black': 19510,
 'Hispanic': 5634,
 'Native American/Native Alaskan': 326,
 'White': 9147}

In [50]:
# Additional analysis step looking at gun homicide rate per 100000.
race_per_hundredk = {}
for k,v in homicide_rate_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000
    
race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

Task:
- Write up your findings in a markdown cell.
- Write up any next steps you want to pursue with the data in a markdown cell.

#### Solution: 
It appears that gun-related deaths disproportionately impact black and hispanic individuals.

Gun deaths tend to occur more during the Summer months.

It would be interesting to know more about the geographic breakdown of where these deaths are occuring and understand more about other factors impacting deaths.  

Other types of death including suicide would be interesting to understand more about as well.

### Potential Next Steps:

That's it for the guided steps! We recommend exploring the data more on your own.

Here are some potential next steps:
- Figure out the link, if any, between month and homicide rate.
- Explore the homicide rate by gender.
- Explore the rates of other intents, like Accidental, by gender and race.
Find out if gun death rates correlate to location and education.