For this project, I'm analyzing gun deaths in the United States using data from https://github.com/fivethirtyeight/guns-data.

In [1]:
import csv
import requests

csv_url = 'https://raw.githubusercontent.com/fivethirtyeight/guns-data/master/full_data.csv'

with requests.Session() as s:
    download = s.get(csv_url)
    decoded_content = download.content.decode('utf-8')
    data = list(csv.reader(decoded_content.splitlines(), delimiter=','))
    

Headers comprise the first row of data. I'll separate them below:

In [2]:
headers = data[0]
data = data[1:]

I want to know how many murders occurred per year. 

I'll start by using list comprehension to create a list of all of the years in the dataset:

In [3]:
years = [row[1] for row in data]

Next, I'll create a dictionary, which I'll populate by matching the years in the list I just created with the gun deaths in those years:

In [4]:
year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:
        year_counts[year] += 1

In [5]:
year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

The number of gun deaths hovers around 33,600 annually for the years 2012, 2013, and 2014. If there are differences between the years, we won't see them at this level.

The next step is to consider the data on a monthly basis. We'll use the same method as we did above, but for months instead of years.

In [6]:
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]

In [7]:
date_counts = {}
for date in dates:
    if date not in date_counts:
        date_counts[date] = 1
    else:
        date_counts[date] += 1

In [8]:
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

I'll do the same thing for sex and race:

In [9]:
sexes = [row[5] for row in data]
sex_counts = {}
for sex in sexes:
    if sex not in sex_counts:
        sex_counts[sex] = 1
    else:
        sex_counts[sex] += 1
        
races = [row[7] for row in data]
race_counts = {}
for race in races:
    if race not in race_counts:
        race_counts[race] = 1
    else:
        race_counts[race] += 1

In [10]:
print(sex_counts, race_counts)

{'M': 86349, 'F': 14449} {'Black': 23296, 'Native American/Native Alaskan': 917, 'Asian/Pacific Islander': 1326, 'Hispanic': 9022, 'White': 66237}


Now that we have the nominal number of gun deaths for each racial and ethnic group, we can get more information by considering them as a proportion of their respective groups.

In other words, we want to know the number of gun deaths for a given race per 100,000 people in that race.

To do this, I've hardcoded some census data into a dictionary:

In [11]:
mapping = {"Asian/Pacific Islander":15834141, "Black":40250635, "Native American/Native Alaskan":3739506, "Hispanic":44618105, "White":197318956}

race_per_hundredK = {}
for race in race_counts:
    race_per_hundredK[race] = (race_counts[race]/mapping[race])*100000
    
print(race_per_hundredK)

{'Black': 57.8773477735196, 'White': 33.56849303419181, 'Hispanic': 20.220491210910907, 'Asian/Pacific Islander': 8.374309664161762, 'Native American/Native Alaskan': 24.521955573811088}


Running the same analysis on observations with homicide as the intent will yield the gun-related murder rate. 

I'll use the enumerate() function, which can find the index for a given observation. This function returns an index and a race. I'll use the index to check the intent for a given observation. If the intent is homicide, I'll increase the count for the race the index refers to by 1.

In [16]:
intents = [row[3] for row in data]

homicide_race_count = {}

for i, race, in enumerate(races):
    if intents[i] == "Homicide":
        if race not in homicide_race_count:
            homicide_race_count[race]=1
        else:
            homicide_race_count[race]+=1
            
hom_per_race_per_hundredK = {}
for race in homicide_race_count:
    hom_per_race_per_hundredK[race] = (homicide_race_count[race]/mapping[race])*100000
    
print(hom_per_race_per_hundredK)


{'Black': 48.471284987180944, 'White': 4.6356417981453335, 'Hispanic': 12.627161104219914, 'Asian/Pacific Islander': 3.530346230970155, 'Native American/Native Alaskan': 8.717729026240365}
