**ANALYZING US GUN DEATHS 2012-2014**

**Introducing the Data**

In [1]:
import csv
file = open('guns.csv','r')
reader = csv.reader(file)
data = list(reader)
data[0:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

**Removing Headers from a List of Lists**

In [2]:
headers = data[0:1]
data = data[1:]
(headers)

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education']]

In [3]:
data[:5]

[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'],
 ['5',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '31',
  'White',
  '100',
  'Other specified',
  '2']]

**Counting Gun Deaths By Year**

In [4]:
years = [ row[1] for row in data ]

year_counts = {}
for year in years:
    if year not in year_counts:
        year_counts[year] = 1
    else:  
        year_counts[year] += 1

year_counts

{'2012': 33563, '2013': 33636, '2014': 33599}

**Exploring Gun Deaths By Month And Year**

In [5]:
import datetime
dates = [ datetime.datetime(year = int(row[1]), month = int(row[2]), day = 1) for row in data ]
dates[0:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

In [6]:
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1

date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

**Sorting Data by Race and Sex**

In [7]:
sexes = [ row[5] for row in data ]
sex_counts = {}
for sex in sexes:
    if sex in sex_counts:
        sex_counts[sex] += 1
    else: 
        sex_counts[sex] = 1
        
sex_counts

{'F': 14449, 'M': 86349}

In [8]:
races = [ row[7] for row in data ]
race_counts = {}
for race in races:
    if race in race_counts:
        race_counts[race] += 1
    else: 
        race_counts[race] = 1
        
race_counts

{'Asian/Pacific Islander': 1326,
 'Black': 23296,
 'Hispanic': 9022,
 'Native American/Native Alaskan': 917,
 'White': 66237}

**Findings so Far**

Gun deaths in the US seem to disproportionately affect men vs women. They also seem to disproportionately affect minorities, although having some data on the percentage of each race in the overall US population would help.

There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter. It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender.

**Adding a second Dataset: Census**

In [9]:
import csv

with open('census.csv', 'r') as file:
    reader = csv.reader(file)
    census = list(reader)
    
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

**Gun Deaths By Race Per 100K People** 

In [10]:
mapping = {
    "Asian/Pacific Islander": 15159516 + 674625,
    "Native American/Native Alaskan": 3739506,
    "Black": 40250635,
    "Hispanic": 44618105,
    "White": 197318956
}

race_per_hundredk = {}
for key, value in race_counts.items():
    race_per_hundredk[key] = (value / mapping[key]) * 100000

race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

**Filter the Data by Intent for Homocides**

In [11]:
intents = [ row[3] for row in data ]
races = [ row[7] for row in data ]
homicide_race_counts = {}
for i, race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1

race_per_hundredk = {}
for key, value in homicide_race_counts.items():
    race_per_hundredk[key] = (value / mapping[key]) * 100000

race_per_hundredk

{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

**Findings**

It appears that gun related homicides in the US disproportionately affect people in the Black and Hispanic racial categories.

Some areas to investigate further:

The link between month and homicide rate.
Homicide rate by gender.
The rates of other intents by gender and race.
Gun death rates by location and education.

**Additional Code**

**Create Function to Search for Any Criteria with Any Intent**

In [12]:
def explore_deaths(criteria_column, intent_string):
    intents = [ row[3] for row in data ]
    criteria = [ row[criteria_column] for row in data ]
    intent_criteria_counts = {}
    for row in data:
        if row[3] == intent_string:
            if row[criteria_column] in intent_criteria_counts:
                intent_criteria_counts[row[criteria_column]] += 1
            else:
                intent_criteria_counts[row[criteria_column]] = 1

    criteria_per_hundredk = {}
    for key, value in intent_criteria_counts.items():
        criteria_per_hundredk[key] = (value / int(census[1][9])) * 100000
      
    return criteria_per_hundredk

**Link Between Months and Homicide Rate Per 100K**

In [13]:
months_and_homicides = explore_deaths(2, "Homicide")
months_and_homicides

{'01': 0.9162885456825615,
 '02': 0.7054352960398087,
 '03': 0.9004178709782682,
 '04': 0.9214708068104939,
 '05': 0.963900569795441,
 '06': 1.013779833151791,
 '07': 1.0588007267007045,
 '08': 1.0121603765493121,
 '09': 0.9606616565904832,
 '10': 0.9613094392314747,
 '11': 0.9454387645271817,
 '12': 1.0335372037020336}

**Link Between Gender and Homicide Rate Per 100K**

In [14]:
gender_and_homicides = explore_deaths(5, "Homicide")
gender_and_homicides

{'F': 1.7402680650238256, 'M': 9.652933024735729}

**Link Between Race, Month, Gender and Accidental Rate Per 100K**

In [15]:
race_and_accidentals = explore_deaths(7, "Accidental")
race_and_accidentals

{'Asian/Pacific Islander': 0.003886695845949359,
 'Black': 0.10623635312261581,
 'Hispanic': 0.04696424147188809,
 'Native American/Native Alaskan': 0.0071256090509071585,
 'White': 0.3666449748012229}

In [16]:
month_and_accidentals = explore_deaths(2, "Accidental")
month_and_accidentals

{'01': 0.049231480715358544,
 '02': 0.04113419770296405,
 '03': 0.04340143694643451,
 '04': 0.03174134940858643,
 '05': 0.03724750185701469,
 '06': 0.03627582789552736,
 '07': 0.04825980675387121,
 '08': 0.053118176561307906,
 '09': 0.03821917581850203,
 '10': 0.04242976298494717,
 '11': 0.05182261127932479,
 '12': 0.0579765463687446}

In [17]:
gender_and_accidentals = explore_deaths(5, "Accidental")
gender_and_accidentals

{'F': 0.07060830786808002, 'M': 0.4602495664245033}

**Link Between Gun Deaths and Location, Education**

**Create Function that Doesn't Discriminate for Intent**

In [18]:
def explore_all_deaths(criteria_column):
    intents = [ row[3] for row in data ]
    criteria = [ row[criteria_column] for row in data ]
    intent_criteria_counts = {}
    for row in data:
        # This is basically the same function except this line removed/adjusted.
        if row[criteria_column] in intent_criteria_counts:
            intent_criteria_counts[row[criteria_column]] += 1
        else:
            intent_criteria_counts[row[criteria_column]] = 1

    criteria_per_hundredk = {}
    for key, value in intent_criteria_counts.items():
        criteria_per_hundredk[key] = (value / int(census[1][9])) * 100000
      
    return criteria_per_hundredk

**Gun Death Rates for Location**

In [19]:
locations = [ row[9] for row in data ]

location_counts = {}
for location in locations:
    if location not in location_counts:
        location_counts[location] = 1
    else:  
        location_counts[location] += 1

# COUNTS 
location_counts
# RATES
location_and_deaths = explore_all_deaths(9)
location_and_deaths

{'Farm': 0.15222892063301657,
 'Home': 19.590890411507743,
 'Industrial/construction': 0.08032504748295342,
 'NA': 0.4482655875661594,
 'Other specified': 4.4538295481374695,
 'Other unspecified': 2.8719443388360806,
 'Residential institution': 0.06574993806064333,
 'School/instiution': 0.21733107605266835,
 'Sports': 0.041458089023459835,
 'Street': 3.611712114848442,
 'Trade/service area': 1.1138622511849872}

**Gun Death Rates for Education**

In [20]:
educations = [ row[9] for row in data ]

education_counts = {}
for education in educations:
    if education not in education_counts:
        education_counts[education] = 1
    else:  
        education_counts[education] += 1

# COUNTS 
education_counts
# RATES
education_and_deaths = explore_all_deaths(10)
education_and_deaths

{'1': 7.068280287179406,
 '2': 13.903682714922345,
 '3': 7.021963828348509,
 '4': 4.193097035138367,
 '5': 0.4434072177587227,
 'NA': 0.017166239986276338}