### Where US and Third World Countries Rates Meet

I came up with the idea for this project after seeing the growing of the hashtag __#ThisIsOurLane__ when an American doctor’s tweet response to NRA’s “stay in their lane” went viral.

    US has a very high r
    ate of gun-shot deaths. There are decades of conflict between the medical and science community, who have called the rise in gun violence a public health crisis.

Doctors interpreted National Rifle Association’ tweet (NRA, America’s most influential gun rights group) as foul language and it was the trigger for Doctor Judy Melinek’s post which quickly gained support from other doctors and forensic pathologists.

I have decided to use this topic as my first Python project, analyzing firearm-related death rates in US between 2012-2014. My aim was to find patterns in the data that shows reasons behind the medical’s community reaction to NRA’s tweet. 


# Opening files 

Opening __gun.csv__ file and reading dataset in a list using csv module. 

In [1]:
# Opening and reading file
import csv

f = open('guns.csv', 'r')
data = list(csv.reader(f))

# Displaying the first 5 rows of the data to verify
print(data[:5])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]


###### I have done this process so I could analyse the data properly and also could see the header after excluinding it, very useful to keep track on the ideas.

In [2]:
# Extracting the header and assigining to a variable 
# Excluinding the first row from __data__

headers = data[:1]
data = data[1:]

# Checking if worked
print(headers)  
print(data[:3])

[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]
[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4']]


# Extracting information from columns 

At this point we will be using three columns to analyse the data: 

    - Year
    - Intent
    - Place
    

In [3]:
# This __def__ funtion counts for every column and then we can assing it to a variable reducing coding load 

def count_columns(column_index):

    column_name = [row[column_index] for row in data] 
    column_counts = {}

    for x in column_name:
        if x in  column_counts:
            column_counts[x] =  column_counts[x] + 1
        else:
             column_counts[x]  = 1
    
    return column_counts

year_column = count_columns(1) # Counts numbers of fatalities for each year
place_column = count_columns(9) # Counts place of occurrence
intent_column = count_columns(3) # Counts the intent 
gender_column = count_columns(5) # Counts fatalities filtered by gender

###### The first findings are striking. 

- Below we can see the number of __gunshot-related death__ per year including both males and females fatalities.
- Men counts for more than __80%__ of deaths due firearm.


In [4]:
year_column 

{'2012': 33563, '2013': 33636, '2014': 33599}

In [5]:
gender_column

{'F': 14449, 'M': 86349}

# US gun-related death rates

- In order to better analyse I converted the data to a rate of gunshot deaths per 100000 people.
- For this I had to create a manual dictionary with US population along those years.

     US gun-related death rate is __very high__, specialy compared to others first world countries. A quick Google search is enough to see that these numbers put USA in the __10th__ position in a list of countries with highest rate of firearm-related death. 
     
     Finland is the next first world country to appear in the __23th__ position of the list with a rate of __3.49__ deaths per 100000 people. The diffrence is daunting!

In [6]:
population = {'2012':314000000, '2013':316200000, '2014':318600000}

rate_gunrelated_death = {}

for k,v in year_column.items():
    rate_gunrelated_death[k] = (v / population[k]) * 100000 #dividing keys' values in dicts and multiplying by 100K to find the rate

rate_gunrelated_death

{'2012': 10.688853503184713,
 '2013': 10.637571157495257,
 '2014': 10.545825486503453}

# Gun-related death spikes in summer period

Data shows a pattern, number of fatalities have a spike during summer time.

I have used the code below to look closely, using the __month__ and __year__ columns of data to create a __datetime__. 

__Months__ are represented by the second index within the list being: 1 = January; 2 = February; 3 = March ... and so on.



In [7]:
# Creating a datetime.datetime object for each row
import datetime

dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

Counted up how many gun deaths occurred in each month and year.

###### Now we can clearly see the spark in gunshot deaths between months of spring and summer.

In [8]:
date_counts = {}

for x in dates:
    if x in date_counts:
        date_counts[x] = date_counts[x] + 1
    else:
        date_counts[x]  = 1
        
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

# Suicide counts for 63% of gunshot deaths

The next code was where it really grabbed my attetion. 
- More than half of gun-related fatalities between 2012-2014 were __suicides__. 
- Firearm-related suicide rate was 6,6 between 2012-2014, which puts __USA__ on the top of the list, next were __Montenegro___ (6,49), __Uruguay__ (4,78) and __Finland__ (2,94).


In [9]:
intent_column

suicide_percentage = intent_column['Suicide']/100798

print(intent_column)
round(suicide_percentage, 2) #rounding percentage two two decimals 

{'NA': 1, 'Homicide': 35176, 'Undetermined': 807, 'Accidental': 1639, 'Suicide': 63175}


0.63

In [10]:
population = {'2012':314000000, '2013':316200000, '2014':318600000}

# Creating a dict with number of suicides filtered by year
intent = [row[3] for row in data]
years = [row[1] for row in data]

intent_counts = {}

for i, year in enumerate(years):  # same than k, v
    if year not in intent_counts:
        intent_counts[year] = 0
    if intent[i] == 'Suicide':
        intent_counts[year] += 1

# Finding rates
for k,v in intent_counts.items():
    intent_counts[k] = (v / population[k]) * 100000


intent_counts 

{'2012': 6.581528662420381,
 '2013': 6.696710942441492,
 '2014': 6.696170747018204}

# 80% of __Farm__'s and 75% of __Home__'s gunshot-related deaths were suicide

With this code my aim was to find out the __Suicide__ percentage filtered by __place__ that it occurred. The percentage of suicide remains very high along all the places analysed, exception in __Street__ which 79,5% were homicides.

In [11]:
# Created a def function that can worked for every element whithin the columns we want to compared

def comparation_reverse(place_index, place_name):
    
    place = [row[place_index] for row in data]
    intents = [row[3] for row in data]

    column_comparation_intent_counts = {}

    for i,intent in enumerate(intents):  
        if intent not in column_comparation_intent_counts:
            column_comparation_intent_counts[intent] = 0
        if place[i] == place_name:
            column_comparation_intent_counts[intent] += 1

# Fiding rates 
    for k,v in column_comparation_intent_counts.items():
        column_comparation_intent_counts[k] = (v / place_column[place_name]) * 100 #want to know how the total deaths are divided
                                                                                    #by intent
    return column_comparation_intent_counts

farm_percentage = comparation_reverse(9, 'Farm')
home_percentage = comparation_reverse(9, 'Home')
school_inst_percentage = comparation_reverse(9, 'School/instiution')
sports_percentage = comparation_reverse(9, 'Sports')
street_percentage = comparation_reverse(9, 'Street')


In [12]:
farm_percentage

{'Accidental': 2.7659574468085104,
 'Homicide': 16.170212765957448,
 'NA': 0.0,
 'Suicide': 80.42553191489363,
 'Undetermined': 0.6382978723404255}

In [13]:
home_percentage

{'Accidental': 1.5259729524187415,
 'Homicide': 22.506034454253875,
 'NA': 0.0,
 'Suicide': 75.08349039447145,
 'Undetermined': 0.8845021988559336}

In [14]:
school_inst_percentage

{'Accidental': 1.639344262295082,
 'Homicide': 53.80029806259314,
 'NA': 0.0,
 'Suicide': 43.96423248882265,
 'Undetermined': 0.5961251862891207}

In [15]:
sports_percentage

{'Accidental': 3.90625,
 'Homicide': 17.1875,
 'NA': 0.0,
 'Suicide': 76.5625,
 'Undetermined': 2.34375}

In [16]:
street_percentage 

{'Accidental': 0.6994888350820554,
 'Homicide': 79.51753205990494,
 'NA': 0.0,
 'Suicide': 19.558783965563624,
 'Undetermined': 0.22419513944937675}