# Project Description

In this project the gun deaths in the US for the years 2012 to 2014 are analyzed.

The dataset used in this project is provided by [FiveThirtyEight](https://www.fivethirtyeight.com/). The documentation and the dataset can be accessed [here](https://github.com/fivethirtyeight/guns-data).  
Also the Census data is used to evalute gun deaths in relation to race categories. The Census dataset is not aviable online so it is provided wihtin the repository.

# Import Libaries

In [1]:
import io
import os
import urllib.request

from IPython.display import display

import pandas as pd

from datetime import datetime

# Set Global Varibles

In [2]:
URL = 'https://raw.githubusercontent.com/fivethirtyeight/guns-data/master/full_data.csv'
PATH_DATASET = 'data/' + URL.split('/')[-1]

### Link to census dataset is not aviable 
### Therfore it is provided inside the dataset
#URL_CDC = 'https://raw.githubusercontent.com/nprapps/katrina-maps/master/data/populations-estimates/PEP_2014_PEPSR6H/PEP_2014_PEPSR6H_with_ann.csv'
#PATH_DATASET_CDC = 'data/' + URL_CDC.split('/')[-1]
PATH_DATASET_CDC = 'data/census.csv'

# Project Preperation

## Download the data

In [3]:
def download_github_csv_data(url):
    """Download a csv file and stores it in the data folder of the project repository.

    Args:
        URL of the csv file

    Returns:
        None
    """
    ### Create data dir if not exts
    if not os.path.exists('data/'):
        os.makedirs('data/')

    file = urllib.request.urlopen(url)

    df = pd.read_csv(io.TextIOWrapper(file))

    filename = url.split('/')[-1]
    path = 'data/' + filename
    df.to_csv(path, header=True, index=False, sep=',')

"""
Downloads the data to the data folder of a local repository after you run it once you can uncomment this lines.
To prevent the code from downloading the data every time you run the code.
"""            
download_github_csv_data(URL)
### Not avilable at the moment
#download_github_csv_data(URL_CDC)

## Load the guns dataset

In [4]:
### Load the data as df in the var data
data = pd.read_csv(PATH_DATASET)
### Inspect the first 5 rows of the dataset
display(data.head())

Unnamed: 0.1,Unnamed: 0,year,month,intent,police,sex,age,race,hispanic,place,education
0,1,2012,1,Suicide,0,M,34.0,Asian/Pacific Islander,100,Home,4.0
1,2,2012,1,Suicide,0,F,21.0,White,100,Street,3.0
2,3,2012,1,Suicide,0,M,60.0,White,100,Other specified,4.0
3,4,2012,2,Suicide,0,M,64.0,White,100,Home,4.0
4,5,2012,2,Suicide,0,M,31.0,White,100,Other specified,2.0


## Load the cdc dataset

In [5]:
### Load the data as df in the var data
data_census = pd.read_csv(PATH_DATASET_CDC)
### Inspect the first 5 rows of the dataset
display(data_census.head())

Unnamed: 0,Id,Year,Id.1,Sex,Id.2,Hispanic Origin,Id.3,Id2,Geography,Total,Race Alone - White,Race Alone - Hispanic,Race Alone - Black or African American,Race Alone - American Indian and Alaska Native,Race Alone - Asian,Race Alone - Native Hawaiian and Other Pacific Islander,Two or More Races
0,cen42010,"April 1, 2010 Census",totsex,Both Sexes,tothisp,Total,0100000US,,United States,308745538,197318956,44618105,40250635,3739506,15159516,674625,6984195


# Analysis

## Analysis guns dataset

### Inspect deaths per year

In [6]:
print(data['year'].value_counts())

2013    33636
2014    33599
2012    33563
Name: year, dtype: int64


### Generate a column year_month that contains a reformated date

In [7]:
def prepare_date(row):
    """Reformats a date column from a year and month field to a datetime object

    Args:
        Row of dataframe

    Returns:
        Row with reformated date column
    """
    year = int(row[['year']])
    
    month = int(row[['month']])
    return datetime(year = year, month = month, day = 1)


data['year_month'] = data.apply(prepare_date, axis=1)

### Inspect the first 5 rows of the dataset
display(data.head())

Unnamed: 0.1,Unnamed: 0,year,month,intent,police,sex,age,race,hispanic,place,education,year_month
0,1,2012,1,Suicide,0,M,34.0,Asian/Pacific Islander,100,Home,4.0,2012-01-01
1,2,2012,1,Suicide,0,F,21.0,White,100,Street,3.0,2012-01-01
2,3,2012,1,Suicide,0,M,60.0,White,100,Other specified,4.0,2012-01-01
3,4,2012,2,Suicide,0,M,64.0,White,100,Home,4.0,2012-02-01
4,5,2012,2,Suicide,0,M,31.0,White,100,Other specified,2.0,2012-02-01


### Count the deaths by month of the year

In [8]:
date_counts = data['year_month'].value_counts()

### Print deaths per month of year 
print(date_counts)

2013-07-01    3079
2012-07-01    3026
2012-05-01    2999
2014-08-01    2970
2012-08-01    2954
2014-06-01    2931
2013-06-01    2920
2014-09-01    2914
2014-07-01    2884
2014-10-01    2865
2013-01-01    2864
2014-05-01    2864
2013-03-01    2862
2014-04-01    2862
2013-08-01    2859
2014-12-01    2857
2012-09-01    2852
2012-06-01    2826
2013-10-01    2808
2013-05-01    2806
2013-04-01    2798
2012-04-01    2795
2012-12-01    2791
2013-12-01    2765
2013-11-01    2758
2012-01-01    2758
2014-11-01    2756
2012-03-01    2743
2013-09-01    2742
2012-10-01    2733
2012-11-01    2729
2014-03-01    2684
2014-01-01    2651
2013-02-01    2375
2014-02-01    2361
2012-02-01    2357
Name: year_month, dtype: int64


## Count the unique items in the sex and race column

In [9]:
### Sum up the deaths per gender
sex_counts = data['sex'].value_counts()
### Sum up the deaths per race
race_counts = data['race'].value_counts()

print('Deaths per sex:')
print(sex_counts)
print('Deaths per race:')
print(race_counts)

Deaths per sex:
M    86349
F    14449
Name: sex, dtype: int64
Deaths per race:
White                             66237
Black                             23296
Hispanic                           9022
Asian/Pacific Islander             1326
Native American/Native Alaskan      917
Name: race, dtype: int64


### Results

We analysis of the data show that men more likely are killed through guns and also that minorities are more likely are killed through guns.

## Analysis guns dataset combined with census dataset

### Map the race categories from the census data with the race categories in the guns data

In [10]:
mapping = {}
mapping['Asian/Pacific Islander'] = data_census.loc[0]['Race Alone - Asian'] + data_census.loc[0]['Race Alone - Native Hawaiian and Other Pacific Islander']
mapping['Black'] = data_census.loc[0]['Race Alone - Black or African American']
mapping['Native American/Native Alaskan'] = data_census.loc[0]['Race Alone - American Indian and Alaska Native']
mapping['Hispanic'] = data_census.loc[0]['Race Alone - Hispanic']
mapping['White'] = data_census.loc[0]['Race Alone - White']

### Compute the deaths per 100,000 per race category

In [11]:
race_per_hundredk = {}

for key, item in race_counts.items():
    
    if key in mapping:
        race_per_hundredk[key] = race_counts[key] / mapping[key] * 100000
    
display(race_per_hundredk)  

{'Asian/Pacific Islander': 8.3743096641617623,
 'Black': 57.877347773519602,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

### Compute the homicides per 100,000 per race category

In [12]:
intents = data['intent']
races = data['race']

homicide_race_per_hundredk = {}

for i, race in enumerate(races):
    if intents[i] == 'Homicide':
        if race not in homicide_race_per_hundredk:
            homicide_race_per_hundredk[race] = 0
        else:
            homicide_race_per_hundredk[race] += 1

for key, item in homicide_race_per_hundredk.items():
    
    if key in mapping:
        homicide_race_per_hundredk[key] = item / mapping[key] * 100000       

print(homicide_race_per_hundredk)

{'White': 4.6351350044645478, 'Asian/Pacific Islander': 3.5240307636517825, 'Black': 48.468800554326656, 'Native American/Native Alaskan': 8.6909875261598728, 'Hispanic': 12.624919861567406}


### Results

The second analysis shows that Black people are extremly more often killed in homicides wiht guns included