### Analyzing US Gun Deaths Data from 2012-2014
The article is a storytelling exploration of the US gun deaths from 2012 to 2014 using Python and specifically Jupyter framework. The analysis uses two datasets: the guns.csv that shows gun deaths in the US from 2012 to 2014 and the second dataset, introduced much later, is the US census data of 1st April 2010 as census.csv . The guns.csv dataset came from FiveThirtyEight. I start off by exploring the guns.csv dataset.

In [2]:
import csv
file_handle = open("guns.csv", "r")
csv_reader = csv.reader(file_handle)
data = list(csv_reader)
data[:5]

[['',
  'year',
  'month',
  'intent',
  'police',
  'sex',
  'age',
  'race',
  'hispanic',
  'place',
  'education'],
 ['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]

The output is a data list of lists and here I have displayed only the first 5 lists within it. The first element (the first list) is the header which I can now remove.

### Removing Header from A List of Lists

In [3]:
header = data[0]
data = data[1:]
print(header)
data[:5]

['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']


[['1',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '34',
  'Asian/Pacific Islander',
  '100',
  'Home',
  '4'],
 ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'],
 ['3',
  '2012',
  '01',
  'Suicide',
  '0',
  'M',
  '60',
  'White',
  '100',
  'Other specified',
  '4'],
 ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'],
 ['5',
  '2012',
  '02',
  'Suicide',
  '0',
  'M',
  '31',
  'White',
  '100',
  'Other specified',
  '2']]

The first five elements of data.csv are displayed above after removing the header.

### Counting Gun Deaths By Year
I now compute the number of gun deaths per year.

In [5]:
years = [row[1] for row in data]

year_counts = {}
for year in years:
        if year in year_counts:
            year_counts[year] += 1
        else:
            year_counts[year] = 1
year_counts
    

{'2012': 33563, '2013': 33636, '2014': 33599}

The result is a year_counts dictionary showing gun deaths for the years 2012, 2013 and 2014. 

### Exploring Gun Deaths By Month and Year
From the output above, it looks like gun deaths did not change much from 2012 to 2014. I then explore changes in gun deaths in the US by month and by year. To achieve this I create a datatime.datetime object using year and month columns. I then count the number of deaths by date, just as I did by year in the previous code.

A datatime object can be specified using the year, month and day keyword arguments. While the data has the year and month column, it does not have the day column. However, this is not a problem because I can simply specify a fixed day, and have chosen day = 1.

In [7]:
import datetime
dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]
dates[:5]

[datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 1, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0),
 datetime.datetime(2012, 2, 1, 0, 0)]

The code above and output shows the year and months. I can then go further and displace the gun deaths for each month of each year and put the output in a date_count dictionary.

In [8]:
import datetime
date_counts = {}
for date in dates:
    if date in date_counts:
        date_counts[date] += 1
    else:
        date_counts[date] = 1
        
date_counts

{datetime.datetime(2012, 1, 1, 0, 0): 2758,
 datetime.datetime(2012, 2, 1, 0, 0): 2357,
 datetime.datetime(2012, 3, 1, 0, 0): 2743,
 datetime.datetime(2012, 4, 1, 0, 0): 2795,
 datetime.datetime(2012, 5, 1, 0, 0): 2999,
 datetime.datetime(2012, 6, 1, 0, 0): 2826,
 datetime.datetime(2012, 7, 1, 0, 0): 3026,
 datetime.datetime(2012, 8, 1, 0, 0): 2954,
 datetime.datetime(2012, 9, 1, 0, 0): 2852,
 datetime.datetime(2012, 10, 1, 0, 0): 2733,
 datetime.datetime(2012, 11, 1, 0, 0): 2729,
 datetime.datetime(2012, 12, 1, 0, 0): 2791,
 datetime.datetime(2013, 1, 1, 0, 0): 2864,
 datetime.datetime(2013, 2, 1, 0, 0): 2375,
 datetime.datetime(2013, 3, 1, 0, 0): 2862,
 datetime.datetime(2013, 4, 1, 0, 0): 2798,
 datetime.datetime(2013, 5, 1, 0, 0): 2806,
 datetime.datetime(2013, 6, 1, 0, 0): 2920,
 datetime.datetime(2013, 7, 1, 0, 0): 3079,
 datetime.datetime(2013, 8, 1, 0, 0): 2859,
 datetime.datetime(2013, 9, 1, 0, 0): 2742,
 datetime.datetime(2013, 10, 1, 0, 0): 2808,
 datetime.datetime(2013, 11,

The above results show the gun deaths for each month for the years 2012, 2013 and 2014. Again, the output reveals that there was no much change across the months for the three years.

I now explore the gun deaths according to the race and sex of victims to understand how each of the 5 US races is affected in terms of gun deaths.

### Exploring Gun Deaths By Race and Sex

In [9]:
sex = [ item[5]for item in data ]
race = [item[7] for item in data ]
sex_counts = dict()
race_counts = dict()
for item in sex:
    if item not in sex_counts:
        sex_counts[item] = 0
    if item in sex_counts:
        sex_counts[item] += 1
for item in race:
    if item not in race_counts:
        race_counts[item] = 0
    if item in race_counts:
        race_counts[item] += 1
print(sex_counts)
print(race_counts)
    

{'F': 14449, 'M': 86349}
{'White': 66237, 'Native American/Native Alaskan': 917, 'Hispanic': 9022, 'Black': 23296, 'Asian/Pacific Islander': 1326}


The first output of sex_counts reveal that more males (M: 86349) die from guns than females (F: 14449). On the second output, more Whites die of guns than any other race. However, such information may not be exhaustive and statistically correct in giving a clear picture of which race is actually affected by gun deaths the most. Therefore, an analysis that is proportional to each race's population would tell a more convincing conclusion on which race has the most gun deaths.

I need to have a dataset for the population of each race for this analysis to be effective and convincing. What I need is to data that captures the population of each race.

### Reading In a Second Dataset on Census
I have explored gun deaths by race in the past analysis. Unfortunately, the analysis only provides the total number of US gun deaths by race. It is important to know the proportionality of each of the US races in order to accurately make a comparison of the figures. What I actually need is to capture the rate of gun deaths for every 100000 people in each of the five races.

In order to achieve this mission, I will need to need to capture data on the percentage that the US population falls in each of the racial categories. Fortunately, I am able to import some census data to assist in this goal. I came across the April 1, 2010 census data (census.csv).

The census.csv data contains information on the total population of the US, as well as, the total population of each racial group in the US.

In [11]:
import csv
file_handle = open("census.csv", "r")
csv_reader = csv.reader(file_handle)
census = list(csv_reader)
census

[['Id',
  'Year',
  'Id',
  'Sex',
  'Id',
  'Hispanic Origin',
  'Id',
  'Id2',
  'Geography',
  'Total',
  'Race Alone - White',
  'Race Alone - Hispanic',
  'Race Alone - Black or African American',
  'Race Alone - American Indian and Alaska Native',
  'Race Alone - Asian',
  'Race Alone - Native Hawaiian and Other Pacific Islander',
  'Two or More Races'],
 ['cen42010',
  'April 1, 2010 Census',
  'totsex',
  'Both Sexes',
  'tothisp',
  'Total',
  '0100000US',
  '',
  'United States',
  '308745538',
  '197318956',
  '44618105',
  '40250635',
  '3739506',
  '15159516',
  '674625',
  '6984195']]

There are only two rows with the first as the header. There is a dilemma here, however. The data.csv data I have been using all along classifies race into five categories. However, the census.csv data I have just displayed above shows that the US population is classified in more than five races. I have to do a mapping by merging two or more races into one in order to retain the the total number of races at five, for consistence with the data.csv data and for a much more accurate analysis.

Already, I notice that the Asian/Pacific Islander's race in the gun death data.csv data is actually made up of the "Race Alone - Asian" and the "Race Alone - Native Hawaiian and Other Pacific Islander" in the census.scv dataset. I simply add the two to create a single race under the name Asian/Pacific Islander's as is in the data.csv.

I manually create a dictionary mapping by adding the races as keys and their populations as values. I go ahead and compute the gun deaths per race from the previous results, dividing the results over the race's population and multiplying it by 100,000 as is a standard in major statistical analysis.

### Computing Rates of Gun Deaths for every 100,000 in each Race

Now I have the population of each race and I have the gun deaths per race. I can easily find out the how many people in 100,000 of the population are likely to die from guns. The code for this computation is:

In [13]:
mapping = {'Asian/Pacific Islander': 15159516 + 674625 , 'Black': 40250635, 'Native American/Native Alaskan': 3739506, 'Hispanic': 44618105, 'White': 197318956}
race_per_hundredk = {}
for key in race_counts:
    race_per_hundredk[key] = (race_counts[key] / mapping[key]) * 100000
race_per_hundredk

{'Asian/Pacific Islander': 8.374309664161762,
 'Black': 57.8773477735196,
 'Hispanic': 20.220491210910907,
 'Native American/Native Alaskan': 24.521955573811088,
 'White': 33.56849303419181}

The output gives a more clear picture on the gun death in American. For every 100,000 Black people in the US approximately 58 are likely to be victims of gun deaths. The least are Asian/Pacific Inslanders at 8 gun deaths for every 100,000 people of this race, in fact, the information is very much consistent reports in studies and media reports that the Blacks are more disadvantages when it comes to gun deaths in the US.

I am now going to filter the results further and restrict deaths to those whose intent was Homicide. This will tell me what the the number of victims in gun deaths per 100,000 people in each racial category is associated with homicide gun deaths.

In order to achieve this, I'll need to redo my work in generating race_counts, but only count rows where the intent is homicide. This process is shown below.

### Filtering By Intent

In [10]:
intents = [intent[3] for intent in data[1:] ]
print(set(intents))
intents[:10]

{'Suicide', 'Homicide', 'NA', 'Undetermined', 'Accidental'}


['Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Suicide',
 'Undetermined',
 'Suicide',
 'Accidental',
 'Suicide',
 'Suicide']

The results shows that intent of the gun death is classified into 5 categories of Accidental, Homicide, NA, Undetermined and Suicide. I have only shown the first ten gun death intents to minimize on space. My objective is to understand gun deaths whose intent was homicide for each of the five races. I start by picking the race of the gun death person from the guns.csv and only display the first 10 outputs.


In [15]:
races = [race[7] for race in data]
races[:10]

['Asian/Pacific Islander',
 'White',
 'White',
 'White',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Native American/Native Alaskan',
 'White',
 'Black']

Having the intents and races lists I go ahead and reveal the number of gun deaths per race whose intent was homicide. Further, I use the populations of each race to compute the homicide gun deaths for every 100,000 of the respective races.

In [16]:
intents = [intent[3] for intent in data ]
homicide_race_counts = {}
for i,race in enumerate(races):
    if race not in homicide_race_counts:
        homicide_race_counts[race] = 0
    if intents[i] == "Homicide":
        homicide_race_counts[race] += 1
            
homicide_race_counts

race_per_hundredk = {}
for k, v in homicide_race_counts.items():
    race_per_hundredk[k] = (v / mapping[k]) * 100000    
race_per_hundredk 



{'Asian/Pacific Islander': 3.530346230970155,
 'Black': 48.471284987180944,
 'Hispanic': 12.627161104219914,
 'Native American/Native Alaskan': 8.717729026240365,
 'White': 4.6356417981453335}

### Findings from the above analysis
It is evident that Blacks are the most affected by homicide gun deaths  followed in a distance by the Hispanics while the While race is the least affected.
### Further Analysis
Further analysis may be done to find out the variation of racial groups affected by police triggered gun deaths. It may also be of interest to undertand how many people at different educational level in each race die from guns. 