## Gun violence in America—looking at data
Task 1: Measuring frequency distributions of variables in data sets.

Note: The cells in this iPython notebook are scrollable if their contents extend beyond the frame. 

In [63]:
# Forces refresh of any cached variables that have been re-defined.
# Must come first.
from IPython import get_ipython
get_ipython().magic('reset -sf')

In [64]:
# Imports
import pandas as pd
import numpy as np
import os # for filepath
import re # for regular expressions

### US Mass Shootings, 1982-2018: Data From Mother Jones’ Investigation
https://www.motherjones.com/politics/2012/12/mass-shootings-mother-jones-full-data/f

In [68]:
# Loading the data
datapath = os.path.join('datasets', "")
mojo_df = pd.read_csv(datapath + 'mojo_us_mass_shoot_82_18.csv')

In [69]:
# A look at the columns and their data types
mojo_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 22 columns):
Case                                   97 non-null object
Location                               97 non-null object
Date                                   97 non-null object
Year                                   97 non-null int64
Summary                                97 non-null object
Fatalities                             97 non-null int64
Injured                                97 non-null int64
Total victims                          97 non-null object
Venue                                  97 non-null object
Prior signs of mental health issues    97 non-null object
Mental health - details                86 non-null object
Weapons obtained legally               97 non-null object
Where obtained                         85 non-null object
Type of weapons                        97 non-null object
Weapon details                         91 non-null object
Race                          

In [70]:
# A look at the first few rows of data
mojo_df.head()

Unnamed: 0,Case,Location,Date,Year,Summary,Fatalities,Injured,Total victims,Venue,Prior signs of mental health issues,...,Where obtained,Type of weapons,Weapon details,Race,Gender,Sources,Mental Health Sources,latitude,longitude,Type
0,Stoneman Douglas High School shooting,"Parkland, Florida",2/14/18,2018,"Nikolas J. Cruz, 19, heavily armed with an AR-...",17,14,31,School,Yes,...,A Florida pawn shop,semiautomatic rifle,AR-15,White,M,https://www.nytimes.com/2018/02/14/us/parkland...,https://www.nytimes.com/2018/02/15/us/nikolas-...,,,Mass
1,Pennsylvania carwash shooting,"Melcroft, PA",1/28/18,2018,"Timothy O'Brien Smith, 28, wearing body armor ...",4,1,5,Other,TBD,...,TBD,semiautomatic rifle and semiautomatic handgun,,White,M,http://www.wpxi.com/news/top-stories/family-me...,,,,Mass
2,Rancho Tehama shooting spree,"Rancho Tehama, CA",11/14/17,2017,"Kevin Janson Neal, 44, went on an approximatel...",5,10,15,Other,TBD,...,TBD,semiautomatic rifles,Two illegally modified rifles,White,M,https://www.nbcnews.com/news/us-news/californi...,,,,Spree
3,Texas First Baptist Church massacre,"Sutherland Springs, TX",11/5/17,2017,"Devin Patrick Kelley, a 26-year-old ex-US Air ...",26,20,46+,Religious,Yes,...,Purchased in April 2016 from an Academy Sports...,semiautomatic rifle,Ruger AR-556; Kelley also possessed semiautoma...,White,M,https://www.washingtonpost.com/news/morning-mi...,http://www.expressnews.com/news/local/article/...,32.780105,-96.800008,Mass
4,Walmart shooting in suburban Denver,"Thornton, CO",11/1/17,2017,"Scott Allen Ostrem, 47, walked into a Walmart ...",3,0,3,Other,Unclear,...,,semiautomatic handgun,,White,M,https://www.nytimes.com/2017/11/01/us/thornton...,,43.060567,-88.106479,Mass


In order to do a frequency count on state data we need to isolate the state from the location with a consistent representation (i.e. full name or abreviation). Creating two new columns: 'State_code' with two letter abreviations, and 'State' with full name.


In [82]:
# Create two maps: (1) state to abreviation, (2) abreviation to state
state_abrv = {'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA', 'Colorado': 'CO', 'Connecticut': 'CT', 'Delaware': 'DE', 'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID', 'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA', 'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA', 'Maine': 'ME', 'Maryland': 'MD', 'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS', 'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV', 'New Hampshire': 'NH', 'New Jersey': 'NJ', 'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH', 'Oklahoma': 'OK', 'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC', 'South Dakota': 'SD', 'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT', 'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA', 'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY'}
abrv_state = {v: k for k, v in state_abrv.items()}

In [87]:
# Transform 'Location' data into two new columns: 'State' and 'State_code'
mojo_df['State_code'] = mojo_df['Location'].str.split(',').str.get(1).str.strip()
mojo_df['State'] = mojo_df['Location'].str.split(',').str.get(1).str.strip()
mojo_df = mojo_df.replace({'State_code': state_abrv})
mojo_df = mojo_df.replace({'State': abrv_state})

Scroll right to examine new columns for state names and state abreviations: State, State_code.

In [88]:
mojo_df.head()

Unnamed: 0,Case,Location,Date,Year,Summary,Fatalities,Injured,Total victims,Venue,Prior signs of mental health issues,...,Weapon details,Race,Gender,Sources,Mental Health Sources,latitude,longitude,Type,State_code,State
0,Stoneman Douglas High School shooting,"Parkland, Florida",2/14/18,2018,"Nikolas J. Cruz, 19, heavily armed with an AR-...",17,14,31,School,Yes,...,AR-15,White,M,https://www.nytimes.com/2018/02/14/us/parkland...,https://www.nytimes.com/2018/02/15/us/nikolas-...,,,Mass,FL,Florida
1,Pennsylvania carwash shooting,"Melcroft, PA",1/28/18,2018,"Timothy O'Brien Smith, 28, wearing body armor ...",4,1,5,Other,TBD,...,,White,M,http://www.wpxi.com/news/top-stories/family-me...,,,,Mass,PA,Pennsylvania
2,Rancho Tehama shooting spree,"Rancho Tehama, CA",11/14/17,2017,"Kevin Janson Neal, 44, went on an approximatel...",5,10,15,Other,TBD,...,Two illegally modified rifles,White,M,https://www.nbcnews.com/news/us-news/californi...,,,,Spree,CA,California
3,Texas First Baptist Church massacre,"Sutherland Springs, TX",11/5/17,2017,"Devin Patrick Kelley, a 26-year-old ex-US Air ...",26,20,46+,Religious,Yes,...,Ruger AR-556; Kelley also possessed semiautoma...,White,M,https://www.washingtonpost.com/news/morning-mi...,http://www.expressnews.com/news/local/article/...,32.780105,-96.800008,Mass,TX,Texas
4,Walmart shooting in suburban Denver,"Thornton, CO",11/1/17,2017,"Scott Allen Ostrem, 47, walked into a Walmart ...",3,0,3,Other,Unclear,...,,White,M,https://www.nytimes.com/2017/11/01/us/thornton...,,43.060567,-88.106479,Mass,CO,Colorado


**Let's now look at frequency counts and distributions (percentages) of mass shootings per state from 1982-2018.**

In [89]:
mojo_df['State'].value_counts()

California        16
Florida           10
Texas              8
Washington         7
Colorado           6
New York           4
Wisconsin          4
Pennsylvania       3
Connecticut        3
South Carolina     2
Nevada             2
Oregon             2
Michigan           2
North Carolina     2
Minnesota          2
Illinois           2
Kentucky           2
Ohio               2
Georgia            2
Maryland           1
Louisiana          1
Virginia           1
Mississippi        1
Kansas             1
Iowa               1
Oklahoma           1
Arizona            1
Utah               1
Massachusetts      1
Arkansas           1
Hawaii             1
D.C.               1
Nebraska           1
Missouri           1
Tennessee          1
Name: State, dtype: int64

In [91]:
mojo_df['State'].value_counts(normalize=True)

California        0.164948
Florida           0.103093
Texas             0.082474
Washington        0.072165
Colorado          0.061856
New York          0.041237
Wisconsin         0.041237
Pennsylvania      0.030928
Connecticut       0.030928
South Carolina    0.020619
Nevada            0.020619
Oregon            0.020619
Michigan          0.020619
North Carolina    0.020619
Minnesota         0.020619
Illinois          0.020619
Kentucky          0.020619
Ohio              0.020619
Georgia           0.020619
Maryland          0.010309
Louisiana         0.010309
Virginia          0.010309
Mississippi       0.010309
Kansas            0.010309
Iowa              0.010309
Oklahoma          0.010309
Arizona           0.010309
Utah              0.010309
Massachusetts     0.010309
Arkansas          0.010309
Hawaii            0.010309
D.C.              0.010309
Nebraska          0.010309
Missouri          0.010309
Tennessee         0.010309
Name: State, dtype: float64