# Gun Violence 2013 thru March 2018
Gun violence seems to be on the rise.  It is a rare month that goes buy that you do not hear about some form of gun violence.  There is surprisingly not a central location of this data that has such an effect on communities across the United States.

This notebook will chronicle exploring one of the datasets that is publicly available.  It will look at how many deaths are in the United States of America and if there are any areas that demonstrate higher probablity of gun violence taking place.  Looking at the date the occurnaces happen we will also evaluate if there is a season for gun violence to happen.

The data we are looking at is from kaggle at https://www.kaggle.com/jameslko/gun-violence-data.  It is pulled from http://gunviolencearchive.org, a group that pulls this data from about 7500 sources.  It is just one of several organizations that is looking to have a centeralized location for near real time data.

In [377]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [378]:
df = pd.read_csv('gun-violence-data_01-2013_03-2018.csv')

## Exploratory Data Analysis

In [303]:
df.head()

Unnamed: 0,incident_id,date,state,city_or_county,address,n_killed,n_injured,incident_url,source_url,incident_url_fields_missing,...,participant_age,participant_age_group,participant_gender,participant_name,participant_relationship,participant_status,participant_type,sources,state_house_district,state_senate_district
0,461105,2013-01-01,Pennsylvania,Mckeesport,1506 Versailles Avenue and Coursin Street,0,4,http://www.gunviolencearchive.org/incident/461105,http://www.post-gazette.com/local/south/2013/0...,False,...,0::20,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Male||1::Male||3::Male||4::Female,0::Julian Sims,,0::Arrested||1::Injured||2::Injured||3::Injure...,0::Victim||1::Victim||2::Victim||3::Victim||4:...,http://pittsburgh.cbslocal.com/2013/01/01/4-pe...,,
1,460726,2013-01-01,California,Hawthorne,13500 block of Cerise Avenue,1,3,http://www.gunviolencearchive.org/incident/460726,http://www.dailybulletin.com/article/zz/201301...,False,...,0::20,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Male,0::Bernard Gillis,,0::Killed||1::Injured||2::Injured||3::Injured,0::Victim||1::Victim||2::Victim||3::Victim||4:...,http://losangeles.cbslocal.com/2013/01/01/man-...,62.0,35.0
2,478855,2013-01-01,Ohio,Lorain,1776 East 28th Street,1,3,http://www.gunviolencearchive.org/incident/478855,http://chronicle.northcoastnow.com/2013/02/14/...,False,...,0::25||1::31||2::33||3::34||4::33,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Male||1::Male||2::Male||3::Male||4::Male,0::Damien Bell||1::Desmen Noble||2::Herman Sea...,,"0::Injured, Unharmed, Arrested||1::Unharmed, A...",0::Subject-Suspect||1::Subject-Suspect||2::Vic...,http://www.morningjournal.com/general-news/201...,56.0,13.0
3,478925,2013-01-05,Colorado,Aurora,16000 block of East Ithaca Place,4,0,http://www.gunviolencearchive.org/incident/478925,http://www.dailydemocrat.com/20130106/aurora-s...,False,...,0::29||1::33||2::56||3::33,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A...,0::Female||1::Male||2::Male||3::Male,0::Stacie Philbrook||1::Christopher Ratliffe||...,,0::Killed||1::Killed||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,http://denver.cbslocal.com/2013/01/06/officer-...,40.0,28.0
4,478959,2013-01-07,North Carolina,Greensboro,307 Mourning Dove Terrace,2,2,http://www.gunviolencearchive.org/incident/478959,http://www.journalnow.com/news/local/article_d...,False,...,0::18||1::46||2::14||3::47,0::Adult 18+||1::Adult 18+||2::Teen 12-17||3::...,0::Female||1::Male||2::Male||3::Female,0::Danielle Imani Jameison||1::Maurice Eugene ...,3::Family,0::Injured||1::Injured||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,http://myfox8.com/2013/01/08/update-mother-sho...,62.0,27.0


In [304]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239677 entries, 0 to 239676
Data columns (total 29 columns):
incident_id                    239677 non-null int64
date                           239677 non-null object
state                          239677 non-null object
city_or_county                 239677 non-null object
address                        223180 non-null object
n_killed                       239677 non-null int64
n_injured                      239677 non-null int64
incident_url                   239677 non-null object
source_url                     239209 non-null object
incident_url_fields_missing    239677 non-null bool
congressional_district         227733 non-null float64
gun_stolen                     140179 non-null object
gun_type                       140226 non-null object
incident_characteristics       239351 non-null object
latitude                       231754 non-null float64
location_description           42089 non-null object
longitude                    

The data has features for the participants age and it stores all parties involved within each row.  The main feature that I am interested in if n_killed.  Going to see if there is a correlation to deaths from gun violence and location.  There are also features that indicate the state House and Senate districts.

There are a few of the features that are not going to be useful.  While these are usefull for analizing and viewing each incident individually, I am going to drop these out of the dataframe.

In [305]:
drop = ['incident_id', 'incident_url', 'source_url', 'incident_url_fields_missing', 'notes', 'sources']
df = df.drop(columns=drop)

There is still a lot of missing data.  Most of it is just unknown or not applicable.  Setting missing data to unknown.

In [306]:
unknown = ['address', 'gun_stolen', 'gun_type', 'n_guns_involved', 'participant_age', 'participant_age_group',
          'participant_gender', 'participant_name', 'participant_status', 'participant_type']
df[unknown] = df[unknown].fillna('unknown')

A few of the features that have null I am going to set to N/A.

In [307]:
na = ['incident_characteristics', 'location_description']
df[na] = df[na].fillna('N/A')

And filling in the na for participant relationship to reflect no relationship.

In [308]:
df['participant_relationship'] = df['participant_relationship'].fillna('No Relationship')

In [309]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239677 entries, 0 to 239676
Data columns (total 23 columns):
date                        239677 non-null object
state                       239677 non-null object
city_or_county              239677 non-null object
address                     239677 non-null object
n_killed                    239677 non-null int64
n_injured                   239677 non-null int64
congressional_district      227733 non-null float64
gun_stolen                  239677 non-null object
gun_type                    239677 non-null object
incident_characteristics    239677 non-null object
latitude                    231754 non-null float64
location_description        239677 non-null object
longitude                   231754 non-null float64
n_guns_involved             239677 non-null object
participant_age             239677 non-null object
participant_age_group       239677 non-null object
participant_gender          239677 non-null object
participant_name     

Filling the nulls of the congressional, house, and senate districts with the mode of the city or county that it is in.

In [310]:
missing_cong_dist = df['city_or_county'][df['congressional_district'].isna()].unique()

def missing_mode(what, where):
    missing = df[where][df[what].isna()].unique()
    for miss in missing:
        df[what].fillna(df.loc[df[where] == miss].mode()[what][0], inplace=True)
        
missing_mode('congressional_district', 'city_or_county')

In [311]:
missing_mode('state_house_district', 'city_or_county')

In [312]:
missing_mode('state_senate_district', 'city_or_county')

In [364]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239677 entries, 0 to 239676
Data columns (total 29 columns):
incident_id                    239677 non-null int64
date                           239677 non-null object
state                          239677 non-null object
city_or_county                 239677 non-null object
address                        223180 non-null object
n_killed                       239677 non-null int64
n_injured                      239677 non-null int64
incident_url                   239677 non-null object
source_url                     239209 non-null object
incident_url_fields_missing    239677 non-null bool
congressional_district         227733 non-null float64
gun_stolen                     140179 non-null object
gun_type                       140226 non-null object
incident_characteristics       239351 non-null object
latitude                       231754 non-null float64
location_description           42089 non-null object
longitude                    

In [None]:
#df['address_full'] = df['latitude'].mask(df['latitude'].isna(), other=(df['address'].notnull(df['address']).map(str) + ', ' + df['city_or_county'].map(str) + ', ' + df['state']), axis=0)

In [388]:
df['address_full'] = df['latitude'].where(~df['latitude'].isna(), other=(df['address'].map(str) + ', ' + df['city_or_county'].map(str) + ', ' + df['state']), axis=0)

In [446]:
add_nan = df['address_full'][df['latitude'].isna()]
add_nan.to_frame('address')

Unnamed: 0,address
257,"nan, Derry, Pennsylvania"
277,"104th Ave and Walnut St, Oakland, California"
1926,"3700 block of Coconino Dr., San Antonio, Texas"
1933,"2100 block of London Court, Henrico County, Vi..."
2184,"Harmons Hill Rd, Millsboro, Delaware"
...,...
239666,"3100 block of California St, Saint Louis, Miss..."
239668,"I-96, Detroit, Michigan"
239669,"Hayes Rd, Madison, Wisconsin"
239670,"1 block of N Paulina St, Chicago, Illinois"


In [447]:
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

geolocator = Nominatim(user_agent='Thinkful Learning')
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
#location = geolocator.geocode('nan, Derry, Pennsylvania')

In [421]:
location.raw

{'place_id': 235298755,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'relation',
 'osm_id': 188358,
 'boundingbox': ['40.322898', '40.341487', '-79.312053', '-79.289583'],
 'lat': '40.3339589',
 'lon': '-79.2997573',
 'display_name': 'Derry, Westmoreland County, Pennsylvania, 15627, United States of America',
 'class': 'boundary',
 'type': 'administrative',
 'importance': 0.5465480119372579,
 'icon': 'https://nominatim.openstreetmap.org/images/mapicons/poi_boundary_administrative.p.20.png'}

Unnamed: 0,address
257,"nan, Derry, Pennsylvania"
277,"104th Ave and Walnut St, Oakland, California"
1926,"3700 block of Coconino Dr., San Antonio, Texas"
1933,"2100 block of London Court, Henrico County, Vi..."
2184,"Harmons Hill Rd, Millsboro, Delaware"
...,...
239666,"3100 block of California St, Saint Louis, Miss..."
239668,"I-96, Detroit, Michigan"
239669,"Hayes Rd, Madison, Wisconsin"
239670,"1 block of N Paulina St, Chicago, Illinois"


In [None]:
location = geolocator.geocode('nan, Derry, Pennsylvania')

#for add in add_nan.values():
    #print(add)

In [5]:
from pandas_profiling import ProfileReport

profile = ProfileReport(df, title='Gun Violence Profiling Report', html={'style':{'full_width':True}})
profile

HBox(children=(FloatProgress(value=0.0, description='variables', max=29.0, style=ProgressStyle(description_wid…




HBox(children=(FloatProgress(value=0.0, description='correlations', max=6.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='interactions [continuous]', max=81.0, style=ProgressStyle…




HBox(children=(FloatProgress(value=0.0, description='table', max=1.0, style=ProgressStyle(description_width='i…




HBox(children=(FloatProgress(value=0.0, description='missing', max=4.0, style=ProgressStyle(description_width=…









HBox(children=(FloatProgress(value=0.0, description='package', max=1.0, style=ProgressStyle(description_width=…




HBox(children=(FloatProgress(value=0.0, description='build report structure', max=1.0, style=ProgressStyle(des…




