# Gun Violence in Philadelphia 2015 - 2020 - Exploratory Data Analysis

In this notebook, I have performed exploratory data analysis on this [Dataset](https://phl.carto.com/api/v2/sql?q=SELECT+*,+ST_Y(the_geom)+AS+lat,+ST_X(the_geom)+AS+lng+FROM+shootings&filename=shootings&format=csv&skipfields=cartodb_id) of gun violence incidents reported in Philadelphia.  Since 2020, gun violence has surged in Philadelphia.  In this notebook, I have performed a deep exploration of gun violence incidents reported in Philadelphia since 2015. 

As the first step, I imported the required libraries, loaded the dataset, and created some additional features.

In [34]:
import plotly.express as px
import plotly.offline as pyo 
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
init_notebook_mode(connected=True)
import pandas as pd
import numpy as np
import folium
import calendar

In [35]:
shootings_df = pd.read_csv("shootings.csv")
shootings_df['date_'] = pd.to_datetime(shootings_df['date_'])
shootings_df['year'] = shootings_df['date_'].dt.year
shootings_df['month'] = shootings_df['date_'].dt.month
shootings_df['monthday'] = shootings_df['date_'].dt.day
shootings_df['weekday'] = shootings_df['date_'].dt.weekday

## The dataset is comprised of 10,106 rows and 28 columns

In [36]:
shootings_df.shape

(10465, 28)

In [37]:
shootings_df['date_'].max()

Timestamp('2021-08-23 00:00:00')

In [38]:
shootings_df.isna().sum()

the_geom                192
the_geom_webmercator    192
objectid                  0
year                      0
dc_key                    0
code                     88
date_                     0
time                     88
race                     88
sex                       0
age                     148
wound                   108
officer_involved          0
offender_injured          0
offender_deceased         0
location                  0
latino                   88
point_x                 192
point_y                 192
dist                      2
inside                   88
outside                  88
fatal                    88
lat                     192
lng                     192
month                     0
monthday                  0
weekday                   0
dtype: int64

In [39]:
shootings_df.describe()

Unnamed: 0,objectid,year,dc_key,code,age,latino,point_x,point_y,dist,inside,outside,fatal,lat,lng,month,monthday,weekday
count,10465.0,10465.0,10465.0,10377.0,10317.0,10377.0,10273.0,10273.0,10463.0,10377.0,10377.0,10377.0,10273.0,10273.0,10465.0,10465.0,10465.0
mean,868581.882274,2018.275299,200063900000.0,429.971572,28.627023,0.113328,-75.309777,39.723724,21.10389,0.052905,0.946709,0.1963,39.723721,-75.309778,6.634591,15.751744,3.052843
std,3021.335493,1.978909,18865790000.0,485.238952,10.950109,0.317008,0.967128,1.741317,9.131393,0.223856,0.224624,0.397217,1.741316,0.967128,3.221628,8.674891,2.054908
min,863238.0,2015.0,1502.0,0.0,0.0,0.0,-81.581379,28.419548,1.0,0.0,0.0,0.0,28.419548,-81.581379,1.0,1.0,0.0
25%,865966.0,2017.0,201701000000.0,411.0,21.0,0.0,-75.200713,39.964978,15.0,0.0,1.0,0.0,39.964978,-75.200713,4.0,8.0,1.0
50%,868582.0,2018.0,201839000000.0,411.0,26.0,0.0,-75.158299,39.993923,22.0,0.0,1.0,0.0,39.993923,-75.158299,7.0,16.0,3.0
75%,871198.0,2020.0,202022000000.0,411.0,33.0,0.0,-75.131019,40.017076,25.0,0.0,1.0,0.0,40.017076,-75.131019,9.0,23.0,5.0
max,873814.0,2021.0,202139000000.0,3412.0,117.0,1.0,-74.959364,40.114857,39.0,1.0,1.0,1.0,40.114857,-74.959364,12.0,31.0,6.0


In [40]:
total_shootings_per_year = shootings_df.groupby('year')['objectid'].count()
year_labels = total_shootings_per_year.index.tolist()
total_shootings = total_shootings_per_year.values.tolist()

In [41]:
fig = px.bar(total_shootings_per_year, x=year_labels, y=total_shootings, 
            text=total_shootings, opacity=0.75)
fig.update_traces(texttemplate='%{text:,}', textposition='outside')
fig.update_layout(title_text='Total Shooting Incidents Per Year in Philadelphia', title_x=0.5)
fig.update_yaxes(visible=False)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()

> Since 2017, shooting incidents have increased each year in Philadelphia.  In 2020, total shooting incidents increased by 51.69% over 2019.

In [42]:
total_shootings_per_year = total_shootings_per_year.reset_index()

In [43]:
total_shootings_per_year = total_shootings_per_year.rename(index=str, columns={'objectid': 'shooting incidents'})
format_dict = {'shooting incidents':'{0:,}'}
total_shootings_per_year['shooting incidents % change'] = total_shootings_per_year['shooting incidents'].pct_change()
format_dict = {'shooting incidents':'{0:,}', 'shooting incidents % change': '{:.2%}'}
total_shootings_per_year.style.format(format_dict).hide_index()

year,shooting incidents,shooting incidents % change
2015,1258,nan%
2016,1300,3.34%
2017,1235,-5.00%
2018,1441,16.68%
2019,1475,2.36%
2020,2253,52.75%
2021,1503,-33.29%


In [44]:
fatal_shootings_per_year = shootings_df.groupby('year')['fatal'].sum()
year_labels1 = fatal_shootings_per_year.index.tolist()
fatal_shootings = fatal_shootings_per_year.values.tolist()

In [45]:
fig = px.bar(fatal_shootings_per_year, x=year_labels1, y=fatal_shootings, 
            text=fatal_shootings, opacity=0.75)
fig.update_traces(texttemplate='%{text:,}', textposition='outside')
fig.update_layout(title_text='Fatal Shooting Incidents Per Year in Philadelphia', title_x=0.5)
fig.update_yaxes(visible=False)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()

> 2020 was marked by a major increase in fatal gun shooting incidents in Philadelphia, representing a 36.18% increase over 2019 

In [46]:
month_df = shootings_df[shootings_df['year'].isin([2015, 2016, 2017, 2018, 2019, 2020])]
month_df1 = month_df.groupby(['year', 'month']).agg({'month' : 'count'}).rename(columns={'month': 'month_avg'}).reset_index()
grouped_month_avg = month_df1.groupby(['month']).agg({'month_avg': 'mean'})
grouped_month_x = grouped_month_avg.index.tolist()
grouped_month_y = grouped_month_avg.month_avg.tolist()
month_name = [calendar.month_abbr[i] for i in sorted(grouped_month_x)]


In [47]:
fig = px.bar(grouped_month_avg, x=month_name, y=grouped_month_y, 
            text=grouped_month_y, opacity=0.75)
fig.update_traces(texttemplate='%{text:.1f}', textposition='outside')
fig.update_layout(title_text='Average Number of Shooting Incidents Per Month in Philadelphia', title_x=0.5)
fig.update_yaxes(visible=False)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()

> Since 2015, the months of August, July and October have averaged the highest number of shooting incidents.  The winter months of January, February and March have consistently averaged the lowest number of shooting incidents. 

In [48]:
weekday_df = shootings_df[shootings_df['year'].isin([2015, 2016, 2017, 2018, 2019, 2020])]
weekday_df1 = weekday_df.groupby(['year','weekday']).agg({'weekday': 'count'}).rename(columns={'weekday' : 'weekday_count'}).reset_index()

grouped_weekday_df = weekday_df1.groupby(['weekday']).agg({'weekday_count' : 'mean'}).round(1)

weekday_labels = grouped_weekday_df.index.tolist()
weekday_values = grouped_weekday_df.weekday_count.tolist()

weekmap = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'}
weekday_labels = [weekmap[x] for x in weekday_labels]

In [49]:
fig = px.bar(grouped_weekday_df, x=weekday_labels, y=weekday_values, 
            text=weekday_values, opacity=0.75)
fig.update_traces(texttemplate='%{text:.1f}', textposition='outside')
fig.update_layout(title_text='Average Number of Shooting Incidents Per Weekday', title_x=0.5)
fig.update_yaxes(visible=False)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()

> Since 2015, the number of shooting incidents are higher on the weekends with approximately 229.5 incidents on Saturdays and 237.2 incidents on Sundays per year.

In [50]:
district_group = shootings_df.groupby('dist')['fatal'].sum().reset_index()
district_group = district_group.rename(columns={'dist': 'district', 'fatal': 'fatal shooting incident avg'})
district_group

Unnamed: 0,district,fatal shooting incident avg
0,1.0,24.0
1,2.0,44.0
2,3.0,28.0
3,5.0,5.0
4,6.0,19.0
5,7.0,7.0
6,8.0,18.0
7,9.0,9.0
8,12.0,170.0
9,14.0,126.0


In [51]:
district_group['% of fatal shootings'] = ((district_group['fatal shooting incident avg']/district_group['fatal shooting incident avg'].sum())*100).round(2)
district_group.sort_values(by='fatal shooting incident avg', ascending=False).head()

Unnamed: 0,district,fatal shooting incident avg,% of fatal shootings
17,25.0,241.0,11.83
15,22.0,226.0,11.09
16,24.0,196.0,9.62
8,12.0,170.0,8.35
20,39.0,168.0,8.25


> 5 Philadelphia police precint district areas account for 48.99% of all fatal shooting incidents.

In [52]:
district_df = shootings_df[shootings_df['dist'].isin([25.0, 22.0, 24.0, 12.0, 39.0]) & (shootings_df.fatal > 0.0)]

In [53]:
fatal_shootings_loc_dist = district_df[['lat', 'lng', 'dist', 'location']]
fatal_shootings_loc_dist = fatal_shootings_loc_dist.dropna()
fatal_shootings_loc_dist['dist'] = fatal_shootings_loc_dist['dist'].astype(str)

> The five most violent Philadelphia police precint areas are concentrated in North and West Philadelphia

In [54]:
fatal_shootings_map = folium.Map(location=[39.9509, -75.1575], 
                                 zoom_start=12, 
                                 zoom_control=False,
                                 scrollWheelZoom=False,
                                 dragging=False
)


for row in fatal_shootings_loc_dist.iterrows():
    row_values = row[1]
    location = [row_values['lat'], row_values['lng']]
    popup = popup = '<strong>' + row_values['location'] + '<strong>'
    marker = folium.Marker(location = location, popup = popup)
    marker.add_to(fatal_shootings_map)

display(fatal_shootings_map)

> Tragically, gun homocide victims are disproproportionality young, black men between the ages of 18 - 29

In [55]:
fatal_shootings_df = shootings_df[(shootings_df.fatal > 0.0)]

In [56]:
fig = px.histogram(fatal_shootings_df, x="age", template='simple_white+presentation', opacity=0.75)
fig.update_layout(title = "Gun Violence Victims Age Distribution", xaxis_title= "Ages", yaxis_title= "Count" )

fig.show()

In [57]:
fatal_shootings_df = shootings_df[(shootings_df.fatal > 0.0)]

victim_race = fatal_shootings_df["race"].value_counts()
victim_race_groups = victim_race.index.tolist()
victim_race_counts = victim_race.values.tolist()

victim_map = {'B':'Black', 'W':'White', 'A':'Unknown'}
victim_race_labels = [victim_map[x] for x in victim_race_groups]



In [58]:
fig = px.bar(victim_race, x=victim_race_labels, y=victim_race_counts, 
            text=victim_race_counts, opacity=0.75)
fig.update_layout(title = "Gun Violence Victims Race Distribution", xaxis_title= "Race", title_x=0.5)
fig.update_traces(texttemplate='%{text:,}', textposition='outside')
fig.update_yaxes(visible=False)
fig.show()

## Thank you for reviewing this analysis on Gun Violence in Philadelphia, please feel free to reach me at snellmatthewL@gmail.com, if you have any questions. 