##### Please follow this nbviewer link to view the project code and interactive charts: 
https://nbviewer.org/github/damilare-osunleke/NYC_Shooting/blob/main/main.ipynb

# Shootings: Gun Violence in New York City

####                                    Project Description   
####

##### Goal:
This project seeks to understand gun voilence in the city of New York, and obtain insights that can potentially help to reduce the spate of shootings in the city. To achieve this, the following questions are explored in this project:

1   What has been the overall trend in shootings across the five boroughs of the city over the last 15 years?   
2   What proportion of shootings resulted in deaths?   
3   What are the dominan profiles of the perpetrators and victims of shootings: race, age group?    
4   During what time of the day do most shootings occur?     
5   Which boroughs of New York City have the worst shootings problem?   



#### Motivation   

In the last decade, a lot has been reported in the bedia about the epidemic of gun violence in the United States of America, particularly mass shootings. Although mass shootings recieve extensive coverage in the media, they account for only a small fraction of gun-related violence and death in the country.

In 2018, the Centers for Disease Control and Prevention's (CDC) National Center for Health Statistics reports 38,390 deaths by firearm, bringing the rate of firearm death to about 12 per 100,000. [[1]](https://www.cdc.gov/injury/wisqars/pdf/leading_causes_of_injury_deaths_highlighting_violence_2018-508.pdf), [[2]](https://www.kff.org/other/state-indicator/firearms-death-rate-per-100000/?currentTimeframe=0&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D)

New York City is the most populated and most densely-populated city in the United States, with its fair share of gun voilence. Many of the cases of shootings reported in the media happended in New York city. Although the city has made progress in the reduction of crime in general, and gun violence in particular, gun violence remains a serious concern. For instance, compared to London (a city with similar population), NYC had double the number of gun violence in 2021. [[3]](https://www.statista.com/statistics/865565/gun-crime-in-london/)

I was motivated to carry out this project in order to better understand the dynamics of gun violence in one of the world's most popular cities.


#### Data

Three datasets were used for this project. All are publicly available of the open data platform of the City of New York and can be accessed via the links below:

1) Historic records of all shootings in New York City (2006 to 2021): [NYPD Shooting Incident Data (Historic)](https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8). Data provided by Police Department (NYPD)
2) Historic and projected population of New York City by Borough: [New York City Population by Borough, 1950 - 2040](https://data.cityofnewyork.us/City-Government/New-York-City-Population-by-Borough-1950-2040/xywu-7bv9). Data provided by Department of City Planning (DCP)
3) A map of the administrative boundaries of the the 5 boroughs of New York City: [Borough Boundaries](https://data.cityofnewyork.us/browse?q=map%20borough&sortBy=relevance). Data provided by Department of City Planning (DCP)


Since the datasets is published by reputable government agencies, they are considered very reliable.


##### Data Description
Below are the features from each dataset that was used for this projects

**Shooting Dataset:**   
a) **incident_key** : Randomly generated persistent and unique ID for each incident    
b) **occur_date** : Exact date of the shooting incident   
c) **occur_time** : Exact time of the shooting incident      
d) **boro** : Borough where the shooting incident occurred   
e) **statistical_murder_flag** : whether or not the shooting resulted in the victim’s death   
f) **perp_age_group** : Perpetrator’s age category      
g) **perp_sex** : Perpetrator’s sex description      
h) **perp_race** : Perpetrator’s race description   
f) **vic_age_group** : Victim’s age within a category   
i) **vic_sex** : Victim’s sex description   
j) **vic_race** : Victim’s race description   
i) **longitude** : longitudinal coordinate of the shooting location   
j) **latitude** : latitudinal  coordinate of the shooting location   


**Borough population dataset:**   
a) **Borough** : New York City borough name   
b) **2020** : Population of the borough as at the year 2020  


**Borough Boundaries dataset:**   
a) **boro_code** : Unique Borough code   
b) **boro_name** : Borough name 
b) **geometry** : Polygon representing the geometry of the borough 




Importing all necessary libraries

In [60]:
import geopandas as gpd
import pandas as pd
import altair as alt
import numpy as np
from altair import datum


In [61]:
import warnings
warnings.filterwarnings('ignore')

Reading all cleaned files (Please refer to the file 'data_preprocessing.ipynb' for detailed data cleaning and preprocessing)

In [62]:
shooting= pd.read_pickle('./Data/Cleaned/shooting.pkl')
boro_pop= pd.read_pickle('./Data/Cleaned/boro_pop.pkl')
boro_boundary= pd.read_pickle('./Data/Cleaned/boro_boundary.pkl')

In [63]:
# A quick view of the updated 'shooting' dataset
shooting.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25596 entries, 0 to 25595
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   incident_key    25596 non-null  int64         
 1   occur_date      25596 non-null  datetime64[ns]
 2   occur_time      25596 non-null  datetime64[ns]
 3   boro            25596 non-null  object        
 4   murder_flag     25596 non-null  bool          
 5   perp_age_group  16252 non-null  object        
 6   perp_sex        16286 non-null  object        
 7   perp_race       16286 non-null  object        
 8   vic_age_group   25596 non-null  object        
 9   vic_sex         25596 non-null  object        
 10  vic_race        25596 non-null  object        
 11  latitude        25596 non-null  float64       
 12  longitude       25596 non-null  float64       
 13  year            25596 non-null  int64         
 14  total           25596 non-null  object        
dtypes:

In [64]:
alt.data_transformers.enable('json')

DataTransformerRegistry.enable('json')

1) In the visualisation below, we explore the overall trend of shootings in New York City and across boroughs in the last 15 years.

In [65]:

borough= alt.Chart(shooting).mark_line(tooltip= True).encode(
    x= alt.X('year(occur_date):T', title= 'Year'),
    y= alt.Y('count(incident_key)', title= 'Number of Shootings'),
    color = alt.Color('boro', scale=alt.Scale(scheme= 'category10'), legend= alt.Legend(title= 'Borough')),
    )


total = alt.Chart(shooting).mark_line(strokeDash=[10,1], strokeWidth=3, tooltip= True).encode(
    x= alt.X('year(occur_date):T', axis=alt.Axis(title= 'Year', grid= False)),
    y= alt.Y('count(incident_key)', axis=alt.Axis(title= 'Number of Shootings', grid= False)),
    color= alt.Color('total', scale=alt.Scale(scheme='greys'), legend= alt.Legend(title= ''))
    
)

alt.layer( borough, total).resolve_scale(color='independent').properties(
    title= 'Shootings in New York (2006- 2021)').encode()



 **Observation(s)** : There has generally been a steady decline in shootings from 2006 until 2020 when the number of shootings doubled compared to the previous year. This same trend is seen across most of the boroughs.

 2) In the visualisation below, we look at the proportion of shootings that resulted in murder across the last 15 years. There is also a drop-down button for exploring this data by borough.

In [66]:
options= [None, 'BRONX','BROOKLYN','STATEN ISLAND','MANHATTAN']

drop_down = alt.binding_select(options= options, name= 'Boroughs ', labels= ['ALL','BRONX','BROOKLYN','STATEN ISLAND','MANHATTAN'])
selection =alt.selection_single(fields=['boro'], bind = drop_down)



alt.Chart(shooting).mark_bar(tooltip= True).encode(
    x= alt.X("year:O"),
    y=alt.Y("count(incident_key)", stack='normalize',  axis= alt.Axis(labels= False, format='.2%')),
    color = alt.Color("murder_flag")
    
).add_selection(selection).transform_filter(selection)

**Observation(s)** : Overall, the percentage of murders from shootings have remained relatively stable, especially in the last 5 years (between 17% & 21%). Compared to other boroughs in New York City, Staten Island has a high murder rate from shootings. In fact, in 2021 Staten Island had almost two times the murder rate of Manhattan (30% vs 17%)

 3) In the visualisation below, using data for the most recent year (2021), we explore the races of the perpetrators and victims of shootings. This will show if there is a disproportionate representation of one or more races in shooting incidents.   
 Note: The data in this plot is limited to the instances where the race of the perpetrator and victim are known. Hence, the data has been filtered within the chart.

In [67]:
rect = alt.Chart(shooting[shooting['perp_race'].notnull()]).mark_rect( tooltip= True).encode(
    x= alt.X('perp_race', axis = alt.Axis(title= 'Perpetrator Race', labelAngle=-45)),
    y= alt.Y('vic_race', axis = alt.Axis(title= 'Victim Race'),),
    color = alt.Color('count()', scale= alt.Scale(scheme='reds'))
    ).transform_filter(
        datum.year == 2021
    )


circ = rect.mark_point().encode(
    alt.ColorValue('black'),
    alt.Size('count()',  legend=alt.Legend(title='Crime in Selection'))
    )
# add chart of total population

alt.layer(rect,circ).properties( title= 'Perpetrators and Victims of New York City Shootings by Race in 2021',
    width=400,
    height=400
)

**Observation(s)** : Blacks were overwhelmingly both the perpetrators and victims of shootings in New York City in 2021. Out of the 962 shootings in 2021 where the races of the perpetrators and victims are known, 643 were perpetrated by Blacks; Blacks were the victims in 617 cases and Black-on-Black crime totalled 496. Considering that Blacks represent only 23.4% of the total New Yok City Population [[4]](https://www.census.gov/quickfacts/newyorkcitynewyork), this is disproportionately high.

 4) Similarly, In the visualisation below, using data for the most recent year (2021), we explore the age groups of the perpetrators and victims of shootings. This will show if there is a disproportionate representation of one or more age groups in shooting incidents.   
 Note: The data in this plot is limited to to the instances where the age groups of the perpetrators and victims are known. Hence, the data has been filtered within the chart.

In [68]:
rect = alt.Chart(shooting[(shooting['perp_age_group'].notnull()) & (shooting['perp_age_group'] != 'UNKNOWN') & (shooting['vic_age_group'] != 'UNKNOWN')]).mark_rect(tooltip= True).encode(
    x= alt.X('perp_age_group', sort=['<18','18-24','25-44','45-64','65+'], axis = alt.Axis(title= 'Perpetrator Age')),
    y= alt.Y('vic_age_group', sort=['<18','18-24','25-44','45-64','65+'], axis = alt.Axis(title= 'Victim Age')),
    color = alt.Color('count()', scale= alt.Scale(scheme='reds'))
    ).transform_filter(
        datum.year == 2021
    )


circ = rect.mark_point().encode(
    alt.ColorValue('black'),
    alt.Size('count()', legend=alt.Legend(title='Crime in Selection'))
    )
# add chart of total population

alt.layer(rect,circ).properties( title= 'Perpetrators and Victims of New York City Shootings by Age in 2021',
    width=400,
    height=400
)

**Observation(s)** : Most perpetrators and victims of shootings in NYC are youths and middle aged individuals (age 18-44).   
The incidence rate is particularly high in the 25-44 age group.  Out of the 961 shootings in 2021 where the ages of the perpetrators and victims are known, 342 were perpetrated by members of this age group on members of the same group.

5) In the visualisation below, using data for the most recent year (2021), we explore the time of the day when shootings occur in NYC. To do this, a new column is _time_of_day_ is created by splitting the 24 hour clock into 4 equal parts (12 hours each) as follows:      
Morning:    0:00 to 5:59   
Afternoon:  6:00 to 11:59   
Night:      12:00 to 17:59   
Midnight:   18:00 to 23:59   

In [69]:
conditions= [
    (shooting['occur_time'] >= pd.to_datetime('0:00')) &  (shooting['occur_time'] < pd.to_datetime('6:00')),
    (shooting['occur_time']>= pd.to_datetime('6:00')) &  (shooting['occur_time'] < pd.to_datetime('12:00')),
    (shooting['occur_time'] >= pd.to_datetime('12:00')) &  (shooting['occur_time'] < pd.to_datetime('18:00')),
    (shooting['occur_time'] >= pd.to_datetime('18:00')) &  (shooting['occur_time'] <= pd.to_datetime('23:59')),
    
]

choices = ['Midnight','Morning','Afternoon','Night']
shooting['time_of_day'] = np.select(conditions, choices, default=np.nan)

In [70]:
base = alt.Chart(shooting[shooting.year == 2021]).transform_joinaggregate(
    Total1='count(incident_key)',
    groupby= ['time_of_day']
).transform_joinaggregate(
    Total2='count(incident_key)'    
).encode(
    theta = alt.Theta('percentage:Q', stack="normalize"),    
    order= alt.Order('count():Q', sort= 'ascending'),
    tooltip= ['time_of_day', alt.Tooltip('percentage:Q', format=".0%")]).properties(
        title= 'Shootings in New york City by Time of Day in 2021 (%) '            
).transform_calculate( 
            percentage = (datum.Total1/ datum.Total2)
)


pie = base.mark_arc(outerRadius=120).encode(color= 'time_of_day:N')

text = base.mark_text(radius=100).encode(text= alt.Text("percentage:Q", format=".0%"))

pie + text

**Observation(s)** : Most shootings occur at night and midnight.

6) Finally, using data for the most recent year (2021), we explore shootings per borough in New York City, comparing total shootings per borough to shootings per capita. 

To do this, we go through several steps, cuminating in two geoplots showing the total shootings and shootings per capita across the  boroughs of NYC.

Each step is highlighted in the repective blocks of code below:

In [71]:
# group total number of shootings by borough, and keep only required columns 
boro_shooting = shooting[shooting['year']==2021][['boro','incident_key']].groupby('boro').count(
).reset_index().rename(columns= {'incident_key':'number_of_shootings'})
boro_shooting

Unnamed: 0,boro,number_of_shootings
0,BRONX,701
1,BROOKLYN,631
2,MANHATTAN,343
3,QUEENS,296
4,STATEN ISLAND,40


In [72]:
# merge the boro_shooting, boro_boundary and boro_pop datasets into a single dataset 'shooting_combined'
shooting_combined = pd.merge(left=boro_boundary, right= pd.merge(left= boro_shooting, right= boro_pop, on= 'boro', how= 'left'), on= 'boro', how= 'left')
shooting_combined['shootings_per_million'] = round((shooting_combined['number_of_shootings']/shooting_combined['population'])*1000000)
shooting_combined.head(5)

Unnamed: 0,boro_code,boro,shape_area,shape_leng,geometry,number_of_shootings,population,shootings_per_million
0,5,STATEN ISLAND,1623620725.06,325917.353702,"MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ...",40,487155,82.0
1,2,BRONX,1187182350.92,463176.004334,"MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ...",701,1446788,485.0
2,3,BROOKLYN,1934229471.99,728263.543413,"MULTIPOLYGON (((-73.86327 40.58388, -73.86381 ...",631,2648452,238.0
3,1,MANHATTAN,636520830.696,357564.317228,"MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ...",343,1638281,209.0
4,4,QUEENS,3041418543.49,888199.780587,"MULTIPOLYGON (((-73.82645 40.59053, -73.82642 ...",296,2330295,127.0


In [73]:
# Two geomaps comparing the total shootings against shootings_per_million across the 5 boroughs of New york City

chart_a= alt.Chart(shooting_combined).mark_geoshape(stroke='black').encode(
    alt.Color("number_of_shootings", scale=alt.Scale(scheme='reds'), legend=alt.Legend(
        orient='none',
        legendX= 20, legendY=20,
        direction='horizontal',
        titleAnchor='middle')),
    alt.Tooltip(["boro_code","boro", 'number_of_shootings', 'population'])
).properties( title= 'Number of Shootings by New York City Borough in 2021').project("naturalEarth1")


chart_b= alt.Chart(shooting_combined).mark_geoshape(stroke='black').encode(
    alt.Color("shootings_per_million", scale=alt.Scale(scheme='reds'), legend=alt.Legend(
        orient='none',
        legendX= 20, legendY=20,
        direction='horizontal',
        titleAnchor='middle')),
    alt.Tooltip(["boro_code","boro", 'shootings_per_million', 'population'])
).properties( title= 'Shootings per Million by New York City Borough in 2021').project("naturalEarth1")



alt.hconcat(
   chart_a, chart_b 
).resolve_scale(
    color='independent'
).configure(background='#DDEEFF')

 **Observation(s)** : Although Brooklyn has the highest population, it has the second highest number of shootings.
Bronx with about half the population of Brooklyn has the highest number of shootings. Bronx also has, by far, the highest shootings per capita amongst the 5 boroughs of New york City.   
Staten Island, the least populated borough, has the lowest number of shootings and the lowest shootings per capita.

#### Summary of all key Findings

1) After many years of steady decline there has been a sharp uptick in shootings in New York City in recent years
2) Overall, about 20% of shootings in New York City resulted in murder. However, this percentage is significantly higher in Staten Island, about 30%.
3) The Black community is the most affected by shootings in New York City, both as perpetrators and victims.
4) The youth and Middle age groups are most affected by shootings in New York City, both as perpetrators and victims.
5) Most shootings occur at night and midnight.
6) The Bronx borough is a hotspot for shootings in New York City.

#### Conclusion

This project has explored the shooting epidemic in New york City, revealing insights about shootings trends, profiles of perpetrators and victims, and locations and time_of day when shootings happen. It is important to note that there are several factors , which were not explored in this project, that affect crime rates in general. These include socio-economic, cultural, family background, population density, mental health amongst others. 


#### Project Limitations
i) In many instances, the identities of shooters are unknown. It is possible, in theory, that some communities have more strict policing, and pepetrators are more likley to be caught in these communities than in other communities.   
ii) Other important varibles such as socio-economic factors have not been considered in this project. It is possible, for instance, that the high rate of shootings in the Black community and the Bronx  borough is merely a reflection of high rate of unemployment and uneducation in that community.
 

##### Recommendations

In order to reduce the incidence of shootings in New York City, it is recommended that further research  should be done to compare socio-economic, family background, gun ownership, and socio-cultural factors in high and low risk communities with regards to crime in NYC in general and shootings in particular. The will lead to better understanding of the factors that make the Black community, Youth and Middle aged, and the Bronx borough high risk groups for shootings.   
With these insights, city administrators will be better equipped to reduce the incidence of shootings in New York City

##### Sources

Datasets

###### [1. shootings](https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8)      
###### [2. population](https://data.cityofnewyork.us/City-Government/New-York-City-Population-by-Borough-1950-2040/xywu-7bv9)   
###### [3. boundary](https://data.cityofnewyork.us/browse?q=map%20borough&sortBy=relevance)


Other References

###### [1. leading cause of deaths ](https://www.cdc.gov/injury/wisqars/pdf/leading_causes_of_injury_deaths_highlighting_violence_2018-508.pdf)  
###### [2. Death rate by firearm](https://www.kff.org/other/state-indicator/firearms-death-rate-per-100000/?currentTimeframe=0&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D)   
###### [3. Population in NYC by race decription](https://www.census.gov/quickfacts/newyorkcitynewyork)   
###### [4. Shootings in London 2021/2022](https://www.census.gov/quickfacts/newyorkcitynewyork) 




