# Stop and Frisk - Portfolio - Analysis and Visualisation


This portfolio will attempt to uncover insights into the NYPD's stop and frisk practices in 2016. During the 2019/2020 Democratic campaign, former New York City Mayor (and billionaire) Michael Bloomberg was a democratic candidate for the 2020 presidential election. Numerous times during the democratic debates in the campaign, the candidates were asked about a range of racial and injustice issues. Michael Bloomberg's stop and frisk policy was a contentious issue for him. It was responsible for a massive increase in innocent people being stopped, having zero effect on crime in the city and the racial targeting of black and latinx communities across New York City. This data set is from 2018, long after Bloomberg's mayorship has finished however the same issues of racial bias and targeting still apply to the NYPD practices today.

The data for this report is taken from the NYPD data base and crime statistics unit. The data for Stop, Question and Frisk data, dates back to 2003. Earlier data sets are less detailed and do not contain the complexities of the data sets from 2016 onwards. 

It is taken from this NYPD website 
https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page

Disclaimers when analysing this data. 

When an individual is stopped on the street it can be for a number of reasons. 
1. Based on self initiated 
2. Based on radio run 
3. Based on c/w scene 

This policy is known as Stop, Question and Frisk. This data set contains information only if an individual was frisked or searched. There is data that indicates both the length of the police observation and the length of the stop is innacurate and is therefore not of value to this report. 

A stop or even an arrest does not necessarily mean that the individual was guilty of a crime. The data set contains an arrestable offense but it does not include information if the legal process was followed through and the individual was innocent or guilty. 

The following questions will be answered in this portfolio. 

1. What trends can we uncover about the ages of the people that have been stopped and frisked? 
2. What percentage of people stopped and frisked were innocent?
3. Is there are clear racial bias in the targeting of Black and Latinx communities? 
4. What was the main reason why people were stopped?
5. Where are you more likely to be stopped in this city if you are black, white and latinx?
6. Is there a correlation between the reason why someone was pulled over and their innocence? 
 

First, let's prep our datasets.

In [1]:
import pandas as pd
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

In [2]:
StopAndFriskdf = pd.read_csv("/Users/jordancreenaune/Desktop/Python_for_Data_Science/Jordan_Projects/StopAndFrisk/2018_sqf_database.csv")

#A snapshot of the data.
StopAndFriskdf.head(3)

Unnamed: 0,STOP_FRISK_ID,STOP_FRISK_DATE,Stop Frisk Time,YEAR2,MONTH2,DAY2,STOP_WAS_INITIATED,RECORD_STATUS_CODE,ISSUING_OFFICER_RANK,ISSUING_OFFICER_COMMAND_CODE,...,STOP_LOCATION_SECTOR_CODE,STOP_LOCATION_APARTMENT,STOP_LOCATION_FULL_ADDRESS,STOP_LOCATION_PREMISES_NAME,STOP_LOCATION_STREET_NAME,STOP_LOCATION_X,STOP_LOCATION_Y,STOP_LOCATION_ZIP_CODE,STOP_LOCATION_PATROL_BORO_NAME,STOP_LOCATION_BORO_NAME
0,1,1/1/18,19:04:00,2018,January,Monday,Based on C/W on Scene,APP,POM,1,...,G,(null),VARICK STREET && FRANKLIN STREET,(null),VARICK STREET,982327,201274,(null),PBMS,MANHATTAN
1,2,1/1/18,23:00:00,2018,January,Monday,Based on Radio Run,APP,POM,34,...,C,(null),DYCKMAN STREET && POST AVENUE,(null),DYCKMAN STREET,1004892,253548,(null),PBMN,MANHATTAN
2,3,1/1/18,23:55:00,2018,January,Monday,Based on Radio Run,APP,POM,808,...,B,4M,2245 RANDALL AVENUE,(null),RANDALL AVENUE,1026706,237776,(null),PBBX,BRONX


In [3]:
#Printing last three rows
StopAndFriskdf.tail(3)


Unnamed: 0,STOP_FRISK_ID,STOP_FRISK_DATE,Stop Frisk Time,YEAR2,MONTH2,DAY2,STOP_WAS_INITIATED,RECORD_STATUS_CODE,ISSUING_OFFICER_RANK,ISSUING_OFFICER_COMMAND_CODE,...,STOP_LOCATION_SECTOR_CODE,STOP_LOCATION_APARTMENT,STOP_LOCATION_FULL_ADDRESS,STOP_LOCATION_PREMISES_NAME,STOP_LOCATION_STREET_NAME,STOP_LOCATION_X,STOP_LOCATION_Y,STOP_LOCATION_ZIP_CODE,STOP_LOCATION_PATROL_BORO_NAME,STOP_LOCATION_BORO_NAME
11005,11006,12/31/18,23:20:00,2018,December,Monday,Based on Radio Run,APP,POM,32,...,D,(null),WEST 145 STREET && 7 AVENUE,(null),WEST 145 STREET,1001112,238628,(null),PBMN,MANHATTAN
11006,11007,12/31/18,23:20:00,2018,December,Monday,Based on Radio Run,APP,POM,32,...,D,(null),WEST 145 STREET && 7 AVENUE,(null),WEST 145 STREET,1001112,238628,(null),PBMN,MANHATTAN
11007,11008,12/31/18,22:43:00,2018,December,Monday,Based on Radio Run,APP,POM,803,...,B,(null),505 GATES AVENUE,(null),GATES AVENUE,999081,189462,(null),PBBN,BROOKLYN


In [4]:
#Calculating basic descriptive statistics
StopAndFriskdf.describe()

Unnamed: 0,STOP_FRISK_ID,YEAR2,ISSUING_OFFICER_COMMAND_CODE,SUPERVISING_OFFICER_COMMAND_CODE,OBSERVED_DURATION_MINUTES,STOP_DURATION_MINUTES,STOP_LOCATION_PRECINCT,STOP_LOCATION_X,STOP_LOCATION_Y
count,11008.0,11008.0,11008.0,11008.0,11008.0,11008.0,11008.0,11008.0,11008.0
mean,5504.5,2018.0,183.26835,184.168241,21.647075,11.583757,60.983739,1004989.0,207880.290425
std,3177.880216,0.0,268.829442,268.496744,989.025666,19.137491,32.838054,19183.11,29819.550356
min,1.0,2018.0,1.0,1.0,0.0,0.0,1.0,914803.0,122284.0
25%,2752.75,2018.0,43.0,43.0,1.0,5.0,34.0,994705.0,184335.0
50%,5504.5,2018.0,73.0,75.0,1.0,8.0,62.0,1003550.0,207011.0
75%,8256.25,2018.0,113.0,113.0,2.0,15.0,83.0,1014981.0,235551.0
max,11008.0,2018.0,879.0,881.0,99999.0,999.0,123.0,1065899.0,271349.0


In total, the data contains the information and data of 11,008 stop and frisk instances. In this next section, we will begin mining the dataset for answers to the following questions:

1. Are there any trends regarding age of the people that were stopped and frisked? 
2. What was the reason why they were stopped? 


In [5]:
# Removing nulls - from suspect reported age
stopfrisk_cleaned = StopAndFriskdf[StopAndFriskdf.SUSPECT_REPORTED_AGE != '(null)']
total_rows = len(StopAndFriskdf)
print("Remaining rows of data: " + str(len(stopfrisk_cleaned)))
print('Percentage of data points remaining: ' + str((int(len(stopfrisk_cleaned))/(total_rows))*100))

Remaining rows of data: 10212
Percentage of data points remaining: 92.7688953488372


In [6]:

sus_age_clean = pd.to_numeric(stopfrisk_cleaned['SUSPECT_REPORTED_AGE'])
#Plotting histogram
plt.style.use('fivethirtyeight')
plt.figure(figsize=(10,5))
sns.countplot(sus_age_clean.astype(int))
plt.title("Suspected Age of Individual", fontsize = 20)
plt.xlabel('Age', fontsize = 30)
#plt.xticks(rotation = 90)
plt.xticks(fontsize = 8, rotation = 75)

plt.ylabel('Number of individuals', fontsize = 15)


<IPython.core.display.Javascript object>

Text(0, 0.5, 'Number of individuals')

In [7]:
#sus_age_clean[sus_age_clean >14 &  < 20].count()
#Remove data that is younger than 14. This data is not reliable. That is not to say that individuals that are 14 or younger were not stopped questioned or frisked. Data that details individuals stopped and frisked before the ages of 14 is not reliable according to the practices by the NYPD. 
((14 < sus_age_clean) & (sus_age_clean < 21)).sum()

2633

The data above represents 93% (rounded) of the data set which is good considering a cleaned data set when looking at a data set this large, the remaining 8% did not contain age values and that data has been removed. The majority of the individuals that were stopped and frisked in NYC in 2018 were between the ages of 15 and 25. Then there is a steady decline of the number of people stopped and frisked and their ages in this data set with the oldest person being stopped was 87. There was also some reported ages of 12 and below, here is no documentation from the NYPD that accompanied this data set that explains this anomaly. 

Possible reasons for the information that is highlighted in this graph could be that targeting of young people between the ages of 17 and 27 for suspected crimes. People at this age in a population dense city with minimal public spaces are often forced to socialise on the street and therefore making them more visible to law enforcement. Later in this report we will investigate the racial disparity of the stop and frisk policy and pair it with age data. We can then determine if there was racial and age bias in the stop and frisk practices of the NYPD. 

Next we will sort these ages into bins and categorise them into age groups. They will be separated into the following. <15, 15 - 20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90 and 90+

In [8]:
#Processing age data into ranges
#Create bins of ages that we will sort the data into 
#Create labels for those bins 
bins = [0, 14, 20, 30, 40, 50, 60, 70, 80, 90, np.inf]
range_names = ['<15', '15-20', '21-30', '31-40', '41-50', '51-60', '61-70', '71-80', '81-90','90+']

age_range = pd.cut(sus_age_clean, bins, labels = range_names)

In [9]:
plt.style.use('fivethirtyeight')
plt.figure(figsize=(10,5))
sns.countplot(age_range)
plt.title('Distribution individuals who were stopped and frisked', fontsize = 15)
plt.xlabel('Age ranges', fontsize = 25)
plt.ylabel('Number of individuals', fontsize = 10)

<IPython.core.display.Javascript object>

Text(0, 0.5, 'Number of individuals')


Final Observations

This distribution presents an immediately noticeable trend, namely that - the majority of individuals who were stopped and frisked in New York City in 2018 were between the ages of 21 and 30. This generally fits the profile of young males in the city being racially targeted through this practice. In later analysis of this report we will explore the added complexities of racial profiling as well as age as a result of such practices by the NYPD. 



Part B - Examining innocence and guilt 

What was apparent as part of this practice by the NYPD was the lack of legal oversight that jeapardised one of the main tenets of the United States justice system. The legal precedent of Innocent until proven guilty was again highlighted through this practice. Many citizens were stopped and frisked on the street without cause, without reason and without commiting a crime. This section will explore the innocence of people within data set. It must also be kept in mind when reviewing this data that it has the potential to be incomplete. It is entirely likely that citizens were stopped and frisked and the officer did not record the data. 

It must also be acknowledged when looking at this data that a suspect being arrested or detained does not necessarily result in their guilt. There are many cases where citizens are arrested, they later are determined innocent through insufficient evidence or court proceedings which result in their innocence. 

1. What percentage of people stopped and frisked were innocent?
2. Is the a correlation between why they were stopped and their innocence or arrest?


In [10]:
#Collect data from the data set - arrested flag. 
SUSPECT_ARRESTED_FLAG = StopAndFriskdf['SUSPECT_ARRESTED_FLAG']

SusArrestCount= SUSPECT_ARRESTED_FLAG.value_counts()
SusArrestValue = SUSPECT_ARRESTED_FLAG.unique()

In [11]:
#pie chart demonstrates Arrested vs innocent

plt.figure(3)
labels = SusArrestValue

sizes = (SusArrestCount / total_rows) * 100
colors = ['mediumseagreen', 'indianred']

def absolute_value(val):
    a  = np.round(val/100.*sizes.sum(), 2)
    return a

plt.pie(sizes, labels=labels, colors=colors,
        autopct=absolute_value, shadow=True)
plt.title('Arrest Percentages 2018 NYPD Stop and Frisk')

plt.axis('equal')
plt.show()

<IPython.core.display.Javascript object>

This pie chart clearly demonstrates that 71.7% of citizens stopped and frisked in NYC in 2018 were innocent and were not arrested. They were stopped, frisked and questioned and then send on their way. This means that 28.3% (rounded) of citizens that were stopped and frisked were arrested for a crime. Please note that this does not necessarily mean that they were guilt of a crime. There is no data in this data set that tracks their arrest to a conviction.  There are a number of conflicting arguments and questions that are raised at this point. Law enforcement argue that this practice deters crime in the city, being more visible and stopping citizens suspected of crimes reduces the ability for people to commit crimes. Using this data set, this is difficult to determine. Crime statistcs are dependent on a range of other factors including various social, economic and demographic influences as well as policy and law alterations. 

In [12]:
#This will contain male female comparison - percentage 
#Isolate values of sex count for the data set 
suspect_sex = StopAndFriskdf['SUSPECT_SEX']
SexCount = suspect_sex.value_counts()
SexValue = suspect_sex.unique()

#Pie Graph - Sex - Perecentages
plt.figure(4)
labels = SexValue
sizes = (SexCount / total_rows) * 100 #Average needs to reflect STOP_ID Count
colors = ['mediumseagreen', 'indianred', 'lightskyblue']

def absolute_value(val):
    a  = np.round(val/100.*sizes.sum(), 2)
    return a

plt.pie(sizes, labels=labels, colors=colors,
        autopct=absolute_value, shadow=True)
plt.title('Percentage Sex Comparison')
#plt.legend(loc=3)
plt.axis('equal')
plt.show()


<IPython.core.display.Javascript object>

This pie chart is a percentage breakdown with regard to sex. 90.16% of those people pulled over where men, 9.21% were women and 0.63% were individuals where their sex was not recorded. The reason for this was not defined in the documentation provided by the NYPD. There is no documentation around individuals who were pulled over who identify as non binary. 

In [13]:

#This will contain racial breakdown 
SUSPECT_RACE_DESCRIPTION = StopAndFriskdf['SUSPECT_RACE_DESCRIPTION']
#Black - White - LatinX – American Indian Native Alaskan Comparison – Percentages – Pie Graph 
RaceCount = SUSPECT_RACE_DESCRIPTION.value_counts()
RaceCountValue = SUSPECT_RACE_DESCRIPTION.unique()


#Pie Graph - Racial Breakdown of Stop and Frisk
#plt.figure(5)
plt.figure(figsize=(9,10))
#labels = 'Black', 'White Hispanic', 'White','Black Hispanic','Asian/Pacific Islander','Null','American Indian/Native Alaskan'
sizes = (RaceCount / total_rows) * 100
colors = ['mediumseagreen', 'indianred', 'lightskyblue','orange','darkslategrey','magenta','crimson']


def absolute_value(val):
    a  = np.round(val/100.*sizes.sum(), 2)
    return a
explode = (0, 0.05, 0.1, 0.15, 0.2, 0.5,0.7)
plt.pie(sizes, colors=colors,
        autopct=absolute_value, shadow=True, explode=explode)
plt.title('Percentage Racial Comparison')
plt.axis('equal')


plt.legend(labels=RaceCount.index,loc=3, prop={'size': 7})


plt.show()

#the graph below reflects the legend from the previous graph 

<IPython.core.display.Javascript object>

One of the major issues as to why this practice by the NYPD was so troublesome and plagued with issues was the fact that it allowed police officers to patrol neighbourhoods, target underserved communities and profile young people (predominantly males) of colour. This pie chart demonstrates a clear 56.7% majority of individuals who were stoppped were black. There was also a high concentration of other people of colour who identify as white hispanic, black hispanic, Asian/ Pacific islander and also Native Americans. What makes this information even more stark is that this by no way correlates to racial breakdown of the city. 


The following graph investigates the relationship between the reasons why people were stopped and frisked and their subsequent release or arrest. It will attempt to investigate the relationship between suspected crime and their innocence. 

In [14]:
# need to find a comparison between these statistics and city demographic statistics 


In [15]:

SuspectCrime = StopAndFriskdf['SUSPECTED_CRIME_DESCRIPTION'].value_counts()
#plt.figure(6)
plt.figure(figsize=(9,10))
SuspectCrime.plot(kind= 'bar')
plt.title('Suspected Crime Description Count')
plt.gcf().subplots_adjust(bottom=0.25)
#plt.show()


<IPython.core.display.Javascript object>

In [16]:
SuspectCrime

CPW                                            2962
ROBBERY                                        1605
ASSAULT                                        1485
PETIT LARCENY                                  1010
BURGLARY                                        844
OTHER                                           524
CRIMINAL TRESPASS                               484
GRAND LARCENY                                   440
GRAND LARCENY AUTO                              347
CRIMINAL POSSESSION OF MARIHUANA                255
MENACING                                        221
CRIMINAL MISCHIEF                               192
CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE      85
CRIMINAL SALE OF CONTROLLED SUBSTANCE            83
UNAUTHORIZED USE OF A VEHICLE                    83
RECKLESS ENDANGERMENT                            59
CPSP                                             58
AUTO STRIPPIG                                    49
MAKING GRAFFITI                                  42
FORCIBLE TOU

In [17]:
#Location STOP_LOCATION_BORO_NAME

STOP_LOCATION = StopAndFriskdf['STOP_LOCATION_BORO_NAME'].value_counts()

#plt.figure(6)
plt.figure(figsize=(9,10))
STOP_LOCATION.plot(kind= 'bar', color = 'mediumseagreen')

plt.title('Borough Stop Location')


plt.gcf().subplots_adjust(bottom=0.16)

#Need to make a comparative Graph of 

<IPython.core.display.Javascript object>

The graph below will demonstrate the likelihood of being stopped and frisked in NYC with regard to the borough people life and the population. Screen Shot 2020-11-12 at 8.29.23 PM![image.png](attachment:image.png)

This is from the NYC - Population - Current and projected populations July 2018. 
https://www1.nyc.gov/site/planning/planning-level/nyc-population/current-future-populations.page


In [21]:
#create data frame from the above information 

pop2018 = {'Borough':  ['Brooklyn', 'Manhattan','Bronx','Queens','Staten Island'],
        'Population': [2582830,1628701,1432132,2278906,476179]
         
        }

pop2018df = pd.DataFrame (data, columns = ['Borough','Population'])

print (pop2018df)

         Borough  Population
0      Brooklyen     2582830
1      Manhattan     1628701
2          Bronx     1432132
3         Queens     2278906
4  Staten Island      476179
