# EDA of Stop and Frisk Data
The goal of this notebook is to analyze stop and frisk data that is publicily available by the NYPD in aims to contribute to current events and work being done.

# Packages

In [1]:
# Set Up
import pandas as pd
import numpy as np

# These lines make warnings look nicer
import warnings
warnings.simplefilter('ignore', FutureWarning)

# Graphing
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
plt.rcParams['figure.figsize'] = (10,10)
import seaborn as sns
import plotly.express as px

# Loading and Cleaning Data
Our data comes from the NYPD's publicaly available stop, question, and frisk incidents. For now we will only look at 2019.

Source: https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page

In [2]:
#!pip install xlrd
raw = pd.read_excel('../data/sqf-2019.xlsx')

Becuase there are 83 features and over 13000 records, I'm going to break this dataset into smaller dataframes, clean seperately, then re-aggregate via the unique identifier key 'STOP_ID_ANONY'.

* location: describes the location of the SQF
* officer: describes the title, appearance, and behaviors of the officer
* suspect: describes the physical characteristics of the suspect
* sentiment: contains descriptions of the suspect from the perspective of the officer, useful for sentiment analysis

In [13]:
location = raw[['STOP_ID_ANONY', 'STOP_FRISK_DATE', 'STOP_FRISK_TIME',
                'YEAR2', 'MONTH2', 'DAY2','LOCATION_IN_OUT_CODE', 'JURISDICTION_CODE', 
                'JURISDICTION_DESCRIPTION', 'STOP_LOCATION_PRECINCT', 'STOP_LOCATION_SECTOR_CODE'
                ,'STOP_LOCATION_APARTMENT','STOP_LOCATION_FULL_ADDRESS','STOP_LOCATION_STREET_NAME'
                ,'STOP_LOCATION_X','STOP_LOCATION_Y','STOP_LOCATION_ZIP_CODE',
                'STOP_LOCATION_PATROL_BORO_NAME','STOP_LOCATION_BORO_NAME']]
location.shape
#location.to_csv('../data/location.csv')

(13459, 19)

In [12]:
officer = raw[['STOP_ID_ANONY', 'ISSUING_OFFICER_RANK', 'ISSUING_OFFICER_COMMAND_CODE','SUPERVISING_OFFICER_RANK'
               ,'SUPERVISING_OFFICER_COMMAND_CODE','OFFICER_IN_UNIFORM_FLAG','ID_CARD_IDENTIFIES_OFFICER_FLAG'
               ,'SHIELD_IDENTIFIES_OFFICER_FLAG','VERBAL_IDENTIFIES_OFFICER_FLAG']]
officer.shape
#officer.to_csv('../data/officer.csv')

(13459, 9)

In [14]:
suspect = raw[['STOP_ID_ANONY','SUSPECTED_CRIME_DESCRIPTION', 'SUSPECT_ARRESTED_FLAG',
              'FRISKED_FLAG','SEARCHED_FLAG','ASK_FOR_CONSENT_FLG'
               ,'CONSENT_GIVEN_FLG','OTHER_CONTRABAND_FLAG','FIREARM_FLAG'
               ,'KNIFE_CUTTER_FLAG','OTHER_WEAPON_FLAG','WEAPON_FOUND_FLAG'
               ,'SUSPECTS_ACTIONS_CASING_FLAG','SUSPECTS_ACTIONS_CONCEALED_POSSESSION_WEAPON_FLAG'
               ,'SUSPECTS_ACTIONS_DECRIPTION_FLAG','SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG'
               ,'SUSPECTS_ACTIONS_IDENTIFY_CRIME_PATTERN_FLAG','SUSPECTS_ACTIONS_LOOKOUT_FLAG'
               ,'SUSPECTS_ACTIONS_OTHER_FLAG','SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG'
               ,'SUSPECT_REPORTED_AGE','SUSPECT_SEX','SUSPECT_RACE_DESCRIPTION'
               ,'SUSPECT_HEIGHT','SUSPECT_WEIGHT','SUSPECT_BODY_BUILD_TYPE'
               ,'SUSPECT_EYE_COLOR','SUSPECT_HAIR_COLOR']]
suspect.shape
#suspect.to_csv('../data/suspect.csv')

In [10]:
sentiment = raw[['STOP_ID_ANONY', 'OFFICER_NOT_EXPLAINED_STOP_DESCRIPTION', 
                 'OFFICER_NOT_EXPLAINED_STOP_DESCRIPTION',
                 'DEMEANOR_OF_PERSON_STOPPED', 'SUSPECT_OTHER_DESCRIPTION']]
sentiment.shape
#sentiment.to_csv('../data/sentiment.csv')

(13459, 5)

The following code cell prints out the intital datatype assigned to each column by pandas import data function. Use at your own discretion 

# location.csv Change and Description Log
The text below describes all the changes done via Tableau Prep:

* replace (null) calues with Null (interpretable for python)
* stripped unnecessary information from the STOP_FRISK_TIME via a RegEx split
* removed Year, unnecessary

### TODO: populate missing 'STOP_LOCATION_ZIP_CODE" (>99% of the data is missing zipcode) with a reverse geocode, 
* 'STOP_LOCATION_FULL_ADDRESS' has one vague format 'streetname1 && streetname2' represent an intersection and another with the specific location

* the remaining Null values in 'LOCATION_IN_AND_OUT', 'JURISDICTION_CODE' and 'JURISDICTION DESCRIPTION' are not worth the time investment to clean

# officer.csv Change and Description Log
The text below describes all the changes done via Tableau Prep:
* dropped 2 rows (STOP_ID 1164, 4549) due to missing rank
* converted Y/N columns to 1/0 dummy columns

# suspect.csv Change and Description Log
The text below describes all the changes done via Tableau Prep:
* dropped 417 rows due to poor clerical entry for dummy variable 
* converted Y/N columns to 1/0 dummy columns

# sentiment.csv Change and Description Log
The text below describes all the changes done via Tableau Prep:
* dropped 171 rows due to missing values