Racial Disparity and Other Predictive Features of Occurrences of Cannabis Arrests in NYC

Harcourt & Ludwig, 2006
Levine, 2017
Mueller, Gebeloff, & Chinoy, 2018
SAMHSA, 2016

As has been reported (Levine, 2017; Harcourt & Ludwig, 2006; Mueller, Gebeloff, & Chinoy, 2018), data on low-level cannabis possession arrests in New York City have shown that they have been predominantly of young African-American and Latino men since at least 1987. Given the history of the drug war under President Nixon and earlier under Harry Anslinger during his years as the first Commissioner of the U.S. Treasury Department's Federal Bureau of Narcotics, this racial disparity of low-level cannabis arrests has likely remained constant since the Marijuana Tax Act of 1937 passed, effectively making the plant illegal. Data from the Substance Abuse and Mental Health Services Administration of the U.S. Department of Health and Human Services shows consistently that people of different racial groups use cannabis at effectively the same rate (SAMHSA, 2016).

This disparity continues to this day through the mayoral transition to Mayor DeBlasio from Mayor Bloomberg's policy era of stop-and-frisk, even while cannabis arrests have dropped since their height around 2011 (Levine). At the same time, overall crime has dropped in New York City (NYT source). The New York Police Department (NYPD) has been pressed to explain this disparity, and has responded by saying that it is due to the fact that they receive more cannabis-related complaints from neighborhoods which are predominantly occupied by African-American and Latino residents. 

The New York Times has done an analysis exploring this claim, and has shown that even between neighborhoods that have the same level of cannabis-related complaints, more cannabis arrests occur in neighborhoods with a majority of African-American and Latino residents (Mueller, Gebeloff, & Chinoy, 2018). One explanation for the racial disparity is that these neighborhoods are often more policed because of the higher rate of violent crimes there. Another explanation is that when people are arrested for cannabis, NYPD officers are able to check for open warrants and are therefore a way for police officers to cut down on other types of crime through these arrests. But these explanations do not fully illustrate the reasons that this racial disparity in low-level of cannabis arrests persists during an era of criminal justice reform. 

While looking at both low-level and more serious cannabis arrests including felony sales, this report aims to provide a more full picture of the factors that trigger cannabis arrests in New York City. In order to do so, machine learning classification methods will be applied to predict the following six target variables: misdemeanor cannabis possession, violation cannabis possession, felony cannabis possession, misdemeanor cannabis sales, felony cannabis sales, and cannabis crimes as a group. Violation sales were not used as a target variable as there were no cases designated as such.

These methods will be used on all cannabis crimes between January 1st, 2006 and December 31st, 2018 in New York City as reported by the NYPD's Complaint Data historic dataset. A set of features from the original dataset and a set of features derived from this data will be used to create a model that will identify several salient predictors of cannabis arrests in New York City during modern times. Hopefully this project will present a fuller image of cannabis arrests that can be used to improve drug policy in New York City and in the rest of the country.

The NYPD Complaint Data Historic dataset contains data on all valid misdemeanor, violation, and felony crimes reported to the New York Police Department between 2006 and 2018. It is openly supplied to the public through the NYC Open Data project at https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i. 

This dataset of all NYC crimes was filtered to only include cannabis arrests, and then the six target features were created as described above from penal code and law category features native to the original dataset. Missing data was filled in or dropped as described in the data cleaning report. A datetime feature was created from separate date and time features for when the crime was committed, and for when the crime ended if the crime was recorded as spanning more than one day. Duration of the crime was also computed. The raw crime start time feature was used to create a set of time-window features that may be predictive of cannabis crimes. The distance of each cannabis crime from prominent NYC landmarks was encoded into continuous data features. Isolated year, month, and date features were extracted from the crime start datetime feature. Along with being useful data to have on their own, these extracted features were used to define holidays. Cases outside of the stated year range of the dataset were dropped, that is, cases earlier than 2006. Unclear values were recoded to ‘unknown’ for the suspect and victim age group, race, and sex features. This cleaned dataset was then exported for traditional exploratory data. Another dataset for machine learning purposes was exported after binary features were created from several categorical features via pandas' .get_dummies() method. Several other features were dropped from this machine learning dataset as they would have interfered with machine learning functions. Further details are in the data cleaning report.

Notably, suspect sex, age category, and race data was only available for approximately 35,000 cannabis arrests (approximately 16% of the dataset). The NYC Open Data project was contacted, and they verified this information was correct, as police officers are not required to record this demographic information. 

In [21]:
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
pd.set_option('display.max_columns', 550)
pd.set_option('display.width', 1000)

In [3]:
nyc = pd.read_csv('nyc_cann_no_dummies_for_EDA.csv', index_col=0)

Exploring the dataset with some basic exploratory functions can help to identify the pool of predictive features both built into the dataset and derived from the dataset, which was done during the data cleaning phase. 

In [9]:
nyc.shape

(220305, 133)

The dataset has 220,305 cannabis arrests, and 133 features. In the machine learning dataset, this feature set is expanded to 822 after running pandas' .get_dummies() method on the categorical features. These features are maintained as categorical during this section of the report for more traditional exploratory data analysis. Further basic exploratory methods are called for reference.

In [5]:
nyc.head()

Unnamed: 0,CMPLNT_NUM,CMPLNT_FR_DT,CMPLNT_FR_TM,CMPLNT_TO_DT,CMPLNT_TO_TM,ADDR_PCT_CD,RPT_DT,KY_CD,OFNS_DESC,PD_CD,PD_DESC,CRM_ATPT_CPTD_CD,LAW_CAT_CD,BORO_NM,LOC_OF_OCCUR_DESC,PREM_TYP_DESC,JURIS_DESC,HADEVELOPT,X_COORD_CD,Y_COORD_CD,TRANSIT_DISTRICT,Latitude,Longitude,Lat_Lon,PATROL_BORO,STATION_NAME,possession,sales,misdemeanor,violation,felony,misd_poss,viol_poss,felony_poss,misd_sales,viol_sales,felony_sales,cann_crimes_overall,date_time_start,date_time_end,day_tw,night_tw,early_morn,morn_rush_hr,work_day,lunch_hr,eve_rush_hr,dinner,evening,late_night,wtc_taxi,wtc_crow,nyse_taxi,nyse_crow,bk_bridge_taxi,bk_bridge_crow,city_hall_taxi,city_hall_crow,manh_bridge_taxi,manh_bridge_crow,will_bridge_taxi,will_bridge_crow,wash_sq_park_taxi,wash_sq_park_crow,union_sq_taxi,union_sq_crow,penn_station_taxi,penn_station_crow,times_sq_taxi,times_sq_crow,rock_center_taxi,rock_center_crow,empire_st_bldg_taxi,empire_st_bldg_crow,lincoln_ctr_taxi,lincoln_ctr_crow,central_pk_taxi,central_pk_crow,apollo_th_taxi,apollo_th_crow,yankee_stad_taxi,yankee_stad_crow,mets_stad_taxi,mets_stad_crow,queens_taxi,queens_crow,prospect_pk_taxi,prospect_pk_crow,downtown_bk_taxi,downtown_bk_crow,si_ferry_taxi,si_ferry_crow,port_authority_taxi,port_authority_crow,nypd_hq_taxi,nypd_hq_crow,mdc_taxi,mdc_crow,rikers_taxi,rikers_crow,nysc_taxi,nysc_crow,duration,start_year,start_month,start_day,new_years_day,new_years_eve,christmas_eve,christmas,july_4th,valentines,halloween,st_patricks,mlk,pres,easter,diwali,pr_parade,yomkippur,rosh_hashanah,eid_al_fitr,eid_al_adha,hannukkah,memorial_day,labor_day,thanksgiving,SUSP_AGE_GROUP_cleaned,SUSP_RACE_cleaned,SUSP_SEX_cleaned,VIC_AGE_GROUP_cleaned,VIC_RACE_cleaned,VIC_SEX_cleaned
148,498164466,12/31/2018,21:55:00,12/31/2018,22:12:00,62.0,12/31/2018,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BROOKLYN,unknown,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,979947.0,160366.0,not_transit_related,40.606851,-74.015498,"(40.60685112, -74.015498354)",PATROL BORO BKLYN SOUTH,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2018-12-31 21:55:00,2018-12-31 22:12:00,0,1,0,0,0,0,0,0,1,0,0.107947,0.10587,0.104259,0.100115,0.117847,0.100976,0.115347,0.106365,0.125347,0.103635,0.149847,0.115121,0.142147,0.125278,0.153447,0.131335,0.165747,0.145422,0.183147,0.155275,0.188647,0.156244,0.171347,0.144651,0.197647,0.168711,0.226147,0.183038,0.268547,0.213416,0.312047,0.239982,0.317647,0.225146,0.341947,0.251772,0.099847,0.070769,0.120247,0.094479,0.163867,0.138854,0.175047,0.152364,0.118055,0.105979,0.122647,0.110856,0.315747,0.226845,0.121047,0.108393,0 days 00:17:00.000000000,2018,12,31,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25-44,BLACK HISPANIC,M,unknown,unknown,unknown
536,145023256,12/31/2018,17:00:00,12/31/2018,17:07:00,26.0,12/31/2018,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,MANHATTAN,FRONT OF,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,997349.0,235298.0,not_transit_related,40.812513,-73.952681,"(40.812512958, -73.952680664)",PATROL BORO MAN NORTH,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2018-12-31 17:00:00,2018-12-31 17:07:00,1,0,0,0,1,0,1,0,0,0,0.160532,0.116831,0.16422,0.120794,0.150632,0.115235,0.153132,0.113121,0.143132,0.111718,0.118632,0.100859,0.126332,0.093102,0.115032,0.085706,0.102732,0.074158,0.085332,0.062258,0.079832,0.059773,0.097132,0.072116,0.070832,0.050506,0.042332,0.032229,0.005094,0.003602,0.043568,0.031515,0.164794,0.121603,0.242094,0.178895,0.168632,0.153185,0.148232,0.120683,0.432346,0.306695,0.093432,0.067176,0.150425,0.112202,0.145832,0.107957,0.086094,0.069449,0.147432,0.109811,0 days 00:07:00.000000000,2018,12,31,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,18-24,BLACK,M,unknown,unknown,unknown
899,286264352,12/30/2018,17:25:00,12/30/2018,17:29:00,109.0,12/30/2018,678,MISCELLANEOUS PENAL LAW,566.0,"MARIJUANA, POSSESSION",COMPLETED,VIOLATION,QUEENS,unknown,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,1030153.0,215586.0,not_transit_related,40.758299,-73.834309,"(40.758299326, -73.834309457)",PATROL BORO QUEENS NORTH,not_transit_related,1,0,0,1,0,0,1,0,0,0,0,1,2018-12-30 17:25:00,2018-12-30 17:29:00,1,0,0,0,1,0,1,0,0,0,0.22469,0.184805,0.228378,0.184276,0.21479,0.170764,0.21729,0.177714,0.20729,0.164529,0.18279,0.145145,0.19049,0.165294,0.17919,0.158382,0.16689,0.159377,0.150891,0.150192,0.144791,0.144391,0.16129,0.151714,0.163391,0.149865,0.155691,0.133379,0.167491,0.126809,0.163191,0.116308,0.01499,0.011946,0.069509,0.049589,0.23279,0.166628,0.21239,0.162561,0.496504,0.36438,0.15759,0.156494,0.214583,0.174569,0.20999,0.173672,0.086491,0.062314,0.21159,0.173341,0 days 00:04:00.000000000,2018,12,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25-44,BLACK,M,unknown,unknown,unknown
1114,606039781,12/30/2018,00:35:00,12/30/2018,00:35:00,6.0,12/30/2018,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,MANHATTAN,unknown,TRANSIT - NYC SUBWAY,N.Y. TRANSIT POLICE,not_housing_devpt_crime,983985.0,205857.0,2.0,40.731715,-74.000958,"(40.731714801, -74.000957613)",PATROL BORO MAN SOUTH,W. 4 STREET,1,0,1,0,0,1,0,0,0,0,0,1,2018-12-30 00:35:00,2018-12-30 00:35:00,0,1,0,0,0,0,0,0,0,1,0.031457,0.022724,0.035145,0.026892,0.029672,0.025934,0.024057,0.019601,0.034372,0.026259,0.046672,0.033818,0.004572,0.00377,0.014043,0.010709,0.026343,0.020304,0.043743,0.031864,0.049243,0.03498,0.031943,0.02261,0.058243,0.044364,0.086743,0.062324,0.129143,0.093355,0.172643,0.123167,0.178243,0.15695,0.209572,0.206088,0.103472,0.07833,0.052072,0.039233,0.303271,0.21445,0.035643,0.027435,0.02135,0.01974,0.016757,0.014856,0.176343,0.13032,0.018357,0.017346,0 days 00:00:00.000000000,2018,12,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25-44,WHITE,M,unknown,unknown,unknown
1148,698392952,12/30/2018,17:40:00,12/30/2018,17:45:00,44.0,12/30/2018,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BRONX,FRONT OF,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,1008924.0,245473.0,not_transit_related,40.840416,-73.910828,"(40.840415681, -73.910828006)",PATROL BORO BRONX,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2018-12-30 17:40:00,2018-12-30 17:45:00,1,0,0,0,1,0,1,0,0,0,0.230288,0.163806,0.233976,0.167093,0.220388,0.159528,0.222888,0.159256,0.212888,0.15512,0.188388,0.140973,0.196088,0.139617,0.184788,0.131784,0.172488,0.122072,0.155088,0.1098,0.149588,0.106226,0.166888,0.118628,0.140588,0.099467,0.112088,0.079285,0.069688,0.049673,0.026188,0.018796,0.150844,0.107651,0.228144,0.161343,0.238388,0.189372,0.217988,0.162033,0.502102,0.355341,0.163188,0.115414,0.22018,0.157822,0.215588,0.154036,0.072144,0.053434,0.217188,0.155539,0 days 00:05:00.000000000,2018,12,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25-44,WHITE HISPANIC,M,unknown,unknown,unknown


In [6]:
nyc.tail()

Unnamed: 0,CMPLNT_NUM,CMPLNT_FR_DT,CMPLNT_FR_TM,CMPLNT_TO_DT,CMPLNT_TO_TM,ADDR_PCT_CD,RPT_DT,KY_CD,OFNS_DESC,PD_CD,PD_DESC,CRM_ATPT_CPTD_CD,LAW_CAT_CD,BORO_NM,LOC_OF_OCCUR_DESC,PREM_TYP_DESC,JURIS_DESC,HADEVELOPT,X_COORD_CD,Y_COORD_CD,TRANSIT_DISTRICT,Latitude,Longitude,Lat_Lon,PATROL_BORO,STATION_NAME,possession,sales,misdemeanor,violation,felony,misd_poss,viol_poss,felony_poss,misd_sales,viol_sales,felony_sales,cann_crimes_overall,date_time_start,date_time_end,day_tw,night_tw,early_morn,morn_rush_hr,work_day,lunch_hr,eve_rush_hr,dinner,evening,late_night,wtc_taxi,wtc_crow,nyse_taxi,nyse_crow,bk_bridge_taxi,bk_bridge_crow,city_hall_taxi,city_hall_crow,manh_bridge_taxi,manh_bridge_crow,will_bridge_taxi,will_bridge_crow,wash_sq_park_taxi,wash_sq_park_crow,union_sq_taxi,union_sq_crow,penn_station_taxi,penn_station_crow,times_sq_taxi,times_sq_crow,rock_center_taxi,rock_center_crow,empire_st_bldg_taxi,empire_st_bldg_crow,lincoln_ctr_taxi,lincoln_ctr_crow,central_pk_taxi,central_pk_crow,apollo_th_taxi,apollo_th_crow,yankee_stad_taxi,yankee_stad_crow,mets_stad_taxi,mets_stad_crow,queens_taxi,queens_crow,prospect_pk_taxi,prospect_pk_crow,downtown_bk_taxi,downtown_bk_crow,si_ferry_taxi,si_ferry_crow,port_authority_taxi,port_authority_crow,nypd_hq_taxi,nypd_hq_crow,mdc_taxi,mdc_crow,rikers_taxi,rikers_crow,nysc_taxi,nysc_crow,duration,start_year,start_month,start_day,new_years_day,new_years_eve,christmas_eve,christmas,july_4th,valentines,halloween,st_patricks,mlk,pres,easter,diwali,pr_parade,yomkippur,rosh_hashanah,eid_al_fitr,eid_al_adha,hannukkah,memorial_day,labor_day,thanksgiving,SUSP_AGE_GROUP_cleaned,SUSP_RACE_cleaned,SUSP_SEX_cleaned,VIC_AGE_GROUP_cleaned,VIC_RACE_cleaned,VIC_SEX_cleaned
6480505,143571007,01/01/2006,21:45:00,00/00/0000,00:00:00,73.0,01/01/2006,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BROOKLYN,unknown,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,1010049.0,183477.0,not_transit_related,40.670249,-73.907,"(40.670249345, -73.907000055)",PATROL BORO BKLYN NORTH,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2006-01-01 21:45:00,,0,1,0,0,0,0,0,0,1,0,0.148851,0.114556,0.140893,0.110511,0.125751,0.096785,0.141651,0.107849,0.121051,0.091706,0.108751,0.078463,0.150851,0.108722,0.149751,0.10669,0.166851,0.118061,0.166251,0.117826,0.160151,0.113861,0.156851,0.110911,0.178751,0.127701,0.171051,0.126889,0.182851,0.146246,0.178551,0.160503,0.145751,0.104353,0.170051,0.126193,0.072049,0.062809,0.103351,0.081698,0.335763,0.260998,0.170751,0.120759,0.137437,0.104374,0.142751,0.106778,0.143851,0.124633,0.139151,0.104758,,2006,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,unknown,unknown,unknown,unknown,unknown,unknown
6480506,575819737,01/01/2006,01:10:00,00/00/0000,00:00:00,71.0,01/01/2006,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BROOKLYN,FRONT OF,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,1000130.0,180995.0,not_transit_related,40.66346,-73.942762,"(40.663460155, -73.942762342)",PATROL BORO BKLYN SOUTH,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2006-01-01 01:10:00,,0,1,0,0,0,0,0,0,0,1,0.119878,0.086106,0.11192,0.081103,0.096778,0.068913,0.112678,0.080287,0.092078,0.06517,0.079778,0.058244,0.121878,0.086655,0.120778,0.087087,0.137878,0.100835,0.137278,0.104259,0.131178,0.101795,0.127878,0.095176,0.149778,0.116401,0.142078,0.121566,0.153878,0.146723,0.182702,0.166963,0.188302,0.133213,0.212602,0.161414,0.029498,0.026439,0.074378,0.052985,0.293212,0.225202,0.141778,0.105332,0.108464,0.07711,0.113778,0.080592,0.186402,0.141522,0.110178,0.078128,,2006,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,unknown,unknown,unknown,unknown,unknown,unknown
6480522,406166394,01/01/2006,01:16:00,01/01/2006,01:26:00,9.0,01/01/2006,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,MANHATTAN,INSIDE,unknown,N.Y. POLICE DEPT,not_housing_devpt_crime,987274.0,206096.0,not_transit_related,40.73237,-73.98909,"(40.732370284, -73.989090237)",PATROL BORO MAN SOUTH,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2006-01-01 01:16:00,2006-01-01 01:26:00,0,1,0,0,0,0,0,0,0,1,0.04398,0.031271,0.047668,0.033788,0.03408,0.027407,0.03658,0.025929,0.02658,0.024929,0.035461,0.025117,0.00978,0.008359,0.005539,0.004062,0.022639,0.018755,0.03122,0.027022,0.03672,0.028306,0.01942,0.016384,0.04572,0.040517,0.07422,0.055808,0.11662,0.086871,0.16012,0.115796,0.16572,0.145118,0.198361,0.194235,0.092261,0.074914,0.040861,0.03656,0.315794,0.223393,0.026539,0.024889,0.033873,0.024426,0.02928,0.02073,0.16382,0.119648,0.03088,0.022127,0 days 00:10:00.000000000,2006,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,unknown,unknown,unknown,unknown,unknown,unknown
6480539,828372204,01/01/2006,19:40:00,00/00/0000,00:00:00,73.0,01/01/2006,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BROOKLYN,unknown,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,1010138.0,184443.0,not_transit_related,40.672901,-73.906676,"(40.672900538, -73.906675514)",PATROL BORO BKLYN NORTH,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2006-01-01 19:40:00,,0,1,0,0,0,0,0,1,0,0,0.146524,0.113904,0.138566,0.10997,0.123424,0.096139,0.139324,0.107132,0.118724,0.090962,0.106424,0.077306,0.148524,0.107541,0.147424,0.10534,0.164524,0.116515,0.163924,0.116059,0.157824,0.112023,0.154524,0.109293,0.176424,0.125786,0.168724,0.124693,0.180524,0.143812,0.176224,0.157911,0.142775,0.102025,0.167075,0.124707,0.075025,0.063605,0.101024,0.081209,0.338739,0.262239,0.168424,0.119094,0.13511,0.103641,0.140424,0.105939,0.140875,0.121965,0.136824,0.103966,,2006,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,unknown,unknown,unknown,unknown,unknown,unknown
6480628,129199749,01/01/2006,21:00:00,00/00/0000,00:00:00,43.0,01/01/2006,235,DANGEROUS DRUGS,567.0,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BRONX,OPPOSITE OF,RESIDENCE - PUBLIC HOUSING,N.Y. HOUSING POLICE,CASTLE HILL,1026596.0,236961.0,not_transit_related,40.816986,-73.847014,"(40.816985648, -73.847014071)",PATROL BORO BRONX,not_transit_related,1,0,1,0,0,1,0,0,0,0,0,1,2006-01-01 21:00:00,,0,1,0,0,0,0,0,0,1,0,0.270672,0.196366,0.27436,0.197743,0.260772,0.186444,0.263272,0.190166,0.253272,0.180725,0.228772,0.162512,0.236472,0.173245,0.225172,0.165335,0.212872,0.160827,0.195472,0.149214,0.189972,0.144008,0.207272,0.154718,0.180972,0.143553,0.152472,0.123195,0.110072,0.103322,0.0918,0.080184,0.0636,0.0623,0.1409,0.10295,0.278772,0.198651,0.258372,0.183073,0.542486,0.386471,0.203572,0.15572,0.260564,0.1877,0.255972,0.18528,0.062872,0.045721,0.257572,0.185861,,2006,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,unknown,unknown,unknown,unknown,unknown,unknown


In [7]:
nyc.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 220305 entries, 148 to 6480628
Data columns (total 133 columns):
CMPLNT_NUM                int64
CMPLNT_FR_DT              object
CMPLNT_FR_TM              object
CMPLNT_TO_DT              object
CMPLNT_TO_TM              object
ADDR_PCT_CD               float64
RPT_DT                    object
KY_CD                     int64
OFNS_DESC                 object
PD_CD                     float64
PD_DESC                   object
CRM_ATPT_CPTD_CD          object
LAW_CAT_CD                object
BORO_NM                   object
LOC_OF_OCCUR_DESC         object
PREM_TYP_DESC             object
JURIS_DESC                object
HADEVELOPT                object
X_COORD_CD                float64
Y_COORD_CD                float64
TRANSIT_DISTRICT          object
Latitude                  float64
Longitude                 float64
Lat_Lon                   object
PATROL_BORO               object
STATION_NAME              object
possession             

In [8]:
nyc.describe(include='all')

Unnamed: 0,CMPLNT_NUM,CMPLNT_FR_DT,CMPLNT_FR_TM,CMPLNT_TO_DT,CMPLNT_TO_TM,ADDR_PCT_CD,RPT_DT,KY_CD,OFNS_DESC,PD_CD,PD_DESC,CRM_ATPT_CPTD_CD,LAW_CAT_CD,BORO_NM,LOC_OF_OCCUR_DESC,PREM_TYP_DESC,JURIS_DESC,HADEVELOPT,X_COORD_CD,Y_COORD_CD,TRANSIT_DISTRICT,Latitude,Longitude,Lat_Lon,PATROL_BORO,STATION_NAME,possession,sales,misdemeanor,violation,felony,misd_poss,viol_poss,felony_poss,misd_sales,viol_sales,felony_sales,cann_crimes_overall,date_time_start,date_time_end,day_tw,night_tw,early_morn,morn_rush_hr,work_day,lunch_hr,eve_rush_hr,dinner,evening,late_night,wtc_taxi,wtc_crow,nyse_taxi,nyse_crow,bk_bridge_taxi,bk_bridge_crow,city_hall_taxi,city_hall_crow,manh_bridge_taxi,manh_bridge_crow,will_bridge_taxi,will_bridge_crow,wash_sq_park_taxi,wash_sq_park_crow,union_sq_taxi,union_sq_crow,penn_station_taxi,penn_station_crow,times_sq_taxi,times_sq_crow,rock_center_taxi,rock_center_crow,empire_st_bldg_taxi,empire_st_bldg_crow,lincoln_ctr_taxi,lincoln_ctr_crow,central_pk_taxi,central_pk_crow,apollo_th_taxi,apollo_th_crow,yankee_stad_taxi,yankee_stad_crow,mets_stad_taxi,mets_stad_crow,queens_taxi,queens_crow,prospect_pk_taxi,prospect_pk_crow,downtown_bk_taxi,downtown_bk_crow,si_ferry_taxi,si_ferry_crow,port_authority_taxi,port_authority_crow,nypd_hq_taxi,nypd_hq_crow,mdc_taxi,mdc_crow,rikers_taxi,rikers_crow,nysc_taxi,nysc_crow,duration,start_year,start_month,start_day,new_years_day,new_years_eve,christmas_eve,christmas,july_4th,valentines,halloween,st_patricks,mlk,pres,easter,diwali,pr_parade,yomkippur,rosh_hashanah,eid_al_fitr,eid_al_adha,hannukkah,memorial_day,labor_day,thanksgiving,SUSP_AGE_GROUP_cleaned,SUSP_RACE_cleaned,SUSP_SEX_cleaned,VIC_AGE_GROUP_cleaned,VIC_RACE_cleaned,VIC_SEX_cleaned
count,220305.0,220305,220305,220305,220305,220305.0,220305,220305.0,220305,220305.0,220305,220305,220305,220305,220305,220305,220305,220305,220305.0,220305.0,220305,220305.0,220305.0,220305,220305,220305,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305,153112,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,153112,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305.0,220305,220305,220305,220305,220305,220305
unique,,4728,1414,4721,1430,,4727,,2,,5,2,3,6,5,69,19,258,,,13,,,32981,9,309,,,,,,,,,,,,,193683,142780,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,582,,,,,,,,,,,,,,,,,,,,,,,,,6,7,3,6,7,3
top,,02/18/2011,21:00:00,00/00/0000,00:00:00,,02/18/2011,,DANGEROUS DRUGS,,"MARIJUANA, POSSESSION 4 & 5",COMPLETED,MISDEMEANOR,BRONX,unknown,STREET,N.Y. POLICE DEPT,not_housing_devpt_crime,,,not_transit_related,,,"(40.823101299, -73.869690461)",PATROL BORO BRONX,not_transit_related,,,,,,,,,,,,,2007-01-24 21:30:00,2006-04-05 20:20:00,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0 days 00:05:00.000000000,,,,,,,,,,,,,,,,,,,,,,,,,unknown,unknown,unknown,unknown,unknown,unknown
freq,,136,2012,67191,67285,,129,,213521,,197225,219024,208651,86847,91891,127453,172962,193180,,,216766,,,608,86916,216766,,,,,,,,,,,,,11,8,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,42544,,,,,,,,,,,,,,,,,,,,,,,,,185542,185468,185164,219902,219667,219752
mean,549777400.0,,,,,54.128486,,246.033122,,567.105163,,,,,,,,,1005691.0,216807.0,,40.761731,-73.922573,,,,0.943074,0.056926,0.947101,0.030794,0.022106,0.895236,0.030794,0.017045,0.051864,0.0,0.005061,1.0,,,0.389002,0.610998,0.005529,0.009283,0.375257,0.039218,0.180749,0.174417,0.190268,0.262005,0.184746,0.135527,0.184362,0.135793,0.171491,0.127161,0.177926,0.130713,0.165846,0.123307,0.150102,0.112862,0.16562,0.12104,0.159256,0.116651,0.158665,0.116338,0.150087,0.110871,0.145713,0.107988,0.152591,0.112212,0.148123,0.110454,0.134308,0.101982,0.124694,0.099077,0.121728,0.098886,0.158716,0.120388,0.212735,0.160842,0.169997,0.134894,0.164149,0.124076,0.412224,0.300273,0.155387,0.114393,0.17496,0.128757,0.173973,0.127618,0.125094,0.098098,0.173749,0.127698,,2011.118754,6.353492,15.263208,0.001725,0.000808,0.000458,0.000268,0.002383,0.00241,0.002043,0.00246,0.001829,0.001838,0.001194,0.002978,0.001979,0.003209,0.003073,0.003014,0.002469,0.001956,0.001021,0.001775,0.000522,,,,,,
std,260243400.0,,,,,25.717604,,78.926079,,0.53832,,,,,,,,,14925.33,32340.0,,0.088743,0.053874,,,,0.231701,0.231701,0.223833,0.172759,0.147028,0.30625,0.172759,0.129438,0.221754,0.0,0.070962,0.0,,,0.487525,0.487525,0.07415,0.095898,0.48419,0.194115,0.384811,0.379469,0.392513,0.439726,0.076906,0.05535,0.079313,0.056685,0.079111,0.056806,0.076802,0.055341,0.077966,0.056292,0.072211,0.053284,0.070358,0.05165,0.068499,0.050334,0.065927,0.048967,0.064716,0.048246,0.064056,0.047704,0.065547,0.048466,0.064946,0.049048,0.066661,0.051572,0.079662,0.063633,0.092544,0.074958,0.0634,0.047365,0.059993,0.048295,0.093644,0.071029,0.082135,0.059372,0.120253,0.080126,0.065314,0.048655,0.077009,0.055483,0.075149,0.054413,0.078874,0.05896,0.076077,0.054953,,3.295674,3.342459,8.635357,0.041496,0.028413,0.021407,0.016363,0.048759,0.049036,0.045149,0.04954,0.042731,0.042837,0.034531,0.054487,0.044443,0.056559,0.05535,0.054817,0.049631,0.044188,0.031942,0.042091,0.022841,,,,,,
min,100003000.0,,,,,1.0,,117.0,,566.0,,,,,,,,,913463.0,121219.0,,40.499143,-74.25456,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001396,0.001082,0.001327,0.00099,0.003934,0.003134,0.001115,0.000851,0.003293,0.002516,0.005002,0.004407,0.00091,0.000699,7.9e-05,5.7e-05,0.000968,0.000897,0.000355,0.000257,7.7e-05,6.2e-05,0.001183,0.000941,0.001228,0.001133,0.000926,0.000655,0.000718,0.000511,0.001578,0.001159,0.001555,0.001298,0.003967,0.002951,0.005935,0.004438,7.7e-05,6.1e-05,0.004253,0.003946,0.000916,0.00076,0.00201,0.001459,0.000237,0.000176,0.003565,0.002532,0.001571,0.001544,,2006.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
25%,324688800.0,,,,,40.0,,235.0,,567.0,,,,,,,,,998431.0,185121.0,,40.674767,-73.948801,,,,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136242,0.101189,0.130326,0.0987,0.116158,0.088247,0.12937,0.096206,0.112066,0.08449,0.099153,0.075952,0.125186,0.092071,0.119614,0.088588,0.121316,0.089712,0.11108,0.0824,0.106117,0.079016,0.114051,0.085504,0.106489,0.079388,0.088872,0.065479,0.065948,0.049712,0.044002,0.033993,0.130652,0.097975,0.182153,0.13643,0.082853,0.069855,0.09785,0.07627,0.322306,0.24301,0.117604,0.086975,0.125365,0.093659,0.128215,0.094468,0.06885,0.054698,0.126228,0.093558,,2008.0,3.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
50%,549274500.0,,,,,47.0,,235.0,,567.0,,,,,,,,,1007174.0,229965.0,,40.797872,-73.91721,,,,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.180503,0.136238,0.178622,0.135337,0.165822,0.127427,0.173824,0.131454,0.160061,0.123721,0.143535,0.111612,0.166401,0.123018,0.160908,0.118014,0.162019,0.118083,0.152545,0.11231,0.147045,0.10895,0.155248,0.11325,0.152572,0.114131,0.134383,0.102728,0.116063,0.091773,0.108457,0.081545,0.151807,0.114813,0.213741,0.161414,0.174992,0.149028,0.15908,0.12455,0.435187,0.309461,0.158637,0.116113,0.170661,0.12924,0.170735,0.12846,0.101577,0.083307,0.170127,0.128337,,2011.0,6.0,15.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
75%,775968200.0,,,,,73.0,,235.0,,567.0,,,,,,,,,1014902.0,242790.0,,40.833029,-73.88934,,,,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,,,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.242653,0.174605,0.246128,0.177366,0.23352,0.16968,0.23602,0.170352,0.226755,0.164865,0.205457,0.149834,0.213531,0.154101,0.204325,0.148447,0.19836,0.144155,0.188716,0.137374,0.183659,0.133395,0.191957,0.139453,0.185664,0.138306,0.172422,0.132127,0.171934,0.143954,0.182737,0.159625,0.171889,0.132781,0.235062,0.177283,0.248617,0.191236,0.231095,0.169735,0.510781,0.363957,0.194424,0.141904,0.233668,0.168696,0.229568,0.165465,0.160597,0.128577,0.231087,0.16665,,2014.0,9.0,23.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,


The set of features that compose the analytical frame for both initial EDA and later for machine learning purposes is shown in the following list, details of which include a user-friendly feature name, the original dataset's variable name, and an asterisk for those features which were converted into machine-learning friendly binary format for the machine learning phase.

Police Precinct Number* ('ADDR_PCT_CD')

Crime Completed/Attempted Flag* ('CRM_ATPT_CPTD_CD')

NYC Borough* ('BORO_NM')

Location of Crime Occurrence* ('LOC_OF_OCCUR_DESC')

Premises Type of Crime Occurrence* ('PREM_TYP_DESC')

Jurisdiction of Crime* (JURIS_DESC)

Housing Development of Crime Occurrence* ('HADEVELOPT')

Geographic NYC X and Y Coordinates of Crime ('X_COORD_CD', 'Y_COORD_CD')

Transit District of Crime Occurrence* ('TRANSIT_DISTRICT')

Geographic NYC Latitude, Longitude, and Latitude/Longitude Coordinates of Crime ('Latitude', 'Longitude', 'Lat_Lon')

Police Patrol Borough of Crime Occurrence* ('PATROL_BORO')

Transit Station Name of Crime Occurrence* ('STATION_NAME')

Datetime of Crime Start and End ('date_time_start' and 'date_time_end')

Time-windows of Crime Occurrences ('day_tw', 'night_tw', 'early_morn', 'morn_rush_hr', 'work_day', 'lunch_hr', 'eve_rush_hr', 'dinner', 'evening', 'late_night')

Distance From NYC Landmarks Which Crime Occurred ('landmark_taxi' and 'landmark_crow' series)

Duration of Crime ('duration')

Year Which Crime Occurred/Started ('start_year')

Month Which Crime Occurred/Started ('start_month')

Day Which Crime Occurred/Started ('start_day')

Holidays Which Crime Occurred/Started ('new_years_day', 'new_years_eve', 'christmas_eve', 'christmas', 'july_4th', 'valentines', 'halloween', 'st_patricks', 'mlk', 'pres', 'easter', 'diwali', 'pr_parade', 'yomkippur', 'rosh_hashanah', 'eid_al_fitr', 'eid_al_adha', 'hannukkah', 'memorial_day', 'labor_day', 'thanksgiving')

Suspect Age Group* (SUSP_AGE_GROUP_cleaned)	

Suspect Race* (SUSP_RACE_cleaned)

Suspect Sex* (SUSP_SEX_cleaned)

Victim Age Group* (VIC_AGE_GROUP_cleaned)

Victim Race* (VIC_RACE_cleaned)

Victim Sex* (VIC_SEX_cleaned)

In this exploratory data analysis (EDA) phase, the most important place to start is to look to see if this dataset from the NYPD corroborates the racial disparity in cannabis arrests reported elsewhere. As mentioned earlier, only 34,837 cannabis cases (15.8%) have the crime suspect's race reported, which is unfortunate and begs the question as to how often the crime suspect's race is reported in non-cannabis crimes. As reported in the data cleaning notebook for this capstone project, 2,392,029 non-cannabis crimes (38.1%) have the suspect's race reported. This is a large difference, and will be the subject of a hypothesis test in the Statistical Methods section of this project to see if the difference is due to random chance.

In [11]:
nyc['SUSP_RACE_cleaned'].value_counts(normalize=True)

unknown                           0.841869
BLACK                             0.080842
WHITE HISPANIC                    0.042972
BLACK HISPANIC                    0.017585
WHITE                             0.012660
ASIAN / PACIFIC ISLANDER          0.003690
AMERICAN INDIAN/ALASKAN NATIVE    0.000381
Name: SUSP_RACE_cleaned, dtype: float64

Although one can see in the above cell that blacks, white Hispanics, and black Hispanics constitute the majority of cannabis crimes whose suspect's race was reported, I'd like to create a dataframe with just the cases with suspect race reported, and compute the racial, age, and sex proportions of that group.

In [14]:
nyc_susp_race_reported = nyc[nyc.SUSP_RACE_cleaned != 'unknown']

In [15]:
nyc_susp_race_reported['SUSP_RACE_cleaned'].value_counts(normalize=True)

BLACK                             0.511238
WHITE HISPANIC                    0.271751
BLACK HISPANIC                    0.111204
WHITE                             0.080059
ASIAN / PACIFIC ISLANDER          0.023337
AMERICAN INDIAN/ALASKAN NATIVE    0.002411
Name: SUSP_RACE_cleaned, dtype: float64

In [None]:
#viz of racial disparity. how do I just run a bar chart?

As can be seen in the above cell, 51% of cannabis arrests with the suspect's race reported were of African-Americans, 27% of white Hispanics, and 11% of black Hispanics, for a total of 89% of the total cannabis crimes with the suspect's race reported. Only 8% of these arrests were of white people. This corroborates the racial disparity data reported elsewhere. Age group and sex of these arrests are reported below.

In [16]:
#For later use, I'm exporting a csv file of just these cases.
nyc_susp_race_reported.to_csv('nyc_cann_susp_race_reported.csv')

In [17]:
nyc_susp_race_reported['SUSP_AGE_GROUP_cleaned'].value_counts(normalize=True)

25-44      0.441083
18-24      0.402360
45-64      0.081637
<18        0.062032
unknown    0.010391
65+        0.002497
Name: SUSP_AGE_GROUP_cleaned, dtype: float64

In [18]:
nyc_susp_race_reported['SUSP_SEX_cleaned'].value_counts(normalize=True)

M          0.895284
F          0.103884
unknown    0.000832
Name: SUSP_SEX_cleaned, dtype: float64

In [51]:
pd.crosstab(nyc_susp_race_reported.SUSP_RACE_cleaned, [nyc_susp_race_reported.SUSP_AGE_GROUP_cleaned, nyc_susp_race_reported.SUSP_SEX_cleaned], normalize=True)

SUSP_AGE_GROUP_cleaned,18-24,18-24,18-24,25-44,25-44,25-44,45-64,45-64,45-64,65+,65+,<18,<18,unknown,unknown,unknown
SUSP_SEX_cleaned,F,M,unknown,F,M,unknown,F,M,unknown,F,M,F,M,F,M,unknown
SUSP_RACE_cleaned,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
AMERICAN INDIAN/ALASKAN NATIVE,8.6e-05,0.001033,0.0,2.9e-05,0.000832,0.0,0.0,8.6e-05,0.0,0.0,0.0,2.9e-05,0.000287,0.0,0.0,2.9e-05
ASIAN / PACIFIC ISLANDER,0.000746,0.011281,0.0,0.000459,0.008468,0.0,0.0,0.000603,0.0,0.0,0.0,0.000115,0.001579,0.0,8.6e-05,0.0
BLACK,0.024399,0.160289,2.9e-05,0.022591,0.217613,0.000115,0.002957,0.047909,2.9e-05,5.7e-05,0.001464,0.003875,0.023222,0.000344,0.006057,0.000287
BLACK HISPANIC,0.005081,0.046014,0.0,0.003646,0.040761,2.9e-05,0.000316,0.006028,0.0,0.0,0.000144,0.000804,0.007492,2.9e-05,0.000775,8.6e-05
WHITE,0.003014,0.030628,0.0,0.003473,0.030973,0.0,0.00066,0.004708,0.0,0.0,0.000258,0.000574,0.005425,0.0,0.000316,2.9e-05
WHITE HISPANIC,0.015415,0.104257,8.6e-05,0.011339,0.100755,0.0,0.001234,0.017108,0.0,0.0,0.000574,0.002526,0.016104,8.6e-05,0.002153,0.000115


As can be seen in the above crosstabulation, 40.1% of cannabis arrests are of African-American men younger than 45, and 31.5% are of Hispanic men younger than 45, for a total of 71.6% of all cannabis arrests in New York City between 2006 and 2018.

One of the striking things about cannabis arrests in New York City are that the vast majority of them are for simply misdemeanor and violation possession charges. This is seen in the following value count cells. 

In [195]:
#Percentage of cannabis arrests that are for misdemeanor cannabis possession
nyc['misd_poss'].value_counts(normalize=True)

1    0.895236
0    0.104764
Name: misd_poss, dtype: float64

In [61]:
#Percentage of cannabis arrests that are for violation cannabis possession
nyc['viol_poss'].value_counts(normalize=True)

0    0.969206
1    0.030794
Name: viol_poss, dtype: float64

In [62]:
#Percentage of cannabis arrests that are for felony cannabis possession
nyc['felony_poss'].value_counts(normalize=True)

0    0.982955
1    0.017045
Name: felony_poss, dtype: float64

In [63]:
#Percentage of cannabis arrests that are for misdemeanor cannabis sales
nyc['misd_sales'].value_counts(normalize=True)

0    0.948136
1    0.051864
Name: misd_sales, dtype: float64

In [64]:
#Percentage of cannabis arrests that are for felony cannabis sales
nyc['felony_sales'].value_counts(normalize=True)

0    0.994939
1    0.005061
Name: felony_sales, dtype: float64

As a side note, violations are generally less serious than misdemeanor charges, as they are typically involve fines and do not go on one's criminal record; violations have been the primary tool used in cannabis arrests after the recent decriminalization (New York State Penal Law). 

It would be interesting to see whether the racial disparity differs between the five levels of cannabis crime explored in this project. Suspect race value counts are run for each of the five levels of cannabis crime; only those cases with suspect race reported are used. As can be seen below, the same racial disparity largely holds true across all five levels of cannabis crime. More violation possession arrests are made of white perpetrators than of black Hispanic perpetrators, but the difference is only 3%. Also, it should be noted that violation possession charges are the lowest level of cannabis arrests, and that the majority of violation possession charges are still of African-Americans and white Hispanics. More whites are arrested for felony possession charges than black Hispanics and the same amount of whites are arrested for felony sales charges as black Hispanics, but the difference is less than a percentage point and it bears mentioning that the sample size for non-misdemeanor possession charges for cases where the suspect's race is reported are very small.

In [191]:
df = nyc_susp_race_reported

In [192]:
race_reported_misd_poss = df[df.misd_poss == 1]

In [197]:
race_reported_misd_poss['SUSP_RACE_cleaned'].value_counts(normalize=True)

BLACK                             0.507868
WHITE HISPANIC                    0.275568
BLACK HISPANIC                    0.111426
WHITE                             0.081354
ASIAN / PACIFIC ISLANDER          0.021438
AMERICAN INDIAN/ALASKAN NATIVE    0.002346
Name: SUSP_RACE_cleaned, dtype: float64

In [198]:
race_reported_viol_poss = df[df.viol_poss == 1]

In [199]:
race_reported_viol_poss['SUSP_RACE_cleaned'].value_counts(normalize=True)

BLACK                             0.605285
WHITE HISPANIC                    0.202312
WHITE                             0.097440
BLACK HISPANIC                    0.066887
ASIAN / PACIFIC ISLANDER          0.023947
AMERICAN INDIAN/ALASKAN NATIVE    0.004129
Name: SUSP_RACE_cleaned, dtype: float64

In [200]:
race_reported_felony_poss = df[df.felony_poss == 1]

In [201]:
race_reported_felony_poss['SUSP_RACE_cleaned'].value_counts(normalize=True)

BLACK                             0.463158
WHITE HISPANIC                    0.261988
WHITE                             0.099415
BLACK HISPANIC                    0.093567
ASIAN / PACIFIC ISLANDER          0.079532
AMERICAN INDIAN/ALASKAN NATIVE    0.002339
Name: SUSP_RACE_cleaned, dtype: float64

In [202]:
race_reported_misd_sales = df[df.misd_sales == 1]

In [203]:
race_reported_misd_sales['SUSP_RACE_cleaned'].value_counts(normalize=True)

BLACK                             0.526652
WHITE HISPANIC                    0.257996
BLACK HISPANIC                    0.148188
WHITE                             0.039446
ASIAN / PACIFIC ISLANDER          0.025586
AMERICAN INDIAN/ALASKAN NATIVE    0.002132
Name: SUSP_RACE_cleaned, dtype: float64

In [204]:
race_reported_felony_sales = df[df.felony_sales == 1]

In [205]:
race_reported_felony_sales['SUSP_RACE_cleaned'].value_counts(normalize=True)

BLACK                             0.519802
WHITE HISPANIC                    0.277228
WHITE                             0.074257
BLACK HISPANIC                    0.074257
ASIAN / PACIFIC ISLANDER          0.049505
AMERICAN INDIAN/ALASKAN NATIVE    0.004950
Name: SUSP_RACE_cleaned, dtype: float64

To look at other indicators of a bias in cannabis arrests in New York City, five DataFrames are first made, one for each of the cannabis crime types: misdemeanor possession, violation possession, felony possession, misdemeanor sales, and felony sales. Indicators of arrest bias will be explored for each crime type in comparison to the overall set of cannabis crimes. These DataFrames will be subsetted from the overall cannabis crimes DataFrame. Subsetting from the DataFrame that has suspect race reported for all crimes would clearly introduce too much bias to the findings.

In [58]:
nyc_misd_poss = nyc[nyc.misd_poss == 1]

In [68]:
nyc_viol_poss = nyc[nyc.viol_poss == 1]

In [69]:
nyc_felony_poss = nyc[nyc.felony_poss == 1]

In [70]:
nyc_misd_sales = nyc[nyc.misd_sales == 1]

In [72]:
nyc_felony_sales = nyc[nyc.felony_sales == 1]

The first geographic indicator of New York City is the borough. As can be seen in overall cannabis arrests, the Bronx and Brooklyn are home to the majority of them. This is interesting because of the racial demographics of these two boroughs. The Bronx's populace is 36% black, 48% Latino, and only 14.5% non-Latino white, and Brooklyn's populace is 36% black, 20% Latino, and 36% non-Latino white. By contrast, Manhattan's populace is 16% black, 25% Latino, and 48% non-Latino white. Queens is 19% black, 27% Latino, and 30% non-Latino white; and Staten Island is 11% black, 17% Latino, and 65% non-Latino white (U.S. Census Bureau).

In [74]:
nyc['BORO_NM'].value_counts(normalize=True)

BRONX            0.394213
BROOKLYN         0.324119
MANHATTAN        0.211470
QUEENS           0.043817
STATEN ISLAND    0.025542
unknown          0.000840
Name: BORO_NM, dtype: float64

In [76]:
nyc_misd_poss['BORO_NM'].value_counts(normalize=True)

BRONX            0.402824
BROOKLYN         0.329284
MANHATTAN        0.203985
QUEENS           0.037272
STATEN ISLAND    0.025803
unknown          0.000832
Name: BORO_NM, dtype: float64

Notice how misdemeanor and felony possession charges are dominant in the Bronx and Brooklyn, while violation possession charges are dominant in Manhattan. This reflects the evidence that cannabis crimes are punished very differently in New York City dependent on which part of the city the crime takes place in.

In [77]:
nyc_viol_poss['BORO_NM'].value_counts(normalize=True)

MANHATTAN        0.354805
BROOKLYN         0.340360
BRONX            0.204894
QUEENS           0.088296
STATEN ISLAND    0.010908
unknown          0.000737
Name: BORO_NM, dtype: float64

In [80]:
nyc_felony_poss['BORO_NM'].value_counts(normalize=True)

BRONX            0.341411
BROOKLYN         0.328096
MANHATTAN        0.160852
QUEENS           0.150200
STATEN ISLAND    0.018642
unknown          0.000799
Name: BORO_NM, dtype: float64

Interestingly, Manhattan is second to the Bronx for misdemeanor sales arrests. It would be interesting to see which neighborhoods of Manhattan are responsible for this.

In [81]:
nyc_misd_sales['BORO_NM'].value_counts(normalize=True)

BRONX            0.384562
MANHATTAN        0.273324
BROOKLYN         0.222388
QUEENS           0.087082
STATEN ISLAND    0.031682
unknown          0.000963
Name: BORO_NM, dtype: float64

Brooklyn and the Bronx predominate for felony sales; again, it would be interesting to see which neighborhoods are responsible for these arrests. Police precincts offer a route to explore these smaller geographic zones.

In [82]:
nyc_felony_sales['BORO_NM'].value_counts(normalize=True)

BROOKLYN         0.340807
BRONX            0.299552
MANHATTAN        0.200000
QUEENS           0.129148
STATEN ISLAND    0.028700
unknown          0.001794
Name: BORO_NM, dtype: float64

Unfortunately, the precinct data was unlabeled in the dataset downloaded from the NYC Open Data project. However, after consulting the NYPD's website (https://www1.nyc.gov/site/nypd/bureaus/patrol/precincts-landing.page), I can say that the top 10 precinct arrests are:

43rd Precinct - Southeastern Bronx

75th Precinct - Easternmost Brooklyn (East New York and Cypress Hills)

44th Precinct - Southwestern Bronx

73rd Precinct - Northeastern Brooklyn (Brownsville and Ocean Hill)

46th Precinct - Central West Bronx (Fordham, University Heights, Morris Heights and Mount Hope)

40th Precinct - Southernmost Bronx (Port Morris, Mott Haven, and Melrose)

47th Precinct - Northern Bronx (Woodlawn, Wakefield, Williamsbridge, Baychester, Edenwald, Olinville, Fishbay, and Woodlawn Cemetary)

52nd Precinct - Northern Bronx section (Bedford Park, Fordham, Kingsbridge, Norwood, Bronx Park, and University Heights)

42nd Precinct - Morrisania section of the Bronx (Claremont, Crotona Park East, and Crotona Park)

67th Precinct - Central Brooklyn (East Flatbush and Remsen Village)

As can be seen, the top 10 police precincts with the highest amounts of cannabis arrests are all in the Bronx and Brooklyn. Unfortunately, the demographics in these neighborhoods reflects the racial disparity seen in cannabis arrests.

There is also a Patrol Borough feature, as the NYPD splits the boroughs up into a few different patrol boroughs. Value counts for Patrol Borough are included after each precinct code value count call for supportive purposes.

In [83]:
nyc['ADDR_PCT_CD'].value_counts()

43.0     16415
75.0     15552
44.0     12422
73.0     10099
46.0      9708
40.0      9473
47.0      9422
52.0      8924
42.0      7645
67.0      6243
77.0      6222
23.0      5510
32.0      5452
30.0      5256
71.0      5036
120.0     4051
25.0      3996
81.0      3885
48.0      3717
70.0      3090
60.0      3046
41.0      2822
49.0      2771
115.0     2768
69.0      2687
18.0      2599
6.0       2567
28.0      2515
79.0      2444
9.0       2269
         ...  
83.0       854
62.0       825
103.0      811
122.0      748
66.0       712
13.0       704
110.0      610
1.0        605
5.0        594
72.0       590
22.0       568
108.0      484
104.0      478
100.0      469
113.0      460
20.0       459
121.0      432
78.0       407
123.0      399
19.0       376
17.0       364
106.0      299
101.0      274
94.0       265
105.0      252
111.0      228
107.0      226
109.0      221
102.0      183
112.0      122
Name: ADDR_PCT_CD, Length: 77, dtype: int64

In [97]:
nyc['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.394526
PATROL BORO BKLYN NORTH      0.198012
PATROL BORO MAN NORTH        0.140859
PATROL BORO BKLYN SOUTH      0.126384
PATROL BORO MAN SOUTH        0.070557
PATROL BORO QUEENS NORTH     0.030608
PATROL BORO STATEN ISLAND    0.025551
PATROL BORO QUEENS SOUTH     0.013499
unknown                      0.000005
Name: PATROL_BORO, dtype: float64

The only difference for the DataFrame of just misdemeanor possession charges is that the 77th precinct takes the 10th place spot. This precinct is found in the northern section of the Crown Heighs neighborhood of Brooklyn, an area mostly occupied by African-Americans of Caribbean descent.

In [93]:
nyc_misd_poss['ADDR_PCT_CD'].value_counts(normalize=True)

43.0     0.079341
75.0     0.074184
44.0     0.056555
73.0     0.047246
46.0     0.044883
40.0     0.044487
47.0     0.042900
52.0     0.041090
42.0     0.035989
77.0     0.029449
67.0     0.028850
23.0     0.025585
32.0     0.024895
30.0     0.023562
71.0     0.022888
120.0    0.018568
81.0     0.017898
25.0     0.017650
48.0     0.016286
70.0     0.014151
60.0     0.014045
49.0     0.012874
69.0     0.012488
115.0    0.012483
41.0     0.012154
18.0     0.011606
28.0     0.011180
7.0      0.010546
9.0      0.010440
79.0     0.010318
           ...   
122.0    0.003417
62.0     0.003402
66.0     0.003235
83.0     0.003189
22.0     0.002723
5.0      0.002697
1.0      0.002647
110.0    0.002515
103.0    0.002490
72.0     0.002393
121.0    0.002048
13.0     0.001993
104.0    0.001972
108.0    0.001957
20.0     0.001810
123.0    0.001785
78.0     0.001785
100.0    0.001729
17.0     0.001592
19.0     0.001486
94.0     0.001060
113.0    0.001009
106.0    0.001004
111.0    0.000867
107.0    0

In [98]:
nyc_misd_poss['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.403144
PATROL BORO BKLYN NORTH      0.202510
PATROL BORO MAN NORTH        0.137675
PATROL BORO BKLYN SOUTH      0.127058
PATROL BORO MAN SOUTH        0.066239
PATROL BORO QUEENS NORTH     0.028881
PATROL BORO STATEN ISLAND    0.025813
PATROL BORO QUEENS SOUTH     0.008675
unknown                      0.000005
Name: PATROL_BORO, dtype: float64

As reflected above, the precinct with the most violation possession charges (the 14th) is in Manhattan in Midtown South, which encompasses the Port Authority Bus Terminal, Penn Station, and Times Square. This finding is also shown in the fact that the station with the largest amount of transit-related cannabis arrests is the Port Authority Bus Terminal. This is an interesting finding, as one could assume that the historically African-American neighborhood of Harlem would be most heavily policed for cannabis arrests. The other two Manhattan precincts (the 13th and the 18th) in the top 10 for violation possession are also in Midtown. The 71st precinct is in central Brooklyn, encompassing the southern portion of Crown Heights, Wingate, and Prospect Lefferts.

In [94]:
nyc_viol_poss['ADDR_PCT_CD'].value_counts(normalize=True)

14.0     0.068396
75.0     0.059110
73.0     0.043337
40.0     0.037441
43.0     0.035820
71.0     0.031545
13.0     0.031250
25.0     0.028744
18.0     0.028302
67.0     0.025796
32.0     0.025354
47.0     0.023732
77.0     0.020637
81.0     0.018573
84.0     0.017983
30.0     0.017836
23.0     0.017836
79.0     0.016509
41.0     0.016362
115.0    0.015920
52.0     0.015330
28.0     0.014888
44.0     0.014151
49.0     0.014004
48.0     0.013856
88.0     0.013856
24.0     0.013119
46.0     0.012972
100.0    0.012382
9.0      0.011940
           ...   
34.0     0.007518
1.0      0.006928
19.0     0.006486
62.0     0.006486
61.0     0.006044
120.0    0.005896
108.0    0.005896
78.0     0.005749
17.0     0.005159
68.0     0.005012
5.0      0.004864
72.0     0.004570
110.0    0.004570
101.0    0.003538
109.0    0.003538
94.0     0.003390
106.0    0.003096
50.0     0.002948
83.0     0.002948
122.0    0.002801
104.0    0.002653
22.0     0.002506
111.0    0.002358
107.0    0.002211
102.0    0

In [99]:
nyc_viol_poss['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.205189
PATROL BORO BKLYN NORTH      0.204452
PATROL BORO MAN SOUTH        0.188237
PATROL BORO MAN NORTH        0.165979
PATROL BORO BKLYN SOUTH      0.136203
PATROL BORO QUEENS NORTH     0.048054
PATROL BORO QUEENS SOUTH     0.040979
PATROL BORO STATEN ISLAND    0.010908
Name: PATROL_BORO, dtype: float64

In [100]:
nyc_viol_poss['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.205189
PATROL BORO BKLYN NORTH      0.204452
PATROL BORO MAN SOUTH        0.188237
PATROL BORO MAN NORTH        0.165979
PATROL BORO BKLYN SOUTH      0.136203
PATROL BORO QUEENS NORTH     0.048054
PATROL BORO QUEENS SOUTH     0.040979
PATROL BORO STATEN ISLAND    0.010908
Name: PATROL_BORO, dtype: float64

In [87]:
nyc_viol_poss['STATION_NAME'].value_counts(normalize=True)

not_transit_related               0.674086
42 ST.-PORT AUTHORITY BUS TERM    0.034051
125 STREET                        0.014446
42 ST.-TIMES SQUARE               0.011645
59 ST.-COLUMBUS CIRCLE            0.009876
3 AVENUE-149 STREET               0.008992
14 STREET                         0.008697
116 STREET                        0.008255
SIMPSON STREET                    0.005601
EAST 180 STREET                   0.005159
UNION SQUARE                      0.005159
GUN HILL ROAD                     0.005012
34 ST.-PENN STATION               0.004717
EAST 174 STREET                   0.004570
PROSPECT AVENUE                   0.004127
42 ST.-GRAND CENTRAL              0.003980
PELHAM PKWY.                      0.003685
3 AVENUE-138 STREET               0.003685
241 ST.-WAKEFIELD                 0.003685
28 STREET                         0.003538
HUNTS POINT AVENUE                0.003390
1 AVENUE                          0.003390
86 STREET                         0.003390
14 ST.-UNIO

The only newcomers in felony possession are the 113th and the 34th. The 113th is in Jamaica, Queens, and the 34th is Washington Heights and Inwood, two neighborhoods north of Harlem in Manhattan. Both of these neighborhoods have a predominantly African-American and Latino population.

In [88]:
nyc_felony_poss['ADDR_PCT_CD'].value_counts(normalize=True)

47.0     0.099601
75.0     0.045273
52.0     0.044208
67.0     0.041012
46.0     0.039947
44.0     0.033822
113.0    0.029827
73.0     0.028229
34.0     0.023968
77.0     0.023169
79.0     0.022104
71.0     0.022104
33.0     0.019441
48.0     0.019174
69.0     0.019174
32.0     0.018642
43.0     0.018375
45.0     0.017843
81.0     0.016511
105.0    0.016511
40.0     0.015979
114.0    0.015446
50.0     0.014115
23.0     0.013848
25.0     0.013582
70.0     0.013316
41.0     0.013049
68.0     0.013049
103.0    0.013049
42.0     0.012783
           ...   
9.0      0.005859
72.0     0.005593
61.0     0.005593
76.0     0.005593
108.0    0.005326
88.0     0.005060
14.0     0.005060
122.0    0.004794
110.0    0.004527
1.0      0.004261
13.0     0.003995
84.0     0.003728
100.0    0.003728
66.0     0.003462
94.0     0.003462
112.0    0.003196
26.0     0.002929
102.0    0.002929
24.0     0.002397
111.0    0.002130
19.0     0.002130
20.0     0.002130
123.0    0.002130
5.0      0.002130
6.0      0

In [101]:
nyc_felony_poss['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.341678
PATROL BORO BKLYN NORTH      0.169907
PATROL BORO BKLYN SOUTH      0.158722
PATROL BORO MAN NORTH        0.117443
PATROL BORO QUEENS SOUTH     0.093742
PATROL BORO QUEENS NORTH     0.056458
PATROL BORO MAN SOUTH        0.043409
PATROL BORO STATEN ISLAND    0.018642
Name: PATROL_BORO, dtype: float64

For misdemeanor sales, newcomer precincts are the 6th and the 30th. The 6th encompasses Greenwich Village and the West Village, and the 30th is Western Harlem.

In [89]:
nyc_misd_sales['ADDR_PCT_CD'].value_counts(normalize=True)

44.0     0.087257
46.0     0.049711
52.0     0.045510
6.0      0.039209
33.0     0.038509
30.0     0.037896
43.0     0.037633
42.0     0.035183
47.0     0.032995
73.0     0.031507
40.0     0.029407
48.0     0.028969
75.0     0.028269
120.0    0.025643
32.0     0.023980
23.0     0.023455
41.0     0.022318
25.0     0.021180
103.0    0.018292
71.0     0.016629
79.0     0.016366
67.0     0.016191
28.0     0.015053
77.0     0.014616
34.0     0.014003
70.0     0.013653
115.0    0.013391
83.0     0.013215
81.0     0.013128
60.0     0.011465
           ...   
62.0     0.004464
24.0     0.004376
68.0     0.004026
26.0     0.003938
66.0     0.003676
84.0     0.003238
106.0    0.003238
63.0     0.003151
88.0     0.003063
104.0    0.002976
45.0     0.002976
107.0    0.002888
108.0    0.002888
112.0    0.002888
122.0    0.002626
111.0    0.002538
102.0    0.002276
100.0    0.002188
20.0     0.002100
109.0    0.002013
19.0     0.001925
123.0    0.001838
76.0     0.001575
94.0     0.001575
121.0    0

In [102]:
nyc_misd_sales['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.384737
PATROL BORO MAN NORTH        0.187292
PATROL BORO BKLYN NORTH      0.131367
PATROL BORO BKLYN SOUTH      0.091020
PATROL BORO MAN SOUTH        0.086557
PATROL BORO QUEENS SOUTH     0.048573
PATROL BORO QUEENS NORTH     0.038771
PATROL BORO STATEN ISLAND    0.031682
Name: PATROL_BORO, dtype: float64

For felony sales, the 79th and 25th precincts are newcomers to the top 10 list. They are the Bedford-Stuyvesant neighborhood of Brooklyn and East Harlem. Again, both of these neighborhoods have a predominantly African-American and Latino population.

In [95]:
nyc_felony_sales['ADDR_PCT_CD'].value_counts(normalize=True)

46.0     0.044843
47.0     0.043946
40.0     0.043946
44.0     0.043049
67.0     0.034978
71.0     0.031390
52.0     0.026906
34.0     0.025112
79.0     0.024215
25.0     0.024215
75.0     0.024215
62.0     0.023318
32.0     0.023318
42.0     0.023318
43.0     0.022422
60.0     0.022422
61.0     0.020628
23.0     0.020628
73.0     0.018834
45.0     0.018834
77.0     0.017937
102.0    0.016143
69.0     0.015247
81.0     0.015247
70.0     0.014350
120.0    0.013453
114.0    0.013453
106.0    0.013453
30.0     0.013453
90.0     0.013453
           ...   
9.0      0.008072
10.0     0.008072
84.0     0.007175
88.0     0.007175
63.0     0.007175
105.0    0.007175
109.0    0.007175
48.0     0.007175
123.0    0.007175
122.0    0.006278
26.0     0.006278
7.0      0.006278
20.0     0.005381
66.0     0.005381
24.0     0.005381
112.0    0.005381
49.0     0.005381
110.0    0.005381
100.0    0.004484
108.0    0.004484
111.0    0.003587
107.0    0.003587
13.0     0.003587
5.0      0.003587
18.0     0

In [103]:
nyc_felony_sales['PATROL_BORO'].value_counts(normalize=True)

PATROL BORO BRONX            0.300448
PATROL BORO BKLYN SOUTH      0.200897
PATROL BORO MAN NORTH        0.154260
PATROL BORO BKLYN NORTH      0.140807
PATROL BORO QUEENS SOUTH     0.069955
PATROL BORO QUEENS NORTH     0.059193
PATROL BORO MAN SOUTH        0.045740
PATROL BORO STATEN ISLAND    0.028700
Name: PATROL_BORO, dtype: float64

An intriguing part of the NYPD dataset is a feature that describes the premises type that the arrest occurred in. As can be seen below, the majority of cannabis arrests happen either on the street or in the New York City housing projects. Violation possession charges also occur in the New York City subway system.

In [96]:
nyc['PREM_TYP_DESC'].value_counts(normalize=True)

STREET                          0.578530
RESIDENCE - PUBLIC HOUSING      0.191180
RESIDENCE - APT. HOUSE          0.078051
PARK/PLAYGROUND                 0.058614
OTHER                           0.023649
TRANSIT - NYC SUBWAY            0.015637
PUBLIC BUILDING                 0.010440
RESIDENCE-HOUSE                 0.010100
unknown                         0.007767
PARKING LOT/GARAGE (PUBLIC)     0.005020
GROCERY/BODEGA                  0.002873
PUBLIC SCHOOL                   0.002569
OPEN AREAS (OPEN LOTS)          0.002565
PARKING LOT/GARAGE (PRIVATE)    0.001412
BAR/NIGHT CLUB                  0.000831
COMMERCIAL BUILDING             0.000803
MARINA/PIER                     0.000785
HIGHWAY/PARKWAY                 0.000658
TAXI (LIVERY LICENSED)          0.000617
FAST FOOD                       0.000531
TUNNEL                          0.000517
RESTAURANT/DINER                0.000468
AIRPORT TERMINAL                0.000440
CANDY STORE                     0.000422
STORE UNCLASSIFI

In [104]:
nyc_misd_poss['PREM_TYP_DESC'].value_counts(normalize=True)

STREET                          0.578841
RESIDENCE - PUBLIC HOUSING      0.201673
RESIDENCE - APT. HOUSE          0.077571
PARK/PLAYGROUND                 0.061772
OTHER                           0.023795
PUBLIC BUILDING                 0.010572
RESIDENCE-HOUSE                 0.008843
unknown                         0.007712
TRANSIT - NYC SUBWAY            0.006059
PARKING LOT/GARAGE (PUBLIC)     0.005299
OPEN AREAS (OPEN LOTS)          0.002708
GROCERY/BODEGA                  0.002155
PUBLIC SCHOOL                   0.001962
PARKING LOT/GARAGE (PRIVATE)    0.001404
MARINA/PIER                     0.000862
BAR/NIGHT CLUB                  0.000796
COMMERCIAL BUILDING             0.000664
HIGHWAY/PARKWAY                 0.000598
TAXI (LIVERY LICENSED)          0.000492
TUNNEL                          0.000482
RESTAURANT/DINER                0.000426
FAST FOOD                       0.000365
BUS TERMINAL                    0.000345
CANDY STORE                     0.000330
STORE UNCLASSIFI

In [105]:
nyc_viol_poss['PREM_TYP_DESC'].value_counts(normalize=True)

STREET                          0.364239
TRANSIT - NYC SUBWAY            0.320312
RESIDENCE - PUBLIC HOUSING      0.113060
RESIDENCE - APT. HOUSE          0.056309
PARK/PLAYGROUND                 0.029776
OTHER                           0.029186
PUBLIC SCHOOL                   0.017836
PUBLIC BUILDING                 0.010761
RESIDENCE-HOUSE                 0.009581
unknown                         0.007075
AIRPORT TERMINAL                0.005896
BUS (NYC TRANSIT)               0.004570
HOSPITAL                        0.003243
TRANSIT FACILITY (OTHER)        0.003096
GROCERY/BODEGA                  0.002506
PARKING LOT/GARAGE (PUBLIC)     0.001916
COMMERCIAL BUILDING             0.001769
BAR/NIGHT CLUB                  0.001621
TUNNEL                          0.001621
PARKING LOT/GARAGE (PRIVATE)    0.001179
HIGHWAY/PARKWAY                 0.001179
BUS TERMINAL                    0.001179
BRIDGE                          0.001032
CANDY STORE                     0.000884
GAS STATION     

In [106]:
nyc_felony_poss['PREM_TYP_DESC'].value_counts(normalize=True)

STREET                          0.609854
RESIDENCE - APT. HOUSE          0.149933
RESIDENCE - PUBLIC HOUSING      0.072170
RESIDENCE-HOUSE                 0.066045
OTHER                           0.018908
unknown                         0.009055
TAXI (LIVERY LICENSED)          0.008788
AIRPORT TERMINAL                0.008256
PUBLIC BUILDING                 0.006125
COMMERCIAL BUILDING             0.004794
TRANSIT - NYC SUBWAY            0.004527
GROCERY/BODEGA                  0.004527
HIGHWAY/PARKWAY                 0.003995
PARKING LOT/GARAGE (PUBLIC)     0.003728
PARK/PLAYGROUND                 0.003462
STORE UNCLASSIFIED              0.003462
PARKING LOT/GARAGE (PRIVATE)    0.002929
HOTEL/MOTEL                     0.001332
BRIDGE                          0.001332
TUNNEL                          0.001065
RESTAURANT/DINER                0.001065
SMALL MERCHANT                  0.001065
OPEN AREAS (OPEN LOTS)          0.001065
PRIVATE/PAROCHIAL SCHOOL        0.001065
FAST FOOD       

In [107]:
nyc_misd_sales['PREM_TYP_DESC'].value_counts(normalize=True)

STREET                          0.685717
RESIDENCE - PUBLIC HOUSING      0.102573
RESIDENCE - APT. HOUSE          0.073517
PARK/PLAYGROUND                 0.043235
OTHER                           0.019604
GROCERY/BODEGA                  0.014528
RESIDENCE-HOUSE                 0.011378
PUBLIC BUILDING                 0.009540
unknown                         0.008314
TRANSIT - NYC SUBWAY            0.004639
PUBLIC SCHOOL                   0.003676
FAST FOOD                       0.002888
PARKING LOT/GARAGE (PUBLIC)     0.002451
OPEN AREAS (OPEN LOTS)          0.001838
CANDY STORE                     0.001838
PARKING LOT/GARAGE (PRIVATE)    0.001313
COMMERCIAL BUILDING             0.001313
STORE UNCLASSIFIED              0.001138
LIQUOR STORE                    0.000963
BAR/NIGHT CLUB                  0.000963
RESTAURANT/DINER                0.000788
BUS TERMINAL                    0.000788
GAS STATION                     0.000700
VARIETY STORE                   0.000700
BEAUTY & NAIL SA

In [108]:
nyc_felony_sales['PREM_TYP_DESC'].value_counts(normalize=True)

STREET                         0.623318
RESIDENCE - PUBLIC HOUSING     0.119283
RESIDENCE - APT. HOUSE         0.099552
RESIDENCE-HOUSE                0.034081
OTHER                          0.021525
PARK/PLAYGROUND                0.018834
PUBLIC SCHOOL                  0.013453
unknown                        0.011659
PUBLIC BUILDING                0.008969
GROCERY/BODEGA                 0.007175
TRANSIT - NYC SUBWAY           0.006278
PARKING LOT/GARAGE (PUBLIC)    0.005381
OPEN AREAS (OPEN LOTS)         0.004484
FAST FOOD                      0.002691
HIGHWAY/PARKWAY                0.002691
BEAUTY & NAIL SALON            0.001794
DRY CLEANER/LAUNDRY            0.001794
GAS STATION                    0.001794
BAR/NIGHT CLUB                 0.001794
CLOTHING/BOUTIQUE              0.001794
TELECOMM. STORE                0.001794
RESTAURANT/DINER               0.001794
TAXI (LIVERY LICENSED)         0.001794
CANDY STORE                    0.000897
CHAIN STORE                    0.000897


As can be expected, the jurisdiction responsible for the majority of cannabis arrests are the NYPD, the New York City Housing Authority (NYCHA), and to a much lesser degree the N.Y. Transit Police. The fact that 19% of cannabis arrests fall under the jurisdiction of the NYCHA shows how heavily policed these public housing projects are.

In [110]:
nyc['JURIS_DESC'].value_counts(normalize=True)

N.Y. POLICE DEPT                0.785102
N.Y. HOUSING POLICE             0.194285
N.Y. TRANSIT POLICE             0.016064
OTHER                           0.001625
PORT AUTHORITY                  0.001244
POLICE DEPT NYC                 0.000713
TRI-BORO BRDG TUNNL             0.000291
N.Y. STATE POLICE               0.000172
DEPT OF CORRECTIONS             0.000172
HEALTH & HOSP CORP              0.000095
NYC PARKS                       0.000064
N.Y. STATE PARKS                0.000050
NEW YORK CITY SHERIFF OFFICE    0.000036
LONG ISLAND RAILRD              0.000023
METRO NORTH                     0.000018
U.S. PARK POLICE                0.000018
STATN IS RAPID TRANS            0.000014
NYS DEPT TAX AND FINANCE        0.000009
AMTRACK                         0.000005
Name: JURIS_DESC, dtype: float64

In [111]:
nyc_misd_poss['JURIS_DESC'].value_counts(normalize=True)

N.Y. POLICE DEPT                0.785124
N.Y. HOUSING POLICE             0.204847
N.Y. TRANSIT POLICE             0.006313
OTHER                           0.001430
PORT AUTHORITY                  0.000796
POLICE DEPT NYC                 0.000715
TRI-BORO BRDG TUNNL             0.000218
DEPT OF CORRECTIONS             0.000167
N.Y. STATE POLICE               0.000117
HEALTH & HOSP CORP              0.000061
NYC PARKS                       0.000056
N.Y. STATE PARKS                0.000046
NEW YORK CITY SHERIFF OFFICE    0.000041
LONG ISLAND RAILRD              0.000020
U.S. PARK POLICE                0.000020
METRO NORTH                     0.000010
NYS DEPT TAX AND FINANCE        0.000010
STATN IS RAPID TRANS            0.000010
Name: JURIS_DESC, dtype: float64

The fact that the N.Y. Transit Police takes the NYCHA's place for violation possession charges show an interesting difference in enforcement of the different cannabis types, and reflects the fact that the premises type for violation possession is frequently in the N.Y. subway system.

In [112]:
nyc_viol_poss['JURIS_DESC'].value_counts(normalize=True)

N.Y. POLICE DEPT       0.539357
N.Y. TRANSIT POLICE    0.326651
N.Y. HOUSING POLICE    0.115419
PORT AUTHORITY         0.007665
OTHER                  0.005896
TRI-BORO BRDG TUNNL    0.001769
HEALTH & HOSP CORP     0.001032
POLICE DEPT NYC        0.000737
DEPT OF CORRECTIONS    0.000442
NYC PARKS              0.000295
METRO NORTH            0.000295
LONG ISLAND RAILRD     0.000147
N.Y. STATE PARKS       0.000147
AMTRACK                0.000147
Name: JURIS_DESC, dtype: float64

In [113]:
nyc_misd_poss['JURIS_DESC'].value_counts(normalize=True)

N.Y. POLICE DEPT                0.785124
N.Y. HOUSING POLICE             0.204847
N.Y. TRANSIT POLICE             0.006313
OTHER                           0.001430
PORT AUTHORITY                  0.000796
POLICE DEPT NYC                 0.000715
TRI-BORO BRDG TUNNL             0.000218
DEPT OF CORRECTIONS             0.000167
N.Y. STATE POLICE               0.000117
HEALTH & HOSP CORP              0.000061
NYC PARKS                       0.000056
N.Y. STATE PARKS                0.000046
NEW YORK CITY SHERIFF OFFICE    0.000041
LONG ISLAND RAILRD              0.000020
U.S. PARK POLICE                0.000020
METRO NORTH                     0.000010
NYS DEPT TAX AND FINANCE        0.000010
STATN IS RAPID TRANS            0.000010
Name: JURIS_DESC, dtype: float64

In [114]:
nyc_felony_poss['JURIS_DESC'].value_counts(normalize=True)

N.Y. POLICE DEPT       0.901198
N.Y. HOUSING POLICE    0.074301
PORT AUTHORITY         0.011185
N.Y. TRANSIT POLICE    0.004527
OTHER                  0.003196
N.Y. STATE POLICE      0.002929
TRI-BORO BRDG TUNNL    0.002397
HEALTH & HOSP CORP     0.000266
Name: JURIS_DESC, dtype: float64

Because of the fact that nearly 20% of all cannabis arrests occur in N.Y. housing projects, it pays to look at the 'HADEVELOPT' feature, which tells which housing project the cannabis arrest occurred in. Because there are so many unknown values in this feature (as roughly 80% of cannabis arrests occurred outside of N.Y. housing projects), it makes sense for reporting purposes to first re-base the feature by removing the unknown values.

In [115]:
nyc['HADEVELOPT'].value_counts()

not_housing_devpt_crime                  193180
CASTLE HILL                                 851
BUTLER                                      814
BRONXDALE                                   655
SOUNDVIEW                                   616
LINDEN                                      574
MARCY                                       528
THROGGS NECK                                436
WHITMAN                                     419
MONROE                                      419
LINCOLN                                     413
PINK                                        372
SAINT MARY'S PARK                           355
INGERSOLL                                   353
CYPRESS HILLS                               341
GRANT                                       340
MITCHEL                                     340
BOULEVARD                                   331
WILLIAMSBURG                                324
ADAMS                                       316
BRONX RIVER                             

In [118]:
nyc_hadevelopt_reported = nyc[nyc.HADEVELOPT != 'not_housing_devpt_crime']

In [120]:
nyc_hadevelopt_reported['HADEVELOPT'].value_counts(normalize=True)

CASTLE HILL                              0.031373
BUTLER                                   0.030009
BRONXDALE                                0.024147
SOUNDVIEW                                0.022710
LINDEN                                   0.021161
MARCY                                    0.019465
THROGGS NECK                             0.016074
MONROE                                   0.015447
WHITMAN                                  0.015447
LINCOLN                                  0.015226
PINK                                     0.013714
SAINT MARY'S PARK                        0.013088
INGERSOLL                                0.013014
CYPRESS HILLS                            0.012571
MITCHEL                                  0.012535
GRANT                                    0.012535
BOULEVARD                                0.012203
WILLIAMSBURG                             0.011945
ADAMS                                    0.011650
BRONX RIVER                              0.011244


The top 10 N.Y. housing developments with the highest proportion of cannabis arrests are all in the South Bronx or in economically disadvantaged areas of Brooklyn.

Cannabis arrests occur more frequently during certain times of the day. 39% occur during the daytime (6 am - 6 pm), and 61% occur during the nighttime (6 pm - 6 am).

In [122]:
nyc['day_tw'].value_counts(normalize=True)

0    0.610998
1    0.389002
Name: day_tw, dtype: float64

Looking at just the daytime hours, one sees that the work day (9 am - 6 pm) obviously encloses the majority of those arrests. Early morning (6 am - 7:30 am) and the morning rush hour (7:30 am - 9 am) have very little arrests, but this picks up during the lunch hour (12-1 pm).

In [124]:
nyc['early_morn'].value_counts(normalize=True)

0    0.994471
1    0.005529
Name: early_morn, dtype: float64

In [125]:
nyc['morn_rush_hr'].value_counts(normalize=True)

0    0.990717
1    0.009283
Name: morn_rush_hr, dtype: float64

In [126]:
nyc['work_day'].value_counts(normalize=True)

0    0.624743
1    0.375257
Name: work_day, dtype: float64

In [127]:
nyc['lunch_hr'].value_counts(normalize=True)

0    0.960782
1    0.039218
Name: lunch_hr, dtype: float64

The long New York metropolitan area's evening rush hour (4:30 pm - 7 pm) straddles the daytime (6 am - 6 pm) and nighttime (6 pm - 6 am) windows, but one sees a fairly concentration of arrests happening during this time window.

In [128]:
nyc['eve_rush_hr'].value_counts(normalize=True)

0    0.819251
1    0.180749
Name: eve_rush_hr, dtype: float64

The nighttime sees the majority of cannabis arrests, at 61%.

In [123]:
nyc['night_tw'].value_counts(normalize=True)

1    0.610998
0    0.389002
Name: night_tw, dtype: float64

Overlapping with the evening rush hour, the dinner window of 6-8 pm has a high concentration of arrests for just a two hour window, and has nearly as many arrests as occur in the 2.5 hour window of the evening rush hour.

In [129]:
nyc['dinner'].value_counts(normalize=True)

0    0.825583
1    0.174417
Name: dinner, dtype: float64

Evening (8-10 pm) has a similarly high concentration of arrests at 19% for a two hour window.

In [130]:
nyc['evening'].value_counts(normalize=True)

0    0.809732
1    0.190268
Name: evening, dtype: float64

Late night (10 pm - 6 am) has 26% of the arrests for an 8 hour window, showing that more than half of the nighttime arrests do not happen during the nightlife hours, but after work and before the working population would typically go to bed.

In [131]:
nyc['late_night'].value_counts(normalize=True)

0    0.737995
1    0.262005
Name: late_night, dtype: float64

It has been well reported that during Mayor Bloomberg's time as mayor, cannabis arrests reached their peak. One can see that 2006 has 15,127 arrests, and that this increases to 24,468 arrests in 2010. This holds fairly steady for 2011 (23,827), drops a bit in 2012 (20,611) as criticism of Bloomberg's "stop and frisk" program mounts, and then drops significantly in 2013 (16,206) when the "stop and frisk" program is judged as unconstitutional by Judge Scheindlin (NY Times, 2013). Mayor DeBlasio, who vowed to reverse the program, took office in 2014, but cannabis arrests remained fairly consistent in that year compared to 2013 (15,787). By 2015, the number was still fairly high but significantly dropped (11,424). This number stayed consistent through 2017, and then dropped by half in 2018 as discussions of cannabis legalization in New York intensified.

In [138]:
nyc['start_year'].value_counts()

2010    24468
2011    23827
2009    23612
2012    20611
2008    20571
2007    19686
2013    16206
2014    15787
2006    15127
2016    11789
2017    11458
2015    11424
2018     5739
Name: start_year, dtype: int64

Each month of the year has about the same amount of cannabis arrests, but August has the highest number and the number drops in November and December during the Holiday season.

In [142]:
nyc['start_month'].value_counts()

8     20707
3     19767
5     19534
10    19425
9     19374
4     18936
7     18844
1     18328
6     18157
2     17733
11    15925
12    13575
Name: start_month, dtype: int64

Each day of the month has a fairly consistent number of cannabis arrests, ranging from 5,660 to 7,900 arrests a day. The number drops somewhat in the last 10 days of the month. The 31st has roughly half the arrests as the rest of the month, because not every month has 31 days.

In [143]:
nyc['start_day'].value_counts()

12    7900
8     7887
11    7852
10    7774
20    7750
13    7687
9     7685
3     7607
5     7605
16    7560
2     7524
15    7519
14    7434
6     7427
17    7350
21    7345
7     7329
4     7258
19    7253
18    7251
1     7231
22    6955
23    6925
27    6752
24    6688
28    6654
25    6559
26    6498
29    6004
30    5660
31    3382
Name: start_day, dtype: int64

Because of the importance of holidays to various cultural groups, and because of the differences in how certain groups of people are arrested for cannabis, it makes sense to look at whether certain holidays have higher concentrations of cannabis arrests. Due to the cultural diversity of New York City, certain holidays are included that would not be typically celebrated in other parts of the United States. Intriguingly, the holidays with the highest number of cannabis arrests are Hindu, Jewish, and Muslim holidays. Diwali had 656 arrests, Yom Kippur has 707, Rosh Hashanah has 677, Eid al-Fitr has 644, and Eid al-Adha has 544. St. Patrick's Day also has a high number at 542, which may be due to co-occurring cannabis use that happens during the large amount of public drunkenness that occurs on New York City streets on that day.

In [167]:
nyc['new_years_day'].value_counts()

0    219925
1       380
Name: new_years_day, dtype: int64

In [168]:
nyc['new_years_eve'].value_counts()

0    220127
1       178
Name: new_years_eve, dtype: int64

In [169]:
nyc['christmas_eve'].value_counts()

0    220204
1       101
Name: christmas_eve, dtype: int64

In [170]:
nyc['christmas'].value_counts()

0    220246
1        59
Name: christmas, dtype: int64

In [171]:
nyc['july_4th'].value_counts()

0    219780
1       525
Name: july_4th, dtype: int64

In [172]:
nyc['valentines'].value_counts()

0    219774
1       531
Name: valentines, dtype: int64

In [173]:
nyc['halloween'].value_counts()

0    219855
1       450
Name: halloween, dtype: int64

In [174]:
nyc['st_patricks'].value_counts()

0    219763
1       542
Name: st_patricks, dtype: int64

In [175]:
nyc['mlk'].value_counts()

0    219902
1       403
Name: mlk, dtype: int64

In [176]:
nyc['pres'].value_counts()

0    219900
1       405
Name: pres, dtype: int64

In [177]:
nyc['easter'].value_counts()

0    220042
1       263
Name: easter, dtype: int64

In [178]:
nyc['diwali'].value_counts()

0    219649
1       656
Name: diwali, dtype: int64

In [179]:
nyc['pr_parade'].value_counts()

0    219869
1       436
Name: pr_parade, dtype: int64

In [180]:
nyc['yomkippur'].value_counts()

0    219598
1       707
Name: yomkippur, dtype: int64

In [181]:
nyc['rosh_hashanah'].value_counts()

0    219628
1       677
Name: rosh_hashanah, dtype: int64

In [182]:
nyc['eid_al_fitr'].value_counts()

0    219641
1       664
Name: eid_al_fitr, dtype: int64

In [183]:
nyc['eid_al_adha'].value_counts()

0    219761
1       544
Name: eid_al_adha, dtype: int64

In [184]:
nyc['hannukkah'].value_counts()

0    219874
1       431
Name: hannukkah, dtype: int64

In [185]:
nyc['memorial_day'].value_counts()

0    220080
1       225
Name: memorial_day, dtype: int64

In [186]:
nyc['labor_day'].value_counts()

0    219914
1       391
Name: labor_day, dtype: int64

In [187]:
nyc['thanksgiving'].value_counts()

0    220190
1       115
Name: thanksgiving, dtype: int64

The picture that emerges from exploring the descriptive statistics of cannabis arrests in New York City between 2006 and 2018 is one of racial bias against African-Americans and Hispanics for all five levels of cannabis crimes. This is further supported by looking at the geographic areas where these arrests are occurring, and seeing that from every angle the geographic areas being hit the most are boroughs, precincts, neighborhoods, and housing projects that are predominantly occupied by African-American and Hispanic residents. These arrests are largely happening during the evening and early nighttime hours of the day, and it was also seen that there are not huge spikes in holiday arrests except for those holidays intrinsically linked with religious minorities. 

Citations:

Harcourt, B.E. & Ludwig, J., "Reefer Madness: Broken Windows Policing and Misdemeanor Marijuana Arrests in New York", University of Chicago Law School: Chicago Unbound, Working Papers, 2006, https://chicagounbound.uchicago.edu/cgi/viewcontent.cgi?article=1250&context=public_law_and_legal_theory

Levine, H., Sociology Department, Queens College, "Unjust and Unconstitutional", Marijuana Arrest Research Project and the Drug Policy Alliance, July 2017, https://www.drugpolicy.org/sites/default/files/Marijuana-Arrests-NYC--Unjust-Unconstitutional--July2017_2.pdf

Mueller, B., Gebeloff, R., Chinoy, S., "Surest Way to Face Marijuana Charges in New York: Be Black or Hispanic", New York Times, May 13, 2018, https://www.nytimes.com/2018/05/13/nyregion/marijuana-arrests-nyc-race.html

Results from the 2016 National Survey on Drug Use and Health: Detailed Tables, SAMHSA, 2016, https://www.samhsa.gov/data/sites/default/files/NSDUH-DetTabs-2016/NSDUH-DetTabs-2016.pdf

"New York State Penal Law". Article 221,  No. 221 of 2016. Retrieved November 13, 2016.

Goldstein, J., "Judge Rejects New York's Stop-and-Frisk Policy", New York Times, August 12, 2013, https://www.nytimes.com/2013/08/13/nyregion/stop-and-frisk-practice-violated-rights-judge-rules.html