# NY Druc Crime Analysis
#### Proposal
As you are aware, drug-related crimes have been a persistent issue in urban areas,
including New York City. Understanding the dynamics and patterns of these crimes can have
significant implications for law enforcement and policymakers. Research in this field can
provide insights into the underlying factors of drug-related crimes in NYC.

Our project proposal aims to explore the trends and patterns in drug crimes in NYC using
available data that includes the precincts where these crimes occurred, the specific time of when
they occured, when the crime was reported, descriptions and levels of offense, whether the
crime was successfully completed, latitude-longitude coordinates, and more.

Also, we will investigate the correlation between the crimes and other factors, like
geography, population, race, and compare the difference between districts. We aim to identify
geographical hotspots and coldspots of drug-related crimes within NYC, using provided data that
includes lat-lon coordinates. We will examine temporal patterns to understand when drug-related
crimes are more likely to occur during the day, which can shed light on potential contributing
factors. We intend to categorize drug-related offenses by type and severity, offering a deeper
understanding of the nature of these crimes in the city.

Our project will involve data cleaning and preprocessing to ensure data quality and
accuracy. Multiple columns/features in the dataset have NaNs or ambiguous values (precinct #),
and we will deal with this appropriately. We will use various data analysis techniques, including
spatial analysis, time series analysis, and data visualization, to achieve the project’s objectives.

#### Data Sources
* Drug Crime | NYC Open Data (cityofnewyork.us)
* BYTES of the BIG APPLE - DCP (nyc.gov)
* 2020 Census -DCP (nyc.gov)

In [1]:
# Imports
import preprocess_utils as pu
import data_utils as du
import plotly.express as px

In [2]:
raw_datasets = pu.import_data()
datasets = pu.preprocess_datasets(raw_datasets)  # Indices: 'Drug_Crime', 'All', 'Hispanic', 'Asian', 'Black', 'White'

In [3]:
datasets['Drug_Crime']

Unnamed: 0,Year,Time,Precinct,Reported on:,Description,NYC Penal Code,Crime,Completed?,Crime Category,BORO_NM,...,PREM_TYP_DESC,JURIS_DESC,PARKS_NM,HADEVELOPT,X_COORD_CD,Y_COORD_CD,Lat_Lon,Time of Day,Phone,Address
0,2011,02:00:00,41.0,2011,DANGEROUS DRUGS,503,POSS. OF CONTROLLED SUBSTANCE W/ INTENT TO SELL,True,FELONY,BRONX,...,STREET,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1013037.0,236657.0,"(40.8162058439227, -73.8960011932583)",night,718-542-4771,1035 Longwood Avenue
1,2000,19:10:00,75.0,2006,DANGEROUS DRUGS,511,7 DEG POSS. OF CONTROLLED,True,MISDEMEANOR,BROOKLYN,...,STREET,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1017036.0,183890.0,"(40.6713598203364, -73.8818110231735)",night,718-827-3511,1000 Sutter Avenue
2,2005,21:25:00,113.0,2006,DANGEROUS DRUGS,503,POSS. OF CONTROLLED SUBSTANCE W/ INTENT TO SELL,True,FELONY,QUEENS,...,RESIDENCE-HOUSE,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1046315.0,187088.0,"(40.6799807384666, -73.7762339071953)",night,718-712-7733,167-02 Baisley Boulevard
3,2005,21:50:00,42.0,2006,DANGEROUS DRUGS,503,POSS. OF CONTROLLED SUBSTANCE W/ INTENT TO SELL,True,FELONY,BRONX,...,STREET,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1008690.0,238862.0,"(40.8222710411331, -73.911697780277)",night,718-402-3887,830 Washington Avenue
4,2005,19:18:00,48.0,2007,DANGEROUS DRUGS,511,7 DEG POSS. OF CONTROLLED,True,MISDEMEANOR,BRONX,...,RESIDENCE - APT. HOUSE,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1011751.0,246839.0,"(40.8441566000203, -73.9006054489734)",night,718-299-3900,450 Cross Bronx Expressway
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
443485,2022,22:59:00,48.0,2022,DANGEROUS DRUGS,503,POSS. OF CONTROLLED SUBSTANCE W/ INTENT TO SELL,True,FELONY,BRONX,...,STREET,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1014655.0,248327.0,"(40.848224, -73.890098)",night,718-299-3900,450 Cross Bronx Expressway
443486,2022,11:53:00,28.0,2022,DANGEROUS DRUGS,501,"3, 4, 5 DEG POSS. OF CONTROLLED SUBSTANCE",True,FELONY,MANHATTAN,...,STREET,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,997602.0,230430.0,"(40.799146, -73.951772)",morning,212-678-1611,2271-89 8th Avenue
443487,2022,21:05:00,52.0,2022,DANGEROUS DRUGS,503,POSS. OF CONTROLLED SUBSTANCE W/ INTENT TO SELL,True,FELONY,BRONX,...,STREET,N.Y. POLICE DEPT,Not at a park,Not at a HA dev,1016545.0,255351.0,"(40.867497, -73.883234)",night,718-220-5811,3016 Webster Avenue
443488,2022,02:55:00,71.0,2022,DANGEROUS DRUGS,510,POSS. OF CONTROLLED SUBSTANCE W/ INTENT TO SELL,True,FELONY,BROOKLYN,...,TRANSIT - NYC SUBWAY,N.Y. TRANSIT POLICE,Not at a park,Not at a HA dev,995908.0,183618.0,"(40.67065802, -73.95797447)",night,718-735-0511,421 Empire Boulevard


In [4]:
datasets['Census']

Unnamed: 0_level_0,GeoType,Borough,Name,Hispanic Pop Change,Hispanic Natural Change,Hispanic Net Migration,Hispanic Pop_10,Hispanic Pop_20,All Pop Change,All Natural Change,...,Asian Pop Change,Asian Natural Change,Asian Net Migration,Asian Pop_10,Asian Pop_20,White Pop Change,White Natural Change,White Net Migration,White Pop_10,White Pop_20
GeoID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,NYC,New York City,NYC (adjusted for citywide total population in...,154274,242213,-87939,2336076,2490350,561566,612638,...,345383,148946,196437,1028119,1373502,-3048,126719,-129767,2722904,2719856
1,Boro,Manhattan,Manhattan,-937,24328,-25265,403577,402640,108378,81949,...,42000,19511,22489,177624,219624,31801,38086,-6285,761493,793294
2,Boro,Bronx,Bronx,65050,84961,-19911,741413,806463,87546,114402,...,20431,8012,12419,47335,67766,-20413,-8271,-12142,151209,130796
3,Boro,Brooklyn,Brooklyn,20141,50358,-30217,496285,516426,231374,246479,...,110647,49274,61373,260129,370776,75121,103482,-28361,893306,968427
4,Boro,Queens,Queens,54111,72532,-18421,613750,667861,174742,152976,...,148249,68698,79551,508334,656583,-67369,-5932,-61437,616727,549358
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SI0391,NTA2020,Staten Island,Freshkills Park (South),-9,2,-11,44,35,-2,9,...,4,0,4,15,19,-7,6,-13,38,31
SI9561,NTA2020,Staten Island,Fort Wadsworth,-3,29,-32,157,154,-236,120,...,9,5,4,6,15,-225,72,-297,463,238
SI9591,NTA2020,Staten Island,Hoffman & Swinburne Islands,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
SI9592,NTA2020,Staten Island,Miller Field,-2,0,-2,10,8,15,-1,...,7,0,7,0,7,7,0,7,12,19


In [5]:
# X = datasets['Census'][['Hispanic Pop_10', 'Asian Pop_10', 'White Pop_10', 'Black Pop_10']]
# X

In [6]:
import pandas as pd
import numpy as np

In [13]:
total_census = datasets['Census'].iloc[:][0:6].set_index('Borough', drop=False)
total_census

Unnamed: 0_level_0,GeoType,Borough,Name,Hispanic Pop Change,Hispanic Natural Change,Hispanic Net Migration,Hispanic Pop_10,Hispanic Pop_20,All Pop Change,All Natural Change,...,Asian Pop Change,Asian Natural Change,Asian Net Migration,Asian Pop_10,Asian Pop_20,White Pop Change,White Natural Change,White Net Migration,White Pop_10,White Pop_20
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
New York City,NYC,New York City,NYC (adjusted for citywide total population in...,154274,242213,-87939,2336076,2490350,561566,612638,...,345383,148946,196437,1028119,1373502,-3048,126719,-129767,2722904,2719856
Manhattan,Boro,Manhattan,Manhattan,-937,24328,-25265,403577,402640,108378,81949,...,42000,19511,22489,177624,219624,31801,38086,-6285,761493,793294
Bronx,Boro,Bronx,Bronx,65050,84961,-19911,741413,806463,87546,114402,...,20431,8012,12419,47335,67766,-20413,-8271,-12142,151209,130796
Brooklyn,Boro,Brooklyn,Brooklyn,20141,50358,-30217,496285,516426,231374,246479,...,110647,49274,61373,260129,370776,75121,103482,-28361,893306,968427
Queens,Boro,Queens,Queens,54111,72532,-18421,613750,667861,174742,152976,...,148249,68698,79551,508334,656583,-67369,-5932,-61437,616727,549358
Staten Island,Boro,Staten Island,Staten Island,15909,10168,5741,81051,96960,27017,17822,...,24056,3490,20566,34697,58753,-22188,-62,-22126,300169,277981


In [14]:
filtered_census_10 = du.filter_by_boro_feature(total_census, feature='Pop_10').drop(columns=['All'])
filtered_census_10

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_dataset.rename(columns=rename_cols, inplace=True)


Unnamed: 0_level_0,Hispanic,Black,Asian,White
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
New York City,2336076,1861295,1028119,2722904
Manhattan,403577,205340,177624,761493
Bronx,741413,416695,47335,151209
Brooklyn,496285,799066,260129,893306
Queens,613750,395881,508334,616727
Staten Island,81051,44313,34697,300169


In [15]:
filtered_census_20 = du.filter_by_boro_feature(total_census, feature='Pop_20').drop(columns=['All'])
filtered_census_20

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_dataset.rename(columns=rename_cols, inplace=True)


Unnamed: 0_level_0,Hispanic,Black,Asian,White
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
New York City,2490350,1776891,1373502,2719856
Manhattan,402640,199592,219624,793294
Bronx,806463,419393,67766,130796
Brooklyn,516426,729696,370776,968427
Queens,667861,381375,656583,549358
Staten Island,96960,46835,58753,277981


In [16]:
# Noramalize all rows to between 0-1 via the euclidean norm
filtered_census_10 = du.normalize(filtered_census_10)
filtered_census_20 = du.normalize(filtered_census_20)
filtered_census_10

Unnamed: 0_level_0,Hispanic,Black,Asian,White
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
New York City,0.560146,0.446302,0.246523,0.652899
Manhattan,0.446641,0.227251,0.196578,0.842749
Bronx,0.857006,0.481661,0.054715,0.174784
Brooklyn,0.375106,0.603956,0.196613,0.675186
Queens,0.566887,0.365654,0.46952,0.569637
Staten Island,0.256513,0.140244,0.10981,0.949986


In [17]:
filtered_census_20

Unnamed: 0_level_0,Hispanic,Black,Asian,White
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
New York City,0.576765,0.411528,0.318103,0.629919
Manhattan,0.429336,0.212826,0.234186,0.845892
Bronx,0.875777,0.455439,0.07359,0.142038
Brooklyn,0.377197,0.532969,0.270814,0.707338
Queens,0.580336,0.331395,0.570536,0.477363
Staten Island,0.319113,0.154142,0.193367,0.914886


In [94]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=3, 
                    start_cell="top-left",
                    specs=[[{'type': 'polar'}] * 3] * 2, 
                    shared_xaxes = True,
                    subplot_titles = ['New York City', 'Manhattan', 'Bronx', 'Brooklyn', 'Queens', 'Staten Island']
                   )

row = 1
col = 1
first_plot = True
for borough in ['New York City', 'Manhattan', 'Bronx', 'Brooklyn', 'Queens', 'Staten Island']:
    fig.add_trace(go.Scatterpolar(
                r=filtered_census_10.loc[borough][:],
                theta=list(filtered_census_10.columns),
                fill='toself',
                name='2010s',
                legendgroup="2010s",
                showlegend=first_plot,
                fillcolor='rgba(255, 87, 51, 0.75)',
                line={'color': 'rgba(255, 87, 51, 0.2)'}),
            row=row, col=col)
    fig.add_trace(go.Scatterpolar(
                r=filtered_census_20.loc[borough][:],
                theta=list(filtered_census_20.columns),
                fill='toself',
                name='2020s', 
                legendgroup="2020s",
                showlegend=first_plot,
                fillcolor='rgba(207, 159, 255, 0.5)',
                line={'color': 'rgba(207, 159, 255, 0.5)'}), 
            row=row, col=col)
    
    fig.update_annotations(y=(1.05 if row == 1 else 0.45), selector={'text':borough})
    
    col = (col + 1) % 3 + 1
    row = (row + 1) if col == 1 else row
    first_plot = False

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
)

fig.show()


In [None]:
plot_df = pd.DataFrame(dict(
    r=filtered_census_10.loc['Manhattan'][:],
    theta=list(filtered_census_10.columns)))
fig = px.line_polar(plot_df, r='r', theta='theta', line_close=True)
fig.update_traces(fill='toself')
fig.show()