# üîç NYC Crime Hotspot Analysis with Temporal Event Impact

## Project Overview
This notebook analyzes crime hotspots in NYC and their temporal dynamics around major events.

**Final DataFrames:**
- `df_crimes`: Filtered arrest data (5 crime categories, valid coordinates)
- `df_events`: Major NYC events (Halloween, Independence Day, NYC Marathon)

**Analysis Objectives:**
1. Load and preprocess NYC arrest data (2024)
2. Load and filter major event data (2024)
3. Perform temporal hotspot analysis around events
4. Compare hotspot dynamics: baseline vs. event periods

In [None]:
# ========================================
# 1. IMPORT LIBRARIES
# ========================================
import contextily as ctx
import pandas as pd
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull
import seaborn as sns
from sodapy import Socrata
import warnings
warnings.filterwarnings('ignore')



‚úÖ All libraries imported successfully
NumPy version: 2.3.2
Pandas version: 2.3.2


In [14]:
# ========================================
# 2. LOAD ARREST DATA
# ========================================

# Load 2024 arrest data from NYC Open Data API
print("üì• Loading NYC Arrest Data (2024)...")
client = Socrata("data.cityofnewyork.us", None)
results = client.get("8h9b-rp9u", 
                     where="arrest_date >= '2024-01-01T00:00:00' AND arrest_date < '2025-01-01T00:00:00.000'",
                     limit=350000)
df = pd.DataFrame.from_records(results)

print(f"‚úÖ Loaded {len(df):,} total arrest records")
print(f"üìÖ Date range: {df['arrest_date'].min()} to {df['arrest_date'].max()}")



üì• Loading NYC Arrest Data (2024)...
‚úÖ Loaded 260,503 total arrest records
üìÖ Date range: 2024-01-01T00:00:00.000 to 2024-12-31T00:00:00.000
‚úÖ Loaded 260,503 total arrest records
üìÖ Date range: 2024-01-01T00:00:00.000 to 2024-12-31T00:00:00.000


In [15]:
# ========================================
# 3. PREPROCESSING - FILTER TO 5 CRIME CATEGORIES
# ========================================

# Define the 5 crime categories
relevant_crimes = [
    'ROBBERY',
    'ASSAULT 3 & RELATED OFFENSES',
    'DANGEROUS DRUGS',
    'PETIT LARCENY',
    'CRIMINAL TRESPASS'
]

# Filter to relevant crimes only
df_crimes = df[df['ofns_desc'].isin(relevant_crimes)].copy()
print(f"‚úÖ Filtered to {len(df_crimes):,} arrests in 5 crime categories")

# Convert coordinates to numeric
df_crimes['latitude'] = pd.to_numeric(df_crimes['latitude'], errors='coerce')
df_crimes['longitude'] = pd.to_numeric(df_crimes['longitude'], errors='coerce')

# Remove missing coordinates
df_crimes = df_crimes.dropna(subset=['latitude', 'longitude'])
print(f"‚úÖ After removing missing coordinates: {len(df_crimes):,} arrests")

# Validate NYC coordinates
df_crimes = df_crimes[
    (df_crimes['latitude'] >= 40.5) & 
    (df_crimes['latitude'] <= 41.0) &
    (df_crimes['longitude'] >= -74.3) & 
    (df_crimes['longitude'] <= -73.7)
]
print(f"‚úÖ After coordinate validation: {len(df_crimes):,} arrests")

# Convert arrest_date to datetime and extract temporal features
df_crimes['arrest_datetime'] = pd.to_datetime(df_crimes['arrest_date'])
df_crimes['date'] = df_crimes['arrest_datetime'].dt.date
df_crimes['hour'] = df_crimes['arrest_datetime'].dt.hour
df_crimes['day_of_week'] = df_crimes['arrest_datetime'].dt.dayofweek

# Summary statistics
print("\n" + "="*60)
print("FINAL CRIME DATASET (df_crimes)")
print("="*60)
print(f"Total arrests: {len(df_crimes):,}")
print(f"Date range: {df_crimes['arrest_datetime'].min()} to {df_crimes['arrest_datetime'].max()}")
print(f"\nCrime type distribution:")
crime_dist = df_crimes['ofns_desc'].value_counts()
for crime, count in crime_dist.items():
    pct = (count / len(df_crimes)) * 100
    print(f"  ‚Ä¢ {crime}: {count:,} ({pct:.1f}%)")
print("="*60)

‚úÖ Filtered to 98,653 arrests in 5 crime categories
‚úÖ After removing missing coordinates: 98,652 arrests
‚úÖ After coordinate validation: 98,650 arrests

FINAL CRIME DATASET (df_crimes)
Total arrests: 98,650
Date range: 2024-01-01 00:00:00 to 2024-12-31 00:00:00

Crime type distribution:
  ‚Ä¢ ASSAULT 3 & RELATED OFFENSES: 38,236 (38.8%)
  ‚Ä¢ PETIT LARCENY: 27,107 (27.5%)
  ‚Ä¢ DANGEROUS DRUGS: 18,518 (18.8%)
  ‚Ä¢ ROBBERY: 12,020 (12.2%)
  ‚Ä¢ CRIMINAL TRESPASS: 2,769 (2.8%)

FINAL CRIME DATASET (df_crimes)
Total arrests: 98,650
Date range: 2024-01-01 00:00:00 to 2024-12-31 00:00:00

Crime type distribution:
  ‚Ä¢ ASSAULT 3 & RELATED OFFENSES: 38,236 (38.8%)
  ‚Ä¢ PETIT LARCENY: 27,107 (27.5%)
  ‚Ä¢ DANGEROUS DRUGS: 18,518 (18.8%)
  ‚Ä¢ ROBBERY: 12,020 (12.2%)
  ‚Ä¢ CRIMINAL TRESPASS: 2,769 (2.8%)


In [30]:
# Initialize API client
client = Socrata("data.cityofnewyork.us", None, timeout=120)

# Load Events Data for 2022-2024
print("üì• Loading Events Data (2024)...")

events_results = client.get(
    "bkfu-528j",  # NYC Events dataset
    where="start_date_time >= '2024-01-01T00:00:00.000' AND start_date_time < '2025-01-01T00:00:00.000'",
    limit=5000000
)



üì• Loading Events Data (2024)...


In [31]:
events_results_df = pd.DataFrame.from_records(events_results)
events_results_df

Unnamed: 0,event_id,event_name,start_date_time,end_date_time,event_agency,event_type,event_borough,event_location,street_closure_type,community_board,police_precinct,event_street_side
0,746721,New Years Eve Fireworks Display,2024-01-01T00:00:00.000,2024-01-01T00:10:00.000,Parks Department,Special Event,Brooklyn,Prospect Park: Long Meadow North,,55,78,
1,739793,Lawn closure - Cherry Hill,2024-01-01T00:00:00.000,2024-01-01T01:00:00.000,Parks Department,Special Event,Manhattan,"Central Park: Cherry Hill ,Central Park: Wagne...",,64,22,
2,739847,Lawn Closure - Mineral Springs,2024-01-01T00:00:00.000,2024-01-01T23:59:00.000,Parks Department,Special Event,Manhattan,Central Park: Mineral Springs,,64,22,
3,679798,Landscape closed for season,2024-01-01T00:00:00.000,2024-01-01T23:00:00.000,Parks Department,Special Event,Manhattan,Central Park: Dana Discovery Center Lawn,,64,22,
4,743992,Big Apple Circus,2024-01-01T00:00:00.000,2024-01-01T23:59:00.000,Parks Department,Special Event,Manhattan,"Damrosch Park: Bandshell ,Damrosch Park: Tent ...",,7,20,
...,...,...,...,...,...,...,...,...,...,...,...,...
4283309,814462,Football - Youth,2024-12-31T20:00:00.000,2024-12-31T23:00:00.000,Parks Department,Sport - Youth,Brooklyn,Leif Ericson Park: Dust Bowl - Soccer/Football-01,,10,68,
4283310,811441,Soccer - Non Regulation,2024-12-31T22:00:00.000,2024-12-31T23:00:00.000,Parks Department,Sport - Adult,Brooklyn,McCarren Park: Soccer-01,,01,94,
4283311,811441,Soccer - Non Regulation,2024-12-31T22:00:00.000,2024-12-31T23:00:00.000,Parks Department,Sport - Adult,Brooklyn,McCarren Park: Soccer-01,,01,94,
4283312,811441,Soccer - Non Regulation,2024-12-31T22:00:00.000,2024-12-31T23:00:00.000,Parks Department,Sport - Adult,Brooklyn,McCarren Park: Soccer-01,,01,94,


In [32]:
# ========================================
# 4. EXPLORE EVENTS DATA
# ========================================

print("üîç Exploring Events Dataset...")
print(f"Total events loaded: {len(events_results_df):,}")
print(f"\nColumns available:")
print(events_results_df.columns.tolist())
print(f"\nFirst few rows:")
print(events_results_df.head())
print(f"\nData types:")
print(events_results_df.dtypes)

# Check for event names/categories
if 'event_name' in events_results_df.columns:
    print(f"\nSample event names:")
    print(events_results_df['event_name'].head(20))
elif 'event_type' in events_results_df.columns:
    print(f"\nSample event types:")
    print(events_results_df['event_type'].value_counts().head(20))

üîç Exploring Events Dataset...
Total events loaded: 4,283,314

Columns available:
['event_id', 'event_name', 'start_date_time', 'end_date_time', 'event_agency', 'event_type', 'event_borough', 'event_location', 'street_closure_type', 'community_board', 'police_precinct', 'event_street_side']

First few rows:
  event_id                       event_name          start_date_time  \
0   746721  New Years Eve Fireworks Display  2024-01-01T00:00:00.000   
1   739793       Lawn closure - Cherry Hill  2024-01-01T00:00:00.000   
2   739847   Lawn Closure - Mineral Springs  2024-01-01T00:00:00.000   
3   679798      Landscape closed for season  2024-01-01T00:00:00.000   
4   743992                 Big Apple Circus  2024-01-01T00:00:00.000   

             end_date_time      event_agency     event_type event_borough  \
0  2024-01-01T00:10:00.000  Parks Department  Special Event      Brooklyn   
1  2024-01-01T01:00:00.000  Parks Department  Special Event     Manhattan   
2  2024-01-01T23:59:00.00

In [39]:
# ========================================
# 5. EXTRACT 5 MAJOR EVENTS FROM REAL DATA
# ========================================

# Convert start_date_time to datetime
events_results_df['start_datetime'] = pd.to_datetime(events_results_df['start_date_time'])
events_results_df['event_date'] = events_results_df['start_datetime'].dt.date

print("üîç Searching for 5 major events in real data...")

# Define search patterns for each event
event_searches = {
    'Halloween': ['halloween', 'trick or treat'],
    'Thanksgiving': ['thanksgiving', 'macy'],
    'Independence_Day': ['independence', 'july 4', 'firework', 'fourth of july'],
    'New_Years_Eve': ['new year', 'nye'],
    'NYC_Marathon': ['marathon', 'tcs']
}

# Collect all matched events
all_matches = []

for event_category, search_terms in event_searches.items():
    # Search in event_name column (case insensitive)
    pattern = '|'.join(search_terms)
    mask = events_results_df['event_name'].str.lower().str.contains(pattern, na=False)
    matches = events_results_df[mask].copy()
    
    if len(matches) > 0:
        print(f"\n‚úÖ Found {len(matches)} matches for {event_category}:")
        print(matches[['event_name', 'start_datetime', 'event_location']].head(5))
        
        # Add category label
        matches['event_category'] = event_category.replace('_', ' ')
        all_matches.append(matches)
    else:
        print(f"\n‚ö†Ô∏è No matches found for {event_category}")

# Combine all matches into single dataframe
if all_matches:
    df_events = pd.concat(all_matches, ignore_index=True)
    
    # Keep only relevant columns
    df_events = df_events[[
        'event_name', 'event_category', 'event_date', 'start_datetime',
        'event_location', 'event_agency', 'event_type', 'event_borough'
    ]]
    
    print("\n" + "="*60)
    print("FINAL EVENTS DATASET (df_events)")
    print("="*60)
    print(f"Total events extracted: {len(df_events)}")
    print(f"\nEvents by category:")
    print(df_events['event_category'].value_counts())
    print("="*60)
else:
    print("\n‚ùå No events found!")
    df_events = pd.DataFrame()

# Display dataframe
df_events.head(20)

üîç Searching for 5 major events in real data...

‚úÖ Found 2262 matches for Halloween:
                                    event_name      start_datetime  \
2898675  Brooklyn Scouts Youth Halloween Party 2024-09-22 14:00:00   
2898751  Brooklyn Scouts Youth Halloween Party 2024-09-22 14:00:00   
2898754  Brooklyn Scouts Youth Halloween Party 2024-09-22 14:00:00   
2898757  Brooklyn Scouts Youth Halloween Party 2024-09-22 14:00:00   
2898760  Brooklyn Scouts Youth Halloween Party 2024-09-22 14:00:00   

                                            event_location  
2898675  Prospect Park: Grecian Shelter/Peristyle Lawn ...  
2898751  Prospect Park: Grecian Shelter/Peristyle Lawn ...  
2898754  Prospect Park: Grecian Shelter/Peristyle Lawn ...  
2898757  Prospect Park: Grecian Shelter/Peristyle Lawn ...  
2898760  Prospect Park: Grecian Shelter/Peristyle Lawn ...  

‚úÖ Found 2262 matches for Halloween:
                                    event_name      start_datetime  \
2898675  Brookl

Unnamed: 0,event_name,event_category,event_date,start_datetime,event_location,event_agency,event_type,event_borough
0,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
1,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
2,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
3,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
4,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
5,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
6,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
7,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
8,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
9,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn


In [40]:
df_events

Unnamed: 0,event_name,event_category,event_date,start_datetime,event_location,event_agency,event_type,event_borough
0,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
1,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
2,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
3,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
4,Brooklyn Scouts Youth Halloween Party,Halloween,2024-09-22,2024-09-22 14:00:00,Prospect Park: Grecian Shelter/Peristyle Lawn ...,Parks Department,Special Event,Brooklyn
...,...,...,...,...,...,...,...,...
8638,CDP Marathon Protests,NYC Marathon,2024-12-03,2024-12-03 13:30:00,Dag Hammarskjold Plaza: First Avenue Plaza,Parks Department,Special Event,Manhattan
8639,Guadalupe Marathon 2024,NYC Marathon,2024-12-07,2024-12-07 14:00:00,"173 EAST 3 STREET,173 EAST 3 STREET,173 ...",Police Department,Parade,Manhattan
8640,Guadalupe Marathon 2024,NYC Marathon,2024-12-07,2024-12-07 14:00:00,"173 EAST 3 STREET,173 EAST 3 STREET,173 ...",Police Department,Parade,Manhattan
8641,Guadalupe Marathon 2024,NYC Marathon,2024-12-07,2024-12-07 14:00:00,"173 EAST 3 STREET,173 EAST 3 STREET,173 ...",Police Department,Parade,Manhattan
