
## 1. Import Libraries & Load Data

In this initial cell, we set up our Python environment and load the core datasets needed for spatial analysis:

- **Library Imports**  
  - `pandas` & `geopandas`: for handling tabular and geospatial data structures  
  - `folium` & `HeatMap`: for creating interactive maps and density heatmaps  
  - `pathlib`: to build file paths in a cross-platform, maintainable way  
  - `numpy`: for supporting numerical operations  

- **Data Paths Configuration**  
  - `PROC_DIR`: points to the processed crime data directory  
  - `RAW_DIR`: points to the raw GeoJSON directory  

- **Data Loading**  
  1. Read **`crime_clean.csv`** into `df_crime`, parsing the `date` column as datetime  
  2. Read **`nyc_boroughs.geojson`** into `gdf_boroughs` for borough boundary geometries  

This cell ensures that both the cleaned crime incidents and borough geometries are ready in memory for aggregation and mapping in subsequent steps.  


In [None]:

# %%
# 1. Import libraries and load data
import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import HeatMap
from pathlib import Path
import numpy as np

# Paths
PROC_DIR = Path('../data/processed')
RAW_DIR = Path('../data/raw')

# Load cleaned crime data with datetime

df_crime = pd.read_csv(PROC_DIR / 'crime_clean.csv', parse_dates=['date'])
# point to myGeoJSON
geojson_path = RAW_DIR / 'nyc_boroughs.geojson'
# Load GeoJSON of NYC neighborhoods or boroughs

nyc_geo = gpd.read_file(geojson_path)
gdf_boroughs = gpd.read_file(RAW_DIR / 'nyc_boroughs.geojson')


## 2. Aggregate Crime Counts & Merge with Borough Geometries

In this cell, we transform individual crime records into borough–level summaries and join them back to our geospatial data:

1. **Aggregate incidents**  
   Group by the raw borough code (`boro_nm`) and count total crimes per borough.

2. **Clean & standardize names**  
   - Convert any `'(null)'` entries to `NaN` and drop them.  
   - Convert borough codes from upper‐case to Title Case (e.g. `'BRONX'` → `'Bronx'`).

3. **Rename for consistency**  
   Change the column name from `boro_nm` to `BoroName` so it matches the GeoDataFrame’s property.

4. **Spatial join**  
   Merge the crime counts into `gdf_boroughs` on `BoroName`, using a left join.

5. **Fill missing values**  
   Any boroughs with no recorded crimes are set to zero.

6. **Verify results**  
   Print the resulting DataFrame slice to confirm that each borough now has an integer `crime_count`.

In [None]:
# 1. Aggregate crimes by raw borough code
df_boro = (
    df_crime
    .groupby('boro_nm')
    .size()
    .reset_index(name='crime_count')
)

# 2. Clean “(null)” entries and convert to Title Case
df_boro['boro_nm'] = (
    df_boro['boro_nm']
    .replace({'(null)': np.nan})   # convert "(null)" to NaN
    .dropna()                      # drop any NaN rows
    .str.title()                   # e.g. "BRONX" → "Bronx"
)

# 3. Rename to match the GeoDataFrame’s column
df_boro = df_boro.rename(columns={'boro_nm': 'BoroName'})

# 4. Merge crime counts into the borough GeoDataFrame
gdf_merged = gdf_boroughs.merge(
    df_boro,
    on='BoroName',
    how='left'
)

# 5. Fill boroughs with no crimes as zero
gdf_merged['crime_count'] = gdf_merged['crime_count'].fillna(0).astype(int)

# 6. Confirm the merged result
print(gdf_merged[['BoroName', 'crime_count']])



## 3. Choropleth Map: Visualizing Crime Distribution by Borough

This cell builds an interactive choropleth to highlight geographic differences in total crime counts:

1. **Define map center**  
   We center on NYC’s approximate geographic midpoint (`[40.7128, -74.0060]`) and set an initial zoom level.

2. **Create base map**  
   Initialize a Folium `Map` object as the canvas for our layers.

3. **Add choropleth layer**  
   - **`geo_data`**: pass the merged GeoDataFrame as a GeoJSON string  
   - **`data` & `columns`**: link `BoroName` (the geographic key) to `crime_count`  
   - **`key_on`**: instruct Folium where to find the matching property in the GeoJSON features  
   - **Style options**: adjust fill and line opacity for clarity, and include a descriptive legend.

4. **Optional layer control**  
   Add controls to toggle individual layers on/off, facilitating comparison if multiple layers are added later.

5. **Display map**  
   Render the choropleth inline, allowing interactive pan, zoom, and hover.

In [None]:
# 1. NYC center coordinates
nyc_center = [40.7128, -74.0060]

# 2. Initialize base map
m = folium.Map(location=nyc_center, zoom_start=10)

# 3. Add the crime choropleth layer
folium.Choropleth(
    geo_data=gdf_merged.to_json(),
    name='Crime Choropleth',
    data=gdf_merged,
    columns=['BoroName', 'crime_count'],
    key_on='feature.properties.BoroName',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Crime Count by Borough'
).add_to(m)

# 4. Add layer control
folium.LayerControl().add_to(m)

# 5. Render the map
display(m)

## 4. Heatmap: Crime Incident Density

In this section, we dive into the granular distribution of every recorded incident across NYC:

1. **Extract coordinates**  
   - Pull latitude and longitude for each crime report  
   - Drop any missing values to ensure clean input for the heatmap

2. **Initialize map**  
   - Center on the same NYC midpoint (`nyc_center`)  
   - Choose a slightly higher zoom for neighborhood-level detail

3. **Render density layer**  
   - Use Folium’s `HeatMap` plugin to aggregate point densities  
   - Adjust `radius` to control the spread of each point’s influence

4. **Display result**  
   - The interactive heatmap reveals micro-hotspots that may not be visible in the borough choropleth 

In [None]:
# 1. Sample coordinates for all crime incidents
coords = df_crime[['latitude', 'longitude']].dropna().values.tolist()

# 2. Initialize a new folium map
m_heat = folium.Map(location=nyc_center, zoom_start=11)

# 3. Add the heatmap layer
HeatMap(coords, radius=8).add_to(m_heat)

# 4. Display the map inline
display(m_heat)


## 3. Combined Seasonal & Spatial Analysis

In this single cell, we execute the full pipeline for both borough‐level and seasonal mapping:

1. **Create temporal features**  
   - Extract the month from each crime’s `date`.  
   - Map months → meteorological seasons (`Winter`, `Spring`, `Summer`, `Fall`).

2. **Aggregate crime counts by borough**  
   - Group raw data by `boro_nm` and count incidents.  
   - Clean any `'(null)'` entries and convert to Title Case (e.g. `'BRONX'` → `'Bronx'`).  
   - Rename to `BoroName` and merge into the borough GeoDataFrame.  
   - Fill missing boroughs with zero crime counts.

3. **Choropleth map of total crime**  
   - Center a Folium map on NYC.  
   - Add a Choropleth layer using `crime_count` to color each borough.  
   - Include layer controls for interactive toggling.

4. **Heatmap of Summer hotspots**  
   - Filter crimes to `season == 'Summer'`.  
   - Extract latitude/longitude into a list of coordinate pairs.  
   - Render a Folium heatmap to reveal high‐density summer crime areas.

By consolidating these steps, we ensure that the `season` column exists before any seasonal filtering, and we produce two distinct visualizations in one place:  
- **Choropleth** for borough‐wide totals  
- **Heatmap** for summer‐specific clusters  

In [None]:

# --- 1) Extract month and create 'season' column ---
df_crime['month'] = df_crime['date'].dt.month

def month_to_season(m):
    if m in (12, 1, 2):
        return 'Winter'
    elif m in (3, 4, 5):
        return 'Spring'
    elif m in (6, 7, 8):
        return 'Summer'
    else:
        return 'Fall'

df_crime['season'] = df_crime['month'].apply(month_to_season)


# --- 2) Aggregate crimes by borough and prepare GeoDataFrame ---
# Count total crimes per borough (uses 'boro_nm' from raw data)
df_boro = (
    df_crime
    .groupby('boro_nm')
    .size()
    .reset_index(name='crime_count')
)

# Clean up any '(null)' entries and convert to Title Case
df_boro['boro_nm'] = (
    df_boro['boro_nm']
    .replace({'(null)': np.nan})
    .dropna()
    .str.title()
)

# Rename to match the GeoDataFrame's column
df_boro = df_boro.rename(columns={'boro_nm': 'BoroName'})

# Merge crime counts into the borough GeoDataFrame
gdf_merged = gdf_boroughs.merge(df_boro, on='BoroName', how='left')
gdf_merged['crime_count'] = gdf_merged['crime_count'].fillna(0).astype(int)


# --- 3) Map #1: Choropleth of crime counts by borough ---
nyc_center = [40.7128, -74.0060]
m = folium.Map(location=nyc_center, zoom_start=10)

folium.Choropleth(
    geo_data=gdf_merged.to_json(),
    name='Crime Choropleth',
    data=gdf_merged,
    columns=['BoroName', 'crime_count'],
    key_on='feature.properties.BoroName',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Crime Count by Borough'
).add_to(m)

folium.LayerControl().add_to(m)
display(m)


# --- 4) Map #2: Heatmap of Summer crime hotspots ---
summer_df    = df_crime[df_crime['season'] == 'Summer']
summer_coords = summer_df[['latitude', 'longitude']].dropna().values.tolist()

m_summer = folium.Map(location=nyc_center, zoom_start=11)
HeatMap(summer_coords, radius=8).add_to(m_summer)
display(m_summer)
