# NYC Crime & Climate Temporal Analysis

**Project Overview:**  
In this notebook, we explore how reported crime in New York City varies over time and how it correlates with daily temperature. We use clean, pre‑processed crime and weather datasets to:

- Build time‑dimension features (year, month, weekday, season)  
- Visualize daily, weekly, and monthly crime trends  
- Quantify seasonal crime distribution  
- Overlay crime volume with average daily temperature  

> **Key Skills Demonstrated:**  
> - Python & Pandas for ETL and date/time feature engineering  
> - Plotly Express interactive visualizations  
> - Time‑series aggregation & seasonality analysis  
> - Data storytelling & insight communication  
---


## 1. Import Libraries & Load Data

In this cell, we:

1. Import core libraries:
   - `pandas` for data manipulation
   - `plotly.express` for interactive plotting
   - `pathlib.Path` for filesystem‑agnostic paths

2. Define our `PROC_DIR` path and read in:
   - `crime_clean.csv` (parsed as dates)
   - `weather_clean.csv` (parsed as dates)

**Why?**  
Preparing our two cleaned datasets (crime and weather) with proper datetime parsing sets the foundation for all downstream time‑based analysis.

In [None]:
# 1. Import libraries and load cleaned data
import pandas as pd
import plotly.express as px
from pathlib import Path

# Define processed data directory
PROC_DIR = Path('../data/processed')

# Load cleaned datasets
# Ensure 'date' columns are parsed as datetime

df_crime = pd.read_csv(PROC_DIR / 'crime_clean.csv', parse_dates=['date'])
df_weather = pd.read_csv(PROC_DIR / 'weather_clean.csv', parse_dates=['date'])

## 2. Extract Year, Month, Weekday & Season

In this cell, we:

1. Derive new columns from the `date` field:
   - `year` (e.g., 2024)
   - `month` (1–12)
   - `weekday` (Monday, …, Sunday)

2. Map each `month` to a meteorological **season** (`Winter`, `Spring`, `Summer`, `Fall`) using a simple helper function.

**Why?**  
These time dimensions allow us to group and slice the crime data by day, week, month, or season to reveal temporal patterns.

In [None]:
# 1) Standardize and combine into a single datetime
df_crime.columns = df_crime.columns.str.strip().str.lower()
df_crime['complaint_datetime'] = pd.to_datetime(
    df_crime['cmplnt_fr_dt'].astype(str) + ' ' + df_crime['cmplnt_fr_tm'].astype(str),
    errors='coerce'
)

# 2) Create a date column (as datetime64) — no time component
df_crime['date'] = df_crime['complaint_datetime'].dt.normalize()

# 3) Now the .dt accessor funciona:
df_crime['year']    = df_crime['date'].dt.year
df_crime['month']   = df_crime['date'].dt.month
df_crime['weekday'] = df_crime['date'].dt.day_name()

# 4) Map months to seasons
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

df_crime['season'] = df_crime['month'].apply(get_season)



## 3. Aggregate & Plot Daily Crime Counts

This cell:

1. Aggregates the number of crime incidents per `date`.
2. Builds a Plotly line chart showing **crime_count** versus **date**.

**Why?**  
Visualizing day‑to‑day crime volume helps us spot sudden spikes or drops—critical for understanding short‑term fluctuations and anomaly detection.


In [None]:
# %% [markdown]
## 3. Daily Crime Count Time Series

# %%
# Aggregate daily crime counts
daily_counts = (
    df_crime.groupby('date')
           .size()
           .reset_index(name='crime_count')
)

# Plot interactive line chart of daily crime counts
fig = px.line(
    daily_counts,
    x='date',
    y='crime_count',
    title='Daily Crime Count Over Time',
    labels={'crime_count': 'Number of Crimes', 'date': 'Date'}
)
fig.show()


## 4. Compute & Visualize Monthly Crime Totals

Here we:

1. Group crime incidents by `(year, month)` to get monthly totals.
2. Create a `month_start` datetime as the first day of each month.
3. Render a Plotly bar chart of monthly crime counts.

**Why?**  
Monthly aggregation surfaces seasonal cycles and longer‑term trends that might be obscured in daily noise.


In [None]:
## 4. Monthly Crime Trend
# %%
# Aggregate by year and month
df_monthly = (
    df_crime.groupby(['year', 'month'])
           .size()
           .reset_index(name='crime_count')
)
# Create a datetime for the first day of each month for plotting
df_monthly['month_start'] = pd.to_datetime(
    df_monthly[['year', 'month']].assign(day=1)
)

# Plot interactive bar chart for monthly counts
fig = px.bar(
    df_monthly,
    x='month_start',
    y='crime_count',
    title='Monthly Crime Count',
    labels={'crime_count': 'Number of Crimes', 'month_start': 'Month'}
)
fig.show()

## 4. Compute & Visualize Monthly Crime Totals

Here we:

1. Group crime incidents by `(year, month)` to get monthly totals.
2. Create a `month_start` datetime as the first day of each month.
3. Render a Plotly bar chart of monthly crime counts.

**Why?**  
Monthly aggregation surfaces seasonal cycles and longer‑term trends that might be obscured in daily noise.


In [None]:
## 5. Crime Count by Weekday

# %%
# Order weekdays
weekday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# Count crimes by weekday
df_weekday = (
    df_crime['weekday']
            .value_counts()
            .reindex(weekday_order)
            .reset_index()
)
df_weekday.columns = ['weekday', 'crime_count']

# Plot bar chart for weekdays
fig = px.bar(
    df_weekday,
    x='weekday',
    y='crime_count',
    title='Crime Count by Weekday',
    labels={'crime_count': 'Number of Crimes'}
)
fig.show()

## 6. Show Crime Distribution by Season

This cell:

1. Tallies the number of crimes in each season (`Winter`, `Spring`, `Summer`, `Fall`).
2. Displays a pie chart illustrating relative share per season.

**Why?**  
A seasonal breakdown highlights periods of elevated or reduced crime activity, valuable for strategic planning.

In [None]:
## 6. Crime Distribution by Season

# %%
# Count crimes per season
df_season = (
    df_crime['season']
            .value_counts()
            .reset_index()
)
df_season.columns = ['season', 'crime_count']

# Plot pie chart for seasonal distribution
fig = px.pie(
    df_season,
    names='season',
    values='crime_count',
    title='Crime Distribution by Season'
)
fig.show()


## 7. Compute Daily Crime Counts

In this cell, we:

1. **Group** the cleaned crime DataFrame by the `date` column, ensuring that every incident is tallied under its occurrence day.
2. **Count** the number of incidents per day using `.size()`.
3. **Reset the index** to produce a tidy DataFrame with columns:
   - `date` — the calendar date.
   - `crime_count` — the total number of crimes on that date.
4. **Print** the resulting shape to confirm how many days of data we have (rows × 2 columns).

This daily summary (`daily_counts`) is essential for later merging with weather data on the same `date` field to explore potential climate–crime correlations.


In [None]:
# -- Compute daily crime counts so we can merge on 'date' --
daily_counts = (
    df_crime
      .groupby('date')         # make sure df_crime['date'] exists!
      .size()                  # count the number of incidents per day
      .reset_index(name='crime_count')
)
print("Daily counts calculated:", daily_counts.shape)
daily_counts.head()

## 8. Correlate Daily Crime with Average Temperature

Steps in this cell:

1. Compute **daily average temperature** from the weather dataset.
2. Merge the daily crime counts and daily temperature on `date`.
3. Plot both time series together for comparison.

**Why?**  
Overlaying crime volume with temperature allows us to explore potential correlations (e.g., do hotter days see more incidents?).


In [None]:
## 7. Compare Daily Crime Counts with Average Temperature

# %%
# Calculate daily average temperature
df_temp_daily = (
    df_weather.groupby('date')
              .agg(temp_avg=('temp_max', 'mean'))
              .reset_index()
)

# Merge crime counts with temperature data

df_combined = pd.merge(
    daily_counts,
    df_temp_daily,
    on='date',
    how='inner'
)

# Plot line chart with two metrics
y = ['crime_count', 'temp_avg']
fig = px.line(
    df_combined,
    x='date',
    y=y,
    labels={'value': 'Value', 'variable': 'Metric', 'date': 'Date'},
    title='Daily Crime Count vs. Average Temperature'
)
fig.show()
