# Background

- **Author**: Ying-Dian, Lin
- **Created At**: 2025-11-23
- **Research Motivation and Context (why are we interested in the findings?)：**
NYC implemented a congestion-based charge in early January 2025. This notebook analyzes whether vehicle speeds increased inside the CBD and whether the policy had spillover effects outside the CBD.

- **Main Findings and Takeaways：**
1. Before January 2025, **in_CBD** speeds were consistently lower than **not_in_CBD**, with stable patterns.
2. After the policy on **2025-01-08**, in_CBD speeds show a visible upward trend, suggesting reduced congestion.
3. Speeds outside the CBD fluctuate much more but do not show a trend clearly attributable to the policy.

- **Future Direction：**
1. Conduct a formal difference-in-differences model.
2. Add traffic volume data to interpret speed changes more accurately.
3. Extend dataset beyond October 2025 to test long-run effects.

In [None]:
# Load packages here
import os
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt


In [None]:
# Load input data here and please finish all the data manipulation here.
# Finish this block by printing the first ten observations of the data.
# Note: We assume the processed file is located in /data/processed

df = pd.read_csv('/data/processed/speed-mht.csv', encoding_errors='ignore')

# Convert date
df['DATE'] = pd.to_datetime(df['DATA_AS_OF']).dt.date
df['DATE'] = pd.to_datetime(df['DATE'])

# Classify CBD
df['CBD_group'] = df['is_in_CBD'].apply(
    lambda x: 'in_CBD' if 'CBD' in str(x) and 'not' not in str(x).lower() else 'not_in_CBD'
)

# Compute distance (assuming SPEED in m/s)
df['DISTANCE_M'] = df['SPEED'] * df['TRAVEL_TIME']

# Create week variable
df['WEEK'] = df['DATE'].dt.to_period('W').apply(lambda r: r.start_time)

# Weighted weekly averages
weekly = df.groupby(['WEEK','CBD_group']).apply(
    lambda g: g['DISTANCE_M'].sum() / g['TRAVEL_TIME'].sum()
).unstack()

# Convert to km/h
weekly = weekly * 3.6

weekly.to_csv('weekly_weighted_avg_speed_by_CBD.csv')

df.head(10)

In [None]:
# Summary statistics
num_vars = ['SPEED','TRAVEL_TIME','DISTANCE_M']
summary_numeric = df[num_vars].describe(percentiles=[0.25,0.5,0.75]).T

cat_vars = ['CBD_group']
summary_categorical = df['CBD_group'].value_counts(normalize=True).head(10)

summary_numeric, summary_categorical

### The actual analysis starts below
This section contains the figure showing how **weekly weighted average speeds** change in and out of the CBD before and after NYC's congestion pricing policy (2025-01-08).

In [None]:
# Plot weekly weighted speed
plt.figure(figsize=(14,6))
plt.plot(weekly.index, weekly['in_CBD'], label='in_CBD')
plt.plot(weekly.index, weekly['not_in_CBD'], label='not_in_CBD')

# Policy line
policy_date = dt.datetime(2025,1,8)
plt.axvline(x=policy_date, color='red', linestyle='--', linewidth=2,
            label='CBD policy starts (2025-01-08)')

plt.xlabel('Week')
plt.ylabel('Weighted Avg Speed (km/h)')
plt.title('Weekly Weighted Average Speed: in_CBD vs not_in_CBD')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.5)
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('weekly_weighted_speed_plot.png')

weekly.head()

## Interpretation of Results

![Weekly Weighted Speed](../data/temp/img/weekly_weighted_speed_plot.png)

**1. Before the policy (before 2025-01-08)**
- in_CBD speeds remain consistently lower (35–55 km/h).
- not_in_CBD speeds are higher and more volatile (55–90 km/h).

**2. After the policy takes effect**
- in_CBD speeds show a **visible and persistent increase**, suggesting reduced congestion.
- not_in_CBD speeds do **not** show a consistent shift, implying no spillover congestion.

**3. Policy Evaluation**
- The CBD toll appears **effective**: inside-CBD speeds increased.
- Matches theoretical predictions: reduced traffic volume → smoother traffic.

**4. Next steps**
- Run a formal **Difference-in-Differences (DiD)** model.
- Separate **weekday vs weekend** effects.
- Validate **parallel trends** using 2023–2024 data.
