# MTA Ridership Analysis Project

**Team:** Haixin, Hanghai (bouncing-penguin)

**Dataset:** MTA Daily Ridership Data: Beginning 2020

**Research Questions:**
1. What's the difference between weekday and weekend travel patterns?
2. Do holidays and big events show up in the ridership numbers?
3. Which parts of the MTA system bounced back fastest after 2020?

## 1. Setup and Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load data directly from NYC Open Data API
url = "https://data.ny.gov/resource/vxuj-8kew.csv?$limit=50000"
df = pd.read_csv(url)

print(f"Dataset shape: {df.shape}")
df.head()

## 2. Data Cleaning

In [None]:
# Convert date column to datetime and sort
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')

print(f"Date range: {df['date'].min()} to {df['date'].max()}")

## 3. Visualization: MTA Ridership Recovery by Transit Mode

In [None]:
plt.figure(figsize=(14, 7))

plt.plot(df['date'], df['subways_of_comparable_pre_pandemic_day'], 
         label='Subway', alpha=0.8, linewidth=1.2)
plt.plot(df['date'], df['buses_of_comparable_pre_pandemic_day'], 
         label='Bus', alpha=0.8, linewidth=1.2)
plt.plot(df['date'], df['lirr_of_comparable_pre_pandemic_day'], 
         label='LIRR', alpha=0.8, linewidth=1.2)
plt.plot(df['date'], df['metro_north_of_comparable_pre_pandemic_day'], 
         label='Metro-North', alpha=0.8, linewidth=1.2)

plt.axhline(y=1.0, color='gray', linestyle='--', linewidth=1.5, label='Pre-pandemic baseline (100%)')

plt.xlabel('Date', fontsize=12)
plt.ylabel('% of Pre-Pandemic Ridership', fontsize=12)
plt.title('MTA Ridership Recovery: Subway vs Bus vs Commuter Rail (2020-Present)', fontsize=14, fontweight='bold')
plt.legend(loc='lower right', fontsize=10)
plt.grid(True, alpha=0.3)
plt.ylim(0, 1.5)
plt.tight_layout()
plt.show()