# 01 â€” Worcester EDA (MVP)

This notebook is part of **MA EV ChargeMap**, a personal portfolio project.

Goal (v1):
- Load the engineered candidate-site dataset for Worcester
- Sanity-check feature distributions
- Explore the heuristic scores (demand/equity/overall)

> **Note**: v1 uses synthetic candidate sites and synthetic demographics/traffic so the full stack runs end-to-end. Replace the synthetic pipeline with real GIS + MAPC/MassDOT layers in v2.


In [None]:
import json
from pathlib import Path

import pandas as pd

DATA = Path("../data/processed/sites_worcester.json")

if not DATA.exists():
    raise FileNotFoundError(
        "Missing processed dataset. Run:\n"
        "  python ../data/ingest_parcels.py\n"
        "  python ../data/ingest_demographics.py\n"
        "  python ../data/ingest_traffic.py\n"
        "  python ../data/build_scores.py"
    )

rows = json.loads(DATA.read_text())
df = pd.DataFrame(rows)

df.head()

In [None]:
# Basic sanity checks
numeric_cols = [
    "traffic_index",
    "pop_density_index",
    "renters_share",
    "income_index",
    "poi_index",
    "score_overall",
    "score_demand",
    "score_equity",
    "daily_kwh_estimate",
]

(df[numeric_cols].describe().T)


In [None]:
# Quick plots (requires matplotlib)
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(12, 7))
axes = axes.flatten()

for ax, col in zip(axes, ["traffic_index", "pop_density_index", "renters_share", "income_index", "poi_index", "score_overall"]):
    df[col].hist(ax=ax, bins=30)
    ax.set_title(col)

plt.tight_layout()
plt.show()
