# Error Analysis


## Notebook Guide

- **Purpose:** Analyze residuals and error patterns; highlight anomalies.
- **Inputs:** data/processed/splits/*.parquet + trained models
- **Outputs:** residual charts and anomaly flags.
- **Run:** Execute cells top‑to‑bottom. If a file is missing, run the earlier pipeline notebook first.


Residual analysis for GBM predictions by hour and day-of-week.

In [None]:
# Environment setup
import sys, subprocess
from pathlib import Path

print('Python:', sys.executable)
repo_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()
subprocess.run(['pip', 'install', '-e', str(repo_root)], check=True)

In [None]:
import pandas as pd, pickle

train = pd.read_parquet('data/processed/splits/train.parquet')
val = pd.read_parquet('data/processed/splits/val.parquet')
test = pd.read_parquet('data/processed/splits/test.parquet')

# Load GBM model bundle
bundle = pickle.load(open('artifacts/models/gbm_lightgbm_load_mw.pkl', 'rb'))
model = bundle['model']
feat_cols = bundle['feature_cols']

X = test[feat_cols].to_numpy()
y = test['load_mw'].to_numpy()
pred = model.predict(X)
resid = y - pred

err = test[['timestamp']].copy()
err['resid'] = resid
err['hour'] = pd.to_datetime(err['timestamp']).dt.hour
err['dow'] = pd.to_datetime(err['timestamp']).dt.dayofweek

err.groupby('hour')['resid'].mean().plot(title='Mean Residual by Hour')


## Visual Sanity Checks

These plots provide a quick visual validation of recent behavior.

In [None]:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

repo_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()
features_path = repo_root / 'data' / 'processed' / 'features.parquet'
if not features_path.exists():
    print('features.parquet not found. Run the feature pipeline first.')
else:
    df_viz = pd.read_parquet(features_path).sort_values('timestamp')
    if {'load_mw','wind_mw','solar_mw'}.issubset(df_viz.columns):
        recent = df_viz.tail(7 * 24)
        fig, ax = plt.subplots(3, 1, figsize=(12, 7), sharex=True)
        recent.plot(x='timestamp', y='load_mw', ax=ax[0], color='#1f77b4', title='Load (last 7 days)')
        recent.plot(x='timestamp', y='wind_mw', ax=ax[1], color='#2ca02c', title='Wind (last 7 days)')
        recent.plot(x='timestamp', y='solar_mw', ax=ax[2], color='#ff7f0e', title='Solar (last 7 days)')
        plt.tight_layout()
    else:
        print('Expected columns not found in features.parquet')
