# Forecast Comparison & Gap-Fill Notebook 

**Goal:**  
Compare National Demand Forecast data from:

* `da_demand_forecast.parquet`  →  *NESO feed*  
* `demand_forecast_forecast.parquet` →  *Elexon BMRS feed*  

…and **fill any half-hourly periods that are missing** in the Elexon file with the corresponding values from the NESO file.

---

### Steps

1. Load both parquet files  
2. Ensure `datetime` column is timezone-aware UTC  
3. Identify missing periods in the Elexon series  
4. Fill them from NESO where available  
5. Output a merged parquet (`demand_forecast_filled.parquet`) and quick sanity checks


In [14]:
import pandas as pd
from pathlib import Path
import numpy as np

# Get the project root directory (one level up from notebooks)
ROOT_DIR = Path().absolute().parent

# Define data files with absolute paths
FILES = {
    "ELEXON": str(ROOT_DIR / "data/processed/demand_forecast_forecast.parquet"),
    "NESO": str(ROOT_DIR / "data/processed/da_demand_forecast.parquet"),
    "OUTPUT": str(ROOT_DIR / "data/processed/demand_forecast_filled.parquet")
}

# Print the paths to verify
print("Project root:", ROOT_DIR)
for name, path in FILES.items():
    print(f"{name}: {path}")

pd.set_option("display.max_rows", 10)

Project root: c:\Users\alexa\OneDrive\Desktop\GB-Power-Price-Diver-Spread-Radar
ELEXON: c:\Users\alexa\OneDrive\Desktop\GB-Power-Price-Diver-Spread-Radar\data\processed\demand_forecast_forecast.parquet
NESO: c:\Users\alexa\OneDrive\Desktop\GB-Power-Price-Diver-Spread-Radar\data\processed\da_demand_forecast.parquet
OUTPUT: c:\Users\alexa\OneDrive\Desktop\GB-Power-Price-Diver-Spread-Radar\data\processed\demand_forecast_filled.parquet



Checking directory structure:
Project root exists?: True
Data directory exists?: True

Available parquet files:
- da_demand_forecast.parquet
- demand_filtered.parquet
- demand_forecast.parquet
- fcast_merged.parquet
- final_merged.parquet
- forecast_actual.parquet
- forecast_filtered.parquet
- imbalance_filtered.parquet
- imbalance_prices.parquet
- intraday_filtered.parquet
- intraday_prices.parquet
- intraday_trades_raw.parquet

c:\Users\alexa\OneDrive\Desktop\GB-Power-Price-Diver-Spread-Radar\data\processed\demand_forecast_forecast.parquet


FileNotFoundError: Required input file missing: c:\Users\alexa\OneDrive\Desktop\GB-Power-Price-Diver-Spread-Radar\data\processed\demand_forecast_forecast.parquet

In [None]:
def _ensure_dt(df):
    if "datetime" not in df.columns:
        raise ValueError("Expected 'datetime' column missing!")
    dt = pd.to_datetime(df["datetime"], utc=True, errors="coerce")
    if dt.isna().any():
        print("⚠️  NaNs found in datetime conversion – check source")
    return df.assign(datetime=dt)

elexon = _ensure_dt(elexon)
neso   = _ensure_dt(neso)


In [None]:
elexon = elexon.set_index("datetime").sort_index()
neso   = neso.set_index("datetime").sort_index()

print("Elexon index range:", elexon.index.min(), "→", elexon.index.max())
print("NESO   index range:", neso.index.min(), "→", neso.index.max())
