# PoC v2: Minute-Level Peak Forecasting with D-PAD–Inspired Decomposition

**What’s new vs. PoC v1?**  
- **Finer resolution**: we work directly on minute-level data (resampled to 15 min) rather than hourly.  
- **Richer decomposition**: instead of a single median split (base vs. peak), we adopt a **multi-frequency, deep-shallow disentangling** inspired by **Xia et al. (2024) “D-PAD”**:  
  1. **MCD block**: morphological envelopes → multiple IMFs (“shallow” disentanglement)  
  2. **D-R-D module**: each IMF branch split and re-recombined (“deep” disentanglement)  
- **Better features**: we extract a suite of component-based features rather than only global median-based peaks.  
- **Papers it builds on**:  
  - **Z. Xia et al. (2024)**, _Day-Ahead Electricity Consumption Prediction_ (peak-load magnitude & timing)  
  - **X. Yuan & L. Chen (2024)**, _D-PAD: Deep-Shallow Multi-Frequency Patterns Disentangling_  


In [31]:
import pandas as pd
import numpy as np
from scipy.ndimage import maximum_filter1d, minimum_filter1d
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

# adjust path as needed
df = pd.read_csv("energy_and_temperature_minute_data.csv", 
                 sep=";", parse_dates=["timestamp_utc"])
df.set_index("timestamp_utc", inplace=True)

## 2) Resample to 15 min & Filter Complete Days

In [32]:
# Cell 2 (fixed)

# 1) If you ever lost your datetime index, reset and re-parse:
df = df.reset_index()
df["timestamp_utc"] = pd.to_datetime(df["timestamp_utc"], utc=True, errors="coerce")
df = df.set_index("timestamp_utc")

# 2) Now your index is a proper DatetimeIndex—resample:
data = df[["active_power_W", "air_temperature"]].resample("15T").mean()

# 3) Tag each row with its calendar day:
data["date"] = data.index.date   # this is a pandas Series of Python dates

# 4) Count how many 15-min slots each day has:
day_counts = data.groupby("date")["active_power_W"].count()

# 5) Keep only the “good” days that have 96 slots:
good_days = day_counts[day_counts == 96].index       # an Index of date objects
data = data[data["date"].isin(good_days)]            # filter via the Series

# 6) (Optional) drop the helper “date” column now that you’re cleaned up:
data = data.drop(columns="date")

print(f"Remaining days: {len(good_days)}, slots/day: {data.groupby(data.index.date).count().iloc[0,0]}")


Remaining days: 35, slots/day: 96


  data = df[["active_power_W", "air_temperature"]].resample("15T").mean()


## 3) “Shallow” MCD Block (Morphological EMD)

In [33]:
def MCD(series, n_imfs=3, win=8):
    """Morphological EMPD: returns (T×n_imfs) array of IMFs."""
    x = series.values.astype(float)
    res = x.copy()
    imfs = []
    for _ in range(n_imfs):
        up = maximum_filter1d(res, size=win, mode="reflect")
        lo = minimum_filter1d(res, size=win, mode="reflect")
        m  = 0.5*(up+lo)
        imf = res - m
        imfs.append(imf)
        res = m
    return np.stack(imfs, axis=1)  # shape (T,n_imfs)


## 4) “Deep” D-R-D Module (Branch & Recombine)

In [34]:
def DRD(imfs):
    """Split each IMF into positive vs negative branches → (T×2n_imfs)."""
    pos = np.maximum(imfs, 0)
    neg = np.maximum(-imfs, 0)
    return np.concatenate([pos, neg], axis=1)


## 5) Daily Feature Extraction

In [35]:
records = []
for day, grp in data.groupby(data.index.date):
    s = grp["active_power_W"]
    imfs = MCD(s, n_imfs=3, win=8)
    comp = DRD(imfs)            # shape (96,6)
    base_feat = comp[:,-1].sum()     # last residual branch
    peak_feats = comp.max(axis=0)    # 6 peak-branch maxima
    peak_time = s.idxmax().hour + s.idxmax().minute/60
    temp = grp["air_temperature"].mean()
    rec = {"date":pd.to_datetime(str(day)), "base":base_feat, "peak_hr":peak_time, "temp":temp}
    rec.update({f"c{i}":peak_feats[i] for i in range(comp.shape[1])})
    records.append(rec)

df_feat = pd.DataFrame(records).set_index("date").sort_index()


## 6) Lag Features & Train/Test Split

In [36]:
for lag in (1,2,7):
    df_feat[f"lag{lag}_c0"] = df_feat["c0"].shift(lag)
df_feat.dropna(inplace=True)

X = df_feat[[f"lag{lag}_c0" for lag in (1,2,7)] + ["temp"]]
y_mag  = df_feat["c0"]          # predict magnitude
y_time = df_feat["peak_hr"]     # predict timing


## 7) Modeling & Evaluation



In [37]:
Xtr, Xte, ym_tr, ym_te = train_test_split(X,y_mag,  test_size=0.2, random_state=0)
_,   _,    yt_tr, yt_te = train_test_split(X,y_time,test_size=0.2, random_state=0)

m_mag = RandomForestRegressor(100, random_state=0).fit(Xtr, ym_tr)
m_tim = RandomForestRegressor(100, random_state=0).fit(Xtr, yt_tr)

pm = m_mag.predict(Xte)
pt = m_tim.predict(Xte)

print("MAE magnitude (kW):",  mean_absolute_error(ym_te, pm))
print("MAE timing   (hrs):", mean_absolute_error(yt_te, pt))


MAE magnitude (kW): 18.473226499684717
MAE timing   (hrs): 1.6008333333333333
