# Demand Response / Flexibility Simulation (Household Energy)

This notebook builds a **toy demand response (DR) simulator** on top of
household energy data.

We assume you have the **Individual household electric power consumption**
dataset, but the notebook is self-contained: it can also recompute daily
features and clusters if needed.

## Goals

- Use **daily clusters** as behavioural segments (evening-heavy, day-heavy,
  low-use days, etc.).
- Identify **flexible hours** (evening peaks) and **shoulder hours**.
- Define a simple **price signal** (DR control events with high prices).
- Simulate **behaviour change**:
  - 10–25% reduction in evening consumption,
  - energy shifted towards shoulder hours.
- Evaluate:
  - Energy shifted out of peak window,
  - Peak reduction,
  - Impact on costs under a simple price curve.
- Visualise:
  - Before/after load shapes for example days,
  - Distribution of savings per day and per cluster.


## 0. Setup and data

We use the **Individual household electric power consumption** dataset.

Expected file:

```text
data/household_power.csv
```

With (at least) the usual columns:

- `Date`, `Time`
- `Global_active_power`
- `Global_reactive_power`, `Voltage`, `Global_intensity`
- `Sub_metering_1`, `Sub_metering_2`, `Sub_metering_3`

If you already ran the *daily pattern clustering* project and exported
`data/daily_profiles_with_clusters.csv`, this notebook will reuse it.
Otherwise, it will recompute clusters from scratch.


### 0.1 Imports and paths


In [ ]:
from __future__ import annotations

from pathlib import Path
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (11, 5)

DATA_PATH = Path("data") / "household_power.csv"
CLUSTER_EXPORT_PATH = Path("data") / "daily_profiles_with_clusters.csv"
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

if not DATA_PATH.exists():
    raise FileNotFoundError(
        f"Expected dataset at {DATA_PATH.resolve()}\n"
        "Download 'Individual household electric power consumption' and save as 'data/household_power.csv'."
    )

raw = pd.read_csv(DATA_PATH)
raw.head()

## 1. From raw data to hourly kWh

We:

- Combine `Date` and `Time` into a timestamp.
- Convert energy-related columns to numeric.
- Resample from 1-minute to **hourly**.
- Compute hourly `kwh` from average `Global_active_power`.


In [ ]:
def clean_household_power(df: pd.DataFrame) -> pd.DataFrame:
    """Clean and resample the household power dataset to hourly kWh.

    Parameters
    ----------
    df : pd.DataFrame
        Raw dataframe with Date/Time, Global_active_power and related columns.

    Returns
    -------
    pd.DataFrame
        Hourly dataframe indexed by timestamp with at least:
        - kwh: energy in that hour (approx. mean kW * 1 hour)
        - global_active_power: mean kW in that hour
        - sub_metering_1/2/3: hourly sums (if available)
    """
    df = df.copy()

    if not {"Date", "Time"}.issubset(df.columns):
        raise ValueError("Expected 'Date' and 'Time' columns in dataset.")

    df["timestamp"] = pd.to_datetime(
        df["Date"].astype(str) + " " + df["Time"].astype(str), errors="coerce"
    )
    df = df.dropna(subset=["timestamp"]).sort_values("timestamp")

    num_cols = [
        "Global_active_power",
        "Global_reactive_power",
        "Voltage",
        "Global_intensity",
        "Sub_metering_1",
        "Sub_metering_2",
        "Sub_metering_3",
    ]
    for col in num_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors="coerce")

    df = df.set_index("timestamp").sort_index()

    hourly = pd.DataFrame()
    if "Global_active_power" in df.columns:
        hourly["global_active_power"] = df["Global_active_power"].resample("H").mean()
        hourly["kwh"] = hourly["global_active_power"]

    for col in ["Sub_metering_1", "Sub_metering_2", "Sub_metering_3"]:
        if col in df.columns:
            hourly[col] = df[col].resample("H").sum()

    hourly = hourly.dropna(subset=["kwh"])
    return hourly


hourly = clean_household_power(raw)
hourly.head()

### 1.1 Quick view of hourly consumption


In [ ]:
hourly["kwh"].plot(alpha=0.7)
plt.title("Hourly energy consumption (kWh)")
plt.ylabel("kWh")
plt.show()

sample_start = hourly.index.min() + pd.Timedelta(days=7)
sample_end = sample_start + pd.Timedelta(days=7)
sample = hourly.loc[sample_start:sample_end]
sample["kwh"].plot()
plt.title("Sample week of hourly consumption")
plt.ylabel("kWh")
plt.show()

## 2. Daily features and clusters

We create one row per day with:

- Daily totals, max, mean.
- Day/night/evening fractions.
- 24h profile: kWh per hour (h_00..h_23).

If an exported `daily_profiles_with_clusters.csv` exists, we reuse the
`cluster` labels from there. Otherwise we run KMeans to create clusters.


In [ ]:
def build_daily_profile_frame(hourly_df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Create daily features and 24h profiles from hourly kWh.

    Returns
    -------
    daily_features : pd.DataFrame
        One row per date with aggregate features and fractions.
    daily_profiles : pd.DataFrame
        One row per date, columns `h_00`..`h_23` with kWh at each hour.
    """
    df = hourly_df.copy()
    df["date"] = df.index.date
    df["hour"] = df.index.hour

    profile = df.pivot_table(
        index="date",
        columns="hour",
        values="kwh",
        aggfunc="mean",
    )
    profile.columns = [f"h_{h:02d}" for h in profile.columns]

    daily = df.groupby("date").agg(
        total_kwh=("kwh", "sum"),
        max_kwh=("kwh", "max"),
        mean_kwh=("kwh", "mean"),
    )

    def _fraction_sum(mask: pd.Series) -> pd.Series:
        return df.loc[mask, :].groupby("date")["kwh"].sum()

    day_mask = (df["hour"] >= 8) & (df["hour"] < 18)
    night_mask = (df["hour"] < 6) | (df["hour"] >= 22)
    evening_mask = (df["hour"] >= 18) & (df["hour"] < 23)

    day_kwh = _fraction_sum(day_mask)
    night_kwh = _fraction_sum(night_mask)
    eve_kwh = _fraction_sum(evening_mask)

    daily["day_kwh"] = day_kwh
    daily["night_kwh"] = night_kwh
    daily["evening_kwh"] = eve_kwh

    daily["day_frac"] = daily["day_kwh"] / daily["total_kwh"]
    daily["night_frac"] = daily["night_kwh"] / daily["total_kwh"]
    daily["evening_frac"] = daily["evening_kwh"] / daily["total_kwh"]

    features = daily.join(profile, how="inner")
    return features, profile


daily_features, daily_profiles = build_daily_profile_frame(hourly)
daily_features.index = pd.to_datetime(daily_features.index)
daily_features["weekday"] = daily_features.index.dayofweek
daily_features.head()

### 2.1 Attach or compute clusters


In [ ]:
if CLUSTER_EXPORT_PATH.exists():
    existing = pd.read_csv(CLUSTER_EXPORT_PATH, index_col="date", parse_dates=["date"])
    if "cluster" in existing.columns:
        daily_features["cluster"] = existing.loc[daily_features.index, "cluster"].to_numpy()
    else:
        print("Cluster file found but no 'cluster' column – will recompute.")
else:
    print("No precomputed clusters found – will compute KMeans clusters.")

if "cluster" not in daily_features.columns:
    cluster_cols = [c for c in daily_features.columns if c not in ["weekday"]]
    X = daily_features[cluster_cols].fillna(0.0).to_numpy()
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    sil_scores: Dict[int, float] = {}
    for k in range(2, 7):
        kmeans = KMeans(n_clusters=k, random_state=RANDOM_STATE, n_init=20)
        labels = kmeans.fit_predict(X_scaled)
        sil_scores[k] = silhouette_score(X_scaled, labels)

    best_k = max(sil_scores, key=sil_scores.get)
    print("Silhouette scores:", sil_scores)
    print("Using K=", best_k)

    kmeans_final = KMeans(n_clusters=best_k, random_state=RANDOM_STATE, n_init=50)
    daily_features["cluster"] = kmeans_final.fit_predict(X_scaled)

daily_features["cluster"].value_counts().sort_index()

## 3. DR events, flexible hours and price signal

We define a simple DR control policy:

- **Flexible hours** (peak window): 18:00–21:59 (hours 18–21).
- **Shoulder hours** where load can be shifted: 14–17 and 22–23.
- **DR event days**: days in the top 20% of total_kwh (heaviest days).
- **Price curve**:
  - Off-peak (night): cheap (0.8 units).
  - Normal hours: base price (1.0).
  - DR peak hours on event days: high price (3.0).


In [ ]:
FLEX_HOURS = list(range(18, 22))
SHOULDER_HOURS = list(range(14, 18)) + [22, 23]

def select_dr_event_days(daily: pd.DataFrame, quantile: float = 0.8) -> List[pd.Timestamp]:
    """Select DR event days as those with total_kwh above the given quantile."""
    threshold = daily["total_kwh"].quantile(quantile)
    mask = daily["total_kwh"] >= threshold
    return list(daily.index[mask])


dr_event_days = select_dr_event_days(daily_features, quantile=0.8)
len(dr_event_days), dr_event_days[:5]

In [ ]:
def build_price_series(hourly_df: pd.DataFrame, dr_days: List[pd.Timestamp]) -> pd.Series:
    """Construct a simple price time series.

    - Off-peak (hours < 6 or >= 22): 0.8
    - Normal: 1.0
    - On DR days, flexible hours (18–21): 3.0
    """
    idx = hourly_df.index
    hours = idx.hour
    dates = idx.date

    base_price = np.full(len(idx), 1.0)
    offpeak_mask = (hours < 6) | (hours >= 22)
    base_price[offpeak_mask] = 0.8

    dr_dates = {d.date() if isinstance(d, pd.Timestamp) else d for d in dr_days}
    peak_mask = np.isin(dates, list(dr_dates)) & np.isin(hours, FLEX_HOURS)
    base_price[peak_mask] = 3.0

    return pd.Series(base_price, index=idx, name="price")


price_series = build_price_series(hourly, dr_event_days)
price_series.head()

Quick check on one DR day.

In [ ]:
example_day = dr_event_days[0]
mask_day = hourly.index.date == example_day.date()
price_series[mask_day].plot(marker="o")
plt.title(f"Price curve for DR event day {example_day.date()}")
plt.ylabel("Price units")
plt.show()

## 4. Behaviour model: shifting flexible load

Each cluster gets an assumed **flexible fraction** of peak-window load.

On DR days we:

- Reduce flexible window load by that fraction.
- Redistribute the removed energy to shoulder hours, proportional to
  their baseline load.


In [ ]:
unique_clusters = sorted(daily_features["cluster"].unique())
flex_fraction_by_cluster: Dict[int, float] = {}
for cid in unique_clusters:
    # Simple heuristic: higher cluster id -> more flexible
    flex_fraction_by_cluster[cid] = 0.10 + 0.05 * cid

flex_fraction_by_cluster

In [ ]:
def simulate_dr(hourly_df: pd.DataFrame,
                daily_feat: pd.DataFrame,
                dr_days: List[pd.Timestamp],
                flex_frac_by_cluster: Dict[int, float]) -> pd.DataFrame:
    """Simulate a DR program by shifting flexible evening load.

    For each DR event day:
    - Determine that day's cluster and its flexible fraction.
    - Reduce evening load (flex window) by that fraction.
    - Redistribute the removed energy to shoulder hours.
    """
    dr_df = hourly_df.copy()
    dr_dates = {d.normalize() for d in dr_days}

    for day in dr_dates:
        day_mask = dr_df.index.normalize() == day
        if not day_mask.any():
            continue

        date_key = day.date()
        try:
            cluster = int(daily_feat.loc[pd.Timestamp(date_key), "cluster"])
        except KeyError:
            continue

        flex_frac = flex_frac_by_cluster.get(cluster, 0.10)

        hours = dr_df.index[day_mask].hour
        flex_hours_mask = day_mask & np.isin(dr_df.index.hour, FLEX_HOURS)
        shoulder_hours_mask = day_mask & np.isin(dr_df.index.hour, SHOULDER_HOURS)

        if not flex_hours_mask.any() or not shoulder_hours_mask.any():
            continue

        flex_load = dr_df.loc[flex_hours_mask, "kwh"]
        shoulder_load = dr_df.loc[shoulder_hours_mask, "kwh"]

        total_flex_energy = flex_load.sum()
        energy_to_shift = total_flex_energy * flex_frac

        dr_df.loc[flex_hours_mask, "kwh"] = flex_load * (1.0 - flex_frac)

        shoulder_total = shoulder_load.sum()
        if shoulder_total > 0:
            extra_per_hour = energy_to_shift * (shoulder_load / shoulder_total)
            dr_df.loc[shoulder_hours_mask, "kwh"] = shoulder_load + extra_per_hour

    return dr_df


hourly_dr = simulate_dr(hourly, daily_features, dr_event_days, flex_fraction_by_cluster)
hourly_dr.head()

### 4.1 Example DR day – baseline vs DR


In [ ]:
example_day = dr_event_days[0]
mask_base = hourly.index.date == example_day.date()
mask_dr = hourly_dr.index.date == example_day.date()

plt.plot(hourly.index[mask_base].hour, hourly.loc[mask_base, "kwh"], label="baseline")
plt.plot(hourly_dr.index[mask_dr].hour, hourly_dr.loc[mask_dr, "kwh"], label="DR", linestyle="--")
plt.xlabel("Hour of day")
plt.ylabel("kWh")
plt.title(f"Example DR day – baseline vs DR load ({example_day.date()})")
plt.legend()
plt.show()

## 5. Metrics: energy shifted, peak reduction, cost impact


In [ ]:
def compute_daily_metrics(hourly_base: pd.DataFrame,
                          hourly_dr: pd.DataFrame,
                          price: pd.Series,
                          dr_days: List[pd.Timestamp]) -> pd.DataFrame:
    """Compute DR metrics per event day.

    Returns one row per DR day with:
    - baseline_peak, dr_peak, peak_reduction
    - baseline_peak_energy, dr_peak_energy, energy_shifted_peak
    - baseline_cost, dr_cost, cost_savings
    """
    rows: List[Dict[str, float]] = []
    dr_dates = {d.normalize() for d in dr_days}

    for day in sorted(dr_dates):
        base_mask = hourly_base.index.normalize() == day
        dr_mask = hourly_dr.index.normalize() == day
        if not base_mask.any() or not dr_mask.any():
            continue

        hours = hourly_base.index[base_mask].hour
        flex_mask_day = np.isin(hours, FLEX_HOURS)

        base_load = hourly_base.loc[base_mask, "kwh"].to_numpy()
        dr_load = hourly_dr.loc[dr_mask, "kwh"].to_numpy()
        day_price = price.loc[base_mask].to_numpy()

        if flex_mask_day.any():
            base_flex = base_load[flex_mask_day]
            dr_flex = dr_load[flex_mask_day]
            baseline_peak = float(base_flex.max())
            dr_peak = float(dr_flex.max())
            baseline_peak_energy = float(base_flex.sum())
            dr_peak_energy = float(dr_flex.sum())
            energy_shifted_peak = baseline_peak_energy - dr_peak_energy
        else:
            baseline_peak = dr_peak = baseline_peak_energy = dr_peak_energy = energy_shifted_peak = 0.0

        baseline_cost = float((base_load * day_price).sum())
        dr_cost = float((dr_load * day_price).sum())

        rows.append(
            {
                "date": day.date(),
                "baseline_peak": baseline_peak,
                "dr_peak": dr_peak,
                "peak_reduction": baseline_peak - dr_peak,
                "baseline_peak_energy": baseline_peak_energy,
                "dr_peak_energy": dr_peak_energy,
                "energy_shifted_peak": energy_shifted_peak,
                "baseline_cost": baseline_cost,
                "dr_cost": dr_cost,
                "cost_savings": baseline_cost - dr_cost,
            }
        )

    return pd.DataFrame(rows).set_index("date")


price_series = build_price_series(hourly, dr_event_days)
metrics_df = compute_daily_metrics(hourly, hourly_dr, price_series, dr_event_days)
metrics_df.head()

### 5.1 Aggregate view of DR impact


In [ ]:
metrics_df[["energy_shifted_peak", "peak_reduction", "cost_savings"]].describe()

In [ ]:
metrics_df["energy_shifted_peak"].hist(bins=20)
plt.xlabel("Energy shifted out of peak window (kWh)")
plt.title("Distribution of shifted energy per DR day")
plt.show()

metrics_df["cost_savings"].hist(bins=20)
plt.xlabel("Cost savings per DR day (price units)")
plt.title("Distribution of cost savings per DR day")
plt.show()

## 6. Savings by cluster


In [ ]:
day_to_cluster = daily_features["cluster"].to_dict()
metrics_df["cluster"] = [day_to_cluster.get(pd.Timestamp(d), np.nan) for d in metrics_df.index]

metrics_df.groupby("cluster")["energy_shifted_peak", "cost_savings"].mean()

In [ ]:
sns.boxplot(data=metrics_df, x="cluster", y="cost_savings")
plt.title("Cost savings per DR day by cluster")
plt.ylabel("Savings (price units)")
plt.show()

sns.boxplot(data=metrics_df, x="cluster", y="energy_shifted_peak")
plt.title("Energy shifted out of peak by cluster")
plt.ylabel("kWh")
plt.show()

## 7. Wrap-up

We built a **simple DR / flexibility simulator** over household energy data:

- Clusters act as segments with different assumed flexibility.
- On high-load DR days, evening load is reduced and shifted to shoulders.
- We measured peak reduction, shifted energy, and cost savings.
- We analysed which clusters contribute most to savings.

Next steps could include learning flexibility parameters from real DR events,
using probabilistic forecasts, or scaling to many households.


## 8. Policy / utility perspective

The simulator we built is deliberately simple, but it mirrors how a utility or
aggregator might reason about a **DR program**:

- **Targeting**: clusters with higher `energy_shifted_peak` and `cost_savings`
  are good candidates for DR offers (they respond well to signals or have more
  flexible load to shift).
- **Product design**:
  - Evening peaker clusters → time-of-use tariffs with strong evening signals.
  - Day-heavy clusters → incentives for midday shifting (solar or low wholesale prices).
  - Low-use clusters → simpler, low-fixed products; DR impact per customer is small.
- **System impact**: peak reduction and energy shifted translate directly into
  lower peak generation / network stress and, in many markets, lower imbalance
  or capacity charges.
- **Customer impact**: cost savings per day and per cluster give a first-order
  view of how attractive the DR program might feel to different segments.

This kind of analysis is typically extended with:

- **Uplift modelling** to estimate the causal effect of DR signals on load.
- **Comfort constraints** (minimum evening use, maximum deferral time, etc.).
- **Portfolio scaling** from a single household to thousands of meters.


## 9. DR campaign report

To make the simulator more report-friendly, we aggregate per-day KPIs into a
simple **campaign report**:

- Number of DR event days.
- Total and average energy shifted out of peak.
- Total and average cost savings.
- Breakdown by cluster.

We also export the detailed per-day metrics to CSV for downstream BI or
dashboarding.


In [ ]:
# Overall campaign KPIs
n_days = len(metrics_df)
total_shifted = metrics_df["energy_shifted_peak"].sum()
avg_shifted = metrics_df["energy_shifted_peak"].mean()
total_savings = metrics_df["cost_savings"].sum()
avg_savings = metrics_df["cost_savings"].mean()

campaign_summary = pd.DataFrame(
    {
        "n_dr_days": [n_days],
        "total_shifted_kwh": [total_shifted],
        "avg_shifted_kwh_per_day": [avg_shifted],
        "total_savings_units": [total_savings],
        "avg_savings_units_per_day": [avg_savings],
    }
)
campaign_summary

In [ ]:
# Cluster-level campaign KPIs
cluster_campaign = (
    metrics_df.groupby("cluster")["energy_shifted_peak", "cost_savings"]
    .agg(["count", "sum", "mean"])
)
cluster_campaign

In [ ]:
# Export detailed DR metrics for BI / dashboarding
dr_metrics_path = Path("data") / "dr_campaign_daily_metrics.csv"
dr_metrics_path.parent.mkdir(parents=True, exist_ok=True)
metrics_df.to_csv(dr_metrics_path, index_label="date")
print("Exported DR daily metrics to:", dr_metrics_path.resolve())