# **Chapter 79: Energy Demand Forecasting**

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand the unique challenges of energy demand forecasting: multiple seasonalities, weather dependence, and regulatory influences.
- Identify and acquire relevant data sources, including historical load, weather, and calendar information.
- Engineer features that capture daily, weekly, and annual cycles, as well as holiday effects and temperature sensitivities.
- Apply both classical time‑series models (SARIMA, exponential smoothing) and machine learning models (tree‑based, neural networks) to forecast demand.
- Implement probabilistic forecasts to quantify uncertainty, which is essential for grid operations.
- Evaluate forecasts using energy‑specific metrics (e.g., pinball loss for quantiles).
- Deploy a forecasting system in a production environment, integrating with real‑time data feeds and alerting mechanisms.

---

## **79.1 Introduction to Energy Demand Forecasting**

Energy demand forecasting is critical for the reliable and economical operation of power grids. Utilities use forecasts to schedule generation, manage reserves, plan maintenance, and set prices. Forecast errors can lead to blackouts (if demand exceeds supply) or wasted fuel (if too much generation is scheduled).

Unlike the financial time series we studied with NEPSE, energy demand exhibits:

- **Strong multiple seasonalities**: daily (peak vs. off‑peak), weekly (weekday vs. weekend), and annual (seasonal heating/cooling).
- **Weather sensitivity**: temperature, humidity, cloud cover directly affect heating and cooling loads.
- **Holiday effects**: demand patterns change dramatically on public holidays.
- **Special events**: major sports events, festivals can cause spikes.
- **Trends**: population growth, energy efficiency improvements, adoption of electric vehicles.

In this chapter, we will build an energy demand forecasting system for a hypothetical utility. We'll use publicly available data (e.g., from PJM in the US) but simplified for demonstration. The pipeline will include data ingestion, feature engineering, model training (both point and probabilistic forecasts), evaluation, and deployment.

---

## **79.2 Energy Demand Data Characteristics**

A typical energy demand dataset includes:

- **Timestamp**: usually hourly or half‑hourly.
- **Load**: actual demand (MW).
- **Temperature**: dry bulb, dew point.
- **Other weather variables**: humidity, wind speed, cloud cover, precipitation.
- **Calendar indicators**: hour of day, day of week, month, holiday flags.

We'll generate synthetic data that mimics real hourly load with these features.

```python
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import matplotlib.pyplot as plt

def generate_load_data(years=3, start_date='2020-01-01', seed=42):
    """
    Generate synthetic hourly load data with weather.
    """
    np.random.seed(seed)
    hours = years * 365 * 24
    dates = pd.date_range(start=start_date, periods=hours, freq='H')
    
    # Base load: trend + annual seasonality
    t = np.arange(hours)
    trend = 0.0001 * t  # slight upward trend
    annual = 200 * np.sin(2 * np.pi * t / (365.25 * 24) + 0.5)  # annual cycle
    
    # Weekly seasonality
    weekly = 150 * np.sin(2 * np.pi * t / (7*24))
    
    # Daily seasonality
    daily = 300 * np.sin(2 * np.pi * t / 24 - np.pi/2)  # peak in afternoon
    
    # Temperature effect (heating and cooling)
    # Simulate temperature with annual cycle and noise
    temp = 15 + 15 * np.sin(2 * np.pi * t / (365.25 * 24) - np.pi/2) + np.random.normal(0, 2, hours)
    # Heating load (cold temperatures increase demand)
    heating_effect = 50 * np.maximum(0, 10 - temp)
    # Cooling load (hot temperatures increase demand)
    cooling_effect = 80 * np.maximum(0, temp - 20)
    
    # Holiday effect (simplified: Christmas, New Year)
    holiday_effect = np.zeros(hours)
    for date in pd.date_range(start=start_date, periods=years*365, freq='D'):
        if date.month == 12 and date.day == 25:  # Christmas
            holiday_effect[(date - pd.Timestamp(start_date)).days * 24 : (date - pd.Timestamp(start_date)).days * 24 + 24] = -150
        if date.month == 1 and date.day == 1:    # New Year
            holiday_effect[(date - pd.Timestamp(start_date)).days * 24 : (date - pd.Timestamp(start_date)).days * 24 + 24] = -200
    
    # Random noise
    noise = np.random.normal(0, 20, hours)
    
    load = 1000 + trend + annual + weekly + daily + heating_effect + cooling_effect + holiday_effect + noise
    load = np.maximum(load, 200)  # minimum load
    
    df = pd.DataFrame({
        'timestamp': dates,
        'load': load,
        'temperature': temp,
        'hour': dates.hour,
        'dayofweek': dates.dayofweek,
        'month': dates.month,
        'day': dates.day,
        'is_weekend': (dates.dayofweek >= 5).astype(int)
    })
    
    # Add holiday flags
    df['is_holiday'] = 0
    for date in pd.date_range(start=start_date, periods=years*365, freq='D'):
        if (date.month == 12 and date.day == 25) or (date.month == 1 and date.day == 1):
            df.loc[df['timestamp'].dt.date == date.date(), 'is_holiday'] = 1
    
    return df

# Generate 2 years of hourly data
df_load = generate_load_data(years=2)
print(df_load.head())

# Plot one week of load and temperature
week = df_load.iloc[:7*24]
fig, ax1 = plt.subplots(figsize=(12,5))
ax1.plot(week['timestamp'], week['load'], color='tab:blue', label='Load (MW)')
ax1.set_xlabel('Time')
ax1.set_ylabel('Load', color='tab:blue')
ax2 = ax1.twinx()
ax2.plot(week['timestamp'], week['temperature'], color='tab:red', label='Temperature')
ax2.set_ylabel('Temperature', color='tab:red')
plt.title('Load and Temperature Over One Week')
fig.tight_layout()
plt.show()
```

**Explanation:**

- This function generates hourly load data with realistic patterns: trend, annual, weekly, and daily seasonality.
- Temperature is simulated with an annual cycle; heating and cooling effects are modelled as piecewise linear functions of temperature.
- Holiday effects are added as drops on Christmas and New Year (demand typically lower on holidays).
- The resulting DataFrame is the basis for our forecasting experiments.

---

## **79.3 Feature Engineering for Energy Demand**

Feature engineering for load forecasting must capture:

- **Time features**: hour, day of week, month, holiday flags, interactions (e.g., hour × weekday).
- **Weather features**: current temperature, lagged temperature, heating/cooling degree days.
- **Lagged load**: autoregressive terms (load at same hour yesterday, last week).
- **Rolling statistics**: moving averages of load and temperature over recent hours/days.
- **Calendar events**: special days (e.g., Super Bowl) may need manual flags.

We'll build a feature engineering class similar to previous chapters.

```python
class EnergyFeatureEngineer:
    """
    Feature engineering for hourly load forecasting.
    """
    
    def __init__(self):
        self.feature_columns = []
    
    def add_time_features(self, df):
        """Add cyclical time features."""
        df = df.copy()
        df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
        df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
        df['dayofweek_sin'] = np.sin(2 * np.pi * df['dayofweek'] / 7)
        df['dayofweek_cos'] = np.cos(2 * np.pi * df['dayofweek'] / 7)
        df['month_sin'] = np.sin(2 * np.pi * df['month'] / 12)
        df['month_cos'] = np.cos(2 * np.pi * df['month'] / 12)
        return df
    
    def add_lag_features(self, df, target='load', lags=[1, 2, 24, 48, 168]):
        """Lag features for load."""
        df = df.copy()
        for lag in lags:
            df[f'{target}_lag_{lag}'] = df[target].shift(lag)
        return df
    
    def add_rolling_features(self, df, target='load', windows=[24, 168]):
        """Rolling statistics over past hours."""
        df = df.copy()
        for window in windows:
            df[f'{target}_rolling_mean_{window}'] = df[target].rolling(window, min_periods=1).mean().shift(1)
            df[f'{target}_rolling_std_{window}'] = df[target].rolling(window, min_periods=1).std().shift(1)
            df[f'{target}_rolling_min_{window}'] = df[target].rolling(window, min_periods=1).min().shift(1)
            df[f'{target}_rolling_max_{window}'] = df[target].rolling(window, min_periods=1).max().shift(1)
        return df
    
    def add_weather_features(self, df, temp_col='temperature'):
        """Add temperature-related features."""
        df = df.copy()
        # Heating and cooling degree days (base 65°F = 18.3°C, approximate)
        base_temp = 18.3
        df['hdd'] = np.maximum(0, base_temp - df[temp_col])
        df['cdd'] = np.maximum(0, df[temp_col] - base_temp)
        # Lagged temperature
        for lag in [1, 2, 24]:
            df[f'{temp_col}_lag_{lag}'] = df[temp_col].shift(lag)
        # Rolling temperature
        df[f'{temp_col}_rolling_mean_24'] = df[temp_col].rolling(24, min_periods=1).mean().shift(1)
        return df
    
    def add_interaction_features(self, df):
        """Interactions between hour and weekend/holiday."""
        df = df.copy()
        df['hour_weekend'] = df['hour'] * df['is_weekend']
        df['hour_holiday'] = df['hour'] * df['is_holiday']
        return df
    
    def compute_features(self, df, target='load', forecast_horizon=24):
        """
        Main entry point.
        Creates features and target (load shifted by horizon).
        """
        df = df.copy().sort_values('timestamp')
        
        # Add all feature groups
        df = self.add_time_features(df)
        df = self.add_lag_features(df, target, lags=[1, 2, 24, 48, 168])
        df = self.add_rolling_features(df, target, windows=[24, 168])
        df = self.add_weather_features(df)
        df = self.add_interaction_features(df)
        
        # Create target: load at horizon hours ahead
        df[f'target_{target}_{forecast_horizon}'] = df[target].shift(-forecast_horizon)
        
        # Drop rows with NaN created by shifts
        df = df.dropna().reset_index(drop=True)
        
        # Store feature columns (exclude identifiers and target)
        exclude = ['timestamp', target, f'target_{target}_{forecast_horizon}']
        self.feature_columns = [c for c in df.columns if c not in exclude]
        
        return df
```

**Explanation:**

- Time features are cyclically encoded to preserve circular continuity.
- Lag features include load from 1 hour ago, 2 hours ago, same hour yesterday (24), same hour two days ago (48), and same hour last week (168).
- Rolling statistics over 24 hours (daily) and 168 hours (weekly) provide recent trend and variability.
- Weather features include heating/cooling degree days (common in energy modelling) and lagged/rolling temperature.
- Interactions like hour×weekend help capture different daily patterns on weekends.
- The target is load shifted forward by the forecast horizon (e.g., 24 hours ahead).

---

## **79.4 Modeling Approaches for Load Forecasting**

Load forecasting can be tackled with:

- **Classical time‑series models**: SARIMA, exponential smoothing (Holt‑Winters). Good for capturing seasonality but may struggle with exogenous variables like weather.
- **Machine learning models**: Gradient boosting (XGBoost, LightGBM) often perform very well because they can handle many features and interactions.
- **Deep learning**: LSTM, Sequence‑to‑sequence, or Transformer models can capture long‑range dependencies.
- **Hybrid**: Combining a statistical baseline with ML corrections.

We'll demonstrate both a classical SARIMA model (for comparison) and a LightGBM model, which typically excels in load forecasting competitions.

### **79.4.1 SARIMA Baseline**

SARIMA (Seasonal ARIMA) models the time series with seasonal components. For hourly data, we might have seasonality of 24 (daily) and 168 (weekly). We'll use `statsmodels`.

```python
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings("ignore")

def fit_sarima(train_series, order=(1,1,1), seasonal_order=(1,1,1,24)):
    """
    Fit a SARIMA model.
    """
    model = SARIMAX(train_series, order=order, seasonal_order=seasonal_order,
                    enforce_stationarity=False, enforce_invertibility=False)
    results = model.fit(disp=False)
    return results

# Example: forecast next 24 hours
# train = df_load['load'].iloc[:-24]
# test = df_load['load'].iloc[-24:]
# sarima_model = fit_sarima(train)
# forecast = sarima_model.forecast(steps=24)
# mae = np.mean(np.abs(forecast - test))
# print(f"SARIMA 24h MAE: {mae:.2f}")
```

**Explanation:**

- SARIMA requires careful parameter selection (order, seasonal order). Here we use a simple configuration; in practice, you would use AIC or grid search.
- The model can only use past load; it cannot incorporate temperature forecasts. This limits its accuracy.

### **79.4.2 LightGBM Model**

LightGBM can incorporate all our features, including weather forecasts (if available). We'll train a model to predict load 24 hours ahead.

```python
import lightgbm as lgb
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_absolute_error

class LoadForecaster:
    def __init__(self, feature_columns, categorical_features=None):
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features if categorical_features else []
        self.model = None
    
    def prepare_data(self, df, target_col):
        X = df[self.feature_columns]
        y = df[target_col]
        for col in self.categorical_features:
            if col in X.columns:
                X[col] = X[col].astype('category')
        return X, y
    
    def train_with_cv(self, X, y, n_splits=5, params=None):
        if params is None:
            params = {
                'objective': 'regression',
                'metric': 'mae',
                'boosting_type': 'gbdt',
                'num_leaves': 31,
                'learning_rate': 0.05,
                'feature_fraction': 0.8,
                'bagging_fraction': 0.8,
                'bagging_freq': 5,
                'verbose': -1
            }
        
        tscv = TimeSeriesSplit(n_splits=n_splits)
        cv_scores = []
        models = []
        
        for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
            X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
            y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
            
            train_data = lgb.Dataset(X_train, label=y_train, categorical_feature=self.categorical_features)
            val_data = lgb.Dataset(X_val, label=y_val, categorical_feature=self.categorical_features, reference=train_data)
            
            model = lgb.train(
                params,
                train_data,
                valid_sets=[val_data],
                num_boost_round=1000,
                callbacks=[lgb.early_stopping(10), lgb.log_evaluation(0)]
            )
            
            y_pred = model.predict(X_val, num_iteration=model.best_iteration)
            mae = mean_absolute_error(y_val, y_pred)
            cv_scores.append(mae)
            models.append(model)
            print(f"Fold {fold+1} MAE: {mae:.2f}")
        
        best_idx = np.argmin(cv_scores)
        self.model = models[best_idx]
        print(f"Best CV MAE: {cv_scores[best_idx]:.2f}")
        return cv_scores
    
    def predict(self, X):
        return self.model.predict(X)
```

**Explanation:**

- The forecaster uses time‑series cross‑validation to avoid look‑ahead.
- LightGBM can handle categorical features (e.g., hour, dayofweek) natively.
- The model is trained to predict load at horizon (e.g., 24 hours ahead) using features known at forecast time (including lagged load and weather forecasts).

---

## **79.5 Probabilistic Forecasting**

In energy, point forecasts are insufficient; operators need to know the uncertainty. Probabilistic forecasts provide prediction intervals or full quantiles. Methods include:

- **Quantile regression**: train models to predict specific quantiles (e.g., 10th, 50th, 90th).
- **Conformal prediction**: add intervals to any point forecast.
- **Bayesian methods**: e.g., Bayesian neural networks.

LightGBM can be used for quantile regression by specifying the `objective` as `'quantile'` and `alpha` for the quantile.

```python
def train_quantile_model(X, y, alpha=0.5):
    params = {
        'objective': 'quantile',
        'alpha': alpha,
        'metric': 'quantile',
        'boosting_type': 'gbdt',
        'num_leaves': 31,
        'learning_rate': 0.05,
        'feature_fraction': 0.8,
        'bagging_fraction': 0.8,
        'bagging_freq': 5,
        'verbose': -1
    }
    train_data = lgb.Dataset(X, label=y)
    model = lgb.train(params, train_data, num_boost_round=100)
    return model

# Train three models for 10th, 50th, 90th percentiles
# model_p10 = train_quantile_model(X_train, y_train, alpha=0.1)
# model_p50 = train_quantile_model(X_train, y_train, alpha=0.5)
# model_p90 = train_quantile_model(X_train, y_train, alpha=0.9)
# pred_p10 = model_p10.predict(X_test)
# pred_p50 = model_p50.predict(X_test)
# pred_p90 = model_p90.predict(X_test)
```

**Explanation:**

- Three models give the lower, median, and upper quantiles.
- The interval [p10, p90] contains 80% of the probability mass.
- This provides operators with a range of possible outcomes.

---

## **79.6 Evaluation Metrics for Load Forecasts**

Standard metrics:

- **MAE**, **RMSE** for point forecasts.
- **Pinball loss** for quantile forecasts:  
  `L(y, q, α) = (y - q) * α  if y ≥ q else (q - y) * (1-α)`
- **Continuous Ranked Probability Score (CRPS)** for full distributions.

We'll implement pinball loss.

```python
def pinball_loss(y_true, y_pred, alpha):
    """Compute pinball loss for a single quantile."""
    error = y_true - y_pred
    loss = np.where(error >= 0, alpha * error, (alpha - 1) * error)
    return np.mean(loss)

# For the three quantile models, average pinball loss over alphas
# loss_p10 = pinball_loss(y_test, pred_p10, 0.1)
# loss_p50 = pinball_loss(y_test, pred_p50, 0.5)
# loss_p90 = pinball_loss(y_test, pred_p90, 0.9)
# avg_loss = (loss_p10 + loss_p50 + loss_p90) / 3
# print(f"Average pinball loss: {avg_loss:.2f}")
```

**Explanation:**

- Pinball loss is the standard metric for quantile forecasts; lower is better.
- Averaging over quantiles gives an overall measure of probabilistic forecast quality.

---

## **79.7 Case Study: 24‑Hour Ahead Load Forecasting**

We'll now run a complete example using our synthetic data.

```python
# Generate data
df = generate_load_data(years=3)

# Feature engineering
engineer = EnergyFeatureEngineer()
featured_df = engineer.compute_features(df, target='load', forecast_horizon=24)
print(f"Features shape: {featured_df.shape}")

# Split by time (last 30 days for test)
split_date = df['timestamp'].max() - pd.Timedelta(days=30)
train_df = featured_df[featured_df['timestamp'] < split_date]
test_df = featured_df[featured_df['timestamp'] >= split_date]

# Define categorical features
cat_features = ['hour', 'dayofweek', 'month', 'is_weekend', 'is_holiday']

# Prepare data
forecaster = LoadForecaster(engineer.feature_columns, categorical_features=cat_features)
X_train, y_train = forecaster.prepare_data(train_df, 'target_load_24')
X_test, y_test = forecaster.prepare_data(test_df, 'target_load_24')

# Train with cross‑validation
cv_scores = forecaster.train_with_cv(X_train, y_train, n_splits=3)

# Predict on test
y_pred = forecaster.predict(X_test)
test_mae = mean_absolute_error(y_test, y_pred)
print(f"Test MAE (24h ahead): {test_mae:.2f}")

# Plot actual vs predicted for a sample week
plot_idx = test_df['timestamp'].iloc[:7*24]
plt.figure(figsize=(12,5))
plt.plot(plot_idx, y_test[:7*24], label='Actual')
plt.plot(plot_idx, y_pred[:7*24], label='Predicted')
plt.xlabel('Time')
plt.ylabel('Load (MW)')
plt.title('Actual vs Predicted Load (24h ahead)')
plt.legend()
plt.show()
```

**Explanation:**

- We train on 3 years of data, test on the last 30 days.
- The model predicts load 24 hours ahead using features available at forecast time (e.g., temperature forecasts would need to be provided; here we use actual temperature, which would not be known in reality – in practice you would use forecasted temperature).
- The plot shows that the model captures the daily and weekly patterns well.

---

## **79.8 Deployment and Integration**

In production, an energy forecasting system typically runs daily, producing forecasts for the next day (or week) at hourly resolution. The pipeline might:

1. **Ingest latest load and weather observations** (e.g., from SCADA and weather services).
2. **Compute features** up to the current hour.
3. **Load the latest trained model** from a model registry (e.g., MLflow).
4. **Generate forecasts** for the next 24–168 hours.
5. **Save forecasts** to a database and expose via API or push to control room dashboards.
6. **Monitor forecast errors** and trigger retraining when performance degrades.

We can reuse the batch prediction pattern from Chapter 74. For real‑time updates (e.g., every hour), we would have a streaming version.

```python
class EnergyBatchPredictor:
    def __init__(self, model, feature_engineer, feature_columns, categorical_features):
        self.model = model
        self.feature_engineer = feature_engineer
        self.feature_columns = feature_columns
        self.categorical_features = categorical_features
    
    def predict_next_day(self, historical_df, weather_forecast_df, forecast_horizon=24):
        """
        historical_df: DataFrame with recent load and observed weather up to now.
        weather_forecast_df: DataFrame with forecasted temperature for next hours.
        Returns predictions for next `forecast_horizon` hours.
        """
        # Combine historical and forecast weather
        # This is simplified; you would align timestamps and fill future weather.
        # For demonstration, we assume weather_forecast_df has the same structure and is already merged.
        full_df = pd.concat([historical_df, weather_forecast_df], ignore_index=True)
        # Engineer features (this will compute lags, etc., using historical data)
        featured = self.feature_engineer.compute_features(full_df, target='load', forecast_horizon=0)
        # The last `forecast_horizon` rows correspond to the future times
        future_features = featured.iloc[-forecast_horizon:][self.feature_columns]
        for col in self.categorical_features:
            if col in future_features.columns:
                future_features[col] = future_features[col].astype('category')
        preds = self.model.predict(future_features)
        return preds
```

**Explanation:**

- The batch predictor uses the latest historical data and future weather forecasts to generate predictions.
- In practice, you would have a separate process to fetch weather forecasts from an API.

---

## **79.9 Lessons Learned from Energy Forecasting**

1. **Weather is the most important exogenous variable**: Accurate temperature forecasts are crucial.
2. **Holidays are difficult**: They occur infrequently, so models may not learn them well. Consider adding holiday dummies or using separate models for holidays.
3. **Multiple seasonalities require careful feature design**: Our use of lag features at 24 and 168 hours captures daily and weekly cycles.
4. **Probabilistic forecasts add value**: Operators need to know the range of possible outcomes for risk management.
5. **Model retraining is necessary**: Load patterns change over time (e.g., due to new appliances, energy efficiency). Monitor and retrain periodically.
6. **Regulatory and market implications**: In deregulated markets, forecasts directly affect bidding strategies; accuracy translates to profit.

---

## **79.10 Future Directions**

- **Incorporating renewables**: Solar and wind generation forecasts are becoming essential as they add variability.
- **Ensemble methods**: Combine multiple models (e.g., SARIMA, LightGBM, LSTM) for improved accuracy.
- **Spatio‑temporal models**: Forecast load across multiple zones simultaneously, capturing regional correlations.
- **Deep learning with attention**: Transformers can capture long‑range dependencies and multiple seasonalities.
- **Transfer learning**: Use models pretrained on one region to improve forecasts in another with limited data.

---

## **Chapter Summary**

In this chapter, we built an energy demand forecasting system. We generated synthetic hourly load data with realistic patterns, engineered features to capture seasonalities and weather effects, trained a LightGBM model for 24‑hour ahead point forecasts, extended to probabilistic forecasts using quantile regression, and discussed deployment considerations. The principles of feature engineering, time‑based validation, and model deployment mirror those from earlier chapters, but with domain‑specific adaptations for energy.

This chapter concludes our exploration of domain‑specific forecasting systems. The next chapter, **Supply Chain Optimization**, will apply similar techniques to demand forecasting, inventory optimization, and lead time prediction.

---

**End of Chapter 79**