# Energy Consumption Forecasting with Transformers

A workflow for forecasting household energy consumption using a Transformer-based model is presented. The process includes data acquisition, preprocessing, modeling, evaluation, and discussion of results.

## Table of Contents
1. Download dataset
2. Preprocessing
3. Modeling
4. Evaluation
5. Discussion

## 1. Download dataset

The dataset is downloaded from the UCI Machine Learning Repository using a dedicated Python module. The file is stored locally if it does not already exist.

In [None]:
# from energy_forecast_transformer.download import download_kaggle_dataset
# download_kaggle_dataset('uciml/electric-power-consumption-data-set', './data')

In [None]:
from src.preprocessing import load_and_clean_data

df = load_and_clean_data('./data/household_power_consumption.txt')
df.head()

In [None]:
from summarytools import dfSummary # for summary of the data
dfSummary(df)

## 2. Preprocessing

In this section, we clean the data, handle missing values, resample the time series, and normalize the features.

### 2.1 Fill in missing values

Filling missing values is crucial for time series forecasting. Here, we use linear interpolation to estimate and fill missing data points, ensuring continuity and reducing the risk of bias in the model.

In [None]:
from energy_forecast_transformer.preprocessing import fill_missing_values

df = fill_missing_values(df)
df.isnull().sum()

### 2.2 Resample dataset

Resampling the dataset allows to aggregate the data to a different time frequency (e.g., daily). This can help reduce noise and align the data with the requirements of the forecasting model.

In [None]:
from src.preprocessing import resample_data

df_daily = resample_data(df, freq='D')
df_daily.shape

In [None]:
df_daily.head()

In [None]:
import matplotlib.pyplot as plt

_, axs = plt.subplots(2, 1, figsize=(12, 8), sharex=False)
tags = df.columns.tolist()
tag = tags[0]

# Plot for df (original, minute-level)
axs[0].plot(df.index, df[tag], label=tag)
axs[0].set_title(f'{tag} (Original Data)')
axs[0].set_xlabel('Datetime')
axs[0].set_ylabel('Value')
axs[0].legend()

# Plot for df_daily (resampled, daily)
axs[1].plot(df_daily.index, df_daily[tag], label=f'{tag} (Daily)')
axs[1].set_title(f'{tag} (Daily Resampled)')
axs[1].set_xlabel('Datetime')
axs[1].set_ylabel('Normalized Value')
axs[1].legend()

plt.tight_layout()
plt.show()

### 2.3 Normalization

Normalization scales all features to a common range, typically [0, 1]. This step is important, as it helps the model converge faster and prevents features with larger scales from dominating the learning process.

In [None]:
from src.preprocessing import normalize_data

columns = ['Global_active_power', 'Global_reactive_power', 'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3']
df_daily, scaler = normalize_data(df_daily, columns=columns)

In [None]:
df_daily.head()

### 2.4 Set up dataset dictionary

We organize the data into dictionaries for training, validation, and testing. Each dictionary contains the time series data, static features, and item identifiers, making it compatible with the modeling framework.

In [None]:
freq = "1D"
prediction_length = 28 # 4 weeks

In [None]:
n_total = len(df_daily)
n_train = int(n_total - prediction_length*2)
n_val = int(n_total - prediction_length)
n_test = int(n_total)
print(f"Total samples: {n_total}, Train samples: {n_train}, Validation samples: {n_val}, Test samples: {n_test}")

In [None]:
from src.preprocessing import create_dataset_dict, transform_start_field
from datasets import Dataset, DatasetDict
from functools import partial

data_train, data_val, data_test = create_dataset_dict(df_daily, n_train, n_val, n_test, prediction_length)

dataset = DatasetDict({
    'train': Dataset.from_dict(data_train),
    'validation': Dataset.from_dict(data_val),
    'test': Dataset.from_dict(data_test)
})

train_dataset = dataset["train"]
test_dataset = dataset["test"]

train_dataset.set_transform(partial(transform_start_field, freq=freq))
test_dataset.set_transform(partial(transform_start_field, freq=freq))

In [None]:
dataset

In [None]:
# Checking the structure of the dataset
print(dataset["train"][0]['start'])
print(len(dataset["train"][0]['target']))
print(dataset["train"][6]['feat_static_cat'])
print(dataset["train"][0]['feat_dynamic_real'])
print(dataset["train"][6]['item_id'])

In [None]:
train_example = dataset["train"][0]
validation_example = dataset["validation"][0]

In [None]:
# Check if prediction_length is correctly set
assert len(train_example["target"]) + prediction_length == len(dataset["validation"][0]["target"])

In [None]:
num_of_samples = 6 * prediction_length

figure, axes = plt.subplots()
axes.plot(train_example["target"][-num_of_samples:], color="blue")
axes.plot(
    validation_example["target"][-num_of_samples - prediction_length :],
    color="red",
    alpha=0.5,
)

plt.show()

### 2.5 Splitting dataset

Splitting the dataset into training, validation, and test sets allows us to train the model, tune hyperparameters, and evaluate performance on unseen data.

In [None]:
train_dataset = dataset["train"]
test_dataset = dataset["test"]

In [None]:
# Converting the start field to pandas Period

from functools import lru_cache

import pandas as pd
import numpy as np


@lru_cache(10_000)
def convert_to_pandas_period(date, freq):
    return pd.Period(date, freq)


def transform_start_field(batch, freq):
    batch["start"] = [convert_to_pandas_period(date, freq) for date in batch["start"]]
    return batch

In [None]:
from functools import partial

train_dataset.set_transform(partial(transform_start_field, freq=freq))
test_dataset.set_transform(partial(transform_start_field, freq=freq))

In [None]:
train_dataset[0]["start"]

### 2.6 Create rolling windows

Rolling windows are used to create overlapping sequences for training and testing. This approach helps the model learn from multiple segments of the time series and improves its ability to forecast future values.

In [None]:
from gluonts.dataset.multivariate_grouper import MultivariateGrouper

num_of_variates = len(train_dataset)

train_grouper = MultivariateGrouper(max_target_dim=num_of_variates)
test_grouper = MultivariateGrouper(
    max_target_dim=num_of_variates,
    num_test_dates=len(test_dataset) // num_of_variates, # number of rolling test windows
)

multi_variate_train_dataset = train_grouper(train_dataset)
multi_variate_test_dataset = test_grouper(test_dataset)

In [None]:
multi_variate_train_example = multi_variate_train_dataset[0]
multi_variate_test_example = multi_variate_test_dataset[0]
print("multi_variate_train_example['target'].shape =", multi_variate_train_example["target"].shape)
print("multi_variate_test_example['target'].shape =", multi_variate_test_example["target"].shape)

## 3. Modeling

In this section, we define and configure the Transformer-based model for time series forecasting. The model is set up to handle multivariate time series data and is tailored to the characteristics of the energy consumption dataset.

### 3.1 Model config
The model configuration specifies the architecture and hyperparameters of the Transformer, such as input size, prediction length, context length, and number of layers.

In [None]:
from src.model import get_informer_config
from transformers import InformerForPrediction

num_of_variates = len(train_dataset)
lags_sequence = [1, 2, 7, 14, 28, 56, 112, 224]
context_length = prediction_length * 6

config = get_informer_config(
    num_of_variates=num_of_variates,
    prediction_length=prediction_length,
    context_length=context_length,
    lags_sequence=lags_sequence,
    time_features=["day_of_week", "day_of_month", "day_of_year"]
)

model = InformerForPrediction(config)

In [None]:
# We will iterate over the individual time series of our dataset and add/remove fields or features
from src.transformer import create_train_dataloader, create_backtest_dataloader

train_dataloader = create_train_dataloader(
    config=config,
    freq=freq,
    data=multi_variate_train_dataset,
    batch_size=64,
    num_batches_per_epoch=100,
    num_workers=2,
)

test_dataloader = create_backtest_dataloader(
    config=config,
    freq=freq,
    data=multi_variate_test_dataset,
    batch_size=16,
)


In [None]:
batch = next(iter(train_dataloader))
for k, v in batch.items():
    print(k, v.shape, v.type())


### 3.3 Create model

The model is instantiated and prepared for training. This includes setting up the optimizer and loss function.

<!-- The model is now ready for training. We perform a forward pass to ensure the data pipeline and model are working as expected before starting the training loop. -->

In [None]:
# perform forward pass
outputs = model(
    past_values=batch["past_values"],
    past_time_features=batch["past_time_features"],
    past_observed_mask=batch["past_observed_mask"],
    static_categorical_features=batch["static_categorical_features"]
    if config.num_static_categorical_features > 0
    else None,
    static_real_features=batch["static_real_features"]
    if config.num_static_real_features > 0
    else None,
    future_values=batch["future_values"],
    future_time_features=batch["future_time_features"],
    future_observed_mask=batch["future_observed_mask"],
    output_hidden_states=True,
)


In [None]:
print("Loss:", outputs.loss.item())

## 3.4 Training

The training process involves optimizing the model parameters using the training data. We monitor the training and validation loss to prevent overfitting and apply early stopping if the validation loss does not improve.

<!-- 
The training process is crucial for the model to learn from the data. We use the training data to adjust the model parameters, aiming to minimize the loss function. The loss function measures how well the model's predictions match the actual data. By optimizing the model parameters, we improve the model's performance.

To ensure that the model generalizes well to unseen data, we split the data into training and validation sets. The training set is used to train the model, while the validation set is used to evaluate its performance. We monitor the loss on both sets during training.

Overfitting occurs when the model learns the training data too well, including its noise and outliers. This results in poor performance on new, unseen data. To prevent overfitting, we use techniques such as early stopping.

Early stopping is a form of regularization used to avoid overfitting. It involves monitoring the validation loss during training and stopping the training process if the validation loss does not improve for a certain number of epochs. This way, we can ensure that the model maintains the ability to generalize to new data.
-->

In [None]:
from src.model import train_model
from accelerate import Accelerator

epochs = 20
loss_history = []
val_loss_history = []
best_val_loss = float('inf')
patience = 5
counter = 0

accelerator = Accelerator()
device = accelerator.device

model.to(device)

loss_history, val_loss_history = train_model(
    model,
    train_dataloader,
    test_dataloader,
    multi_variate_test_dataset,
    config,
    prediction_length,
    epochs=20,
    patience=5,
    num_batches_per_epoch=100
)

In [None]:
# view training
loss_history = np.array(loss_history).reshape(-1)
x = range(loss_history.shape[0])
plt.figure(figsize=(10, 5))
plt.plot(x, loss_history, label="train")
plt.title("Loss", fontsize=15)
plt.legend(loc="upper right")
plt.xlabel("iteration")
plt.ylabel("nll")
plt.show()

In [None]:

plt.figure(figsize=(10, 5))
plt.plot(range(1, len(val_loss_history) + 1), val_loss_history, label="validation")
plt.title("Validation Loss", fontsize=15)
plt.legend(loc="upper right")
plt.xlabel("epoch")
plt.ylabel("nll")
plt.show()

## 4. Evaluation

After training, we evaluate the model on the test set. We calculate performance metrics such as Mean Squared Error (MSE) and visualize the predictions against the actual values to assess the model's forecasting ability.

### Generating Predictions

We use the trained model to generate forecasts for the test set. The predictions are then compared to the actual values to evaluate model performance.

In [None]:
model.eval()

forecasts_ = []

for batch in test_dataloader:
    outputs = model.generate(
        static_categorical_features=batch["static_categorical_features"].to(device)
        if config.num_static_categorical_features > 0
        else None,
        static_real_features=batch["static_real_features"].to(device)
        if config.num_static_real_features > 0
        else None,
        past_time_features=batch["past_time_features"].to(device),
        past_values=batch["past_values"].to(device),
        future_time_features=batch["future_time_features"].to(device),
        past_observed_mask=batch["past_observed_mask"].to(device),
    )
    forecasts_.append(outputs.sequences.cpu().numpy())

forecasts = np.vstack(forecasts_)

In [None]:
from src.evaluation import compute_mean_mse

mse_values, mean_mse = compute_mean_mse(multi_variate_test_dataset, forecasts, prediction_length)
for i, mse in enumerate(mse_values):
    print(f"{tags[i]}: MSE = {mse:.4f}")
print(f"Mean MSE across all variables: {mean_mse:.4f}")

In [None]:
from src.evaluation import plot_forecasts
from gluonts.dataset.field_names import FieldName

plot_forecasts([0,1,2,3,4,5,6], tags, multi_variate_test_dataset, forecasts, prediction_length, FieldName)

## 5. Conclusion

The Informer-based Transformer model demonstrated strong performance in forecasting daily household energy consumption. The final prediction plots show that the model is able to closely follow the actual values for all variables, with the predicted means and confidence intervals generally capturing the true trends and fluctuations. The Mean Squared Error (MSE) values across all variables are low, indicating accurate predictions. Although the predictions for the submeterings could be better, the model generalizes well and provides reliable forecasts. This approach is effective for multivariate time series forecasting in energy consumption scenarios.