This notebook uses the GPyTorch package to apply Gaussian Process regression to the multi-step energy consumption forecasting problem.

## Setup

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import gpytorch
import torch

from gpytorch.kernels import ScaleKernel, LinearKernel, PeriodicKernel, AdditiveKernel, ProductKernel
from gpytorch.constraints import Interval
from gpytorch.priors import NormalPrior
from sklearn.preprocessing import MinMaxScaler

In [2]:
random_seed = 1923

Using the torch.float32 datatype with variational inference seems to cause issues with zero values in matrix algebra. Using float64 and disabling mixed precision fixes them but slows down training quite a bit.

In [3]:
# Set Torch settings
torch.set_default_dtype(torch.float32)
torch.set_float32_matmul_precision('medium')

In [4]:
# Plot settings
plt.rcParams["figure.autolayout"] = True
plt.rcParams['figure.dpi'] = 100
sns.set_style("darkgrid")

In [5]:
output_dir = "./OutputData/"

In [6]:
df = pd.read_csv(output_dir + "full_data.csv")
df["time"] = pd.to_datetime(df["time"], format = "%d:%m:%Y:%H:%M")

In [7]:
# Drop generation columns
gen_cols = df.columns.values[2:].tolist()
df = df.drop(gen_cols, axis = 1)

In [8]:
df

Unnamed: 0,time,consumption_MWh
0,2018-01-01 00:00:00,27412.81
1,2018-01-01 01:00:00,26324.39
2,2018-01-01 02:00:00,24635.32
3,2018-01-01 03:00:00,23872.12
4,2018-01-01 04:00:00,23194.89
...,...,...
52579,2023-12-31 19:00:00,35090.93
52580,2023-12-31 20:00:00,33310.94
52581,2023-12-31 21:00:00,32083.96
52582,2023-12-31 22:00:00,30469.49


## Data prep

We do not need to cyclical encode seasonal features, as we will apply periodic kernels to them.

In [9]:
# Add time columns

# Trend
df["trend"] = df.index.values

# Hour of day
df["hour"] = df.time.dt.hour + 1

# Day of week
df["dayofweek"] = df.time.dt.dayofweek + 1

# Month
df["month"] = df.time.dt.month

In [10]:
df

Unnamed: 0,time,consumption_MWh,trend,hour,dayofweek,month
0,2018-01-01 00:00:00,27412.81,0,1,1,1
1,2018-01-01 01:00:00,26324.39,1,2,1,1
2,2018-01-01 02:00:00,24635.32,2,3,1,1
3,2018-01-01 03:00:00,23872.12,3,4,1,1
4,2018-01-01 04:00:00,23194.89,4,5,1,1
...,...,...,...,...,...,...
52579,2023-12-31 19:00:00,35090.93,52579,20,7,12
52580,2023-12-31 20:00:00,33310.94,52580,21,7,12
52581,2023-12-31 21:00:00,32083.96,52581,22,7,12
52582,2023-12-31 22:00:00,30469.49,52582,23,7,12


In [11]:
# Evaluation parameters that match the sequence2sequence testing scheme
horizon = 32 # Forecast horizon
first_t = df[df["time"] == '2022-10-18 16:00:00'].index[0] # First prediction point
stride = 24 # Number of timesteps between each prediction point

In [12]:
# Split features & target
X = df.drop(["time", "consumption_MWh",], axis = 1).values
y = df["consumption_MWh"].values

In [13]:
# Train-test split
X_train, X_test = X[:first_t, :], X[first_t:, :]
y_train, y_test = y[:first_t], y[first_t:]

In [14]:
# Feature & target scaling (0-1), tensor conversion
scaler = MinMaxScaler()
X_train = torch.tensor(scaler.fit_transform(X_train))
X_test = torch.tensor(scaler.transform(X_test))

scaler_target = MinMaxScaler()
y_train = torch.tensor(scaler_target.fit_transform(y_train.reshape(-1, 1))).squeeze(-1)
y_test = torch.tensor(scaler_target.transform(y_test.reshape(-1, 1))).squeeze(-1)

In [15]:
# Subset training data to fit into memory
train_size = int(24 * 365 * 1)
X_train = X_train[-train_size:, :]
y_train = y_train[-train_size:]

In [16]:
X_train.shape

torch.Size([8760, 4])

## Model & wrapper definition

Summary of training strategies tried:
- ExactGP can only be trained with unbatched gradient descent, roughly 10k observations. Does a good job for predicting the first few days of the testing set, declines to zero predictions over time. Would likely be solved by online updates of training data.
- VariationalGP can be trained with with batched, stochastic & natural gradient descent. Only fits into GPU memory with few inducting points, because inducting points are unbatched, kind of defeating the purpose of using SGD. Does not fit & converge properly, likely due to usage of inducting points.
- VNNGP supports batching the inducting points, essentially using the entire data as both training & inducting points. The initial computing of the k-nearest neighbor structure (when model is created, not trained) is slow, but can be made faster by installing the faiss package. The entire data can be used as inducting points, but training loss is inexplicably high (200k to 2k in 15-20 epochs), and predictions fluctate around zero seemingly randomly. Also constant warnings for negative variances are raised.

In [17]:
# ExactGP model class
class ExactGPModel(gpytorch.models.ExactGP):

    def __init__(self, X_train, y_train, likelihood):
        super().__init__(X_train, y_train, likelihood)

        # Create mean module
        self.mean_module = gpytorch.means.ConstantMean()
        #self.mean_module = gpytorch.means.ZeroMean()

        # Create covariance module
        self.covar_module = AdditiveKernel(
            LinearKernel(active_dims = 0),
            #ScaleKernel(LinearKernel(active_dims = 0)),
            ScaleKernel(PeriodicKernel(
                active_dims = (1),
                period_length_prior = NormalPrior(24, 1) # Hourly seasonality
            )),
            ScaleKernel(PeriodicKernel(
                active_dims = (2),
                period_length_prior = NormalPrior(7, 1) # Day of week seasonality
            )),
            ScaleKernel(PeriodicKernel(
                active_dims = (3),
                period_length_prior = NormalPrior(12, 1) # Month seasonality
            )),
        )

    def forward(self, x):
        mean = self.mean_module(x)
        covar = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean, covar)

In [18]:
# ExactGP wrapper class (unbatched gradient descent)
class ExactGP:
    
    def __init__(self, model, likelihood, cuda = True):
        self.model = model
        self.likelihood = likelihood
        self.cuda = cuda

    # Training method
    def train(self, X_train, y_train, max_epochs, learning_rate = 0.01, early_stop = 5, early_stop_tol = 1e-3):

        # Put tensors on GPU if cuda is enabled
        if self.cuda:
            X_train = X_train.cuda()
            y_train = y_train.cuda()
            self.model = self.model.cuda()
            self.likelihood = self.likelihood.cuda()

        # Put models into training mode
        self.model.train()
        self.likelihood.train()

        # Create Adam optimizer with model parameters
        optimizer = torch.optim.Adam(self.model.parameters(), lr = learning_rate)

        # Create marginal log likelihood loss
        mll = gpytorch.mlls.ExactMarginalLogLikelihood(self.likelihood, self.model)

        # Training loop
        for epoch in range(max_epochs):

            # Reset gradients
            optimizer.zero_grad()

            # Get outputs from model
            output = self.model(X_train)

            # Calculate loss and perform backpropagation
            loss = -mll(output, y_train)
            loss.backward()
            optimizer.step()

            # Get loss & noise values to be printed
            loss_scalar = loss.item()
            noise = self.model.likelihood.noise.item()
            
            # Initialize best loss & rounds with no improvement if first epoch
            if epoch == 0:
                self._best_epoch = epoch
                self._best_loss = loss_scalar
                self._epochs_no_improvement = 0
                self._best_state_dict = self.model.state_dict()

            # Record an epoch with no improvement
            if self._best_loss < loss_scalar - early_stop_tol:
                self._epochs_no_improvement += 1

            # Record an improvement in the loss
            if self._best_loss > loss_scalar + early_stop_tol:
                self.best_epoch = epoch
                self._best_loss = loss_scalar
                self._epochs_no_improvement = 0
                self._best_state_dict = self.model.state_dict()

            # Print epoch summary
            print(f"Epoch: {epoch+1}/{max_epochs}, Loss: {loss_scalar}, Noise: {noise}, Best loss: {self._best_loss}")

            # Early stop if necessary
            if self._epochs_no_improvement >= early_stop:
                print(f"Early stopping at epoch {epoch+1}")
                break

        # Load best checkpoint after training 
        self.model.load_state_dict(self._best_state_dict)
            
    # Method to update model training data (kernel hyperparameters unchanged, no additional training)
    def update_train(self, X_update, y_update):
        
        # Put tensors on GPU if cuda is enabled
        if self.cuda:
            X_update = X_update.cuda()
            y_update = y_update.cuda()

        # Update model training data
        self.model = self.model.get_fantasy_model(X_update, y_update)

    # Predict method
    def predict(self, X_test, cpu = True, fast_preds = False):

        # Test data to GPU, if cuda enabled
        if self.cuda:
            X_test = X_test.cuda()

        # Activate eval mode
        self.model.eval()
        self.likelihood.eval()

        # Make predictions without gradient calculation
        with torch.no_grad(), gpytorch.settings.fast_pred_var(state = fast_preds):

            # Returns the model posterior distribution over functions p(f*|x*, X, y)
            # Noise is not yet added to the functions
            f_posterior = self.model(X_test)

            # Returns the predictive posterior distribution p(y*|x*, X, y)
            # Noise is added to the functions
            y_posterior = self.likelihood(f_posterior)

            # Get posterior predictive mean & prediction intervals
            # By default, 2 standard deviations around the mean
            y_mean = y_posterior.mean
            y_lower, y_upper = y_posterior.confidence_region()

        # Return data to CPU if desired
        if cpu:
            y_mean = y_mean.cpu()
            y_lower = y_lower.cpu()
            y_upper = y_upper.cpu()

        return y_mean, y_lower, y_upper

## Model training & testing without online updates

In [19]:
# Create likelihood, model, wrapper
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(X_train, y_train, likelihood)
trainer = ExactGP(model, likelihood)

In [None]:
# Perform training
trainer.train(
    X_train,
    y_train,
    max_epochs = 50,
    learning_rate = 0.1,
    early_stop = 5,
    early_stop_tol = 1e-3)

In [None]:
# Save model state
model_name = "ExactGP1.pth"
torch.save(trainer.model.state_dict(), output_dir + model_name)

In [20]:
# Load saved model state
model_name = "ExactGP1.pth"
state_dict = torch.load(output_dir + model_name)

In [21]:
# Update trainer with loaded model
trainer.model.load_state_dict(state_dict)

<All keys matched successfully>

In [23]:
# Get predictions
preds_mean, preds_upper, preds_lower = trainer.predict(X_test)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument other in method wrapper_CUDA__equal)

In [None]:
# Plot predicted vs. actual, first N days of testing data
plot_hours = 24 * 5
plt.plot(X_test[:plot_hours, 0], y_test[:plot_hours])
plt.plot(X_test[:plot_hours, 0], preds_mean[:plot_hours])
plt.fill_between(X_test[:plot_hours, 0], preds_lower[:plot_hours], preds_upper[:plot_hours], alpha = 0.2)

In [None]:
# Plot predicted vs. actual, entire test set
plt.plot(X_test[:, 0], y_test)
plt.plot(X_test[:, 0], preds_mean)
plt.fill_between(X_test[:, 0], preds_lower, preds_upper, alpha = 0.2)

Issues to fix:
- Loading saved model causes device mismatch between training & prediction data
- GPU memory holdup after training
- Predicting on entire test set likely too GPU memory intensive
- Get rid of the wrapper and add its methods to model itself?

## Model testing with online updates

Use get_fantasy_model method to update trained model's training data with new input sequences. Hyperparameters are not updated, which kind of mirrors the usage of input sequnces in NN models.

In [None]:
# Evaluation pseudocode:
Create preds list
Perform feature scaling
Train until [:first_t]
Predict on first_t + horizon
Save preds
For pred points in [first_t:] // stride:
    Perform feature scaling
    Expand training set & online train
    Predict on first_t + eval index * stride + horizon
    Save preds
Concat & return preds, actual targets[first_t:]

In [None]:
# Plot predicted vs. actual, entire test set

In [None]:
# Plot predicted vs. actual, select sequences

In [None]:
# Calculate performance metrics