# Multivariate Forecasting with D-Linear

This notebook outlines the application of , a recently-proposed model for time series forecasting, to a Electricity Consumption Dataset. The dataset contains the hourly electricity consumption of 321 customers from 2012 to 2014.

This demo uses an implementation of [D-Linear](https://arxiv.org/pdf/2205.13504.pdf) from the Darts forecasting package. [Darts](https://unit8co.github.io/darts/index.html) is a package that provides a variety of time series forecasting methods for statistical models like [ARIMA](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.arima.html#darts.models.forecasting.arima.ARIMA), [Exponential Smoothing](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.exponential_smoothing.html#darts.models.forecasting.exponential_smoothing.ExponentialSmoothing) and [Prophet](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.prophet_model.html#darts.models.forecasting.prophet_model.Prophet) to [NBEATS](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.nbeats.html#darts.models.forecasting.nbeats.NBEATSModel) and [Temporal Fusion Transformer](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.tft_model.html#darts.models.forecasting.tft_model.TFTModel). On top of a diverse set of forecasting methods, Darts provides great utilities to process time series and evaluate forecasts.

## Package Imports and Global Variables



**Note for Colab users:** Run the following cell to install Darts. After installation completes, you will likely need to restart the Colab runtime. If this is the case, a button `RESTART RUNTIME` will appear at the bottom of the next cell's output.

In [1]:
if 'google.colab' in str(get_ipython()):
    !pip install darts
    !pip install wandb



In [2]:
import os
import warnings

import numpy as np
import pandas as pd

from darts import TimeSeries
from darts.dataprocessing.transformers import Scaler
from darts.metrics import mae, mse
from darts import TimeSeries, concatenate
from darts.models.forecasting.dlinear import DLinearModel

import pytorch_lightning as pl
from pytorch_lightning.loggers import WandbLogger
import wandb

In [None]:
!wandb login

In [None]:
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount('/content/drive')

In [5]:
DATA_PATH = "/content/drive/MyDrive/electricity.csv" # Will have to update dataset path to dataset

EPOCHS = 1
VAL_PERC = .1
TEST_PERC = .005
BATCH_SIZE = 32

LAG_TIME = 30
LEAD_TIME = 30

## Load Data
We start by loading the data from a CSV file from `DATA_PATH` into a dataframe. Each column of the dataframe is a different time series that measures the hourly electricity consumption of one of the 320 households included in the dataset. Additonally, there is also a column that encodes the date and time of the observations. The last column, `OT`, is dropped as it is not relevant for this demo.

In [6]:
df = pd.read_csv(DATA_PATH, index_col=0)
df = df.iloc[:, :-1]
df.index = pd.to_datetime(df.index)
df = df.reset_index().rename({'index':'date'}, axis=1)
df

Unnamed: 0,date,0,1,2,3,4,5,6,7,8,...,310,311,312,313,314,315,316,317,318,319
0,2016-07-01 02:00:00,14.0,69.0,234.0,415.0,215.0,1056.0,29.0,840.0,226.0,...,199.0,676.0,372.0,80100.0,4719.0,5002.0,48.0,38.0,1558.0,182.0
1,2016-07-01 03:00:00,18.0,92.0,312.0,556.0,292.0,1363.0,29.0,1102.0,271.0,...,265.0,805.0,452.0,95200.0,4643.0,6617.0,65.0,47.0,2177.0,253.0
2,2016-07-01 04:00:00,21.0,96.0,312.0,560.0,272.0,1240.0,29.0,1025.0,270.0,...,278.0,817.0,430.0,96600.0,4285.0,6571.0,64.0,43.0,2193.0,218.0
3,2016-07-01 05:00:00,20.0,92.0,312.0,443.0,213.0,845.0,24.0,833.0,179.0,...,271.0,801.0,291.0,94500.0,4222.0,6365.0,65.0,39.0,1315.0,195.0
4,2016-07-01 06:00:00,22.0,91.0,312.0,346.0,190.0,647.0,16.0,733.0,186.0,...,267.0,807.0,279.0,91300.0,4116.0,6298.0,75.0,40.0,1378.0,191.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26299,2019-07-01 21:00:00,11.0,116.0,8.0,844.0,384.0,1590.0,51.0,1412.0,407.0,...,178.0,1897.0,1589.0,166500.0,9917.0,10412.0,324.0,21.0,1870.0,162.0
26300,2019-07-01 22:00:00,11.0,103.0,8.0,749.0,371.0,1366.0,47.0,1265.0,369.0,...,241.0,1374.0,1336.0,158800.0,6812.0,8956.0,302.0,20.0,1506.0,438.0
26301,2019-07-01 23:00:00,12.0,93.0,8.0,650.0,346.0,1282.0,48.0,1079.0,308.0,...,158.0,938.0,1311.0,154300.0,6602.0,5910.0,302.0,18.0,1864.0,621.0
26302,2019-07-02 00:00:00,10.0,92.0,8.0,646.0,349.0,1261.0,48.0,1009.0,288.0,...,120.0,833.0,1227.0,141900.0,6546.0,5502.0,259.0,33.0,2623.0,783.0


### Data Splitting
The data is split sequentially into train, validation and test based on `VAL_PERC` and `TEST_PERC` global variables. We will withhold the last `TEST_PERC` of data for testing. In the code below, we are very careful to ensure that when training and validating the model, it does not have access to the withheld data.

In [7]:
n_samples = len(df)
n_val = int(n_samples * VAL_PERC)
n_test = int(n_samples * TEST_PERC)
n_train = n_samples - (n_val + n_test)

# Split data into train and test
train_df = df.iloc[:n_train, :]
val_df = df.iloc[n_train:n_train+n_val]
test_df = df.iloc[n_train+n_val:]

## Data Formatting



## Dataset Definition
[`TimeSeries`](https://unit8co.github.io/darts/quickstart/00-quickstart.html#Building-and-manipulating-TimeSeries) is the main data class in Darts. A `TimeSeries` represents a univariate or multivariate time series, with a proper time index. The time index can either be of type `pandas.DatetimeIndex` (containing datetimes), or of type `pandas.RangeIndex` (containing integers; useful for representing sequential data without specific timestamps). In some cases, [`TimeSeries`](https://unit8co.github.io/darts/quickstart/00-quickstart.html#Building-and-manipulating-TimeSeries) can even represent probabilistic series, in order for instance to obtain confidence intervals.

TimeSeries can be built easily using a few factory methods. We will define the train, validation and test datasets from the respective time series using the method: [`TimeSeries.from_dataframe()`](https://unit8co.github.io/darts/generated_api/darts.timeseries.html#darts.timeseries.TimeSeries.from_dataframe). To help discern between columns representing the time index and the actual time series, we pass the name of the time column as a string to the `time_col` argument.

In [8]:
train_ts = TimeSeries.from_dataframe(train_df, time_col="date").astype(np.float32)
val_ts = TimeSeries.from_dataframe(val_df, time_col="date").astype(np.float32)
test_ts = TimeSeries.from_dataframe(test_df, time_col="date").astype(np.float32)

## Data Preprocessing

Darts provides the [`DataTransformer`](https://unit8co.github.io/darts/examples/02-data-processing.html) class abstraction to provide a unified way of dealing with transformations of TimeSeries:

- `transform()` is implemented by all transformers. This method takes in either a TimeSeries of a sequence of TimeSeries, applies the transformation and returns it as a new TimeSeries/sequence of `TimeSeries.

- `inverse_transform()` is implemented by transformers for which an inverse transformation function exists.

- `fit()` allows transformers to extract some information from the time series first before calling transform() or inverse_transform()

### Using a Transformer: Scaler
Some applications may require your datapoints to be between 0 and 1 (e.g. to feed a time series to a Neural Network based forecasting model). This is easily achieved using the default [`Scaler`](https://unit8co.github.io/darts/generated_api/darts.dataprocessing.transformers.scaler.html), which is a wrapper around [`sklearn.preprocessing.MinMaxScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html).

Here we fit the scaler on the train dataset and subsequently use the fitted scaler to transform the train, validation and test dataset.

In [9]:
scaler = Scaler()
train_ts = scaler.fit_transform(train_ts)
val_ts = scaler.transform(val_ts)
test_ts = scaler.transform(test_ts)

## Model

### Aside: Long Sequence Time Series Forecasting

Multivariate Long Sequence Time Series forecasting (LSTF). LSTF is a branch of time series forecating that is characterized by long input and output sequences. Being able to model the the behaviour of a system over long time horizons is essential to plan effectively in a variety of domains including economics, electricity consumption and transportation.

### Transformers for Long Sequence Time Series Forecasting

Recently, numerous transformer-based approaches have been proposed for LSTF including [Informer](https://arxiv.org/pdf/2012.07436.pdf), [Autoformer](https://arxiv.org/abs/2106.13008) and others. The transformer is a potentially desirable architecture for LSTF since they are known to perform well at dependency discovery over long sequences. D-Linear challenges this line of research, proposing a remarkably effective one layer network that outperforms state-of-the-art transformer based approaches oo the LSTF task on some benchmarks.

> Specifically, the Transformer is arguably the most successful solution to extract the semantic correlations among
the elements in a long sequence. However, in time series
modeling, we are to extract the temporal relations in an
ordered set of continuous points. While employing positional encoding and using tokens to embed sub-series in
Transformers facilitate preserving some ordering information, the nature of the permutation-invariant self-attention
mechanism inevitably results in temporal information loss.

*Zeng, Ailing, et al. "Are transformers effective for time series forecasting?." Proceedings of the AAAI conference on artificial intelligence. Vol. 37. No. 9. 2023.*

### LSTF-Linear
The basic formulation of LTSF-Linear directly regresses historical time series for future prediction via a weighted sum operation:

\begin{equation}
\widehat{X}_i = W X_i
\end{equation}

where $W \in \mathbb{R}^{TxH}$ is a weight matrix along with $X_i \in \mathbb{R}^T$ and $\widehat{X}_i \in \mathbb{R}^H$ which represent an arbitrary input sample and corresponding forecast from time series $i$.

### D-Linear
The authors also propose a variant of LSTF-Linear that obtains slightly better results by first decomposing an input sample from a time series into seasonal $S_i$ and trend component $T_i$. The trend component is simply determined as the moving average of the input time series $X_i$. The seasonal component is calculated as the difference between the original time series and the trend component:

\begin{equation}
T_i = AvgPool(Padding(X_i)) \\
S_i = X_i - T_i
\end{equation}

Then, we define a weight matrix for the trend component $W_t \in \mathbb{R}^{TxH}$ and the seasonal component $W_s \in \mathbb{R}^{TxH}$. The final forecast is the sum of the two matrix-vector products:

\begin{equation}
\widehat{X}_i = W_t T_i + W_s S_i
\end{equation}


### DLinear Overview


## Model Definition
Using the Darts Forecasting package, a DARTS model can easily be initialized using the [`darts.models.forecasting.dlinear.DLinearModel`](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.dlinear.html) class. This class constructs a D-Linear model based on the passed arguments. Some of the notable arguments include:
- `input_chunk_length`: (int) The length of lookback window.
- `output_chunk_length`: (int) The length of the forecasting horizon.
- `share_weights`: (bool) Whether or not to share the same weights across time series.

The model also accepts a variety of arguments that govern how the model is trained including the loss function, batch size, number of epochs, learning rate and more. For additional details in regards to the `DLinearModel` class, consult the  [Darts Documentation](https://unit8co.github.io/darts/generated_api/darts.models.forecasting.dlinear.html).

In [10]:
class _DLinearModel(DLinearModel):
    def training_step(self, training_batch, batch_idx):
        loss = super().training_step(training_batch, batch_idx)
        self.log("train_loss", loss)
        return loss

    def validation_step(self, val_batch, batch_idx):
        loss = super().validation_step(val_batch, batch_idx)
        self.log("val_loss", loss)
        return loss

wandb_logger = WandbLogger(log_model="all", project="ts-forecasting", name="testrun")

model = _DLinearModel(
    input_chunk_length=LAG_TIME,
    output_chunk_length=LEAD_TIME,
    use_static_covariates=False,
    shared_weights=True,
)

trainer = pl.Trainer(max_epochs=3, logger=wandb_logger)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs


## Training
Once our model has been defined, we can simply call the fit method with the train and validation datasets that we created earlier. In the backend, Darts leverages [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/) to train the model.

In [None]:
model.fit(series=train_ts, val_series=val_ts, trainer=trainer)

## Testing

Now that we have fit the model, we may want to assess its out of sample performance on the test dataset we defined earlier. Darts offers a historical_forecasts method to easily produce forecasts on new data using a previously trained model. We simply pass the the previously defined test dataset, along with some additonal information how to generate the forecasts. Lastly, before we evaluate the performance of our model on the test set, we will want to rescale the time series and corresponding forecasts to their original level. As mentioned before, the [`Scaler`](https://unit8co.github.io/darts/generated_api/darts.dataprocessing.transformers.scaler.html) class provides an `inverse_transform` method to accomplish this.

In [12]:
pred_ts = model.historical_forecasts(
    test_ts,
    start=test_ts.start_time(),
    forecast_horizon=LEAD_TIME,
    stride=LEAD_TIME,
    last_points_only=False,
    retrain=False,
    verbose=True
)
pred_ts = concatenate(pred_ts)

pred_ts = scaler.inverse_transform(pred_ts)
test_ts = scaler.inverse_transform(test_ts)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Predicting: |          | 0/? [00:00<?, ?it/s]

### Quantitative Resutls
To assess the performance of D-Linear on the dataset, we calculate its performance on the test set using Mean Absolute Error (MAE) and Mean Squared Error (MSE). For more information about other metrics offered by Darts, please visit the [documentation](https://unit8co.github.io/darts/generated_api/darts.metrics.metrics.html).

In [13]:
mae_res = mae(test_ts, pred_ts)
mse_res = mse(test_ts, pred_ts)

print(f"MSE: {mse_res} MAE: {mae_res}")

MSE: 1554811.5 MAE: 201.51593017578125
