# I/O Representations in GluonTS

GluonTS now includes a representation module which can be used to transform both input and outputs. More info can be found here:
- Paper: https://arxiv.org/abs/2005.10111
- GluonTS representation documentation: https://gluon-ts.s3-accelerate.dualstack.amazonaws.com/master/api/gluonts/gluonts.representation.html
- GluonTS representation source code: https://github.com/awslabs/gluon-ts/tree/master/src/gluonts/mx/representation

## Preparation

### Standard imports

We first import a bunch of libraries ...

In [None]:
%matplotlib inline
import mxnet as mx
from mxnet import gluon
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import json

### Load a dataset

... and then we load the `m4_hourly` dataset.

In [None]:
from gluonts.dataset.repository.datasets import get_dataset
dataset = get_dataset("m4_hourly", regenerate=True)

## Using the representation module

Neural time series models (and in fact many other deep learning based models) are highly sensitive to the representation of their inputs and outputs. To tackle this issue, GluonTS now enables the user to flexibly define which input and output transformation they want the underlying model to use.

### Available I/O representations

GluonTS provides a few popular representation transformations out-of-the-box:
- `Representation`: This is the base class for the representation module. This representation will not change the data in any way, i.e. it corresponds to an identity transformation. 
    - Example: `Representation()`.
- `MeanScaling`: This representation scales all time series by their mean. 
    - Example: `MeanScaling()`.
- `GlobalRelativeBinning`: This representation first scales all time series by their mean and then bins all time series using a single, hence global, binning. Users can further specifiy the binning resolution as well as whether bins should be equally spaced or quantile-based.
    - Example: `GlobalRelativeBinning(num_bins=100, is_quantile=True)`.
- `LocalAbsoluteBinning`: This representation bins each time series individually, thereby implicitly scaling them. Again, users can specifiy the binning resolution as well as whether bins should be equally spaced or quantile-based. 
     - Example: `LocalAbsoluteBinning(num_bins=50, is_quantile=False)`.
- `CustomBinning`: This representation bins each time series using the provided bin centers (in contrast to the previously presented binning strategies in which the bin centers are automatically computed). 
    - Example: `CustomBinning(bin_centers=np.linspace(10,1000,50))`.
- `DimExpansion`: This representation expands the incoming data tensor on a desired axis. This is usually needed to expand a scaled representation (usually on axis=1) to be passed to the model. Models usually expect 3-dimensional tensors in (N,C,T) format, where N = number of samples (batch size), C = number of features (dimensionality), and T = number of time steps. 
    - Example: `DimExpansion()`.
- `Embedding`: This representation embeds the incoming data tensor. 
    - Example: `Embedding(num_bins=1024)`.
- `ConstNormalization`: This representation applies a normalization by a constant to an incoming data tensor. 
    - Example: `ConstNormalization(const=1024)`.
- `RepresentationChain`: This representation chains multiple representations, i.e. multiple representations can be applied on top of each other. This is useful for expanding a dimension on top of mean-scaled date, or for applying the discrete probability integral transform on top of a quantile-based binning. Representations are applied from left to right. 
    - Examples: 
        - `RepresentationChain(chain=[MeanScaling(),DimExpansion()])`
        - `RepresentationChain(chain=[LocalAbsoluteBinning(num_bins=1024), Embedding(num_bins=1024)])`
        - `RepresentationChain(chain=[GlobalRelativeBinning(num_bins=1024), ConstNormalization(num_bins=1024), DimExpansion()])`
- `HybridRepresentation`: This representation stacks multiple representations, i.e. multiple representations can be concatenated (on axis=1). This is especially useful for passing binnings at multiple scales. 
    - Examples: 
        - `HybridRepresentation(representations=[RepresentationChain(chain=[GlobalRelativeBinning(num_bins=16), Embedding(num_bins=16)]), RepresentationChain(chain=[GlobalRelativeBinning(num_bins=128), Embedding(num_bins=128)]), RepresentationChain(chain=[GlobalRelativeBinning(num_bins=1024), Embedding(num_bins=1024)])])`
        - `HybridRepresentation(representations=[RepresentationChain(chain=[GlobalRelativeBinning(num_bins=1024), Embedding(num_bins=1024)]),RepresentationChain(chain=[LocalAbsoluteBinning(num_bins=1024), Embedding(num_bins=1024)])])`

Please consult the documentation for a complete list of constructor parameters. See <a href="#Creating-your-own-representations">below</a> on how to create your own representations.

In [None]:
from gluonts.mx.representation import (
    Representation, 
    MeanScaling, 
    GlobalRelativeBinning, 
    DimExpansion, 
    Embedding, 
    RepresentationChain,
    ConstNormalization
)
from gluonts.mx.distribution import StudentTOutput, CategoricalOutput

binning = False

if binning:
    input_repr = RepresentationChain(chain=[GlobalRelativeBinning(num_bins=1024), Embedding(num_bins=1024)])
    output_repr = GlobalRelativeBinning(num_bins=1024)
    distr_output = CategoricalOutput(num_cats=1024)
else:
    input_repr = RepresentationChain(chain=[MeanScaling(), DimExpansion()])
    output_repr = Representation()
    distr_output = StudentTOutput()

### Define and train an estimator

The following estimators currently support varibale I/O representations ([PR on GitHub](https://github.com/awslabs/gluon-ts/pull/840)):
- `DeepAREstimator`
- `SimpleFeedForwardEstimator`
- `TransformerEstimator`
- `WaveNetEstimator`

For these estimators, users can now specify both  the `input_repr` and the `output_repr` respectively.

Please consult the documentation for a complete list of constructor parameters. See <a href="#Updating-a-model-to-support-the-representation-module">below</a> on how adapt an estimator to support the representation module.

In [None]:
from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.mx.trainer import Trainer

estimator = SimpleFeedForwardEstimator(
    num_hidden_dimensions=[10],
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    input_repr=input_repr,
    output_repr=output_repr,
    distr_output=distr_output,
    trainer=Trainer(
        ctx="gpu",
        epochs=10,
        learning_rate=1e-2,
        num_batches_per_epoch=100,
        hybridize=False
    )
)

predictor = estimator.train(dataset.train)

### Generate evaluation predictions & evaluate

Standard code for generating and evaluating predictions.

In [None]:
from gluonts.evaluation.backtest import make_evaluation_predictions
forecast_it, ts_it = make_evaluation_predictions(
    dataset=dataset.test,
    predictor=predictor,
    num_samples=100,
)

from gluonts.evaluation import Evaluator
forecasts = list(forecast_it)
tss = list(ts_it)
evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(dataset.test))
print(json.dumps(agg_metrics, indent=4))

## Example transformation inspection

Let's take a biref look at some example transformations and see how the data is transformed along the way. We first set up the representations that we want and some sample data.

In [None]:
bins = 8

ms = MeanScaling()
grb = GlobalRelativeBinning(num_bins=bins)
const = ConstNormalization(const=bins)
emb = Embedding(num_bins=bins)

data = np.array([
    np.ones(10),
    np.linspace(0, 9, 10),
    np.sin(np.linspace(-np.pi, np.pi, 10)),
    4 * np.cos(np.linspace(-np.pi, np.pi, 10)),
    [4, 10, 8, 7, 6, 10, 8.5, 8 , 6, 10]
])

ms.initialize_from_array(data, mx.context.cpu())
grb.initialize_from_array(data, mx.context.cpu())
const.initialize_from_array(data, mx.context.cpu())

emb.collect_params().initialize()

Let's take a look at the sample date first.

In [None]:
data_mx = mx.nd.array(data)
print(data_mx)
plt.plot(data_mx.asnumpy().T)
plt.show()

### `MeanScaling`

In [None]:
ms_transf, _, _ = ms(data_mx, mx.nd.ones_like(data_mx), None, [])
print(ms_transf)
plt.plot(ms_transf.asnumpy().T)
plt.show()

### `GlobalRelativeBinning`

In [None]:
grb_transf, _, _ = grb(data_mx, mx.nd.ones_like(data_mx), None, [])
print(grb_transf)
plt.plot(grb_transf.asnumpy().T)
plt.show()

### `ConstNormalization` normalization on top of `GlobalRelativeBinning`-transformed data

In [None]:
const_transf, _, _ = const(grb_transf, mx.nd.ones_like(data_mx), None, [])
print(const_transf)
plt.plot(const_transf.asnumpy().T)
plt.show()

### `Embedding` on top of `GlobalRelativeBinning`-transformed data

In [None]:
emb_transf, _, _ = emb(grb_transf, mx.nd.ones_like(data_mx), None, [])
print(emb_transf)
for i in range(len(emb_transf)):
    plt.scatter(emb_transf[i,:,0].asnumpy(), emb_transf[i,:,1].asnumpy(), alpha=0.2)
plt.show()

## Creating your own representations

If you want to implement your own representations you must subclass from `gluonts.mx.representation.Representation` and implement the defined methods. Specifically, you must implement:
- `hybrid_forward`: Here, you need to implement the forward transformation, given `F` (the used MXNet framework), the `data`, the `observed_indicator`, the `scale`, and a list of `rep_params`. If you are implementing a scaler, for instance, then you need to scale the data here. As return parameters, you need to pass the re-represented data, pass the scale of the data (note that you always need to pass this, even if you don't implement a scaler, as this is used by the network as an additional feature), and some additional representation parameters which can be passed to `post_transform` in the case that the representation is used on the output (most representations don't need this, see `LocalAbsoluteBinning` for an example how this parameter can be used). Note that you are able to use NumPy in this call since gradients for the transformed inputs are usually not required and no gradient computation is blocked.
- `post_transform`: Here, you need to implement the backward transformation, given `F` (the used MXNet framework), `samples` from a distribution, the `scale`, and a list of `rep_params`.

Optionally, you can also implement the following methods:
- `initialize_from_dataset`: Here, you can specify a set of instruction to be computed on the complete dataset. For example, `GlobalRelativeBinning` uses `initialize_from_dataset` to compute a global binning on the entire training set.
- `initialize_from_array` : Same as above, but with a complete NumPy array instead of a GluonTS dataset.

We can now implement a simple median scaler `MedianScaling` as an alternative to the mean scaler. 

In [None]:
from gluonts.core.component import validated
from gluonts.model.common import Tensor
from typing import Tuple, Optional, List

class MedianScaling(Representation):
    
    @validated()
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def hybrid_forward(
        self,
        F,
        data: Tensor,
        observed_indicator: Tensor,
        scale: Optional[Tensor],
        rep_params: List[Tensor],
        **kwargs,
    ) -> Tuple[Tensor, Tensor, List[Tensor]]:
        if scale is None:
            data_np = data.asnumpy()
            observed_np = observed_indicator.asnumpy()
            observed_np[observed_np == 0] = np.nan
            data_np = data_np * observed_np
            scale = np.nanmedian(data_np, axis=-1)
            scale = np.expand_dims(scale, axis=-1)
            scale = F.array(scale)
        scaled_data = F.broadcast_div(data, scale)
        return scaled_data, scale, []
        
    def post_transform(
        self, F, samples: Tensor, scale: Tensor, rep_params: List[Tensor]
    ) -> Tensor:
        transf_samples = F.broadcast_mul(samples, scale)
        return transf_samples

## Updating a model to support the representation module

If you want to adapt a GluonTS module to support the representation module, you need to perform a couple of changes in both the `_estimator` and the `_network` files.

In `_network`:
- Add `input_repr` and `output_repr`, both of type `Representation`, as initialization parameters to both the training and the prediction network.
- In the training network: In `hybrid_forward`, make sure to call `self.input_repr` and `self.output_repr` on the `past_target` and `future_target` respectively. Pass `None` as scale and an empty list `[]` as repr_params. Pass the scale returned by the input representation to the distribution if an actual distribution is used for loss calculation (not used in this example). Then make sure that the network receives the input-transformed values as inputs and calculates the loss using the output-transformed values.
- In the prediction network: In `hybrid_forward`, first perform the same input and output transformations as in the training network. Then, using direct model predictions or samples from a distribution, `post_transform` the data and return the back-transformed predictions. Note that scaled distributions automatically sample from the correct scale, thereby removing the need for calling `post_transform` (see <a href="#Dealing-with-scales-in-distributions">below</a>) and that auto-regressive models need a little more attention (see <a href="Dealing-with-auto-regressive-models">below</a>).

In [None]:
class MyTrainNetwork(gluon.HybridBlock):
    
    def __init__(self, prediction_length, input_repr, output_repr, **kwargs):
        super().__init__(**kwargs)
        self.prediction_length = prediction_length
        self.input_repr = input_repr
        self.output_repr = output_repr

        with self.name_scope():
            self.nn = mx.gluon.nn.HybridSequential()
            self.nn.add(mx.gluon.nn.Dense(units=50, activation='relu'))
            self.nn.add(mx.gluon.nn.Dense(units=50, activation='relu'))
            self.nn.add(mx.gluon.nn.Dense(units=self.prediction_length, activation='softrelu'))

    def hybrid_forward(self, F, past_target, future_target):
        input_tar_repr, scale, _ = self.input_repr(
            past_target, F.ones_like(past_target), None, []
        )
        output_tar_repr, _, _ = self.output_repr(
            future_target, F.ones_like(future_target), None, []
        )
        
        prediction = self.nn(input_tar_repr)
        
        return (prediction - output_tar_repr).abs().mean(axis=-1)


class MyPredNetwork(MyTrainNetwork):
    
    def hybrid_forward(self, F, past_target):
        input_tar_repr, scale, _ = self.input_repr(
            past_target, F.ones_like(past_target), None, []
        )
        _, _, rep_params = self.output_repr(
            past_target, F.ones_like(past_target), None, []
        )
        prediction = self.nn(input_tar_repr)
        
        prediction = self.output_repr.post_transform(
            F, prediction, scale, rep_params
        )
        return prediction.expand_dims(axis=1)

In `_estimator`:
- Add `input_repr` and `output_repr`, both of type `Representation` as initialization parameters to your estimator class. Provide the desired default values for both.
- Override `train()` from the `GluonEstimator` base class to initialize both representations using the training data, i.e. call `initialize_from_dataset(training_data)` on both representations.
- Pass both representations to the training and prediction networks as initialization parameters.

In [None]:
from gluonts.model.estimator import GluonEstimator
from gluonts.model.predictor import Predictor, RepresentableBlockPredictor
from gluonts.core.component import validated
from gluonts.support.util import copy_parameters
from gluonts.transform import ExpectedNumInstanceSampler, Transformation, InstanceSplitter
from gluonts.dataset.field_names import FieldName
from mxnet.gluon import HybridBlock

class MyEstimator(GluonEstimator):
    
    @validated()
    def __init__(
        self,
        freq: str,
        context_length: int,
        prediction_length: int,
        input_repr: Representation = RepresentationChain(
            chain=[MedianScaling(), DimExpansion()]
        ),
        output_repr: Representation = MedianScaling(),
        trainer: Trainer = Trainer()
    ) -> None:
        super().__init__(trainer=trainer)
        self.context_length = context_length
        self.prediction_length = prediction_length
        self.freq = freq
        self.input_repr = input_repr
        self.output_repr = output_repr
        
    def train(
        self,
        training_data,
        **kwargs,
    ) -> Predictor:
        self.input_repr.initialize_from_dataset(training_data)
        self.output_repr.initialize_from_dataset(training_data)
        return super().train(training_data, **kwargs)

    def create_transformation(self):
        return InstanceSplitter(
            target_field=FieldName.TARGET,
            is_pad_field=FieldName.IS_PAD,
            start_field=FieldName.START,
            forecast_start_field=FieldName.FORECAST_START,
            train_sampler=ExpectedNumInstanceSampler(num_instances=1),
            past_length=self.context_length,
            future_length=self.prediction_length,
        )

    def create_training_network(self) -> MyTrainNetwork:
        return MyTrainNetwork(
            prediction_length=self.prediction_length,
            input_repr=self.input_repr,
            output_repr=self.output_repr
        )

    def create_predictor(
        self, transformation: Transformation, trained_network: HybridBlock
    ) -> Predictor:
        prediction_network = MyPredNetwork(
            prediction_length=self.prediction_length,
            input_repr=self.input_repr,
            output_repr=self.output_repr
        )

        copy_parameters(trained_network, prediction_network)

        return RepresentableBlockPredictor(
            input_transform=transformation,
            prediction_net=prediction_network,
            batch_size=self.trainer.batch_size,
            freq=self.freq,
            prediction_length=self.prediction_length,
            ctx=self.trainer.ctx,
        )

### Train `MyEstimator` with `MedianScaling`

We can now train our simple `MyEstimator` which defaults to `MedianScaling` on both the input and the output.

In [None]:
estimator = MyEstimator(
    prediction_length=dataset.metadata.prediction_length,
    context_length=100,
    freq=dataset.metadata.freq,
    trainer=Trainer(ctx="gpu",
        epochs=10,
        learning_rate=1e-3,
        num_batches_per_epoch=100,
        hybridize=False
    )
)

predictor = estimator.train(dataset.train)

### Generate evaluation predictions & evaluate

Standard code for generating and evaluating predictions.

In [None]:
forecast_it, ts_it = make_evaluation_predictions(
    dataset=dataset.test,
    predictor=predictor,
    num_samples=100,
)

forecasts = list(forecast_it)
tss = list(ts_it)
evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
agg_metrics, item_metrics = evaluator(iter(tss), iter(forecasts), num_series=len(dataset.test))
print(json.dumps(agg_metrics, indent=4))

## Additional notes

### Dealing with scales in distributions

In some output distributions, scaling via the mean scale is already incorporated via the `TransformedDistribution`. Therefore, if sampling from the chosen ouput distribution already returns samples from the original scale, you should supply `Representation()` as the model's `output_repr`. For example, `DeepAREstimator`, `SimpleFeedForwardEstimator`, and `TransformerEstimator` resort to explicit input mean scaling but implicit output scaling via `StudenTOutput()`, which is why the input representation is, as expected, `RepresentationChain(chain=[MeanScaling(), DimExpansion()])` but the output distribution is `Representation()` with scaling still occurring in the output via the distribution. If you wish to surpress automatic scaling in the distribution output, return a `None` scale in the input representation.

### Dealing with auto-regressive models

Autoregressive models require us to further compute input and output representations in every auto-regressive step. In order to allow for this, I/O representations must be supplied with a fixed `scale` (this is needed by almost all representations) and a fixed `repr_params` list (this is needed for example by `LocalAbsoluteBinning`). Both of these parameters should be computed in the first forward pass during prediciton time. It is important for these parameters to stay fixed and not to be re-calculated in every step as preditions might diverge if the parameters can vary. Consult the DeepAR `_network.py` implementation for a complete example. See a short conceptual example below:

```python
# Initial forward pass
input_tar_repr, scale, rep_params_in = self.input_repr(
    sequence, sequence_obs, None, []
)
_, _, rep_params_out = self.output_repr(
    past_target, past_observed_values, None, []
)

# Compute initial repeated_past_target and repeated_scale
...

# Autoregressive loop
for (...):
    input_tar_repr, _, _ = self.input_repr(
        repeated_past_target,
        F.ones_like(repeated_past_target),
        repeated_scale,
        rep_params_in,
    )
    _, _, rep_params = self.output_repr(
        repeated_past_target,
        F.ones_like(repeated_past_target),
        repeated_scale,
        rep_params_out,
    )
    
    # Propagate input through network
    ...
    
    # Obtain samples from distribution
    new_samples = distr.sample()

    # Post transform the data
    new_samples = self.output_repr.post_transform(
        F, new_samples, repeated_scale, rep_params
    )
    
    # Update repeated_past_target to include new samples
    ...
```