# How to add new Models
> Tutorial on how to add new models to NeuralForecast

:::{.callout-warning}

## Prerequisites

This Guide assumes advanced familiarity with NeuralForecast.

We highly recommend reading first the Getting Started and the NeuralForecast Map tutorials!

Additionally, refer to the [CONTRIBUTING guide](https://github.com/Nixtla/neuralforecast/blob/main/CONTRIBUTING.md) for the basics how to contribute to NeuralForecast.

:::

This tutorial is aimed for contributors that would like to add a new model to the NeuralForecast library.

The existing modules of the library already take care of all common aspects of the optimization and training, selection, and evaluation of deep learning models. The `core` class simplify building entire pipelines (for both industry and academia) on any dataset with user-friendly methods such as `fit` and `predict`.

**Adding a new model to the NeuralForecast is simpler than building a new PyTorch model from scratch, with the following additional advantages:**

* Existing modules already implement most aspects of the training and evaluation of deep learning models.
* Integrated with PyTorch-Lightning and Tune libraries for efficient optimization and distributed computation.
* The `BaseModel` classes provide common optimization components such as early stopping, learning rate schedulers, among others.
* Scheduled automatic performance tests on Github to ensure quality standards.
* Easy performance and computation comparison with the other models in the library.
* Exposure to a large community of users and contributors. 

We will present the steps following an example of a simplified `MLP` model (with no exogenous covariates).

## 1. Determine the model type (Base class)


The library contains **three** types of base models: `BaseWindows`, `BaseRecurrent`, and `BaseMultivariate`. The main difference between the three types is the sampling procedure and input batch for the `forward` method, which in terms determines the type of model.

### a. Sampling process

During training, all base models receive a sample of time series of the dataset from the `TimeSeriesLoader` module. The main difference between the three types is the creation of the batch, which is then sent to the `forward` method.

The `BaseWindows` models will sample `windows_batch_size` individual windows of size `input_size+h`. This class is designed for windows-based models, that predict the `h` future values based on a fixed short history of size `input_size`. This family of models includes the `MLP`, `NBEATS`, `NHITS`, and most unviariate Transformer based models such as `TFT` and `PatchTST`.

The `BaseRecurrent` models will directly use the `batch_size` time series and pass them to the model (will only use `input_size` to shorten history for truncated-backpropagation). They are designed for recurrent-based models that have infinite memory, such as `RNN`, `LSTM`, among others.

Finally, the `BaseMultivariate` model will receive the complete set of time series from the loader. It will then sample `batch_size` timestamps, and get the windows of size `input_size+h` containing all the time series, starting at those timestamps. This base class is designed for multivariate models, such as `StemGNN`, that model the interactions between time series interactions for forecasting.

### b. Hyperparameters

Get familiar with the hyperparameters specified in the base class correspoding to your model. All classes share common hyperparameters such as `h` (horizon), `input_size`, and optimization hyperparameters such as `learning_rate`, `max_steps`, among others. Additionally, each base class has particular hyperparameters to control the sampling process. The following list presents these hyperparameters for each base class:

`BaseWindows`:

 * `batch_size` (bs): number of time series sampled by the loader during training.
 * `valid_batch_size` (v_bs): number of time series sampled by the loader during inference (validation and test).
 * `windows_batch_size` (w_bs): number of individual windows sampled during training (from the previous time series) to form the batch.
 * `inference_windows_batch_size` (i_bs): number of individual windows sampled during inference to form each batch. Used to control the GPU memory.

`BaseRecurrent`:
    
 * `batch_size` (bs): number of time series sampled by the loader during training. Will be directly passed to the model.
 * `input_size` (L): usually defaulted to -1, can be used to shorten the history during training (sampling timestamps, similarly to `BaseWindows`) for truncated-backpropagation.
 * `inference_input_size` (i_L): length of historical data used during inference (starting from the forecast creation date and going backwards). Used to control the GPU memory.
 

`BaseMultivariate`:
    
 * `batch_size` (bs): number of windows sampled by the class during training (loader sends all the dataset).
 * `n_series` (n_ts): number of time series in the dataset.

### c. Input and output batch shapes

The `forward` method of the base classes receive a batch of data in a dictionary with the following keys:

- `insample_y`: historic values of the time series.
- `insample_mask`: mask indicating the available values of the time series (1 if available, 0 if missing).
- `futr_exog`: future exogenous covariates (if any).
- `hist_exog`: historic exogenous covariates (if any).
- `stat_exog`: static exogenous covariates (if any).

The shape of each of these tensors depends on the base class, the following table presents the shape for each one:

| `tensor`        | `BaseWindows`            | `BaseRecurrent`                 | `BaseMultivariate`             |
|-----------------|--------------------------|---------------------------------|--------------------------------|
| `insample_y`    | (`w_bs`, `L`)            | (`bs`, `seq_len`, 1)            | (`bs`,`L`, `n_ts`)             |
| `insample_mask` | (`w_bs`, `L`)            | (`bs`, `seq_len`, 1)            | (`bs`,`L`, `n_ts`)             |
| `futr_exog`     | (`w_bs`, `L`+`h`, `n_f`) | (`bs`, `n_f`, `seq_len`, 1+`h`) | (`bs`, `n_f`, `L`+`h`, `n_ts`) |
| `hist_exog`     | (`w_bs`, `L`, `n_h`)     | (`bs`, `n_h`, `seq_len`, 1)     | (`bs`, `n_h`, `L`, `n_ts`)     |
| `stat_exog`     | (`w_bs`,`n_s`)           | (`bs`,`n_s`)                    | (`n_ts`, `n_s`)                |

The `forward` function should return a single tensor with the forecasts of the next `h` timestamps for each window (`w_bs`) or batch (`bs`). Use the attributes of the `loss` class to automatically parse the output to the correct shape (see the `MLP` example below).  


:::{.callout-tip}

Since we are using `nbdev`, you can easily add prints to the code and see the shapes of the tensors during training.

:::

## 2. Create the model file and class

### a. Model class

The next step is creating the model class. The main steps are:

1. Create the file in the `nbs` folder (https://github.com/Nixtla/neuralforecast/tree/main/nbs).
2. Add the header of the `nbdev` file.
3. Import libraries in the file. 
4. Define the `__init__` method with the inhereted and particular hyperparameters of the model and instantiates the architecture.
5. Define the `forward` method, that recieves the input batch dictionary and returns the forecast.

First, add the following two cells on top of the `nbdev` file, and add the dependencies of the model.

```python
#| default_exp models.mlp
```

```python
#| hide
%load_ext autoreload
%autoreload 2
```

```python
#| export
from typing import Optional

import torch
import torch.nn as nn

from neuralforecast.losses.pytorch import MAE
from neuralforecast.common._base_windows import BaseWindows
```

:::{.callout-tip}

Don't forget to add the `#| export` tag on this cell.

:::

Second, create class and the init method. The following example shows the init method of the simplified `MLP` model class.

The `loss` class contains an `outputsize_multiplier` attribute, to automatically adjust the output size of the forecast. For example, for the Multi-quantile loss (`MQLoss`), the model needs to output each quantile for each horizon.

```python
#| export
class MLP(BaseWindows): # <<---- Inherit from the BaseWindows
    def __init__(self,
                 # Inhereted hyperparameters with no defaults
                 h,
                 input_size,
                 # Model specific hyperparameters
                 num_layers = 2,
                 hidden_size = 1024,
                 # Inhereted hyperparameters with defaults
                 exclude_insample_y = False,
                 loss = MAE(),
                 valid_loss = None,
                 max_steps: int = 1000,
                 learning_rate: float = 1e-3,
                 num_lr_decays: int = -1,
                 early_stop_patience_steps: int =-1,
                 val_check_steps: int = 100,
                 batch_size: int = 32,
                 valid_batch_size: Optional[int] = None,
                 windows_batch_size = 1024,
                 inference_windows_batch_size = -1,
                 step_size: int = 1,
                 scaler_type: str = 'identity',
                 random_seed: int = 1,
                 num_workers_loader: int = 0,
                 drop_last_loader: bool = False,
                 **trainer_kwargs):
    # Inherit BaseWindows class
    super(MLP, self).__init__(h=h,
                              input_size=input_size,
                              ...,
                              random_seed=random_seed,
                              **trainer_kwargs)

    # Architecture
    self.num_layers = num_layers
    self.hidden_size = hidden_size

    # MultiLayer Perceptron
    layers = [nn.Linear(in_features=input_size, out_features=hidden_size)]
    layers += [nn.ReLU()]
    for i in range(num_layers - 1):
        layers += [nn.Linear(in_features=hidden_size, out_features=hidden_size)]
        layers += [nn.ReLU()]
    self.mlp = nn.ModuleList(layers)

    # Adapter with Loss dependent dimensions
    self.out = nn.Linear(in_features=hidden_size, 
                         out_features=h * self.loss.outputsize_multiplier) ## <<--- Use outputsize_multiplier to adjust output size

```

:::{.callout-tip}

Don't forget to add the `#| export` tag on each cell.

:::


Finally, define the `forward` step for your model. Note how the `reshape` method is used to adjust the output shape of the model to the expected shape of the forecast defined above. **Finally, always include `y_pred = self.loss.domain_map(y_pred)` at the end of the function**.

```python
    #| export
    def forward(self, windows_batch):
        # Parse windows_batch
        insample_y = windows_batch['insample_y'].clone()
        # MLP
        y_pred = self.mlp(y_pred)
        # Reshape and map to loss domain
        y_pred = y_pred.reshape(batch_size, self.h, self.loss.outputsize_multiplier)
        y_pred = self.loss.domain_map(y_pred)
        return y_pred
```

:::{.callout-tip}

Larger architectures, such as Transformers, might require splitting the `forward` by using intermediate functions.

:::


### b. Tests and documentation

`nbdev` allows for testing and documenting the model during the development process. It allow users to iterate the development within the notebook, testing the code in the same environment. Refer to existing models, such as the real MLP model [here](https://github.com/Nixtla/neuralforecast/blob/main/nbs/models.mlp.ipynb). These files already contain the tests, documentation, and usage examples that were used during the development process.

## 3. Core class and additional files

Finally, add the model to the `core` class and additional files. This process should be done after exporting the model with the `nbdev_export` command.

1. Manually add the model in the following [init file](https://github.com/Nixtla/neuralforecast/blob/main/neuralforecast/models/__init__.py).
2. Add the model to the `core` class, using the `nbdev` file [here](https://github.com/Nixtla/neuralforecast/blob/main/nbs/core.ipynb):
    
    a. Add the model to the initial model list:
    ```python
    from neuralforecast.models import (
    GRU, LSTM, RNN, TCN, DilatedRNN,
    MLP, NHITS, NBEATS, NBEATSx,
    TFT, VanillaTransformer,
    Informer, Autoformer, FEDformer,
    StemGNN, PatchTST
    )
    ```
    b. Add the model to the `MODEL_FILENAME_DICT` dictionary (used for the `save` and `load` functions).

## 4. Upload to GitHub

Congratulations! Following the steps above the model is ready to be used in the library. 

Follow the steps in our contributing guide to upload the model to GitHub: [here](https://github.com/Nixtla/neuralforecast/blob/main/CONTRIBUTING.md).

One of the maintainers will review the PR, request changes if necessary, and merge it to the library.