In [None]:
#| hide
%load_ext autoreload
%autoreload 2

# Lag transformations
> Compute features based on lags

mlforecast allows you to define transformations on the lags to use as features. These are provided through the `lag_transforms` argument, which is a dict where the keys are the lags and the values are a list of transformations to apply to that lag.

## Data setup

In [None]:
from mlforecast.utils import generate_daily_series

In [None]:
data = generate_daily_series(10)

## window-ops

The [window-ops package](https://github.com/jmoralez/window_ops) provides transformations defined as [numba](https://numba.pydata.org/) [JIT compiled](https://en.wikipedia.org/wiki/Just-in-time_compilation) functions, which allows you to use them directly and also composing them very easily. We use numba because it makes them really fast and can also bypass [python's GIL](https://wiki.python.org/moin/GlobalInterpreterLock), which allows running them concurrently with multithreading.

In [None]:
import numpy as np
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.shift import shift_array

from mlforecast import MLForecast

In [None]:
@njit
def ratio_over_previous(x, offset=1):
    """Computes the ratio between the current value and its `offset` lag"""
    return x / shift_array(x, offset=offset)

@njit
def diff_over_previous(x, offset=1):
    """Computes the difference between the current value and its `offset` lag"""
    return x - shift_array(x, offset=offset)

If your function takes more arguments than the input array you can provide a tuple like: `(func, arg1, arg2, ...)`

In [None]:
fcst = MLForecast(
    models=[],
    freq='D',
    lags=[1, 2, 3],
    lag_transforms={
        1: [expanding_mean, ratio_over_previous, (ratio_over_previous, 2)],  # the second ratio sets offset=2
        2: [diff_over_previous],
    },
)
prep = fcst.preprocess(data)
prep.head(2)

Unnamed: 0,unique_id,ds,y,lag1,lag2,lag3,expanding_mean_lag1,ratio_over_previous_lag1,ratio_over_previous_lag1_offset2,diff_over_previous_lag2
3,id_0,2000-01-04,3.481831,2.445887,1.218794,0.322947,1.329209,2.006809,7.573645,0.895847
4,id_0,2000-01-05,4.191721,3.481831,2.445887,1.218794,1.867365,1.423546,2.856785,1.227093


As you can see the name of the function is used as the transformation name plus the `_lag` suffix. If the function has other arguments and they're not set to their default values they're included as well, as is done with `offset=2` here.

In [None]:
np.testing.assert_allclose(prep['lag1'] / prep['lag2'], prep['ratio_over_previous_lag1'])
np.testing.assert_allclose(prep['lag1'] / prep['lag3'], prep['ratio_over_previous_lag1_offset2'])
np.testing.assert_allclose(prep['lag2'] - prep['lag3'], prep['diff_over_previous_lag2'])

## Built-in transformations (experimental)

The built-in lag transformations are in the `mlforecast.lag_transforms` module. This module is experimental, so in order to use it you need the `coreforecast` package, which you can get with: `pip install coreforecast` or `pip install "mlforecast[lag_transforms]"`. If you're using conda please install it with `conda install -c conda-forge coreforecast` instead.

The main benefit of using these transformations is that since they're defined as classes they contain more information on the transformation that is being applied and can thus make it more efficiently, e.g. in order to update a rolling mean it just looks at the last `window_size` values, whereas the functions from window-ops have to re-apply the transformation on the full history. Another benefit is that the multithreading is done on the series, as opposed to the transformations, which can help in cases where the transformations are very different. Also, the multithreading is done in C++, so there's no risk of getting blocked by the GIL.

In [None]:
from mlforecast.lag_transforms import RollingMean, ExpandingStd

In [None]:
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [ExpandingStd()],
        7: [RollingMean(window_size=7, min_samples=1), RollingMean(window_size=14)]
    },
)

Once you define your transformations you can see what they look like with `MLForecast.preprocess`.

In [None]:
fcst.preprocess(data).head(2)

Unnamed: 0,unique_id,ds,y,expanding_std_lag1,rolling_mean_lag7_window_size7_min_samples1,rolling_mean_lag7_window_size14
20,id_0,2000-01-21,6.319961,1.956363,3.234486,3.283064
21,id_0,2000-01-22,0.071677,2.028545,3.256055,3.291068


### Extending the built-in transformations

You can compose these transformations by defining a new class that defines the `transform` and `update` methods. Consider the following example:

In [None]:
import coreforecast.lag_transforms as core_tfms
from coreforecast.grouped_array import GroupedArray

In [None]:
class RollingMeansRatioCore:
    def __init__(self, lag: int, window_one: int, window_two: int):
        self.lag = lag
        self.window_one = window_one
        self.window_two = window_two

    def transform(self, ga: GroupedArray) -> np.ndarray:
        self.tfm1 = core_tfms.RollingMean(self.lag, self.window_one)
        self.tfm2 = core_tfms.RollingMean(self.lag, self.window_two)
        return self.tfm1.transform(ga) / self.tfm2.transform(ga)

    def update(self, ga: GroupedArray) -> np.ndarray:
        return self.tfm1.update(ga) / self.tfm2.update(ga)

In order to keep the mlforecast API for lag transforms where the lag is the key, we have to wrap this transformation in another one. We hope to deprecate this in the future so that you only need to define the previous class. The wrapper class needs to implement the `_set_core_tfm` method which takes the lag and sets the `_core_tfm` attribute to be a transformation like the one we defined above.

In [None]:
from mlforecast.lag_transforms import BaseLagTransform

In [None]:
class RollingMeansRatio(BaseLagTransform):
    def __init__(self, window_one: int, window_two: int):
        self.window_one = window_one
        self.window_two = window_two

    def _set_core_tfm(self, lag: int):
        self._core_tfm = RollingMeansRatioCore(lag, self.window_one, self.window_two)
        return self

In [None]:
fcst = MLForecast(
    models=[],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            RollingMean(window_size=14),
            RollingMeansRatio(window_one=7, window_two=14)
        ],
    },
)
prep = fcst.preprocess(data)
prep.head(2)

Unnamed: 0,unique_id,ds,y,rolling_mean_lag1_window_size7,rolling_mean_lag1_window_size14,rolling_means_ratio_lag1_window_one7_window_two14
14,id_0,2000-01-15,0.435006,3.234486,3.283064,0.985204
15,id_0,2000-01-16,1.489309,3.256055,3.291068,0.989361


In [None]:
np.testing.assert_allclose(
    prep['rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag1_window_size14'],
    prep['rolling_means_ratio_lag1_window_one7_window_two14']
)

In [None]:
#| hide
from sklearn.linear_model import LinearRegression

In [None]:
#| hide
fcst = MLForecast(
    models=[LinearRegression()],
    freq='D',
    lag_transforms={
        1: [
            RollingMean(window_size=7),
            RollingMean(window_size=14),
            RollingMeansRatio(window_one=7, window_two=14)
        ],
    },
)
fcst.fit(data)
fcst.predict(2);