<a target="_blank" href="https://colab.research.google.com/github/bettercodepaul/nixtla_intro_workshop/blob/main/Introduction_to_MLForecast.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Introduction to Forecasting with Nixtla's mlforecast

This notebook walks you through the very basics of forecasting time series with Nixtla's mlforecast.

## Install and import necessary libraries

We use [Polars](https://docs.pola.rs/) for data wrangling, [Plotly](https://plotly.com/python/plotly-express/) for visualizations and Nixtla's [mlforecast](https://nixtlaverse.nixtla.io/mlforecast/index.html) for time series forecasting with machine learning algorithms.

In [None]:
pip -q install mlforecast polars plotly scikit-learn==1.5.2

In [None]:
import lightgbm as lgb
import polars as pl
import plotly.express as px
from datetime import date
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.target_transforms import Differences
from utilsforecast.plotting import plot_series

pl.Config(tbl_rows=20)

In [None]:
import warnings
warnings.filterwarnings("ignore")

## Initial Exploration of the data

The data for this walk through is monthly sales data for various countries and products.

In [None]:
Y_df = pl.read_parquet("https://github.com/bettercodepaul/nixtla_intro_workshop/raw/refs/heads/main/retail_sales_product_level.parquet")
Y_df.sample(5)

We can visualize the time series with Plotly. The sales volume on the country level is the same as in the introducing notebook.

In [None]:
plot_series(Y_df.group_by(pl.col("country").alias("unique_id"), "ds").agg(pl.col("y").sum()).sort("ds"), ids=["Deutschland", "Frankreich", "Italien", "Grossbritannien", "Japan", "USA"], engine="plotly")

However, in this data we also have sales on the product level. For example we can visualize the sales for all products of type *elegant* and segment *medium*.

In [None]:
product_type = "elegant" # elegant or comfortable
product_segment = "big" # small, medium or big
df_plot = (
    Y_df
    .filter(pl.col("type").eq(product_type) & pl.col("segment").eq(product_segment))
    .group_by("ds", "version")
    .agg(pl.col("y").sum())
    .sort("ds")
)
px.line(df_plot, x="ds", y="y", color="version")

## Local Models vs Global Models

Classic time series model like ARIMA, Exponential Smoothing and GARCH always operate on single time series. They learn from the past of this single time series to forecast the future.

![Local Models](https://github.com/bettercodepaul/nixtla_intro_workshop/blob/main/images/Local%20Models.png?raw=true)

Global models on the other hand learn from various time series at once. Relationships and patterns learned from one time series can be transferred to other time series in a global model.

Caution: Just because a model is global doesn't mean that the model treats all time series uniformly. It is therefore possible that the global model in our example handles Spain differently than Italy.

Global models often also have an advantage in addressing the so-called cold start problem. This is when a new time series needs to be predicted, but no prior data is available for it.

![Global Models](https://github.com/bettercodepaul/nixtla_intro_workshop/blob/main/images/Global%20Models.png?raw=true)

## Transforming the data to be suitable for Machine Learning

We need to transform the data to be able to feed it into a Machine Learning algorithm, because these algorithms are built to look at a row with different features and a target value.

![Machine Learning Models](https://github.com/bettercodepaul/nixtla_intro_workshop/blob/main/images/Machine%20Learning%20Models.png?raw=true)

For time series a classic feature is the lag of the time series. That is the value the time series had in the past. E.g. a lag 1 feature would be the value of the target one time step before.

In [None]:
models = [
    lgb.LGBMRegressor(random_state=0, verbosity=-1),
]
fcst = MLForecast(
    models=models,
    freq='1mo',
    lags=[1, 12], # create lags for previous month and same month last year
)

In [None]:
# check what the preprocessed data looks like
fcst.preprocess(Y_df.select("unique_id", "ds", "y"), dropna=False).filter(pl.col("unique_id").eq("Japan-big-elegant-3")).sort("ds").head(14)

In [None]:
# Another useful feature is the month so that the model is able to capture the seasonality
fcst = MLForecast(
    models=models,
    freq='1mo',
    lags=[1, 12], # create lags for previous month and same month last year
    date_features=['month'], # create a feature for the month
)
fcst.preprocess(Y_df.select("unique_id", "ds", "y"), dropna=False).filter(pl.col("unique_id").eq("Japan-big-elegant-3")).sort("ds").head(4)

In [None]:
# rolling and expanding means help to capture the trend of a time series without the model having to reconstruct that from a lot of lags
fcst = MLForecast(
    models=models,
    freq='1mo',
    lags=[1, 12], # create lags for previous month and same month last year
    lag_transforms={
        1: [ExpandingMean(), RollingMean(window_size=4, min_samples=1)],
        12: [ExpandingMean()]
    },
    date_features=['month'], # create a feature for the month
)
features = fcst.preprocess(Y_df.select("unique_id", "ds", "y"), dropna=False).filter(pl.col("unique_id").eq("Japan-big-elegant-3")).sort("ds")
features.tail(4)

In [None]:
# visualizing the different features helps a lot
px.line(features, x="ds", y=["y", "lag1", "lag12", "expanding_mean_lag1", "rolling_mean_lag1_window_size4_min_samples1", "expanding_mean_lag12"])

## Making a prediction with a recursive one-step ahead forecaster

To make a prediction we can simply call the fit method. This will create a one-step ahead forecast. A model that can predict the value for the next time step.

In [None]:
# now that we have prepared the features, we can fit a model
fcst.fit(Y_df.select("unique_id", "ds", "y"), dropna=False)

Making the actual predictions can then be done using the predict method. This will recursively make the predictions (use the result of the first forecast to create the features for the second forecast).

In [None]:
# and make a prediction for one year ahead
predictions = fcst.predict(12)
predictions.sample(5)

In [None]:
# what is interesting: the model predicts all series, also those from the past that are no longer of interest!
from utilsforecast.plotting import plot_series
fig = plot_series(Y_df, predictions, max_ids=4, plot_random=False)
fig

In [None]:
# beware of a cross validation in such a case!
# You will validate against series from the training window!
# THIS IS FUTURE LEAKAGE AT ITS FINEST!
cv_result = fcst.cross_validation(
    Y_df.select("unique_id", "ds", "y"),
    n_windows=4,  # number of models to train/splits to perform
    h=12,  # length of the validation set in each window
)
cv_result.group_by("unique_id").agg(pl.col("cutoff").unique(), pl.col("cutoff").n_unique().alias("n_cutoff")).sample(5)

We can fill the time series before they start and after they finish to avoid this.

In [None]:
px.line(Y_df.filter(pl.col("unique_id").eq("Italien-small-elegant-2")), x="ds", y="y", title="Before filling with 0")

In [None]:
# adjust the time series to cover the complete range
from utilsforecast.preprocessing import fill_gaps
Y_df_filled = fill_gaps(Y_df, freq="1mo", start="global", end="global").with_columns(pl.col("y").fill_null(0))
px.line(Y_df_filled.filter(pl.col("unique_id").eq("Italien-small-elegant-2")), x="ds", y="y", title="After filling with 0")

In [None]:
# Beware that we would have to adjust the features as well because fill_gaps is unaware of static features (country, segment, ...) and how to calculate the non-static features (e.g. months_till_eol, months_till_start)
Y_df_filled.filter(pl.col("unique_id").eq("Italien-small-elegant-2")).head(10)

In [None]:
cv_result = fcst.cross_validation(
    Y_df_filled.select("unique_id", "ds", "y"),
    n_windows=4,  # number of models to train/splits to perform
    h=12,  # length of the validation set in each window
)
cv_result.group_by("unique_id").agg(pl.col("cutoff").unique(), pl.col("cutoff").n_unique().alias("n_cutoff")).sample(5)

## Hands-on

Filling with zeros to avoid the future leakage works. However, this is also really problematic. What have we done to the distribution of the training and test data? Is it now easier or harder for the model to predict `y`?

In [None]:
# room for your thoughts or analysis

- Try to do some feature engineering to improve the model
- Play around with different lags and expanding means/rolling means
- What is the best value you can get?
- If that is boring for you, you can also have a look at the take home assignments

In [None]:
# copy and modify the code from above that creates the forecasting object fcst = MLForecast(...)

In [None]:
from utilsforecast.losses import rmse, mae, mape, bias

cv_result = fcst.cross_validation(
    Y_df.select("unique_id", "ds", "y"),
    n_windows=4,  # number of models to train/splits to perform
    h=12,  # length of the validation set in each window
)
rmse(cv_result, models=['LGBMRegressor'], id_col='cutoff').select(pl.col("LGBMRegressor").mean())

## Take home assignment


You can explore various topics we could not cover today!

- Add the static features for segment and type.
- Add features that covers the lifecycle of the products (months since market introduction, months until end of lifecycle). See https://nixtlaverse.nixtla.io/mlforecast/docs/how-to-guides/exogenous_features.html
- Add features that calculate the trend of the segment and the type (difficult with Nixtla!)
- How could you give the model a hint regarding the level of a new time series (cold-start problem)?
- Add features that cover predecessors of the product to be forecasted (which also helps for the cold-start problem)
- The recursive approach comes with quite some downsides. Train one model per horizon to get better forecasts https://nixtlaverse.nixtla.io/mlforecast/docs/how-to-guides/one_model_per_horizon.html and check https://medium.com/data-science/the-perils-of-recursive-forecasting-82ebd218d147
- Implement a proper rolling cross-validation
    - prepend the individual time series based on the maximum forecast horizon (e.g. for 12 months you would prepend the time series with 12 time steps)
    - implement the cross-validation to only forecast time series that are included in the validation window