<a target="_blank" href="https://colab.research.google.com/github/thierrymoudiki/sktime/blob/main/examples/nnetsauce/2024-11-13-nnetsauce-MTS-example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

**You can beat Forecasting LLMs with `nnetsauce.MTS` Pt.2**



In this post I benchmark [`nnetsauce.MTS`](https://www.researchgate.net/publication/382589729_Probabilistic_Forecasting_with_nnetsauce_using_Density_Estimation_Bayesian_inference_Conformal_prediction_and_Vine_copulas)'s _armada_ of base models against foundation models ("LLMs", Amazon's [_Chronos_](https://openreview.net/pdf?id=gerNCVqqtR), IBM's [_TinyTimeMixer_](https://arxiv.org/pdf/2401.03955)) and _statistical_ models. Regarding the LLMs: If I'm not doing it well (I just _plugged and played_), do not hesitate to reach out.  

The _armada_ is [now](https://thierrymoudiki.github.io/blog/2024/11/24/r/python/forecasting/nnetsauce/nnetsauce-sktime-LLM) made of Generic Gradient Boosters (see https://www.researchgate.net/publication/386212136_Scalable_Gradient_Boosting_using_Randomized_Neural_Networks).

**Contents**

- [0 - Install `nnetsauce` and `mlsauce`](#0---install-nnetsauce-and-mlsauce)
- [1 - Example 1: using `nnetsauce` with sktime](#1---example-1-using-nnetsauce-with-sktime)
    - [1 - 1 Point forecast with `nnetsauce`'s `sktime` interface](#1---1-point-forecast-with-nnetsauces-sktime-interface)
    - [1 - 2 Probabilistic forecasting with `nnetsauce`'s `sktime` interface](#1---2-probabilistic-forecasting-with-nnetsauces-sktime-interface)
- [2 - sktime foundation models and nnetsauce](#2---sktime-foundation-models-and-nnetsauce)
    - [2 - 1 - Example1 on macroeconomic data](#2---1---example1-on-macroeconomic-data)
    - [2 - 2 - Example2 on antidiabetic drug sales](#2---2---example2-on-antidiabetic-drug-sales)


# 0 - Install `nnetsauce` and `mlsauce`

In [1]:
!pip install git+https://github.com/Techtonique/mlsauce.git --verbose

Using pip 24.2 from /Users/t/Documents/Python_Packages/nnetsauce/venv/lib/python3.11/site-packages/pip (python 3.11)
Collecting git+https://github.com/Techtonique/mlsauce.git
  Cloning https://github.com/Techtonique/mlsauce.git to /private/var/folders/cp/q8d6040n3m38d22z3hkk1zc40000gn/T/pip-req-build-uwkmyubc
  Running command git version
  git version 2.39.3 (Apple Git-145)
  Running command git clone --filter=blob:none https://github.com/Techtonique/mlsauce.git /private/var/folders/cp/q8d6040n3m38d22z3hkk1zc40000gn/T/pip-req-build-uwkmyubc
  Cloning into '/private/var/folders/cp/q8d6040n3m38d22z3hkk1zc40000gn/T/pip-req-build-uwkmyubc'...
  Updating files:   2% (3/120)
  Updating files:   3% (4/120)
  Updating files:   4% (5/120)
  Updating files:   5% (6/120)
  Updating files:   6% (8/120)
  Updating files:   7% (9/120)
  Updating files:   8% (10/120)
  Updating files:   9% (11/120)
  Updating files:  10% (12/120)
  Updating files:  11% (14/120)
  Updating files:  12% (15/120)
  Upda

In [2]:
!pip install nnetsauce


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
!pip install git+https://github.com/thierrymoudiki/sktime.git --upgrade --no-cache-dir

^C
Traceback (most recent call last):
  File "/Users/t/Documents/Python_Packages/nnetsauce/venv/bin/pip", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/t/Documents/Python_Packages/nnetsauce/venv/lib/python3.11/site-packages/pip/_internal/cli/main.py", line 78, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/t/Documents/Python_Packages/nnetsauce/venv/lib/python3.11/site-packages/pip/_internal/commands/__init__.py", line 114, in create_command
    module = importlib.import_module(module_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

from sklearn import linear_model
from statsmodels.tsa.base.datetools import dates_from_str
from sktime.forecasting.nnetsaucemts import NnetsauceMTS

Macroeconomic data

In [None]:
# some example data
mdata = sm.datasets.macrodata.load_pandas().data
# prepare the dates index
dates = mdata[["year", "quarter"]].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
quarterly = dates_from_str(quarterly)
mdata = mdata[["realgovt", "tbilrate", "cpi"]]
mdata.index = pd.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()
data2 = mdata

n = data.shape[0]
max_idx_train = np.floor(n * 0.9)
training_index = np.arange(0, max_idx_train)
testing_index = np.arange(max_idx_train, n)
df_train = data.iloc[training_index, :]
print(df_train.tail())
df_test = data.iloc[testing_index, :]
print(df_test.head())

            realgovt  tbilrate       cpi
2003-06-30  0.047086 -0.171850  0.002726
2003-09-30  0.000981 -0.021053  0.006511
2003-12-31  0.007267 -0.043485  0.007543
2004-03-31  0.012745  0.043485  0.005887
2004-06-30  0.005669  0.252496  0.009031
            realgovt  tbilrate       cpi
2004-09-30  0.017200  0.297960  0.008950
2004-12-31 -0.012387  0.299877  0.005227
2005-03-31  0.004160  0.201084  0.010374
2005-06-30  0.000966  0.112399  0.004633
2005-09-30  0.023120  0.156521  0.022849


In [None]:
n2 = data.shape[0]
max_idx_train2 = np.floor(n2 * 0.9)
training_index2 = np.arange(0, max_idx_train2)
testing_index2 = np.arange(max_idx_train2, n2)
df_train2 = data2.iloc[training_index2, :]
print(df_train2.tail())
df_test2 = data2.iloc[testing_index, :]
print(df_test.head())

            realgovt  tbilrate    cpi
2003-03-31   800.196      1.14  183.2
2003-06-30   838.775      0.96  183.7
2003-09-30   839.598      0.94  184.9
2003-12-31   845.722      0.90  186.3
2004-03-31   856.570      0.94  187.4
            realgovt  tbilrate       cpi
2004-09-30  0.017200  0.297960  0.008950
2004-12-31 -0.012387  0.299877  0.005227
2005-03-31  0.004160  0.201084  0.010374
2005-06-30  0.000966  0.112399  0.004633
2005-09-30  0.023120  0.156521  0.022849


# 1 - sktime foundation models and nnetsauce

In [5]:
import numpy as np

def rmse(predictions, targets):
    return np.sqrt(((predictions.values - targets.values) ** 2).mean())

def mae(predictions, targets):
    return np.mean(np.abs(predictions - targets))

def me(predictions, targets):
    return np.mean(predictions - targets)

### 1 - 2 - Example1 on macroeconomic data with generic booster

In [6]:
import nnetsauce as ns
import mlsauce as ms
from sktime.forecasting.ttm import TinyTimeMixerForecaster
from sktime.forecasting.chronos import ChronosForecaster

# Initialise models
chronos = ChronosForecaster("amazon/chronos-t5-tiny")
ttm = TinyTimeMixerForecaster()
regr = linear_model.RidgeCV()
obj_MTS = NnetsauceMTS(regr, lags = 20, n_hidden_features=7, n_clusters=2,
                       type_pi="scp2-block-bootstrap",
                       kernel="gaussian",
                       replications=250)
regr2 = ms.GenericBoostingRegressor(regr, verbose=0)
obj_MTS2 = ns.MTS(obj=regr2)

# Fit
h = df_test.shape[0] + 1
chronos.fit(y=df_train, fh=range(1, h))
ttm.fit(y=df_train, fh=range(1, h))
obj_MTS.fit(y=df_train, fh=range(1, h))
obj_MTS2.fit(df_train)

# Predict
pred_chronos = chronos.predict(fh=[i for i in range(1, h)])
pred_ttm = ttm.predict(fh=[i for i in range(1, h)])
pred_MTS = obj_MTS.predict(fh=[i for i in range(1, h)])
pred_MTS2 = obj_MTS2.predict(h=h-1)

ModuleNotFoundError: No module named 'jax'

In [None]:
from sklearn.utils import all_estimators
from sklearn.base import RegressorMixin
from tqdm import tqdm

results = []

results.append(["Chronos", rmse(df_test, pred_chronos), mae(df_test, pred_chronos), me(df_test, pred_chronos)])
results.append(["TinyTimeMixer", rmse(df_test, pred_ttm), mae(df_test, pred_ttm), me(df_test, pred_ttm)])
results.append(["NnetsauceMTS", rmse(df_test, pred_MTS), mae(df_test, pred_MTS), me(df_test, pred_MTS)])

# statistical models
for i, name in enumerate(["ARIMA", "ETS", "Theta", "VAR", "VECM"]):
  try:
    regr = ns.ClassicalMTS(model=name)
    regr.fit(df_train)
    X_pred = regr.predict(h=df_test.shape[0])
    results.append([name, rmse(df_test, X_pred.mean), mae(df_test, X_pred.mean), me(df_test, X_pred.mean)])
  except Exception:
    pass

for est in tqdm(all_estimators()):
  if (issubclass(est[1], RegressorMixin)):
    try:
      preds = ns.MTS(ms.GenericBoostingRegressor(est[1](), verbose=0), show_progress=False).\
      fit(df_train).\
      predict(h=df_test.shape[0])
      results.append([est[0], rmse(df_test, preds), mae(df_test, preds), me(df_test, preds)])
    except Exception:
      pass


results_df = pd.DataFrame(results, columns=["model", "rmse", "mae", "me"])

# Assuming 'results_df' is the DataFrame from the provided code
pd.options.display.float_format = '{:.5f}'.format

display(results_df.sort_values(by="rmse"))

display(results_df.sort_values(by="mae"))

display(results_df.sort_values(by="me"))


NameError: name 'rmse' is not defined

### 1 - 2 - Example2 on antidiabetic drug sales with generic booster

In [None]:
from sklearn.utils import all_estimators
from sklearn.base import RegressorMixin
from tqdm import tqdm

results = []

# LLMs and sktime
results.append(["Chronos", rmse(df_test, pred_chronos), mae(df_test, pred_chronos), me(df_test, pred_chronos)])
results.append(["TinyTimeMixer", rmse(df_test, pred_ttm), mae(df_test, pred_ttm), me(df_test, pred_ttm)])
results.append(["NnetsauceMTS", rmse(df_test, pred_MTS), mae(df_test, pred_MTS), me(df_test, pred_MTS)])

# statistical models
for i, name in enumerate(["ARIMA", "ETS", "Theta", "VAR", "VECM"]):
  try:
    regr = ns.ClassicalMTS(model=name)
    regr.fit(df_train)
    X_pred = regr.predict(h=df_test.shape[0])
    results.append([name, rmse(df_test, X_pred.mean), mae(df_test, X_pred.mean), me(df_test, X_pred.mean)])
  except Exception:
    pass

for est in tqdm(all_estimators()):
  if (issubclass(est[1], RegressorMixin)):
    try:
      preds = ns.MTS(ms.GenericBoostingRegressor(est[1](), verbose=0), lags=20, verbose=0, show_progress=False).\
      fit(df_train).\
      predict(h=df_test.shape[0])
      results.append([est[0], rmse(df_test, preds), mae(df_test, preds), me(df_test, preds)])
    except Exception:
      pass

results_df = pd.DataFrame(results, columns=["model", "rmse", "mae", "me"])



  0%|          | 0/206 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:05, 17.07it/s]

100%|██████████| 1/1 [00:00<00:00, 20.95it/s]

100%|██████████| 1/1 [00:00<00:00, 46.46it/s]
  0%|          | 1/206 [00:00<01:22,  2.49it/s]
  0%|          | 0/100 [00:00<?, ?it/s][A
  1%|          | 1/100 [00:00<00:37,  2.61it/s][A
  2%|▏         | 2/100 [00:00<00:33,  2.88it/s][A
  3%|▎         | 3/100 [00:01<00:33,  2.91it/s][A
  4%|▍         | 4/100 [00:01<00:32,  2.94it/s][A
  5%|▌         | 5/100 [00:01<00:32,  2.97it/s][A
  6%|▌         | 6/100 [00:02<00:32,  2.93it/s][A
  7%|▋         | 7/100 [00:02<00:30,  3.03it/s][A
  8%|▊         | 8/100 [00:02<00:26,  3.42it/s][A
  9%|▉         | 9/100 [00:02<00:24,  3.74it/s][A
 10%|█         | 10/100 [00:03<00:22,  3.93it/s][A
 11%|█         | 11/100 [00:03<00:21,  4.10it/s][A
 12%|█▏        | 12/100 [00:03<00:20,  4.28it/s][A
 13%|█▎        | 13/100 [00:03<00:19,  4.39it/s][A
 14%|█▍        | 14/100 [00:03<00:19,  4.47it/s][A
 15%|█▌ 

In [None]:
# Assuming 'results_df' is the DataFrame from the provided code
pd.options.display.float_format = '{:.5f}'.format

display(results_df.sort_values(by="rmse"))

display(results_df.sort_values(by="mae"))

display(results_df.sort_values(by="me"))
