# Direct multi-step forecaster

## Descripción

El `ForecasterAutoregDirect` permite predecir varios pasos en el futuro de forma directa. Es decir, para cada step a predecir se entrena un modelo diferente.

https://skforecast.org/0.13.0/user_guides/direct-multi-step-forecasting

Hasta ahora, el argumento `steps` del forecaster, el cual determina cual es el horizonte máximo de predicción, solo podía ser un `int`. Con este desarrollo, se permitirá que `steps` sea un iterable y así el usuario pueda entrenar solo los modelos que necesite.

## TODOs y Checks

- `steps` puede ser un iterable (lista, range, numpy...) además de un `int`. Por ejemplo, `steps=range(3, 6)`, `steps=[1, 5]`...

- `regressors_` solo contendría los modelos para esos pasos. `{1: Ridge(), 5: Ridge()}`. El resto de steps no deben aparecer en este diccionario.

- `predict` ya acepta `steps` como lista discontinua. Si no se especifica, se devuelven todos los pasos para los cuales exista modelo. 

- Si se intenta predecir un step para el cual no existe modelo, se lanza un error.

- Llevar esto a todos los métodos de predicción del forecaster.

- Crear tests para `steps` como lista discontinua. Probar métodos `init`, `fit`, `predict`, `set_params`, `get_feature_importances`....

## Next steps (Cuando todo lo de arriba esté OK)

- Comprobar como se integra esto con las funciones de `model_selection`.

## Libraries

In [2]:
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoregDirect import ForecasterAutoregDirect
from skforecast.datasets import fetch_dataset
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

## Data

In [5]:
# Download data
# ==============================================================================
data = fetch_dataset(
    name="h2o", raw=True, kwargs_read_csv={"names": ["y", "datetime"], "header": 0}
)

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y-%m-%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()
data.head(2)

h2o
---
Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health
system had between 1991 and 2008.
Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd
Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman
/fpp3package, http://OTexts.com/fpp3.
Shape of the dataset: (204, 2)


Unnamed: 0_level_0,y
datetime,Unnamed: 1_level_1
1991-07-01,0.429795
1991-08-01,0.400906


In [8]:
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoregDirect(
                 regressor     = Ridge(),
                 steps         = 5,
                 lags          = 3,
                 transformer_y = None,
                 n_jobs        = 'auto'
             )

forecaster.fit(y=data['y'])
forecaster

ForecasterAutoregDirect 
Regressor: Ridge() 
Lags: [1 2 3] 
Transformer for y: None 
Transformer for exog: None 
Weight function included: False 
Window size: 3 
Maximum steps predicted: 5 
Exogenous included: False 
Exogenous variables names: None 
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2008-06-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'alpha': 1.0, 'copy_X': True, 'fit_intercept': True, 'max_iter': None, 'positive': False, 'random_state': None, 'solver': 'auto', 'tol': 0.0001} 
fit_kwargs: {} 
Creation date: 2024-09-16 18:29:33 
Last fit date: 2024-09-16 18:29:33 
Skforecast version: 0.13.0 
Python version: 3.11.5 
Forecaster id: None 

En `regressor` simplemente se almacena una copia del regresor base aunque no se utiliza

In [10]:
forecaster.regressor

El dict `regressors_` contiene los modelos individuales para cada step.

In [9]:
forecaster.regressors_

{1: Ridge(), 2: Ridge(), 3: Ridge(), 4: Ridge(), 5: Ridge()}

La idea es que el argumento `steps` a la hora de inicial el forecaster pudiese ser un iterable. Por ejemplo: (este código falla)

In [None]:
# Create and fit forecaster (Esta celda falla)
# ==============================================================================
forecaster = ForecasterAutoregDirect(
                 regressor     = Ridge(),
                 steps         = [1, 5],
                 lags          = 3,
                 transformer_y = None,
                 n_jobs        = 'auto'
             )

Y esto generaría que solo se entrenansen los modelos para los pasos 1 y 5. Y `regressors_` solo contendría los modelos para esos pasos. `{1: Ridge(), 5: Ridge()}`

## Prediction

When predicting, the value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1.

+ If `int` only steps within the range of 1 to int are predicted.

+ If `list` of `int`. Only the steps contained in the list are predicted.

+ If `None` as many steps are predicted as were defined at initialization.

In [4]:
# Predict
# ==============================================================================
# Predict only a subset of steps
predictions = forecaster.predict(steps=[1, 5])
display(predictions)

2005-07-01    0.952051
2005-11-01    1.179922
Name: pred, dtype: float64

In [5]:
# Predict all steps defined in the initialization.
predictions = forecaster.predict()
display(predictions.head(3))

2005-07-01    0.952051
2005-08-01    1.004145
2005-09-01    1.114590
Name: pred, dtype: float64