<a href="https://colab.research.google.com/github/jgamel/learn_n_dev/blob/python_ds_examples/time_series_forecasting_scikit_learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A [time series](https://en.wikipedia.org/wiki/Time_series) is a succession of chronologically ordered data spaced at equal or unequal intervals. The [forecasting](https://en.wikipedia.org/wiki/Forecasting#Time_series_methods) process consists of predicting the future value of a time series, either by modeling the series solely based on its past behavior (autoregressive) or by using other external variables.

This document describes how to use Scikit-learn regression models to perform forecasting on time series. Specifically, it introduces [Skforecast](https://joaquinamatrodrigo.github.io/skforecast/0.4.2/index.html), a simple library that contains the classes and functions necessary to adapt any Scikit-learn regression model to forecasting problems. 

### Multi-Step Time Series Forecasting

The common objective of working with time series is not only to predict the next element in the series ( 𝑡+1 ) but an entire future interval or a point far away in time ( 𝑡+𝑛 ). Each prediction jump is known as a step.

There are several strategies that allow generating this type of multiple prediction.


Recursive multi-step forecasting

Since to predict the moment  𝑡𝑛  the value of  𝑡𝑛−1  is needed, which is unknown, it is necessary to make recursive predictions. New predictions use previous ones as predictors. This process is known as recursive forecasting or recursive multi-step forecasting.

The main adaptation needed to apply Scikit-learn models to recursive multi-step forecasting problems is to transform the time series into a matrix in which each value is associated with the time window (lags) preceding it. This forecasting strategy can be easily generated with the ForecasterAutoreg and ForecasterAutoregCustom classes from the Skforecast library.

Transformation of a time series into a 5 lags matrix and a vector with the value of the series that follows each row of the matrix.
This type of transformation also allows the inclusion of exogenous variables to the time series.

Direct multi-step forecasting

The direct multi-step forecasting method consists of training a different model for each step. For example, to predict the following 5 values of a time series, 5 different models are required to be trained, one for each step. As a result, the predictions are independent of each other.

The main complexity of this approach is to generate the correct training matrices for each model. The ForecasterAutoregMultiOutput class of the Skforecast library automates this process. It is also important to bear in mind that this strategy has a higher computational cost since it requires the train of multiple models. The following diagram shows the process for a case in which the response variable and two exogenous variables are available.

Multiple output forecasting

Certain models are capable of simultaneously predicting several values of a sequence (one-shot). An example of a model with this capability is the LSTM neural network.


### Recursive autoregressive forecasting

A time series is available with the monthly expenditure (millions of dollars) on corticosteroid drugs that the Australian health system had between 1991 and 2008. It is intended to create an autoregressive model capable of predicting future monthly expenditures.

Load Modules:

In [1]:
# Data manipulation
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
%matplotlib inline

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('ignore')

In addition to the above, Skforecast, a library containing the classes and functions needed to adapt any Scikit-learn regression model to forecasting problems, is used. It can be installed in the following ways:

pip install skforecast

A specific version:

pip install skforecast==0.3

Last version (unstable):

pip install git+https://github.com/JoaquinAmatRodrigo/skforecast#master

In [2]:
# Modeling and Forecasting
# ==============================================================================
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregCustom import ForecasterAutoregCustom
from skforecast.ForecasterAutoregMultiOutput import ForecasterAutoregMultiOutput
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster


ModuleNotFoundError: ignored