<a href="https://colab.research.google.com/github/acedesci/scanalytics/blob/master/EN/S06_Time-Series_Analytics/S06_Simple_TS_using_darts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# S06 Simple time-series forecast and evaluation using darts

### Introduction

Time series demand forecasting is crucial for effective supply chain management. With accurate demand forecasts, businesses can optimize inventory, improve demand planning, enhance sales forecasting, and mitigate supply chain risks.

With recent development in Python and opensources, there are many simple-to-use packages such as [statsmodels](https://www.statsmodels.org/stable/index.html) (mainly for statistical techniques), [Prophet (by Facebook)](https://facebook.github.io/prophet/), [GluonTS (by Amazon)] (https://ts.gluon.ai/stable/), and many other libraries. Most libraries require a specific (but similar) data input format and processes. Thus, the most important step to use such packages is to prepare the data in the right format.

There are also opensource libaries that are built upon many time-series forecasting packages and provide interfaces to many time-series algorithms such as [sktime](https://www.sktime.net/en/latest/index.html) and [darts](https://unit8co.github.io/darts/). These time-series forecasting interfaces greatly simplify forecasting with its intuitive API and diverse range of models, including classical statistical methods and modern machine learning approaches including deep-learning-based models.

In this notebook, we will provide a simple walkthrough to the time-series analysis using darts. There is also a demo that provides more comprehensive pipelines of the forecasting processes.

First of all, an installation of `dart` is mandatory since it is not included in Colab by default.

In [None]:
# The one installs the full version of darts.
# If it causes an error showing "numpy", please click "Runtime -> Restart session" and then run the notebook again
!pip install darts
!pip install statsforecast

# Install missing dependencies
!pip install pytorch-lightning

# this one is another installation option that might work
# !pip install "u8darts[torch]"

### **Step 1**: Load data

In order to use `darts`, it requires a specific object called `TimeSeries`. This is basically very similar to `DataFrame` in the time-series format and we can simply pass the time-series in the Series or DataFrame format to create the `TimeSeries` object for `darts`. **IMPORTANT**: The index of the Series or DataFrame prior to conversion must be in `datetime` format (i.e., using the function `pd.to_datetime(...)`)

In [None]:
# simple Python code for time-series using darts

import pandas as pd
import darts

data = pd.read_csv('https://bit.ly/m5simple', index_col='ds')
data.index = pd.to_datetime(data.index)

y_timeseries = darts.TimeSeries.from_series(data['y'])

### **Step 2**: Train/test split

We then split the time-series data into the training set and test set. The test set contains the last 52 data points whereas the training set contains all the data from the beginning until the period preceeding the test set.

In [None]:
test_n_points = 52

start = len(data)-test_n_points
train, test = y_timeseries.split_before(start)

### **Step 3**: Train the model

Similar to `sklearn`, we can simply create a model object and fit the data to the model. In the following block, we provide examples of different time-series models and you can choose one to try. Then the function `.fit(...)` is used to fit the data to the model (training).

The list of the models supported by `darts` is available here:
https://unit8co.github.io/darts/generated_api/darts.models.forecasting.html

In [None]:
from darts.models import ExponentialSmoothing, AutoARIMA, Theta, Prophet, Croston

model = ExponentialSmoothing()
# model = AutoARIMA()
# model = Theta()
# model = Prophet()
# model = Croston()

# Uncomment below if you want to try the deep learning NBEAT model (there could be a dependency issue)
# from darts.models import NBEATSModel
# model = NBEATSModel(input_chunk_length=52, output_chunk_length=52, n_epochs=50)

# Uncomment below if you want to try LightGBMModel
# from darts.models import LightGBMModel
# model = LightGBMModel(lags=52)

# LightGBM is a gradient boosting framework that uses tree-based learning algorithms.
# When using LightGBM with Darts, you often need to specify 'lgbm_kwargs' to pass parameters
# directly to the LightGBM regressor.
# 'lgbm_kwargs' can include parameters like 'n_estimators', 'learning_rate', 'num_leaves', etc.
# For time series, it's crucial to also provide 'lags' to specify which past values of the series
# should be used as features.

model.fit(train)

### **Step 4**: Create predictions for the period of the test set.

We can then generate the prediction of the following 52 periods, which corresponds to the period in the test set in order to measure the quality of the forecasts on an out-of-sample basis (using test data that has not been included in the training set). We can also convert the output into series (by using `.pd_series()`) or dataframe (by using `.pd_dataframe()`).

In [None]:
forecast = model.predict(len(test))

forecast.to_series()

### **Step 5**: Measure the forecasting errors.

We measure the results using five different error measures, i.e., Mean Absolute Percentage Error (MAPE) using the function `mape()`, Symmetric Mean Absolute Percentage Error (sMAPE)using the function `smape()`, Root Mean Squared Scaled Error (RMSSE) using the function `rmsse()`, Root Mean Squared Error (RMSE) using the function `rmse()`, and Mean Error (ME) using the function `merr()`.

In [None]:
m_mape = darts.metrics.mape(test, forecast)
m_wmape = darts.metrics.wmape(test, forecast)
m_rmsse = darts.metrics.rmsse(test, forecast, insample = train)
m_rmse = darts.metrics.rmse(test, forecast)
m_merr = darts.metrics.merr(test, forecast)


print(f"The model obtains Mean Absolute Percentage Error: {m_mape:.2f}%")
print(f"The model obtains Weighted Mean Absolute Percentage Error: {m_wmape:.2f}%")
print(f"The model obtains Root Mean Squared Scaled Error: {m_rmsse:.2f}")
print(f"The model obtains Root Mean Squared Error: {m_rmse:.2f}")
print(f"The model obtains Mean Error: {m_merr:.2f}")

You can also plot the results using `seaborn`

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(18,6))

sns.scatterplot(x = data[-104:].index, y = data['y'][-104:].values, label = 'true')
sns.lineplot(data = forecast.to_series(), label = 'Forecast')

### Note: Understanding LightGBM's Tabular Input

When `LightGBMModel` in `darts` is used with a `lags` parameter, it implicitly converts the time series data into a tabular format where each row represents a time step, and columns represent the target variable and its lagged values (features). For `lags=52`, it means that for each prediction point, the model considers the previous 52 values of the series as features. Since this process is prebuilt in the package, we cannot fully see the transformed data that is used in the training process.

To illustrate this, the following code manually creates a DataFrame with lagged features to illustrate this concept for our `train` dataset. This will show you the kind of tabular data LightGBM works with.

In [None]:
import pandas as pd

# Convert the Darts TimeSeries 'train' to a pandas Series
train_series = train.to_series()

# Create a DataFrame to store the lagged features
lagged_df = pd.DataFrame(index=train_series.index)

# Add the target variable (y) to the DataFrame
lagged_df['y'] = train_series

# Define the number of lags (as used in LightGBMModel)
lags_to_generate = 52

# Generate lagged features
for i in range(1, lags_to_generate + 1):
    lagged_df['lag_'+str(i)] = train_series.shift(i)

# Display the first few rows of the generated lagged DataFrame
# Note: The first 'lags_to_generate' rows will have NaN values for lags,
# as there aren't enough preceding data points.
display(lagged_df.head(15))
display(lagged_df.tail(15))

