## Tree-based model forecasting a time series with trend

* Timeseries with trend
    * Tree based models cannot extrapolate 
        * Break the timeseries in to it's components
            * Forcaste the trend seperately and add it back
        * Advanced tree algorithm
            * Linear trees : fits a linear model at the leaf 

#### OPTION:1
* $Estimate \: the \: trend \: \: T_t = \beta_0 + \beta_1t + \beta_2t^2 + ...$
    * We can use $PolynomialTrendForecaster$ from sktime for estimating trend
* $Detrend \: the \: target \: variable \: Y_{detrend} = Y_t - T_t$
* $Build \: a \: forecaster \: on \: detrended \: data \: \: Z_{forecast} = Tree(Y_{detrend})$
* $Forecast \: the \: trend \: using \: any \: other \: method \:\: T_{forecast} = \beta_0 + \beta_1t + \beta_2t^2 + ...$
* $Add \: the \: trend \: forecast \: to \: the \: detrended \: data \: forecast \: \: Y_{forecast} =  Z_{forecast} + T_{forecast}$

#### OPTION : 2
* Use Advanced tree algorithms like LGBMRegressor that fits a complex model in the leaf nodes
    $$model = LGBMRegressor(linear\_tree=True)$$


In [212]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sktime.forecasting.trend import PolynomialTrendForecaster
from sktime.transformations.series.boxcox import BoxCoxTransformer, LogTransformer
from sktime.transformations.series.summarize import WindowSummarizer
from sktime.transformations.series.detrend import Detrender
from sktime.transformations.series.time_since import TimeSince

from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures
from sklearn.pipeline import make_pipeline, make_union
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import HistGradientBoostingRegressor, RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error


#### The air passengers dataset is the monthly totals of international airline passengers, from 1949 to 1960, in units of 1000s. 

In [3]:
data = pd.read_csv("../../Datasets/example_air_passengers.csv", parse_dates=['ds'], index_col=['ds'])

data.plot(figsize=(15,4))

<img src='./plots/air-passengers-data.png'>

## How do we extract trend from training data ? , How do we forecast trend into the future?

Let's use an `sktime` forecaster to model a polynomial trend:

$$T_t = \beta_0 + \beta_1t + \beta_2t^2 + ... + \beta_dt^d$$

where $d$ is the degree of the polynomial and $t$ is time.

In [100]:
trend_transformer =  PolynomialTrendForecaster(regressor=LinearRegression(), degree=1)
# PolynomialTrendForecaster requires that the freq must be set in the dataframe
df = data.copy().asfreq('MS')

# fit 
trend_transformer.fit(df['y'])

# Forecast
fh = pd.date_range(start=df.index.min(), end=df.index.max() + pd.DateOffset(months=12), freq='MS')

y_pred = trend_transformer.predict(fh)

ax = df.plot(figsize=(15,4))
y_pred.plot(ax=ax)
ax.set(title='Trend forecast');

<img src='./plots/air-passengers-data-trend-forecast-sktime.png'>

#### Detrend the data

In [102]:
y_detrend = df['y'] - y_pred

y_detrend.plot(figsize=(15,4))
plt.title('De-trended data');

<img src='./plots/air-passengers-data-detrended.png'>

#### We can clearly see that the Seasonal variations are increasing over-time. 
#### Tree based model struggles to extrapolate if timeseries is not stationary.
#### Let's stabilize the variance of the timeseries before fitting and detrending.

## Stabilize the timeseries using LogTransformer

In [104]:
# Transformer
log_transform = LogTransformer()
# data
df = data.copy().asfreq(freq='MS')
# stabilized timeseries
df_stable = log_transform.fit_transform(df)

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,4))
df.plot(ax=ax[0], title='Timeseries : Air passengers data')
df_stable.plot(ax=ax[1], title='Variance Stabilized')

<img src='./plots/air-passengers-data-stationary-plot.png'>

In [106]:
# fit PolynomialTrendForecaster on stationary timeseries
trend_transformer =  PolynomialTrendForecaster(regressor=LinearRegression(), degree=1)
trend_transformer.fit(df_stable)

# forecast  | extract trend
fh = pd.date_range(start=df_stable.index.min(), end=df_stable.index.max() + pd.DateOffset(months=12), freq='MS')

y_pred = trend_transformer.predict(fh=fh)


ax  = df_stable.plot(figsize=(15,4))
y_pred.plot(ax=ax)
ax.legend(['stable-timeseries' , 'predicted-trend'])

<img src='./plots/air-passengers-data-stable-trend-forecast.png'>

#### Detrend the timeseries

In [108]:
y_detrend =  df_stable['y'] - y_pred['y']

y_detrend.plot(figsize=(15, 4))
plt.title('De-trended data');

<img src='./plots/air-passengers-data-stable-detrended.png'>

#### Can we automate these steps ? ## How to create a pipline that does this workflow?

sktime has a convenient transformer that allows us to de-trend and add the trend back called `Detrender`.

In [84]:
# data
df = data.copy()

# stabilize
df = log_transform.fit_transform(df)


detrender = Detrender(
    forecaster=PolynomialTrendForecaster(
        regressor=LinearRegression(), degree=1))

# fit on stable timeseries : here it fits the PolynomialTrendForecaster
detrender.fit(df)

# transform : here it calculate the trend and substract it
y_detrend = detrender.transform(df)


In [76]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,4))
df.plot(ax=ax[0], title="Stationary timeseries")
y_detrend.plot(ax=ax[1], title='detrended data')

<img src='./plots/air-passengers-data-sktime-stabilize-and-detrend.png'>

In [97]:
inverse_transform = detrender.inverse_transform(y_detrend)
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,4))
y_detrend.plot(ax=ax[0], title="detrended data")
inverse_transform.plot(ax=ax[1], title='inverse transform');

<img src='./plots/air-passengers-data-sktime-stabilize-and-detrend-inverse-transform.png'>

## Pipeline

In [109]:
# create a pipeline
# by default Detrender use PolynomialTrendForecaster as forecaster
# by default PolynomialTrendForecaster use LinearRegression as regressor
pipeline = make_pipeline(LogTransformer(), Detrender())

# data
df = data.copy().asfreq('MS')

pipeline.fit(df)

y_transformed = pipeline.transform(df)

y_inverse_transform = pipeline.inverse_transform(y_transformed)


In [98]:
fig,ax = plt.subplots(nrows=1, ncols=3, figsize=(18,4))
df.plot(ax=ax[0], title='data')
y_transformed.plot(ax=ax[1], title="pipeline.transform")
y_inverse_transform.plot(ax=ax[2], title='pipeline.inverse_transform');


<img src='./plots/air-passengers-data-stabilize-detrend-inverse-transform-pipeline.png'>

## Forecasting with tree based models

In [147]:
# Transformers

# makes timeseries stationary and detrend the timeseries
target_transformer = make_pipeline(LogTransformer(), Detrender())

# time realted features to capture trend
time_since = TimeSince(keep_original_columns=False)

# This is because it would still allow a tree based model 
# to segment over time  and isolate changepoints, outliers, 
# and other interesting periods during training.



# Lag features | Window summary | Rolling stats 
window_summary = WindowSummarizer(
    lag_feature={
    'lag':[1,2,3,12],
    'mean':[[1,12]]
    },
    truncate='bfill',
    target_cols=['y']
)


features = make_union(time_since, window_summary)

features_scaled = make_pipeline(features, MinMaxScaler())

In [167]:
# Define time of first forecast, this determines our train / test split
forecast_start_time = pd.to_datetime("1958-01-01")

# Define number of steps to forecast.
num_of_forecast_steps = 36

forecast_horizon = pd.date_range(start=forecast_start_time, periods=num_of_forecast_steps, freq='MS')

# how far back in time we need to look back for feature engineering purpose
look_back_window = pd.DateOffset(months=12)

In [155]:
# TRAIN TEST SPLIT
df = data.copy().asfreq('MS')
df_train = df.loc[:forecast_start_time].copy()
df_test = df.loc[forecast_start_time:].copy()

# TARGET TRANSFORM
df_train_transformed = target_transformer.fit_transform(df_train)
df_test_transformed = target_transformer.transform(df_test)

# TRAINING
model = DecisionTreeRegressor()
X = features_scaled.fit_transform(df_train_transformed)
y = df_train_transformed['y']
model.fit(X, y)
y_pred = model.predict(X)
y_pred = pd.DataFrame(data=y_pred, index=df_train_transformed.index, columns=['y'])
y_pred_transformed = target_transformer.inverse_transform(y_pred)

In [166]:
ax = df_train.plot(figsize=(15, 4))
y_pred_transformed.plot(ax=ax, linewidth=8, alpha=0.5)
y_pred_transformed.plot(ax=ax, linestyle='', marker='.')
ax.set(title='In-sample prediciton')
ax.legend(['training data', 'in-sample prediction']);

<img src='./plots/air-passengers-data-insample-prediction.png'>

In [193]:
index = forecast_start_time - look_back_window

df_lookback = data.loc[index:].copy().asfreq('MS')

df_forecast = pd.DataFrame(index=forecast_horizon)
for fh in forecast_horizon:
    X = df_lookback.loc[:fh, ['y']]
    X = target_transformer.transform(X)
    X = features_scaled.transform(X)

    forecast = model.predict([X[-1]])
    df_forecast.loc[fh, 'y'] = forecast[0]


# inverse transform
df_forecast = target_transformer.inverse_transform(df_forecast)


In [220]:
ax = df_test.plot(marker='.', figsize=(15,4))
df_forecast.plot(ax=ax, marker='.')
ax.legend(['test-data','prediction'])

<img src='./plots/air-passengers-data-tree--based-model-forecast-test-set.png'>

In [217]:
mae = mean_absolute_error(y_true=df_test['y'], y_pred=df_forecast['y'])
rmse = mean_squared_error(y_true=df_test['y'], y_pred=df_forecast['y'], squared=False)
mape = mean_absolute_percentage_error(y_true=df_test['y'], y_pred=df_forecast['y'])

In [221]:
ax = df_train.plot(figsize=(15, 5))
y_pred_transformed.plot(ax=ax,  linestyle='', marker='.', color='r')
df_test.plot(marker='.', ax=ax)
df_forecast.plot(ax=ax, marker='.')

ax.set(title=f'Forecasting | MSE :{mae :0.2f} | RMSE : {rmse: 0.2f} | MAPE : {mape :0.2f}')
ax.legend(['training data', 'in-sample prediction', 'test-data','prediction']);


<img src='./plots/air-passengers-data-forecast-tree-based-model-train-test.png'>