## Non-linear trend : using time as a feature

#### Add non linear time terms to the model

$$y_t = \beta_0 + \beta_1(t)+ \beta_2(t)^2+.... $$

#### Issues:
* When extrapolating the resulting forecast are often unrealistic
* Risk of overfitting to the training data and extrapolating poorly

<br>

#### Alternatives
* Use piecewise linear regression instead of adding non-linear terms
* Use regularization to prevent overfitting?



In [40]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import HistGradientBoostingRegressor, RandomForestRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures, MinMaxScaler

from statsmodels.tsa.deterministic import DeterministicProcess

from sktime.transformations.series.time_since import TimeSince

#### The air passengers dataset is the monthly totals of international airline passengers, from 1949 to 1960, in units of 1000s. 

In [7]:
data = pd.read_csv('../../Datasets/example_air_passengers.csv', parse_dates=['ds'], index_col=['ds'])
data.plot(figsize=(15,4))

<img src='./plots/air-passengers-data.png'>

# Let's create non-linear time features

In [9]:
time_since = TimeSince(keep_original_columns=True, freq='MS')
df = time_since.fit_transform(data)

df.head()

Unnamed: 0_level_0,y,time_since_1949-01-01 00:00:00
ds,Unnamed: 1_level_1,Unnamed: 2_level_1
1949-01-01,112,0
1949-02-01,118,1
1949-03-01,132,2
1949-04-01,129,3
1949-05-01,121,4


In [10]:
df['quadratic-time-feature'] = df['time_since_1949-01-01 00:00:00']**2
df.head()

Unnamed: 0_level_0,y,time_since_1949-01-01 00:00:00,quadratic-time-feature
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1949-01-01,112,0,0
1949-02-01,118,1,1
1949-03-01,132,2,4
1949-04-01,129,3,9
1949-05-01,121,4,16


### OPTION : 2  | Use Deterministic process from statsmodels

In [13]:
dp = DeterministicProcess(order=2, index=data.index, constant=False)

dp.in_sample()

Unnamed: 0_level_0,trend,trend_squared
ds,Unnamed: 1_level_1,Unnamed: 2_level_1
1949-01-01,1.0,1.0
1949-02-01,2.0,4.0
1949-03-01,3.0,9.0
1949-04-01,4.0,16.0
1949-05-01,5.0,25.0
...,...,...
1960-08-01,140.0,19600.0
1960-09-01,141.0,19881.0
1960-10-01,142.0,20164.0
1960-11-01,143.0,20449.0


### OPTION : 3 | Use Polynomial Features from sklearn preprocessing module

In [22]:
time_since = TimeSince(freq='MS', keep_original_columns=False)
poly_feat = PolynomialFeatures(degree=2, include_bias=False)
pipe = make_pipeline(time_since, poly_feat)
df = pd.DataFrame(data=pipe.fit_transform(data), columns=['trend', 'trend squared'], index=data.index)
df.head()

Unnamed: 0_level_0,trend,trend squared
ds,Unnamed: 1_level_1,Unnamed: 2_level_1
1949-01-01,0.0,0.0
1949-02-01,1.0,1.0
1949-03-01,2.0,4.0
1949-04-01,3.0,9.0
1949-05-01,4.0,16.0


## Let's build a forecast with just these non-linear time features.

In [23]:
df = data.copy()

target = "y"
holdout_size = 12 * 6  # forecast horizon

# Use all the time up until the start
# of the forecast horizon for training.
df_train = df.iloc[:-holdout_size]

# Use the end of the time series
# to test.
df_test = df.iloc[-holdout_size:]

In [39]:
def model_pipe(degree=2, model=LinearRegression(), figsize=(15,4)):
    time_since = TimeSince(freq='MS', keep_original_columns=False)
    poly_feat = PolynomialFeatures(degree=degree)
    scaler = MinMaxScaler()

    pipe = make_pipeline(time_since, poly_feat, scaler, model)

    pipe.fit(df_train, df_train['y'])
    y_pred_train = pd.DataFrame(data=pipe.predict(df_train), index=df_train.index, columns=['y_pred_train'])
    y_pred_test = pd.DataFrame(data=pipe.predict(df_test), index=df_test.index, columns=['y_preds_test'])

    ax = df_train.plot(figsize=figsize)
    df_test.plot(ax=ax)
    y_pred_train.plot(ax=ax)
    y_pred_test.plot(ax=ax)
    ax.legend(['train','test','train_prediction','test_prediction'])

### Linear Regression | Non-linear features | Degree 2

In [36]:
model =LinearRegression()
model_pipe(model=model, degree=2)

<img src='./plots/non-linear-trend-air-passengers-linear-reg-deg-2.png'>

### Linear Regression | Non-linear features | Degree 3

In [38]:
model =LinearRegression()
model_pipe(model=model, degree=3)

<img src='./plots/non-linear-trend-air-passengers-linear-reg-deg-3.png'>

## Regularization | Ridge | Non-linear trend | Degree 3

In [43]:
model = Ridge(alpha=1)
# model_pipe(model=model, degree=3)

<img src='./plots/Regularization-Ridge-Non-linear-trend-Degree-3.png'>

## Regularization | Lasso | Non-linear trend | Degree 3

In [44]:
model = Lasso(alpha=1)
model_pipe(model=model, degree=3)

<img src='./plots/Regularization-Lasso-Non-linear-trend-Degree-3.png'>

#### In general, it is not recommended to use $t^2$ terms or higher. But, we can see that regularisation can help. In practice, it's best to experiment and see which features produce the best forecast for your own use case.