## Lag features are a way of using the past to predict the future.

* We Can lag the target or other features.


In [1]:
import pandas as pd

## Load Data

In [9]:
df = pd.read_csv('../../Datasets/example_retail_sales.csv', parse_dates=['ds'], index_col=['ds'])
df.index.name = 'date'
df.plot(y=['y'], marker='.', figsize=(15,4) )

<img src='./plots/retail-sales-plot.png'>

## Lets understands pandas `shift` method

lag = 2  
*  As the time series is uniformly spaced by months we can
*  specify the lag in terms of the number of periods. In
*  this case the period is in unit of months.

### I want to shift the target `y` two periods forward

In [10]:
df.shift(periods=2).head()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1992-01-01,
1992-02-01,
1992-03-01,146376.0
1992-04-01,147079.0
1992-05-01,159336.0


### I want to shift the target `y` two periods backwards

In [11]:
df.shift(periods=-2).head()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1992-01-01,159336.0
1992-02-01,163669.0
1992-03-01,170068.0
1992-04-01,168663.0
1992-05-01,169890.0


In [12]:
df.shift(periods=-2).tail()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
2016-01-01,460093.0
2016-02-01,450935.0
2016-03-01,471421.0
2016-04-01,
2016-05-01,


## You can also specify the `freq` of the time index so that the correct time duration is lagged rather than simply the number of rows.

### I want to shift the date 2 month forward


`M` = `MONTH`

`MS` = `MONTH START`

In [15]:
df.shift(freq='2M').head()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1992-02-29,146376
1992-03-31,147079
1992-04-30,159336
1992-05-31,163669
1992-06-30,170068


In [14]:
df.shift(freq='2MS').head()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1992-03-01,146376
1992-04-01,147079
1992-05-01,159336
1992-06-01,163669
1992-07-01,170068


In [16]:
df.shift(freq='2MS').tail()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
2016-03-01,400928
2016-04-01,413554
2016-05-01,460093
2016-06-01,450935
2016-07-01,471421


### I want to shift the date 2 month backward

In [18]:
df.shift(freq='-2MS').head()

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1991-11-01,146376
1991-12-01,147079
1992-01-01,159336
1992-02-01,163669
1992-03-01,170068


## Using both `periods` and `freq`

In [19]:
df.shift(periods=1, freq='MS')

Unnamed: 0_level_0,y
date,Unnamed: 1_level_1
1992-02-01,146376
1992-03-01,147079
1992-04-01,159336
1992-05-01,163669
1992-06-01,170068
...,...
2016-02-01,400928
2016-03-01,413554
2016-04-01,460093
2016-05-01,450935


## Compute lag features using Feature-engine

In [20]:
from feature_engine.timeseries.forecasting import LagFeatures

In [27]:
lag_transformer = LagFeatures(variables=['y'], freq=['1MS', '2MS', '3MS', '4MS', '5MS'])

lag_transformer.fit_transform(df).head(10)

Unnamed: 0_level_0,y,y_lag_1MS,y_lag_2MS,y_lag_3MS,y_lag_4MS,y_lag_5MS
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1992-01-01,146376,,,,,
1992-02-01,147079,146376.0,,,,
1992-03-01,159336,147079.0,146376.0,,,
1992-04-01,163669,159336.0,147079.0,146376.0,,
1992-05-01,170068,163669.0,159336.0,147079.0,146376.0,
1992-06-01,168663,170068.0,163669.0,159336.0,147079.0,146376.0
1992-07-01,169890,168663.0,170068.0,163669.0,159336.0,147079.0
1992-08-01,170364,169890.0,168663.0,170068.0,163669.0,159336.0
1992-09-01,164617,170364.0,169890.0,168663.0,170068.0,163669.0
1992-10-01,173655,164617.0,170364.0,169890.0,168663.0,170068.0
