## Key Concepts in Time Series Feature Engineering

1. **Lag Features**: These are previous values of the time series that can help capture trends and patterns. For instance, if predicting today's sales, using yesterday's sales as a feature can provide valuable insights.

2. **Window Features**: These summarize values over a fixed window of prior time steps. For example, calculating the average sales over the last week can help smooth out noise and highlight underlying trends.

3. **Time-based Features**: These include components of the date and time, such as the day of the week, month, or whether a date falls on a holiday. These features can capture seasonality and other temporal patterns that are significant for forecasting.


In [2]:
# Lag Features

In [5]:
import pandas as pd
data = {'date': pd.date_range(start='2022-01-01', periods=10, freq='D'),
        'sales': [100, 120, 130, 110, 150, 160, 170, 180, 190, 200]}
df = pd.DataFrame(data)

df.set_index('date',inplace=True)
df

Unnamed: 0_level_0,sales
date,Unnamed: 1_level_1
2022-01-01,100
2022-01-02,120
2022-01-03,130
2022-01-04,110
2022-01-05,150
2022-01-06,160
2022-01-07,170
2022-01-08,180
2022-01-09,190
2022-01-10,200


In [6]:
df['lag_1'] = df['sales'].shift(1)
df['lag_2'] = df['sales'].shift(2)

In [7]:
df

Unnamed: 0_level_0,sales,lag_1,lag_2
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-01-01,100,,
2022-01-02,120,100.0,
2022-01-03,130,120.0,100.0
2022-01-04,110,130.0,120.0
2022-01-05,150,110.0,130.0
2022-01-06,160,150.0,110.0
2022-01-07,170,160.0,150.0
2022-01-08,180,170.0,160.0
2022-01-09,190,180.0,170.0
2022-01-10,200,190.0,180.0


In [8]:
# Rolling Window Features

In [10]:
df['rolling_mean_3'] = df['sales'].rolling(window=3).mean()
df['rolling_std_3'] = df['sales'].rolling(window=3).std()
print(df)

            sales  lag_1  lag_2  rolling_mean_3  rolling_std_3
date                                                          
2022-01-01    100    NaN    NaN             NaN            NaN
2022-01-02    120  100.0    NaN             NaN            NaN
2022-01-03    130  120.0  100.0      116.666667      15.275252
2022-01-04    110  130.0  120.0      120.000000      10.000000
2022-01-05    150  110.0  130.0      130.000000      20.000000
2022-01-06    160  150.0  110.0      140.000000      26.457513
2022-01-07    170  160.0  150.0      160.000000      10.000000
2022-01-08    180  170.0  160.0      170.000000      10.000000
2022-01-09    190  180.0  170.0      180.000000      10.000000
2022-01-10    200  190.0  180.0      190.000000      10.000000


In [18]:
# Time=based Features
df['day_of_week'] = df.index.dayofweek
df['is_weekend'] = df['day_of_week'].apply(lambda x: 1 if x>= 5 else 0)

In [19]:
df

Unnamed: 0_level_0,sales,lag_1,lag_2,rolling_mean_3,rolling_std_3,day_of_week,is_weekend
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-01-01,100,,,,,5,1
2022-01-02,120,100.0,,,,6,1
2022-01-03,130,120.0,100.0,116.666667,15.275252,0,0
2022-01-04,110,130.0,120.0,120.0,10.0,1,0
2022-01-05,150,110.0,130.0,130.0,20.0,2,0
2022-01-06,160,150.0,110.0,140.0,26.457513,3,0
2022-01-07,170,160.0,150.0,160.0,10.0,4,0
2022-01-08,180,170.0,160.0,170.0,10.0,5,1
2022-01-09,190,180.0,170.0,180.0,10.0,6,1
2022-01-10,200,190.0,180.0,190.0,10.0,0,0
