# Feature Engineering for Time Series Forecasting

In this notebook, we will perform feature engineering on our time series dataset. This includes creating lag features and rolling statistics to enhance the dataset for modeling.

In [None]:
import pandas as pd
import numpy as np

# Load the processed data
data = pd.read_csv('../data/processed/processed_data.csv', parse_dates=['date'], index_col='date')

# Display the first few rows of the dataset
data.head()

## Creating Lag Features

Lag features are previous time steps of the target variable. They can help the model understand the temporal dependencies in the data.

In [None]:
def create_lag_features(data, lag=1):
    for i in range(1, lag + 1):
        data[f'lag_{i}'] = data['target_variable'].shift(i)
    return data

# Create lag features
data = create_lag_features(data, lag=5)
data.head()

## Creating Rolling Statistics

Rolling statistics can provide insights into trends and seasonality in the data.

In [None]:
def create_rolling_statistics(data, window=3):
    data[f'rolling_mean_{window}'] = data['target_variable'].rolling(window=window).mean()
    data[f'rolling_std_{window}'] = data['target_variable'].rolling(window=window).std()
    return data

# Create rolling statistics
data = create_rolling_statistics(data, window=3)
data.head()

## Finalizing the Dataset

After creating the lag features and rolling statistics, we will finalize the dataset by dropping any rows with NaN values.

In [None]:
# Drop rows with NaN values
data.dropna(inplace=True)

# Save the enhanced dataset
data.to_csv('../data/processed/enhanced_data.csv')
data.head()