# Feature Engineering

### Goal
Transform raw time-series data into meaningful features that:
- Capture temporal patterns
- Improve model learning
- Increase forecasting accuracy


In [None]:
import pandas as pd
import numpy as np


In [None]:
df = pd.read_csv("../data/raw/sales.csv")
df.columns = df.columns.str.lower().str.replace(" ", "_")

df['order_date'] = pd.to_datetime(df['order_date'])
df = df.sort_values('order_date')
df.head()


In [None]:
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
df['week'] = df['order_date'].dt.isocalendar().week
df['day'] = df['order_date'].dt.day
df['day_of_week'] = df['order_date'].dt.dayofweek

df.head()


Time-based features allow models to learn:
- Seasonal patterns
- Weekly and monthly behavior


In [None]:
lags = [1, 7, 14, 30]

for lag in lags:
    df[f'sales_lag_{lag}'] = df['sales'].shift(lag)

df.head(10)


In [None]:
windows = [7, 14, 30]

for w in windows:
    df[f'sales_roll_mean_{w}'] = df['sales'].rolling(w).mean()
    df[f'sales_roll_std_{w}'] = df['sales'].rolling(w).std()

df.head(15)


In [None]:
df_fe = df.dropna().reset_index(drop=True)
df_fe.head()


In [None]:
df_fe.to_csv("../data/processed/sales_features.csv", index=False)


## Feature Engineering Summary

- Created temporal features (year, month, week)
- Engineered lag features to capture historical dependency
- Added rolling statistics for trend smoothing
- Removed missing values caused by shifting
- Dataset is now ready for machine learning models
