# Feature Engineering for Demand Forecasting

## Objective
- Create interpretable time-series features
- Capture demand patterns, trends, and volatility
- Prepare model-ready dataset for forecasting and inventory decisions


In [1]:
import pandas as pd
import numpy as np

weekly_df = pd.read_csv(
    "../data/processed/weekly_time_series.csv",
    parse_dates=["week"]
)

weekly_df.head()


Unnamed: 0,store_id,product_id,week,weekly_units_sold,weekly_units_ordered,avg_inventory_level,avg_price,avg_discount,holiday_promotion
0,S001,P0001,2021-12-27,208,159,173.5,30.725,15.0,0
1,S001,P0001,2022-01-03,706,977,210.571429,60.3,12.857143,1
2,S001,P0001,2022-01-10,686,1031,182.285714,52.84,10.0,1
3,S001,P0001,2022-01-17,1142,1051,293.714286,59.672857,7.857143,1
4,S001,P0001,2022-01-24,685,740,269.0,65.421429,11.428571,1


In [2]:
weekly_df = weekly_df.sort_values(
    by=["store_id", "product_id", "week"]
)


In [3]:
for lag in [1, 2, 4]:
    weekly_df[f"lag_{lag}_units_sold"] = (
        weekly_df
        .groupby(["store_id", "product_id"])["weekly_units_sold"]
        .shift(lag)
    )


In [4]:
weekly_df["rolling_4wk_avg"] = (
    weekly_df
    .groupby(["store_id", "product_id"])["weekly_units_sold"]
    .transform(lambda x: x.rolling(4).mean())
)

weekly_df["rolling_8wk_avg"] = (
    weekly_df
    .groupby(["store_id", "product_id"])["weekly_units_sold"]
    .transform(lambda x: x.rolling(8).mean())
)


In [5]:
weekly_df["rolling_4wk_std"] = (
    weekly_df
    .groupby(["store_id", "product_id"])["weekly_units_sold"]
    .transform(lambda x: x.rolling(4).std())
)


In [6]:
weekly_df["week_over_week_change"] = (
    weekly_df
    .groupby(["store_id", "product_id"])["weekly_units_sold"]
    .pct_change()
)


In [7]:
weekly_df.isnull().sum()


store_id                   0
product_id                 0
week                       0
weekly_units_sold          0
weekly_units_ordered       0
avg_inventory_level        0
avg_price                  0
avg_discount               0
holiday_promotion          0
lag_1_units_sold         100
lag_2_units_sold         200
lag_4_units_sold         400
rolling_4wk_avg          300
rolling_8wk_avg          700
rolling_4wk_std          300
week_over_week_change    100
dtype: int64

In [8]:
feature_cols = [
    "lag_1_units_sold",
    "lag_2_units_sold",
    "lag_4_units_sold",
    "rolling_4wk_avg",
    "rolling_8wk_avg"
]

weekly_df = weekly_df.dropna(subset=feature_cols)


In [9]:
weekly_df.head()
weekly_df.info()


<class 'pandas.core.frame.DataFrame'>
Index: 9900 entries, 7 to 10599
Data columns (total 16 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   store_id               9900 non-null   object        
 1   product_id             9900 non-null   object        
 2   week                   9900 non-null   datetime64[ns]
 3   weekly_units_sold      9900 non-null   int64         
 4   weekly_units_ordered   9900 non-null   int64         
 5   avg_inventory_level    9900 non-null   float64       
 6   avg_price              9900 non-null   float64       
 7   avg_discount           9900 non-null   float64       
 8   holiday_promotion      9900 non-null   int64         
 9   lag_1_units_sold       9900 non-null   float64       
 10  lag_2_units_sold       9900 non-null   float64       
 11  lag_4_units_sold       9900 non-null   float64       
 12  rolling_4wk_avg        9900 non-null   float64       
 13  rolling

In [10]:
weekly_df.to_csv(
    "../data/processed/feature_engineered_data.csv",
    index=False
)


## Feature Engineering Summary

- Lag features capture recent demand behavior
- Rolling averages smooth short- and medium-term demand
- Volatility indicates inventory risk
- Momentum reflects changing demand trends

These features enable interpretable and reliable demand forecasting
aligned with real-world inventory planning decisions.
