# Weekly Aggregation & Time-Series Structuring

## Objective
- Aggregate daily retail data to weekly level
- Align data with inventory planning cadence
- Create time-series–ready dataset for forecasting


In [1]:
import pandas as pd

df = pd.read_csv(
    "../data/processed/cleaned_data.csv",
    parse_dates=["date"]
)

df.head()


Unnamed: 0,date,store_id,product_id,category,region,inventory_level,units_sold,units_ordered,price,discount,weather_condition,holiday_promotion,competitor_pricing,seasonality
0,2022-01-01,S001,P0001,Groceries,North,231,127,55,33.5,20,Rainy,0,29.69,Autumn
1,2022-01-01,S001,P0002,Toys,South,204,150,66,63.01,20,Sunny,0,66.16,Autumn
2,2022-01-01,S001,P0003,Toys,West,102,65,51,27.99,10,Sunny,1,31.32,Summer
3,2022-01-01,S001,P0004,Toys,North,469,61,164,32.72,10,Cloudy,1,34.74,Autumn
4,2022-01-01,S001,P0005,Electronics,East,166,14,135,73.64,0,Sunny,0,68.95,Summer


In [3]:
df["week"] = df["date"].dt.to_period("W").apply(lambda r: r.start_time)

df[["date", "week"]].head()


Unnamed: 0,date,week
0,2022-01-01,2021-12-27
1,2022-01-01,2021-12-27
2,2022-01-01,2021-12-27
3,2022-01-01,2021-12-27
4,2022-01-01,2021-12-27


## Weekly Aggregation Logic

- units_sold: SUM (weekly demand)
- units_ordered: SUM (weekly replenishment)
- inventory_level: MEAN (average stock position)
- price: MEAN (average selling price)
- discount: MEAN (pricing intensity)
- holiday_promotion: MAX (whether any promotion occurred)


In [4]:
weekly_df = (
    df.groupby(["store_id", "product_id", "week"])
      .agg(
          weekly_units_sold=("units_sold", "sum"),
          weekly_units_ordered=("units_ordered", "sum"),
          avg_inventory_level=("inventory_level", "mean"),
          avg_price=("price", "mean"),
          avg_discount=("discount", "mean"),
          holiday_promotion=("holiday_promotion", "max")
      )
      .reset_index()
)

weekly_df.head()


Unnamed: 0,store_id,product_id,week,weekly_units_sold,weekly_units_ordered,avg_inventory_level,avg_price,avg_discount,holiday_promotion
0,S001,P0001,2021-12-27,208,159,173.5,30.725,15.0,0
1,S001,P0001,2022-01-03,706,977,210.571429,60.3,12.857143,1
2,S001,P0001,2022-01-10,686,1031,182.285714,52.84,10.0,1
3,S001,P0001,2022-01-17,1142,1051,293.714286,59.672857,7.857143,1
4,S001,P0001,2022-01-24,685,740,269.0,65.421429,11.428571,1


In [5]:
weekly_df.shape


(10600, 9)

In [6]:
df[["store_id", "product_id"]].drop_duplicates().shape


(100, 2)

In [7]:
weekly_df = weekly_df.sort_values(
    by=["store_id", "product_id", "week"]
)


In [8]:
weekly_df.duplicated(
    subset=["store_id", "product_id", "week"]
).sum()


np.int64(0)

In [9]:
weekly_df.to_csv(
    "../data/processed/weekly_time_series.csv",
    index=False
)


## Weekly Aggregation Summary

- Weekly aggregation completed successfully
- 100 store–product time series created
- Each series spans ~106 weeks
- Dataset is suitable for demand forecasting and inventory optimization
