
## Order Items–Products Mart – Implementation

To enable seamless integration between order-level and product-level data, a dedicated **Order Items–Products Mart** was created.

* Identified that the **products table** operates at the **product_id grain**, making it the only core dimension not directly aligned with order-level datasets.

* Combined **order_items** and **products** dimensions to bridge the grain mismatch between orders and products.

* Engineered **product-level features** (such as product volume and related physical attributes) prior to the join, ensuring these signals were preserved correctly.

* Aggregated product-level features to the **order_id grain** through the order items table, preventing duplication and maintaining join safety.

* Exposed enriched order–product features for downstream use in the final master table and modeling workflows.


In [44]:
import pandas as pd

int_order_items = pd.read_csv("../Processed Data/int_combo_level_totals.csv")
prd_products = pd.read_csv("../Processed Data/prd_products.csv")

In [45]:
prd_products["volume"] = (
    prd_products["product_length_cm"] *
    prd_products["product_height_cm"] *
    prd_products["product_width_cm"]
)

prd_products = prd_products.drop(
    columns=["product_length_cm", "product_height_cm", "product_width_cm","product_category_name","final_product_category"]
)

prd_products["volume"] = prd_products["volume"].fillna(0)


In [47]:
int_order_items = int_order_items.merge(prd_products, on="product_id", how="left")


In [48]:
def safe_mode(s):
    s = s.dropna()
    return s.mode().iloc[0] if not s.empty else "unknown"

order_rollup = (
    int_order_items.groupby("order_id", as_index=False)
    .agg(
        # category/product features
        n_distinct_categories=("product_category_name_english", "nunique"),
        main_category=("product_category_name_english", safe_mode),
        avg_photos_per_product=("product_photos_qty", "mean"),
        avg_desc_length=("product_description_length", "mean"),
        avg_weight=("product_weight_g", "mean"),
        avg_volume=("volume", "mean"),
        max_weight=("product_weight_g", "max"),
        max_volume=("volume", "max"),

        # order-item metrics
        total_price=("total_price", "sum"),
        min_price=("avg_price", "min"),
        max_price=("avg_price", "max"),
        avg_price=("avg_price", "mean"),

        total_freight_value=("total_freight_value", "sum"),
        avg_freight_value=("avg_freight_value", "mean"),

        total_order_value=("total_order_value", "sum"),
        avg_order_value=("total_order_value", "mean"),
        min_order_value=("total_order_value", "min"),
        max_order_value=("total_order_value", "max"),

        total_items=("total_items", "sum"),
        min_items=("total_items", "min"),
        max_items=("total_items", "max"),
    )
)


In [58]:
order_rollup.to_csv("../Processed Data/prd_Products_OrderItems.csv", index=False)

## VALIDATION ##

In [50]:
int_order_items.loc[int_order_items["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,seller_id,product_id,total_price,avg_price,total_freight_value,avg_freight_value,total_order_value,total_items,product_name_length,product_description_length,product_photos_qty,product_weight_g,product_category_name_english,volume
52027,8272b63d03f5f79c56e9e4120aec44ef,2709af9587499e95e803a6498a5a56e9,05b515fdc76e888aada3c6d66c201dff,12.0,1.2,78.9,7.89,90.9,10,45.0,231.0,3.0,800.0,health_beauty,1260.0
52028,8272b63d03f5f79c56e9e4120aec44ef,2709af9587499e95e803a6498a5a56e9,270516a3f41dc035aa87d220228f844c,12.0,1.2,78.9,7.89,90.9,10,45.0,232.0,3.0,800.0,health_beauty,1260.0
52029,8272b63d03f5f79c56e9e4120aec44ef,2709af9587499e95e803a6498a5a56e9,79ce45dbc2ea29b22b5a261bbb7b7ee7,7.8,7.8,6.57,6.57,14.37,1,27.0,152.0,2.0,1000.0,health_beauty,1800.0


In [55]:
c=(int_order_items.loc[int_order_items["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef'])
li=c['product_id'].tolist()

In [57]:
prd_products.loc[prd_products["product_id"].isin(li)]

Unnamed: 0,product_id,product_name_length,product_description_length,product_photos_qty,product_weight_g,product_category_name_english,volume
1962,79ce45dbc2ea29b22b5a261bbb7b7ee7,27.0,152.0,2.0,1000.0,health_beauty,1800.0
2742,270516a3f41dc035aa87d220228f844c,45.0,232.0,3.0,800.0,health_beauty,1260.0
16218,05b515fdc76e888aada3c6d66c201dff,45.0,231.0,3.0,800.0,health_beauty,1260.0


In [49]:
order_rollup.loc[order_rollup["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,n_distinct_categories,main_category,avg_photos_per_product,avg_desc_length,avg_weight,avg_volume,max_weight,max_volume,total_price,...,avg_price,total_freight_value,avg_freight_value,total_order_value,avg_order_value,min_order_value,max_order_value,total_items,min_items,max_items
50137,8272b63d03f5f79c56e9e4120aec44ef,1,health_beauty,2.666667,205.0,866.666667,1440.0,1000.0,1800.0,31.8,...,3.4,164.37,7.45,196.17,65.39,14.37,90.9,21,1,10
