
## Order Items Mart â€“ Implementation


## Business Logic ##

<p align="center">
  <img src="../EDA/OrderItems.png" width="1000"/>
</p>

Based on insights from the EDA, the following transformations were implemented in the **Order Items Mart**:

* Identified the native grain of the `order_items` table as a combination of **order_id, product_id, and seller_id**.

* Aggregated records at this composite level to infer **item quantity**, since the dataset does not contain an explicit quantity field.

* Validated that **price** and **freight_value** remain constant within each `(order_id, product_id, seller_id)` combination, enabling reliable computation of item-level totals.

* Computed **item-level monetary metrics** by multiplying inferred quantity with the sum of price and freight value.

* Rolled up all item-level metrics to the **order_id grain**, producing join-safe order-level features for downstream marts and modeling.



In [35]:
import pandas as pd
import os
cart_items=pd.read_csv("../Source Data/olist_order_items_dataset.csv")

In [36]:
Total_Items = (
    cart_items.groupby(["order_id", "product_id", "seller_id"])["order_item_id"]
    .count()
    .reset_index(name="Total_Items")
)

cart_items = cart_items.merge(
    Total_Items[["order_id", "product_id", "seller_id", "Total_Items"]],
    on=["order_id","product_id","seller_id"],
    how="left"
)

In [37]:
cart_items.loc[cart_items["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,order_item_id,product_id,seller_id,shipping_limit_date,price,freight_value,Total_Items
57297,8272b63d03f5f79c56e9e4120aec44ef,1,270516a3f41dc035aa87d220228f844c,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57298,8272b63d03f5f79c56e9e4120aec44ef,2,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57299,8272b63d03f5f79c56e9e4120aec44ef,3,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57300,8272b63d03f5f79c56e9e4120aec44ef,4,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57301,8272b63d03f5f79c56e9e4120aec44ef,5,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57302,8272b63d03f5f79c56e9e4120aec44ef,6,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57303,8272b63d03f5f79c56e9e4120aec44ef,7,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57304,8272b63d03f5f79c56e9e4120aec44ef,8,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57305,8272b63d03f5f79c56e9e4120aec44ef,9,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10
57306,8272b63d03f5f79c56e9e4120aec44ef,10,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10


In [38]:
df_cart_items=cart_items.copy()

In [39]:
df_card_items = df_cart_items.drop(columns=["order_item_id"])
df_card_items.loc[:, "Total_Order_Value"] = (
    (df_card_items["freight_value"] + df_card_items["price"]) * df_card_items["Total_Items"]
)

In [40]:
df_card_items.loc[df_card_items["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,product_id,seller_id,shipping_limit_date,price,freight_value,Total_Items,Total_Order_Value
57297,8272b63d03f5f79c56e9e4120aec44ef,270516a3f41dc035aa87d220228f844c,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57298,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57299,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57300,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57301,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57302,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57303,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57304,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57305,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9
57306,8272b63d03f5f79c56e9e4120aec44ef,05b515fdc76e888aada3c6d66c201dff,2709af9587499e95e803a6498a5a56e9,2017-07-21 18:25:23,1.2,7.89,10,90.9


#

Aggegating Metrics based on OrderID, SellerId, ProductId combination

In [None]:
combo_level = (
    df_card_items.groupby(["order_id", "seller_id", "product_id"], as_index=False)
    .agg(
        total_price=("price", "sum"),
        avg_price=("price", "mean"),
        total_freight_value=("freight_value", "sum"),
        avg_freight_value=("freight_value", "mean"),
        total_order_value=("Total_Order_Value", "first"),
        total_items=("Total_Items", "first")
    )
)
combo_level.head()


Unnamed: 0,order_id,seller_id,product_id,total_price,avg_price,total_freight_value,avg_freight_value,total_order_value,total_items
0,00010242fe8c5a6d1ba2dd792cb16214,48436dade18ac8b2bce089ec2a041202,4244733e06e7ecb4970a6e2683c13e61,58.9,58.9,13.29,13.29,72.19,1
1,00018f77f2f0320c557190d7a144bdd3,dd7ddc04e1b6c2c614352b383efe2d36,e5f2d52b802189ee658865ca93d83a8f,239.9,239.9,19.93,19.93,259.83,1
2,000229ec398224ef6ca0657da4fc703e,5b51032eddd242adc84c38acab88f23d,c777355d18b72b67abbeef9df44fd0fd,199.0,199.0,17.87,17.87,216.87,1
3,00024acbcdf0a6daa1e931b038114c75,9d7a1d34a5052409006425275ba1c2b4,7634da152a4610f1595efa32f14722fc,12.99,12.99,12.79,12.79,25.78,1
4,00042b26cf59d7ce69dfabb4e55b4fd9,df560393f3a51e74553ab94004ba5c87,ac6c3623068f30de03045865e4e10089,199.9,199.9,18.14,18.14,218.04,1


In [42]:
order_level = (
    combo_level
    .drop(columns=["seller_id", "product_id"])
    .groupby("order_id", as_index=False)
    .sum()
)

order_level.head()


Unnamed: 0,order_id,total_price,avg_price,total_freight_value,avg_freight_value,total_order_value,total_items
0,00010242fe8c5a6d1ba2dd792cb16214,58.9,58.9,13.29,13.29,72.19,1
1,00018f77f2f0320c557190d7a144bdd3,239.9,239.9,19.93,19.93,259.83,1
2,000229ec398224ef6ca0657da4fc703e,199.0,199.0,17.87,17.87,216.87,1
3,00024acbcdf0a6daa1e931b038114c75,12.99,12.99,12.79,12.79,25.78,1
4,00042b26cf59d7ce69dfabb4e55b4fd9,199.9,199.9,18.14,18.14,218.04,1


In [43]:
order_level.to_csv("../Processed Data/prd_card_order_totals.csv", index=False)

In [54]:
combo_level.to_csv("../Processed Data/int_combo_level_totals.csv", index=False)

## Validation ##

In [44]:
order_totals = (
    df_card_items.drop_duplicates()
    .groupby("order_id", as_index=False)["Total_Order_Value"]
    .sum()
    .rename(columns={"Total_Order_Value": "total_order"})
)

order_totals.head()

Unnamed: 0,order_id,total_order
0,00010242fe8c5a6d1ba2dd792cb16214,72.19
1,00018f77f2f0320c557190d7a144bdd3,259.83
2,000229ec398224ef6ca0657da4fc703e,216.87
3,00024acbcdf0a6daa1e931b038114c75,25.78
4,00042b26cf59d7ce69dfabb4e55b4fd9,218.04


In [45]:
order_totals.loc[order_totals["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,total_order
50137,8272b63d03f5f79c56e9e4120aec44ef,196.17


In [46]:
order_totals.loc[order_totals["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,total_order
50137,8272b63d03f5f79c56e9e4120aec44ef,196.17


In [47]:
order_level.loc[order_level["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,total_price,avg_price,total_freight_value,avg_freight_value,total_order_value,total_items
50137,8272b63d03f5f79c56e9e4120aec44ef,31.8,10.2,164.37,22.35,196.17,21


In [48]:
combo_level.loc[combo_level["order_id"] =='8272b63d03f5f79c56e9e4120aec44ef']

Unnamed: 0,order_id,seller_id,product_id,total_price,avg_price,total_freight_value,avg_freight_value,total_order_value,total_items
52027,8272b63d03f5f79c56e9e4120aec44ef,2709af9587499e95e803a6498a5a56e9,05b515fdc76e888aada3c6d66c201dff,12.0,1.2,78.9,7.89,90.9,10
52028,8272b63d03f5f79c56e9e4120aec44ef,2709af9587499e95e803a6498a5a56e9,270516a3f41dc035aa87d220228f844c,12.0,1.2,78.9,7.89,90.9,10
52029,8272b63d03f5f79c56e9e4120aec44ef,2709af9587499e95e803a6498a5a56e9,79ce45dbc2ea29b22b5a261bbb7b7ee7,7.8,7.8,6.57,6.57,14.37,1
