
## Products Mart – Implementation

Based on the EDA findings, the following steps were implemented in the **Products Mart**:

* Preserved the **product_id–level grain**, with each record representing a unique product.

* Joined the **product category translation table** to map category identifiers to **English category names**.

* Enabled category-level analysis by enriching products with readable category labels, supporting downstream evaluation of **category-wise customer spend and purchasing patterns**.

* These product and category features were made available for integration into order- and customer-level marts.


In [1]:
import pandas as pd
import os
products=pd.read_csv("../Source Data/olist_products_dataset.csv")
product_cat_translation=pd.read_csv("../Source Data/product_category_name_translation.csv")

In [2]:
products = products.rename(columns={
    "product_name_lenght": "product_name_length",
    "product_description_lenght": "product_description_length"
})

Getting Translated Product Category Name

In [3]:
products["product_category_name"] = products["product_category_name"].str.strip()
product_cat_translation["product_category_name"] = product_cat_translation["product_category_name"].str.strip()

products = products.merge(
    product_cat_translation[["product_category_name", "product_category_name_english"]],
    on="product_category_name",
    how="left"
)

In [4]:
products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32951 entries, 0 to 32950
Data columns (total 10 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   product_id                     32951 non-null  object 
 1   product_category_name          32341 non-null  object 
 2   product_name_length            32341 non-null  float64
 3   product_description_length     32341 non-null  float64
 4   product_photos_qty             32341 non-null  float64
 5   product_weight_g               32949 non-null  float64
 6   product_length_cm              32949 non-null  float64
 7   product_height_cm              32949 non-null  float64
 8   product_width_cm               32949 non-null  float64
 9   product_category_name_english  32328 non-null  object 
dtypes: float64(7), object(3)
memory usage: 2.5+ MB


Replacing Null Names with Unknown

In [5]:
import numpy as np

products["final_product_category"] = np.where(
    products["product_category_name_english"].notna(),
    products["product_category_name_english"],
    np.where(
        products["product_category_name"].notna(),
        products["product_category_name"],
        "unknown"
    )
)


## VALIDATION ##

In [7]:
products["product_id"].duplicated().sum()  

0

In [8]:
products.to_csv("../Processed Data/prd_products.csv", index=False)