Product Recommendation Engine

This project uses the Olist Brazilian e-commerce dataset to build a rule-based recommendation engine.  
We analyze user behavior, segment customers into personas, identify their top category, and recommend the 5 most popular products from that category.

✅ Built using Python & pandas  
✅ Visualized with Streamlit (external app)  
✅ Dataset from [Kaggle Olist E-Commerce](https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce)

Load all required CSVs

In [1]:
import pandas as pd
import numpy as np

customers = pd.read_csv('/content/olist_customers_dataset.csv')
orders = pd.read_csv('/content/olist_orders_dataset.csv')
order_items = pd.read_csv('/content/olist_order_items_dataset.csv')
products = pd.read_csv('/content/olist_products_dataset.csv')
payments = pd.read_csv('/content/olist_order_payments_dataset.csv')
sellers = pd.read_csv('/content/olist_sellers_dataset.csv')
category_translation = pd.read_csv('/content/product_category_name_translation.csv')


🟨 2. Data Merging & Cleaning

In [2]:

cust_orders = pd.merge(orders, customers, on='customer_id', how='inner')

cust_order_items = pd.merge(cust_orders, order_items, on='order_id', how='inner')

cust_order_products = pd.merge(cust_order_items, products, on='product_id', how='left')

cust_order_products = pd.merge(cust_order_products, category_translation,
                                on='product_category_name', how='left')

cust_order_products[['customer_unique_id', 'order_id', 'product_id', 'product_category_name_english']].head()


Unnamed: 0,customer_unique_id,order_id,product_id,product_category_name_english
0,7c396fd4830fd04220f754e42b4e5bff,e481f51cbdc54678b7cc49136f2d6af7,87285b34884572647811a353c7ac498a,housewares
1,af07308b275d755c9edb36a90c618231,53cdb2fc8bc7dce0b6741e2150273451,595fac2a385ac33a80bd5114aec74eb8,perfumery
2,3a653a41f6f9fc3d2a113cf8398680e8,47770eb9100c2d0c44946d9cf07ec65d,aa4383b373c6aca5d8797843e5594415,auto
3,7c142cf63193a1473d2e66489a9ae977,949d5b44dbf5de918fe9c16f97b45f8a,d0b61bfb1de832b15ba9d266ca96e5b0,pet_shop
4,72632f0f9dd73dfee390c9b22eb56dd6,ad21c59c0840e6cb83a9ceb5573f8159,65266b2da20d04dbe00c5c2d3bb7859e,stationery


👥 3. User Behavior Analysis and 🟥 4. Segmentation Logic

In [8]:

user_behavior = cust_order_products.groupby('customer_unique_id').agg({
    'order_id': 'nunique',
    'product_category_name_english': 'nunique'
}).reset_index()

user_behavior.columns = ['customer_unique_id', 'num_orders', 'unique_categories']

orders_time = orders[['customer_id', 'order_purchase_timestamp']].copy()
orders_time['order_purchase_timestamp'] = pd.to_datetime(orders_time['order_purchase_timestamp'])
order_span = orders_time.groupby('customer_id').agg(['min', 'max'])['order_purchase_timestamp'].reset_index()
order_span['days_between'] = (order_span['max'] - order_span['min']).dt.days + 1

user_behavior = pd.merge(user_behavior, customers[['customer_id', 'customer_unique_id']], on='customer_unique_id', how='left')
user_behavior = pd.merge(user_behavior, order_span[['customer_id', 'days_between']], on='customer_id', how='left')
user_behavior['avg_decision_time_min'] = user_behavior['days_between'] / user_behavior['num_orders'] * 1440  # in minutes

def segment_user(row):
    if row['num_orders'] >= 3 and row['unique_categories'] <= 3:
        return 'Loyal Buyer'
    elif row['unique_categories'] >= 4:
        return 'Explorer'
    elif row['num_orders'] <= 2 and row['avg_decision_time_min'] < 1440:
        return 'Impulse Buyer'
    else:
        return 'Loyal Buyer'

user_behavior['user_segment'] = user_behavior.apply(segment_user, axis=1)

user_behavior[['customer_unique_id', 'num_orders', 'unique_categories', 'avg_decision_time_min', 'user_segment']].head()


Unnamed: 0,customer_unique_id,num_orders,unique_categories,avg_decision_time_min,user_segment
0,0000366f3b9a7992bf8c76cfdf3221e2,1,1,1440.0,Loyal Buyer
1,0000b849f77a49e4a4ce2b2a4ca5be3f,1,1,1440.0,Loyal Buyer
2,0000f46a3911fa3c0805444483337064,1,1,1440.0,Loyal Buyer
3,0000f6ccb0745a6a4b88665a16c9f078,1,1,1440.0,Loyal Buyer
4,0004aac84e0df4da2b147fca70cf8255,1,1,1440.0,Loyal Buyer


 🏷️ 5. Top Category per User

In [10]:

user_category = cust_order_products.groupby(
    ['customer_unique_id', 'product_category_name_english']
)['order_id'].count().reset_index()


user_category = user_category.sort_values(['customer_unique_id', 'order_id'], ascending=[True, False])

user_top_category = user_category.groupby('customer_unique_id').first().reset_index()
user_top_category.columns = ['customer_unique_id', 'top_category', 'order_count']

behavior_df = pd.merge(user_behavior, user_top_category, on='customer_unique_id', how='left')


behavior_df[['customer_unique_id', 'user_segment', 'top_category']].head()


Unnamed: 0,customer_unique_id,user_segment,top_category
0,0000366f3b9a7992bf8c76cfdf3221e2,Loyal Buyer,bed_bath_table
1,0000b849f77a49e4a4ce2b2a4ca5be3f,Loyal Buyer,health_beauty
2,0000f46a3911fa3c0805444483337064,Loyal Buyer,stationery
3,0000f6ccb0745a6a4b88665a16c9f078,Loyal Buyer,telephony
4,0004aac84e0df4da2b147fca70cf8255,Loyal Buyer,telephony


🟫 6. Top 5 Products Per Category

In [11]:

top_products = cust_order_products.groupby(
    ['product_category_name_english', 'product_id']
)['order_id'].count().reset_index()

top_products.columns = ['category', 'product_id', 'sales_count']

top_products = top_products.sort_values(['category', 'sales_count'], ascending=[True, False])

top_products['rank'] = top_products.groupby('category')['sales_count'].rank(method='first', ascending=False)
top_products_final = top_products[top_products['rank'] <= 5].reset_index(drop=True)
top_products_final.head()


Unnamed: 0,category,product_id,sales_count,rank
0,agro_industry_and_commerce,11250b0d4b709fee92441c5f34122aed,22,1.0
1,agro_industry_and_commerce,423a6644f0aa529e8828ff1f91003690,18,2.0
2,agro_industry_and_commerce,672e757f331900b9deea127a2a7b79fd,17,3.0
3,agro_industry_and_commerce,3bebad3cf2c8d1a8d3ce97174643e054,14,4.0
4,agro_industry_and_commerce,a0fe1efb855f3e786f0650268cd77f44,13,5.0


🟨 7. User-Level Recommendations

In [14]:
user_recommendations = pd.merge(
    behavior_df[['customer_unique_id', 'user_segment', 'top_category']],
    top_products_final,
    left_on='top_category',
    right_on='category',
    how='left'
)

recommendations_final = user_recommendations.groupby(
    ['customer_unique_id', 'user_segment', 'top_category']
)['product_id'].apply(list).reset_index()

recommendations_final['recommended_products'] = recommendations_final['product_id'].apply(lambda x: x[:5])

recommendations_final.drop(columns='product_id', inplace=True)

recommendations_final.head()

Unnamed: 0,customer_unique_id,user_segment,top_category,recommended_products
0,0000366f3b9a7992bf8c76cfdf3221e2,Loyal Buyer,bed_bath_table,"[99a4788cb24856965c36a24e339b6058, f1c7f353075..."
1,0000b849f77a49e4a4ce2b2a4ca5be3f,Loyal Buyer,health_beauty,"[154e7e31ebfa092203795c972e5804a6, 2b4609f8948..."
2,0000f46a3911fa3c0805444483337064,Loyal Buyer,stationery,"[fb55982be901439613a95940feefd9ee, 5411e926950..."
3,0000f6ccb0745a6a4b88665a16c9f078,Loyal Buyer,telephony,"[e7cc48a9daff5436f63d3aad9426f28b, c9c6fde7115..."
4,0004aac84e0df4da2b147fca70cf8255,Loyal Buyer,telephony,"[e7cc48a9daff5436f63d3aad9426f28b, c9c6fde7115..."


🟦 8. Final Export

In [18]:
recommendations_final.to_csv('user_recommendations.csv', index=False)


##  Conclusion

This project demonstrates a structured, data-driven approach to building a product recommendation engine using real-world e-commerce data from Olist. By analyzing user behavior, segmenting customers into well-defined personas, and identifying their most engaged product categories, we designed a rule-based engine that generates relevant product recommendations tailored to each user type.

The solution showcases:

- 📊 Behavioral analysis using order frequency, diversity, and category affinity
- 🧠 User segmentation into **Loyal Buyers**, **Explorers**, and **Impulse Buyers**
- 🎯 Personalized recommendations based on top product categories and popularity
- 📁 A clean, production-ready output for export or integration
- 🖥️ Bonus: a simple yet elegant Streamlit app for real-time recommendation display

This project combines strong analytical thinking with clear code structure and presentation, making it portfolio-ready for roles in data analysis, product analytics, or applied data science.

---

Feel free to explore the full notebook, CSV output, or launch the Streamlit app to experience the recommendation engine in action.
