# Project: Product Recommendation Model

Goal: Predict the probability a user will purchase a product → rank products for personalized recommendations.

# Dataset

Dataset: Olist Brazilian E-Commerce (Kaggle).
Period: 2016–2018.
Rows: 99K orders, 113K items, 33K products, 96K users.
Merged tables: orders, items, products, customers, reviews, category translations.
Engineered features (11): price, product rating/reviews, user spend/rating, recency, clicked (1 if purchased), category (one-hot).
Target: purchased (1 = bought).
Negatives: 3× random non-purchased pairs.
Final size: ~450K rows, ~25% positive.
Real transactional data with RFM signals; ideal for purchase prediction and top-N recommendations.

Here is link to ccess it : https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce

In [6]:
!pip install kaggle



In [7]:
import os
print(os.listdir())
os.listdir()


['.ipynb_checkpoints', 'brazilian-ecommerce.zip', 'kaggle.json', 'recommendation.ipynb']


['.ipynb_checkpoints',
 'brazilian-ecommerce.zip',
 'kaggle.json',
 'recommendation.ipynb']

In [8]:
import shutil

# Create .kaggle directory
os.makedirs(os.path.expanduser("~/.kaggle"), exist_ok=True)

# Move kaggle.json to ~/.kaggle
shutil.copy("kaggle.json", os.path.expanduser("~/.kaggle/kaggle.json"))

# Set permissions
os.chmod(os.path.expanduser("~/.kaggle/kaggle.json"), 0o600)

In [16]:
!kaggle datasets download -d olistbr/brazilian-ecommerce

Dataset URL: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
License(s): CC-BY-NC-SA-4.0
brazilian-ecommerce.zip: Skipping, found more recently modified local copy (use --force to force download)


In [17]:
import zipfile
import os

# Extract ZIP contents into 'olist_data' folder
with zipfile.ZipFile("brazilian-ecommerce.zip", "r") as zip_ref:
    zip_ref.extractall("olist_data")

# Confirm extracted files
print("Extracted files:", os.listdir("olist_data"))

Extracted files: ['olist_customers_dataset.csv', 'olist_geolocation_dataset.csv', 'olist_orders_dataset.csv', 'olist_order_items_dataset.csv', 'olist_order_payments_dataset.csv', 'olist_order_reviews_dataset.csv', 'olist_products_dataset.csv', 'olist_sellers_dataset.csv', 'product_category_name_translation.csv']


# Load & Merge the Data

In [23]:
import pandas as pd
import numpy as np

# Load core tables from the extracted folder
orders        = pd.read_csv('olist_data/olist_orders_dataset.csv')
order_items   = pd.read_csv('olist_data/olist_order_items_dataset.csv')
products      = pd.read_csv('olist_data/olist_products_dataset.csv')
customers     = pd.read_csv('olist_data/olist_customers_dataset.csv')
reviews       = pd.read_csv('olist_data/olist_order_reviews_dataset.csv')
category_name = pd.read_csv('olist_data/product_category_name_translation.csv')