# E-Commerce Product Recommendation System
### Created by: Nafisa Begam

## Problem Statement:
To Build a recommendation system for an e-commerce platform to suggest relevant products to users based on their purchase history.

##  Dataset Description:
The dataset is sourced from the Retailrocket e-commerce dataset, containing:
* User transactions (transaction events)
* Product properties
* Product category hierarchy

##  Tools and Libraries:
- Python 3
- Pandas
- NumPy
- scikit-learn
- Cosine Similarity
- Jupyter Notebook


# Step 1:Importing libraries

In [None]:
# Step 1: Importing required libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

##  Conclusion:
Successfully built a product recommendation system using item-based collaborative filtering on e-commerce transaction data. 
The system suggests similar products based on user purchase behavior.

# Step 2: Loading the DATA

In [26]:
# Loading events data
events = pd.read_csv('events.csv')

# Loading item properties data
item_properties = pd.read_csv('item_properties_part1.csv')

# Loading category tree data
category_tree = pd.read_csv('category_tree.csv')


# Step 3:Exploring the Data

In [27]:
# To Check first 5 rows of events
events.head()

Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,


In [28]:
# To Check data info
events.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2756101 entries, 0 to 2756100
Data columns (total 5 columns):
 #   Column         Dtype  
---  ------         -----  
 0   timestamp      int64  
 1   visitorid      int64  
 2   event          object 
 3   itemid         int64  
 4   transactionid  float64
dtypes: float64(1), int64(3), object(1)
memory usage: 105.1+ MB


In [29]:
# To check Number of unique users and items
print("Unique users:", events['visitorid'].nunique())
print("Unique items:", events['itemid'].nunique())

Unique users: 1407580
Unique items: 235061


In [30]:
# To check Event type distribution
events['event'].value_counts()

event
view           2664312
addtocart        69332
transaction      22457
Name: count, dtype: int64

In [31]:
# To check List of event types
events['event'].unique()

array(['view', 'addtocart', 'transaction'], dtype=object)

# Step 4:Data Cleaning

In [32]:
# Checking for missing values
events.isnull().sum()

timestamp              0
visitorid              0
event                  0
itemid                 0
transactionid    2733644
dtype: int64

In [33]:
# To check whether it keeps only 'transaction' events (as per dataset)
events = events[events['event'] == 'transaction']

# To Confirm only 'transaction' exists now
events['event'].unique()

array(['transaction'], dtype=object)

# Step 5: Creating User-Item Interaction Matrix

In [34]:
# Creating pivot table for user-item interactions
user_item_matrix = events.pivot_table(index='visitorid', columns='itemid', values='transactionid', aggfunc='count', fill_value=0)

# Checking shape of matrix
user_item_matrix.shape

(11719, 12025)

In [35]:
# to View first few rows
user_item_matrix.head()

itemid,15,19,25,42,147,168,199,212,233,304,...,466319,466321,466342,466443,466464,466526,466603,466614,466710,466861
visitorid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
172,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
186,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
264,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
419,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
539,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# Step 6: Computing Item-Item Cosine Similarity

In [36]:
# Calculating cosine similarity between items (columns)
item_similarity = cosine_similarity(user_item_matrix.T)

# Converting similarity matrix to DataFrame
item_similarity_df = pd.DataFrame(item_similarity, 
                                  index=user_item_matrix.columns, 
                                  columns=user_item_matrix.columns)

# To View first few rows of similarity matrix
item_similarity_df.head()

itemid,15,19,25,42,147,168,199,212,233,304,...,466319,466321,466342,466443,466464,466526,466603,466614,466710,466861
itemid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
15,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
42,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
147,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Step 7: Recommendation Function

In [38]:
# Function to get top N similar items
def get_similar_items(item_id, top_n=5):
    if item_id in item_similarity_df:
        similar_scores = item_similarity_df[item_id].sort_values(ascending=False)
        similar_scores = similar_scores.drop(item_id)  # Remove the item itself
        return similar_scores.head(top_n)
    else:
        return "Item not found in data"


# Step 8: To Test the Recommendation System

In [39]:
# Replacing 355908 with any itemid that I wished
get_similar_items(355908)

'Item not found in data'

##  Conclusion:
Successfully built a product recommendation system using item-based collaborative filtering on e-commerce transaction data. 
The system suggests similar products based on user purchase behavior.

---

##  Created by:
**Nafisa Begam**
