# Product Recommendation System Project:


In this project, you can access a CSV format dataset. Your objective is to create a system that can generate a personalized list of recommended products for each user. Additionally, you'll be tasked with establishing a new recommendation databasesers.




- Begin by importing and visualizing the dataset. Organize the data into a 2D matrix format, utilizing User ID (UID) and Product ID as the basis for recording user interactions.
- Compute the features for each user. In this context, features represent the count of distinct products purchased by each user.
- Employ a nearest neighbor algorithm to identify the five closest neighbors for each user. These neighbors will be chosen based on similarity in product purchase behavior.
- Combine the products purchased by the current user and their five nearest neighbors (including the user itself). This forms a union of products collectively bought by this group.
- Implement the process of systematically identifying nearest neighbors and aggregating purchased products for each user. This will result in the creation of a recommendation database tailored to individual users.



In [121]:
# start here
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

### Data Importing and EDA

In [122]:
df_purchasing = pd.read_csv("data/project_I.csv")

# import the dataset
df_purchasing = df_purchasing
display(df_purchasing.shape, df_purchasing.dtypes, df_purchasing[:3])

(5000, 11)

id               int64
product          int64
recommended       bool
shop            object
uid             object
api_key         object
email          float64
order_id       float64
created_at      object
device          object
price          float64
dtype: object

Unnamed: 0,id,product,recommended,shop,uid,api_key,email,order_id,created_at,device,price
0,295,8270579663139,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,,,2023-08-18 13:45:54,desktop,2490.0
1,228,8270579335459,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,,,2023-08-16 12:59:52,desktop,960000.0
2,279,8270579335459,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,,,2023-08-18 07:29:15,desktop,960000.0


In [123]:
# Check for NaN Values
display(df_purchasing.shape, df_purchasing.isnull().sum())

(5000, 11)

id                0
product           0
recommended       0
shop              0
uid               0
api_key           0
email          5000
order_id       5000
created_at        0
device            0
price          4965
dtype: int64

In [124]:
# Treat the NaN Values
df_purchasing["price"].interpolate(method="linear", inplace=True)
display(df_purchasing.shape, df_purchasing.isnull().sum(), df_purchasing[:3])

(5000, 11)

id                0
product           0
recommended       0
shop              0
uid               0
api_key           0
email          5000
order_id       5000
created_at        0
device            0
price             0
dtype: int64

Unnamed: 0,id,product,recommended,shop,uid,api_key,email,order_id,created_at,device,price
0,295,8270579663139,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,,,2023-08-18 13:45:54,desktop,2490.0
1,228,8270579335459,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,,,2023-08-16 12:59:52,desktop,960000.0
2,279,8270579335459,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,,,2023-08-18 07:29:15,desktop,960000.0


In [125]:
# Drop the Empty Columns
df_purchasing.dropna(axis=1, how="all", inplace=True)
display(df_purchasing.shape, df_purchasing.isnull().sum(), df_purchasing[:3])

(5000, 9)

id             0
product        0
recommended    0
shop           0
uid            0
api_key        0
created_at     0
device         0
price          0
dtype: int64

Unnamed: 0,id,product,recommended,shop,uid,api_key,created_at,device,price
0,295,8270579663139,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,2023-08-18 13:45:54,desktop,2490.0
1,228,8270579335459,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,2023-08-16 12:59:52,desktop,960000.0
2,279,8270579335459,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,2023-08-18 07:29:15,desktop,960000.0


#### Approach 1: Collaborative Fitting

In [126]:
# Algorithm:
# 1. Record the Products Bought by the each user Separately Hint: (One Hot encod the products)
# 2. Form the clusters of the users based on the type of products they buy.
# 3. Recommend the product from the cluster's other buyers that a user haven't bought yet.

In [127]:
# Implementation:
# 1. Make separate DataFrame for each user OR Use Group By Hint: (One Hot encod the products)
# 2. Form the clusters of the users based on the type of products they buy.
# 3. Recommend the product from the cluster's other buyers that a user haven't bought yet.

##### Code:

In [128]:
# Check the unique Values in the Product Column
display(df_purchasing["product"].unique().shape)

(1109,)

In [129]:
# Make a product-focused copy of cleaned dataframe
df_purchasing_prodfocused = df_purchasing.copy()
display(df_purchasing_prodfocused.shape, df_purchasing_prodfocused.dtypes)

(5000, 9)

id               int64
product          int64
recommended       bool
shop            object
uid             object
api_key         object
created_at      object
device          object
price          float64
dtype: object

In [130]:
# One hot encode the Product Column
prod_encoding = pd.get_dummies(
    df_purchasing.loc[:, "product"], dtype=int, prefix="product", prefix_sep="_"
)
display(prod_encoding.shape, prod_encoding.dtypes, prod_encoding[:3])

(5000, 1109)

product_2617304088694    int32
product_3880595226742    int32
product_3880595259510    int32
product_4408323145846    int32
product_4408330223734    int32
                         ...  
product_8270579564835    int32
product_8270579597603    int32
product_8270579663139    int32
product_8270579695907    int32
product_8270580023587    int32
Length: 1109, dtype: object

Unnamed: 0,product_2617304088694,product_3880595226742,product_3880595259510,product_4408323145846,product_4408330223734,product_4408330289270,product_4408330748022,product_4408330813558,product_4416592281718,product_4418227699830,...,product_8270569537827,product_8270570750243,product_8270573175075,product_8270579269923,product_8270579335459,product_8270579564835,product_8270579597603,product_8270579663139,product_8270579695907,product_8270580023587
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [131]:
# Drop the product column from the product focused Dataframe
column_to_drop = "product"
if column_to_drop in df_purchasing_prodfocused.columns:
    df_purchasing_prodfocused.drop(columns=column_to_drop, inplace=True)
else:
    print(f"The column '{column_to_drop}' does not exist in the DataFrame.")

# Stitch the Encoded dataframe with the product-focused dataframe
df_purchasing_prod_encoded = pd.concat(
    [df_purchasing_prodfocused, prod_encoding], axis=1
)
display(
    df_purchasing_prod_encoded.shape,
    df_purchasing_prod_encoded.dtypes,
    df_purchasing_prod_encoded[:3],
)

(5000, 1117)

id                        int64
recommended                bool
shop                     object
uid                      object
api_key                  object
                          ...  
product_8270579564835     int32
product_8270579597603     int32
product_8270579663139     int32
product_8270579695907     int32
product_8270580023587     int32
Length: 1117, dtype: object

Unnamed: 0,id,recommended,shop,uid,api_key,created_at,device,price,product_2617304088694,product_3880595226742,...,product_8270569537827,product_8270570750243,product_8270573175075,product_8270579269923,product_8270579335459,product_8270579564835,product_8270579597603,product_8270579663139,product_8270579695907,product_8270580023587
0,295,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,2023-08-18 13:45:54,desktop,2490.0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,228,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,2023-08-16 12:59:52,desktop,960000.0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,279,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,2023-08-18 07:29:15,desktop,960000.0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [132]:
# Initialize an empty dictionary to store aggregation rules
agg_rules = {}
for col in df_purchasing_prod_encoded.columns:
    if df_purchasing_prod_encoded[col].isin([0, 1]).all():
        agg_rules[col] = "sum"
    else:
        agg_rules[col] = "first"

# Aggregate the Consumers
df_prod_grouped = df_purchasing_prod_encoded.groupby("uid").agg(agg_rules)

display(df_prod_grouped.shape)

(2699, 1117)

In [133]:
df_prod_classifier = df_prod_grouped.drop(
    columns={"id", "recommended", "shop", "api_key", "created_at", "device", "price"}
)
display(df_prod_classifier.shape, df_prod_classifier[:3])

(2699, 1110)

Unnamed: 0_level_0,uid,product_2617304088694,product_3880595226742,product_3880595259510,product_4408323145846,product_4408330223734,product_4408330289270,product_4408330748022,product_4408330813558,product_4416592281718,...,product_8270569537827,product_8270570750243,product_8270573175075,product_8270579269923,product_8270579335459,product_8270579564835,product_8270579597603,product_8270579663139,product_8270579695907,product_8270580023587
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
--HmgGZN-V-C,--HmgGZN-V-C,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-2he5U_l1TLY,-2he5U_l1TLY,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-3pyOXldrSof,-3pyOXldrSof,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [134]:
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

x_train, x_test, y_train, y_test = train_test_split(
    df_prod_classifier.iloc[:, 1:], df_prod_classifier.iloc[:, 0]
)

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)

In [135]:
def recommend_product(uid):
    neighbors_indices = knn.kneighbors(
        df_prod_classifier.loc[df_prod_classifier["uid"] == uid, :].iloc[:, 1:],
        n_neighbors=10,
        return_distance=False,
    )[0]

    user_products = (
        df_prod_classifier.loc[df_prod_classifier["uid"] == uid, :]
        .drop(columns="uid")
        .to_numpy()
        .flatten()
    )

    neighbors = []
    for i in range(10):
        neighbors.append(df_prod_classifier.iloc[neighbors_indices[i], 1:])
    neighbors_products = np.array(neighbors)

    # neighbors_union = np.logical_or.reduce(neighbors_products[i] for i in range(neighbors_products.shape[0]))
    neighbors_union = np.logical_or.reduce(neighbors_products, axis=0)
    neighbors_products = neighbors_union

    # Filter out already purchased products
    recommended_products = neighbors_products - user_products
    recommendation_frame = pd.DataFrame(
        recommended_products.reshape(1, -1), columns=df_prod_classifier.columns[1:]
    )
    recommendation_frame = recommendation_frame.loc[
        :, (recommendation_frame != 0).any(axis=0)
    ]

    return recommendation_frame

In [138]:
# Expected output:
# execute=True
# while(execute):
display("Please enter the user Id to get the Product recommendations:")
# uId=input ('Enter Something')
uId = "-3pyOXldrSof"
display(
    f"User {uId} should also buy following products {recommend_product(uId).columns[:4]}"
)

'Please enter the user Id to get the Product recommendations:'

"User -3pyOXldrSof should also buy following products Index(['product_6549060223094', 'product_6613779677302',\n       'product_6618557448310', 'product_6682258342006'],\n      dtype='object')"