# Product Recommendation

Collaborative filtering is based on the fact that relationships exist between products and people's interests. Many recommendation systems use collaborative filtering to find these relationships and to give an accurate recommendation of a product that the user might like or be interested in. Collaborative filtering has basically two approaches: user-based and item-based.

In [1]:
import pandas as pd

In [2]:
meta=pd.read_csv('meta.csv')
meta.head(5)

Unnamed: 0,productid,brand,category,subcategory,name
0,HBV00000AX6LR,Palette,Kişisel Bakım,Saç Bakımı,Palette Kalıcı Doğal Renkler 10-4 PAPATYA
1,HBV00000BSAQG,Best,Pet Shop,Kedi,Best Pet Jöle İçinde Parça Etli Somonlu Konser...
2,HBV00000JUHBA,Tarım Kredi,Temel Gıda,"Bakliyat, Pirinç, Makarna",Türkiye Tarım Kredi Koop.Yeşil Mercimek 1 kg
3,HBV00000NE0QI,Namet,"Et, Balık, Şarküteri",Şarküteri,Namet Fıstıklı Macar Salam 100 gr
4,HBV00000NE0UQ,Muratbey,Kahvaltılık ve Süt,Peynir,Muratbey Burgu Peyniri 250 gr


In [3]:
events=pd.read_csv('events.csv')
events.head(5)

Unnamed: 0,event,sessionid,eventtime,price,productid
0,cart,a0655eee-1267-4820-af21-ad8ac068ff7a,2020-06-01T08:59:16.406Z,14.48,HBV00000NVZE8
1,cart,d2ea7bd3-9235-4a9f-a9ea-d7f296e71318,2020-06-01T08:59:46.580Z,49.9,HBV00000U2B18
2,cart,5e594788-78a0-44dd-8e66-37022d48f691,2020-06-01T08:59:33.308Z,1.99,OFIS3101-080
3,cart,fdfeb652-22fa-4153-b9b5-4dfa0dcaffdf,2020-06-01T08:59:31.911Z,2.25,HBV00000NVZBW
4,cart,9e9d4f7e-898c-40fb-aae9-256c40779933,2020-06-01T08:59:33.888Z,9.95,HBV00000NE0T4


There are two dataset which are contains prouducts and session details. I used productid as a primary key and merge two csv files.

In [4]:
merged_data=meta.merge(events,on="productid")
merged_data.to_csv("merged_data.csv")

In [5]:
data=pd.read_csv('merged_data.csv')
data.head(5)

Unnamed: 0.1,Unnamed: 0,productid,brand,category,subcategory,name,event,sessionid,eventtime,price
0,0,HBV00000AX6LR,Palette,Kişisel Bakım,Saç Bakımı,Palette Kalıcı Doğal Renkler 10-4 PAPATYA,cart,cd34b98c-1e65-4dbb-945c-ca4955a9ad3c,2020-06-02T07:41:35.600Z,14.9
1,1,HBV00000AX6LR,Palette,Kişisel Bakım,Saç Bakımı,Palette Kalıcı Doğal Renkler 10-4 PAPATYA,cart,cd34b98c-1e65-4dbb-945c-ca4955a9ad3c,2020-06-02T07:41:36.982Z,14.9
2,2,HBV00000BSAQG,Best,Pet Shop,Kedi,Best Pet Jöle İçinde Parça Etli Somonlu Konser...,cart,9b1bc61a-abd1-48b6-950d-e4b7e5fbdc44,2020-06-09T14:07:16.068Z,11.99
3,3,HBV00000BSAQG,Best,Pet Shop,Kedi,Best Pet Jöle İçinde Parça Etli Somonlu Konser...,cart,89236793-7661-4043-a33e-cbdd80b728ae,2020-06-14T11:12:10.737Z,11.99
4,4,HBV00000JUHBA,Tarım Kredi,Temel Gıda,"Bakliyat, Pirinç, Makarna",Türkiye Tarım Kredi Koop.Yeşil Mercimek 1 kg,cart,7648b59c-fae4-4486-afbd-25146fa154ff,2020-06-01T08:50:44.463Z,10.5


In [6]:
#Deleted unnecesary column 
data=data.drop(['Unnamed: 0'], axis=1)

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 387656 entries, 0 to 387655
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   productid    387650 non-null  object 
 1   brand        255805 non-null  object 
 2   category     387650 non-null  object 
 3   subcategory  387650 non-null  object 
 4   name         387650 non-null  object 
 5   event        387656 non-null  object 
 6   sessionid    387656 non-null  object 
 7   eventtime    387656 non-null  object 
 8   price        387650 non-null  float64
dtypes: float64(1), object(8)
memory usage: 26.6+ MB


In [8]:
#Identified Nan  values and dropped
data['sessionid'].isna().sum()
data = data.dropna(subset=['sessionid'])

Cart is a object but I used it as a quantity. Every cart process is one buying and as a quantity we can use for answer of how many products did the customers buy? I put 1 for every item and converted to float for later.

In [9]:
data['event'] = data['event'].replace(['cart'],'1')
data['event'] = data['event'].astype(float)

In [10]:
data.head(5)

Unnamed: 0,productid,brand,category,subcategory,name,event,sessionid,eventtime,price
0,HBV00000AX6LR,Palette,Kişisel Bakım,Saç Bakımı,Palette Kalıcı Doğal Renkler 10-4 PAPATYA,1.0,cd34b98c-1e65-4dbb-945c-ca4955a9ad3c,2020-06-02T07:41:35.600Z,14.9
1,HBV00000AX6LR,Palette,Kişisel Bakım,Saç Bakımı,Palette Kalıcı Doğal Renkler 10-4 PAPATYA,1.0,cd34b98c-1e65-4dbb-945c-ca4955a9ad3c,2020-06-02T07:41:36.982Z,14.9
2,HBV00000BSAQG,Best,Pet Shop,Kedi,Best Pet Jöle İçinde Parça Etli Somonlu Konser...,1.0,9b1bc61a-abd1-48b6-950d-e4b7e5fbdc44,2020-06-09T14:07:16.068Z,11.99
3,HBV00000BSAQG,Best,Pet Shop,Kedi,Best Pet Jöle İçinde Parça Etli Somonlu Konser...,1.0,89236793-7661-4043-a33e-cbdd80b728ae,2020-06-14T11:12:10.737Z,11.99
4,HBV00000JUHBA,Tarım Kredi,Temel Gıda,"Bakliyat, Pirinç, Makarna",Türkiye Tarım Kredi Koop.Yeşil Mercimek 1 kg,1.0,7648b59c-fae4-4486-afbd-25146fa154ff,2020-06-01T08:50:44.463Z,10.5


Creating a Customer_Item Matrix : Tabular data where each column represents each product or item and each row represents a purchase and the value in each cell indicates whether the customer has purchased the given product.

In [11]:
customer_item_matrix = data.pivot_table(
    index='sessionid',
    columns='productid',
    values='event',
    aggfunc='sum'
)

In [12]:
customer_item_matrix.head(50)

productid,AILEBIZIZSMTLDGY54,AILEBIZIZSMTLDHB18,AILEBS179526,AILEBSHSB22037,AILEDALIN275101,AILEDALIN275103,AILEDALIN275105,AILEDALIN275106,AILEDALIN275107,AILEDALIN275114,...,ZYUNIL798204,ZYUNIL798280,ZYUNMASEKM007,ZYUNMASEKM008,ZYUNMASEKM065,ZYUZAY1074,ZYUZAY1166,ZYUZAY1272,ZYVLEDASUN003,ZYWAX12117
sessionid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000280f4-62fc-4dcd-b51d-c66ac14d7d8c,,,,,,,,,,,...,,,,,,,,,,
0002e53b-1f60-4309-8380-31ca03de51f8,,,,,,,,,,,...,,,,,,,,,,
0002ef34-6bee-4953-874b-8298ec26b625,,,,,,,,,,,...,,,,,,,,,,
000618de-d415-408c-863e-6124db43f529,,,,,,,,,,,...,,,,,,,,,,
000770d6-c2d4-4ad2-bb2c-b35274bc5e7e,,,,,,,,,,,...,,,,,,,,,,
00078618-dc74-486b-bddb-3c567b4e94ee,,,,,,,,,,,...,,,,,,,,,,
00088987-64b3-490a-9a68-15a994e87a92,,,,,,,,,,,...,,,,,,,,,,
00096b72-5e31-4658-8f1c-a0882d25caa2,,,,,,,,,,,...,,,,,,,,,,
0009e150-07c2-47a3-bef8-224be46dd8e6,,,,,,,,,,,...,,,,,,,,,,
000a2af1-7040-4e15-9e49-f2d40c97019d,,,,,,,,,,,...,,,,,,,,,,


In [13]:
#We transform to table with numbers for applying cosine similarity function.
print(customer_item_matrix.shape)
customer_item_matrix = customer_item_matrix.applymap(lambda x: 1 if x > 0 else 0)

(54442, 10235)


We compute the cosine similarity from the customer item matrix to determine similarity between user's purchase behaviour.

In [14]:
from sklearn.metrics.pairwise import cosine_similarity
user_user_sim_matrix = pd.DataFrame(cosine_similarity(customer_item_matrix))
user_user_sim_matrix.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,54432,54433,54434,54435,54436,54437,54438,54439,54440,54441
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.707107,...,0.0,0.0,0.0,0.707107,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.707107,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [15]:
#Change column and row for better visualization
user_user_sim_matrix.columns = customer_item_matrix.index
user_user_sim_matrix['Session'] = customer_item_matrix.index
user_user_sim_matrix = user_user_sim_matrix.set_index('Session')
user_user_sim_matrix.head()

sessionid,000280f4-62fc-4dcd-b51d-c66ac14d7d8c,0002e53b-1f60-4309-8380-31ca03de51f8,0002ef34-6bee-4953-874b-8298ec26b625,000618de-d415-408c-863e-6124db43f529,000770d6-c2d4-4ad2-bb2c-b35274bc5e7e,00078618-dc74-486b-bddb-3c567b4e94ee,00088987-64b3-490a-9a68-15a994e87a92,00096b72-5e31-4658-8f1c-a0882d25caa2,0009e150-07c2-47a3-bef8-224be46dd8e6,000a2af1-7040-4e15-9e49-f2d40c97019d,...,fff8f26c-21f3-4180-b731-0c5c08fe19a4,fff9e5c4-5cd6-4130-a12a-c54347a6fe37,fffa4f06-3d93-4e17-a80d-ac7bbb898c9b,fffa7118-17b4-4567-be1b-4b366b46d015,fffb235e-4745-49fb-a452-9d18d54b186b,fffb5e6a-2676-4cd9-b4e4-ab8b6621e0fe,fffbae5f-8102-4a14-84d7-6c11c724cf8d,fffbba74-6999-460f-bd5f-70eaebe689cf,fffd3c61-2f71-4437-986c-e1c30ef5b5fe,ffffcd3c-da03-4667-9c75-9fcafb609c9e
Session,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000280f4-62fc-4dcd-b51d-c66ac14d7d8c,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0002e53b-1f60-4309-8380-31ca03de51f8,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.707107,...,0.0,0.0,0.0,0.707107,0.0,0.0,0.0,0.0,0.0,0.0
0002ef34-6bee-4953-874b-8298ec26b625,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000618de-d415-408c-863e-6124db43f529,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.707107,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
000770d6-c2d4-4ad2-bb2c-b35274bc5e7e,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


'5e594788-78a0-44dd-8e66-37022d48f691' coded session similar with sessions below

In [17]:
user_user_sim_matrix.loc['5e594788-78a0-44dd-8e66-37022d48f691'].sort_values(ascending=False).head(10)

sessionid
5e594788-78a0-44dd-8e66-37022d48f691    1.000000
81c4d993-e691-49ad-b3f9-7625503ac33f    0.707107
04e024e2-bb07-44c6-94b5-7d15e30d3b85    0.577350
bbf630ff-a632-4818-bd10-556c446cf6fa    0.577350
b818e591-4583-4010-9421-ea0c90f9c25c    0.577350
3f61d162-6d04-4adf-9b19-9e6aab845733    0.577350
5afd4947-310c-4aba-a6ec-45f570afe014    0.577350
5e5486c0-a2f0-4378-bb14-356aa880bdb3    0.408248
3102c247-38f3-4b93-854c-43e7df9f0fca    0.408248
77d88759-66c6-4978-9648-559b8807175e    0.408248
Name: 5e594788-78a0-44dd-8e66-37022d48f691, dtype: float64

For reference, we take 5e594788-78a0-44dd-8e66-37022d48f691 as A and 81c4d993-e691-49ad-b3f9-7625503ac33f as B. Therefore, by identifying the items purchased by Customer A and Customer B and the Remaining Items of Customer A relative to Customer B, we can safely assume that there is high similarity between customers, as there is high similarity between customers. The rest of the products purchased by customer A are also likely to be purchased by customer B. Therefore, we recommend the remaining products to Customer B.

In [19]:
user_user_sim_matrix.loc['5e594788-78a0-44dd-8e66-37022d48f691'].sort_values(ascending=False)
items_bought_by_customerA = customer_item_matrix.loc['5e594788-78a0-44dd-8e66-37022d48f691'][customer_item_matrix.loc['5e594788-78a0-44dd-8e66-37022d48f691']>0]
print("Items Bought by Customer: ")
print(items_bought_by_customerA)

Items Bought by Customer: 
productid
HBV000004HK0D    1
OFIS3101-080     1
OFISFAB5062      1
Name: 5e594788-78a0-44dd-8e66-37022d48f691, dtype: int64


In [21]:
items_bought_by_customerB = customer_item_matrix.loc['81c4d993-e691-49ad-b3f9-7625503ac33f'][customer_item_matrix.loc['81c4d993-e691-49ad-b3f9-7625503ac33f']>0]
print("Items bought by other customer:")
print(items_bought_by_customerB)

Items bought by other customer:
productid
HBV000004HK0D    1
HBV00000GIVEX    1
HBV00000J5V8R    1
HBV00000NG8GQ    1
OFIS3101-080     1
OFISFAB5062      1
Name: 81c4d993-e691-49ad-b3f9-7625503ac33f, dtype: int64


In [26]:
items_to_recommend_to_customerA= set(items_bought_by_customerB.index) - set(items_bought_by_customerA.index)
print("Items to Recommend to customer B ")
print(items_to_recommend_to_customerA)
data.loc[data['productid'].isin(items_to_recommend_to_customerA),['productid', 'name']].drop_duplicates().set_index('productid')

Items to Recommend to customer B 
{'HBV00000J5V8R', 'HBV00000NG8GQ', 'HBV00000GIVEX'}


Unnamed: 0_level_0,name
productid,Unnamed: 1_level_1
HBV00000J5V8R,L'Oréal Paris Elseve Mor Şampuan Turunculaşma ...
HBV00000GIVEX,Faber-Castell Renkli Tükenmez Kalem 5+1 Blister
HBV00000NG8GQ,Faber Castell Redline 12 Renk


I take the transpose of the customer-item matrix to get the item similarity matrix and proceed with same steps in order to make recommendations for similar items bought by customers.



In [27]:
item_item_sim_matrix = pd.DataFrame(cosine_similarity(customer_item_matrix.T))
item_item_sim_matrix.columns = customer_item_matrix.T.index

item_item_sim_matrix['productid'] = customer_item_matrix.T.index
item_item_sim_matrix = item_item_sim_matrix.set_index('productid')

In [28]:
item_item_sim_matrix.head(5)

productid,AILEBIZIZSMTLDGY54,AILEBIZIZSMTLDHB18,AILEBS179526,AILEBSHSB22037,AILEDALIN275101,AILEDALIN275103,AILEDALIN275105,AILEDALIN275106,AILEDALIN275107,AILEDALIN275114,...,ZYUNIL798204,ZYUNIL798280,ZYUNMASEKM007,ZYUNMASEKM008,ZYUNMASEKM065,ZYUZAY1074,ZYUZAY1166,ZYUZAY1272,ZYVLEDASUN003,ZYWAX12117
productid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AILEBIZIZSMTLDGY54,1.0,0.384615,0.0,0.201802,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AILEBIZIZSMTLDHB18,0.384615,1.0,0.0,0.269069,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AILEBS179526,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AILEBSHSB22037,0.201802,0.269069,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AILEDALIN275101,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
top_10_similar_items = list(item_item_sim_matrix.loc['AILEBIZIZSMTLDGY54'].sort_values(ascending=False).iloc[1:11].index)

print(top_10_similar_items)
print()
print("Similar items to AILEBIZIZSMTLDGY54: ")
print(data.loc[
    data['productid'].isin(top_10_similar_items),
    ['productid', 'name']
].drop_duplicates().set_index('productid').loc[top_10_similar_items])

['HBV00000PV7NK', 'HBV00000GYMOJ', 'HBV00000PV7NP', 'HBV00000PV731', 'HBV00000PV8C5', 'AILEBIZIZSMTLDHB18', 'HBV00000PV8BL', 'HBV00000NGXGG', 'HBV00000GTES9', 'AILEMTTDMT57']

Similar items to AILEBIZIZSMTLDGY54: 
                                                                name
productid                                                           
HBV00000PV7NK                           Fisher Price Poppity Araçlar
HBV00000GYMOJ                        Barbie Dreamtopia Peri Bebekler
HBV00000PV7NP                          Fisher Price Press&Go Araçlar
HBV00000PV731                                Elefun & Friends Junior
HBV00000PV8C5                           Harika Kanatlar Dönüşen Todd
AILEBIZIZSMTLDHB18                          Barbie Kariyer Bebekleri
HBV00000PV8BL                           Harika Kanatlar Dönüşen Flip
HBV00000NGXGG                         Barbie Bebekler Bebek Bakıcısı
HBV00000GTES9       Littlest Pet Shop Ahtapot Miniş Ve Yavrusu B9358
AILEMTTDMT57               