<a href="https://colab.research.google.com/github/kapilmulchandani/E-commerce-recommedation-system/blob/master/Collaborative-Filtering/Collaborative_Filtering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collaborative Filtering



Installing required packages:

In [0]:
!pip install surprise



Importing all the libraries:

In [0]:
from surprise.model_selection import train_test_split
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import accuracy
from surprise import Reader
from collections import defaultdict
import pandas as pd
import numpy as np
import os

Loading Data:

In [2]:
data = pd.read_csv('reviews.csv')
data.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,A1YJEY40YUW4SE,7806397051,Andrea,"[3, 4]",Very oily and creamy. Not at all what I expect...,1.0,Don't waste your money,1391040000,"01 30, 2014"
1,A60XNB876KYML,7806397051,Jessica H.,"[1, 1]",This palette was a decent price and I was look...,3.0,OK Palette!,1397779200,"04 18, 2014"
2,A3G6XNM240RMWA,7806397051,Karen,"[0, 1]",The texture of this concealer pallet is fantas...,4.0,great quality,1378425600,"09 6, 2013"
3,A1PQFP6SAJ6D80,7806397051,Norah,"[2, 2]",I really can't tell what exactly this thing is...,2.0,Do not work on my face,1386460800,"12 8, 2013"
4,A38FVHZTNQ271F,7806397051,Nova Amor,"[0, 0]","It was a little smaller than I expected, but t...",3.0,It's okay.,1382140800,"10 19, 2013"


Reading the dataset using reader

In [0]:
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(data[['reviewerID', 'asin', 'overall']], reader)

Splitting the dataset into training and testing set

In [0]:
trainset, testset = train_test_split(data, test_size=0.3, random_state=10)

## User based Collaborative Filtering

##### Using Cosine Similarity:


In [5]:
algo = KNNWithMeans(k=5, sim_options={'name': 'cosine', 'user_based': True})
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f1ca5036390>


Run the trained model against the testset



In [0]:
user_cosine_test_pred = algo.test(testset)

Get RMSE and MSE of User-based Model using Cosine Similarity

In [7]:
print("RMSE of User-based Model using Cosine Similarity: Test Set")
user_cosine_rmse = accuracy.rmse(user_cosine_test_pred)
print("MSE of User-based Model using Cosine Similarity: Test Set")
user_cosine_mse = accuracy.mse(user_cosine_test_pred)

RMSE of User-based Model using Cosine Similarity: Test Set
RMSE: 1.2051
MSE of User-based Model using Cosine Similarity: Test Set
MSE: 1.4523


##### Using Mean Squared Difference Similarity:

In [8]:
algo = KNNWithMeans(k=5, sim_options={'name': 'msd', 'user_based': True})
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f1c8e71dc50>

Run the trained model against the testset

In [0]:
user_msd_test_pred = algo.test(testset)

Get RMSE and MSE of User-based Model using Mean Squared Difference Similarity

In [10]:
print("RMSE of User-based Model using Mean Squared Difference: Test Set")
user_msd_rmse = accuracy.rmse(user_msd_test_pred)
print("MSE of User-based Model using Mean Squared Difference: Test Set")
user_msd_mse = accuracy.mse(user_msd_test_pred)

RMSE of User-based Model using Mean Squared Difference: Test Set
RMSE: 1.2080
MSE of User-based Model using Mean Squared Difference: Test Set
MSE: 1.4594


##### Using Pearson Baseline Correlation:


In [5]:
algo = KNNWithMeans(k=5, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f50236c1780>

Run the trained model against the testset

In [0]:
user_pb_test_pred = algo.test(testset)

Get RMSE of User-based Model using Pearson Baseline

In [8]:
print("RMSE of User-based Model using Pearson Baseline : Test Set")
user_pb_rmse = accuracy.rmse(user_pb_test_pred)
print("MSE of User-based Model using Pearson Baseline : Test Set")
user_pb_mse = accuracy.mse(user_pb_test_pred)

RMSE of User-based Model using Pearson Baseline : Test Set
RMSE: 1.1782
MSE of User-based Model using Pearson Baseline : Test Set
MSE: 1.3882


## Item based Collaborative Filtering

##### Using Cosine Similarity:


In [9]:
algo = KNNWithMeans(k=5, sim_options={'name': 'cosine', 'user_based': False})
algo.fit(trainset)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f501b8a7160>


Run the trained model against the testset



In [0]:
item_cosine_test_pred = algo.test(testset)

Get RMSE of Item-based Model using Cosine Similarity

In [12]:
print("RMSE of Item-based Model using Cosine Similarity: Test Set")
item_cosine_rmse = accuracy.rmse(item_cosine_test_pred)
print("MSE of Item-based Model using Cosine Similarity: Test Set")
item_cosine_mse = accuracy.mse(item_cosine_test_pred)

RMSE of Item-based Model using Cosine Similarity: Test Set
RMSE: 1.2101
MSE of Item-based Model using Cosine Similarity: Test Set
MSE: 1.4644


##### Using Mean Squared Difference Similarity:

In [13]:
algo = KNNWithMeans(k=5, sim_options={'name': 'msd', 'user_based': False})
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f501b8c80b8>

Run the trained model against the testset

In [0]:
item_msd_test_pred = algo.test(testset)

Get RMSE of Item-based Model using Mean Squared Difference Similarity

In [17]:
print("RMSE of Item-based Model using Mean Squared Difference: Test Set")
user_msd_rmse = accuracy.rmse(item_msd_test_pred)
print("MSE of Item-based Model using Mean Squared Difference: Test Set")
user_msd_mse = accuracy.mse(item_msd_test_pred)

RMSE of Item-based Model using Mean Squared Difference: Test Set
RMSE: 1.2139
MSE of Item-based Model using Mean Squared Difference: Test Set
MSE: 1.4736


##### Using Pearson Baseline Correlation:


In [18]:
algo = KNNWithMeans(k=5, sim_options={'name': 'pearson_baseline', 'user_based': False})
algo.fit(trainset)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNWithMeans at 0x7f501b8c8550>

Run the trained model against the testset

In [0]:
item_pb_test_pred = algo.test(testset)

Get RMSE of Item-based Model using Pearson Baseline

In [21]:
print("RMSE of Item-based Model using Pearson Baseline : Test Set")
item_pb_rmse = accuracy.rmse(item_pb_test_pred)
print("MSE of Item-based Model using Pearson Baseline : Test Set")
item_pb_mse = accuracy.mse(item_pb_test_pred)

RMSE of Item-based Model using Pearson Baseline : Test Set
RMSE: 1.1883
MSE of Item-based Model using Pearson Baseline : Test Set
MSE: 1.4121


In [0]:
dataframe = pd.DataFrame()
user_id_list = []
true_ratings = []
predicted_ratings = []
for uid, iid, true_r, est, info in item_cosine_test_pred:
  user_id_list.append(uid)
  true_ratings.append(true_r)
  predicted_ratings.append(est)
  
  

dataframe['user_id'] = user_id_list
dataframe['true_ratings'] = true_ratings
dataframe['predicted_ratings'] = predicted_ratings


In [0]:
print(dataframe)


              user_id  true_ratings  predicted_ratings
0      A3DXSO87O68MGJ           3.0           4.397436
1      A1Q8N5LKI9E85S           5.0           4.500000
2      A36KTGM5NBBSX2           5.0           4.300000
3      A2YPB8IOEU10XJ           4.0           3.575956
4       AF0E5TGS52NRD           5.0           4.555556
...               ...           ...                ...
59546  A1F7YU6O5RU432           4.0           3.500000
59547  A209OVPWRBXVQ6           4.0           1.625000
59548  A3EF6FNX2G4FD3           5.0           4.377778
59549  A3P7E4RXTT0YQR           2.0           2.750000
59550   AITPVQ9TKUIZT           1.0           5.000000

[59551 rows x 3 columns]


## Getting Top N Recommendations for the Users:

In [0]:
def get_top_n(predictions, n=10):
    '''Return the top-N recommendation for each user from a set of predictions.
    '''

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for given user and retrieve the k highest ones.
    
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n
    # top_n_items = top_n[user_id]
    # top_n_items.sort(key=lambda x: x[1], reverse=True)
    # return top_n_items[:n]

    


In [0]:
top_n = get_top_n(item_msd_test_pred)

In [27]:
top_n['A1DQHS7MOVYYYA']

[('B000NKL3D0', 4.5),
 ('B002PBESOQ', 4.4),
 ('B000PLUZL8', 4.264677863560432),
 ('B00KTLBEEQ', 3.9039265299764736),
 ('B0039UTO3M', 3.761555242380717),
 ('B008FZ562E', 3.6931476254169824),
 ('B00BB9WBC4', 3.138624176964143),
 ('B0095BWH5Q', 2.911479996533888),
 ('B001A3ML3K', 2.7222222222222223)]