## Problem Statement -

Build your own recommendation system for products on an e-commerce website like Amazon.com.

Dataset columns - first three columns are userId, productId, and ratings and the fourth column is timestamp. You can discard the timestamp column as in this case you may not need to use it.

Source - Amazon Reviews data (http://jmcauley.ucsd.edu/data/amazon/)  The repository has several datasets. For this case study, we are using the Electronics dataset.

 

Please do the analysis based on steps( 1 to 8) as given below -

Steps -

1. Read and explore the given dataset.  ( Rename column/add headers, plot histograms, find data characteristics)
2. Take a subset of the dataset to make it less sparse/ denser. ( For example, keep the users only who has given 50 or more number of ratings )
3. Split the data randomly into train and test dataset. ( For example, split it in 70/30 ratio)
4. Build Popularity Recommender model.
5. Build Collaborative Filtering model.
6. Evaluate both the models. ( Once the model is trained on the training data, it can be used to compute the error (RMSE) on predictions made on the test data.)
7. Get top - K ( K = 5) recommendations. Since our goal is to recommend new products to each user based on his/her habits, we will recommend 5 new products.
8. Summarise your insights.

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [2]:
r_cols = ['reviewerID', 'asin', 'rating', 'reviewtime']
df = pd.read_csv('ratings_Electronics.csv',names=r_cols, encoding='latin-1')
df.head(5)

Unnamed: 0,reviewerID,asin,rating,reviewtime
0,AKM1MP6P0OYPR,132793040,5.0,1365811200
1,A2CX7LUOHB2NDG,321732944,5.0,1341100800
2,A2NWSAGRHCP8N5,439886341,1.0,1367193600
3,A2WNBOD3WNDNKT,439886341,3.0,1374451200
4,A1GI0U4ZRJA8WN,439886341,1.0,1334707200


In [3]:
# shape
print(df.shape)

(7824482, 4)


In [4]:
# types of attributes
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7824482 entries, 0 to 7824481
Data columns (total 4 columns):
reviewerID    object
asin          object
rating        float64
reviewtime    int64
dtypes: float64(1), int64(1), object(2)
memory usage: 238.8+ MB
None


In [5]:
print(df.isnull().any())
print('\n ------------------ \n')

print(df.isna().any())
print('\n ------------------ \n')

reviewerID    False
asin          False
rating        False
reviewtime    False
dtype: bool

 ------------------ 

reviewerID    False
asin          False
rating        False
reviewtime    False
dtype: bool

 ------------------ 



In [8]:
# descriptions
print(df.describe().T)
print('\n ------------------ \n')

print (df.isna().sum())

                count          mean           std          min           25%  \
rating      7824482.0  4.012337e+00  1.380910e+00          1.0  3.000000e+00   
reviewtime  7824482.0  1.338178e+09  6.900426e+07  912729600.0  1.315354e+09   

                     50%           75%           max  
rating      5.000000e+00  5.000000e+00  5.000000e+00  
reviewtime  1.361059e+09  1.386115e+09  1.406074e+09  

 ------------------ 

reviewerID    0
asin          0
rating        0
reviewtime    0
dtype: int64


In [0]:
df_json = pd.read_json('/content/gdrive/My Drive/voiceai/Residency 6/Project/Project -- Recommendation System/Electronics_5.json', lines = True)
df_json.to_csv('/content/gdrive/My Drive/voiceai/Residency 6/Project/Project -- Recommendation System/Electronics.csv')

In [0]:
df_ele = pd.read_csv('/content/gdrive/My Drive/voiceai/Residency 6/Project/Project -- Recommendation System/Electronics.csv', index_col=0, encoding='latin-1')
df_ele.head(5)

Unnamed: 0,asin,helpful,overall,reviewText,reviewTime,reviewerID,reviewerName,summary,unixReviewTime
0,528881469,"[0, 0]",5,We got this GPS for my husband who is an (OTR)...,"06 2, 2013",AO94DHGC771SJ,amazdnu,Gotta have GPS!,1370131200
1,528881469,"[12, 15]",1,"I'm a professional OTR truck driver, and I bou...","11 25, 2010",AMO214LNFCEI4,Amazon Customer,Very Disappointed,1290643200
2,528881469,"[43, 45]",3,"Well, what can I say. I've had this unit in m...","09 9, 2010",A3N7T0DY83Y4IG,C. A. Freeman,1st impression,1283990400
3,528881469,"[9, 10]",2,"Not going to write a long review, even thought...","11 24, 2010",A1H8PY3QHMQQA0,"Dave M. Shaw ""mack dave""","Great grafics, POOR GPS",1290556800
4,528881469,"[0, 0]",1,I've had mine for a year and here's what we go...,"09 29, 2011",A24EV6RXELQZ63,Wayne Smith,"Major issues, only excuses for support",1317254400


In [0]:
# shape
print(df_ele.shape)
print('\n ------------------ \n')

# types of attributes
print(df_ele.info())
print('\n ------------------ \n')

print(df_ele.isnull().any())
print('\n ------------------ \n')

print(df_ele.isna().any())
print('\n ------------------ \n')

# descriptions
print(df_ele.describe().T)
print('\n ------------------ \n')

#NA values check
print (df_ele.isna().sum())

(1689188, 9)

 ------------------ 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1689188 entries, 0 to 1689187
Data columns (total 9 columns):
asin              1689188 non-null object
helpful           1689188 non-null object
overall           1689188 non-null int64
reviewText        1688117 non-null object
reviewTime        1689188 non-null object
reviewerID        1689188 non-null object
reviewerName      1664309 non-null object
summary           1689173 non-null object
unixReviewTime    1689188 non-null int64
dtypes: int64(2), object(7)
memory usage: 128.9+ MB
None

 ------------------ 

asin              False
helpful           False
overall           False
reviewText         True
reviewTime        False
reviewerID        False
reviewerName       True
summary            True
unixReviewTime    False
dtype: bool

 ------------------ 

asin              False
helpful           False
overall           False
reviewText         True
reviewTime        False
reviewerID        False
r

In [0]:
counts_df = df['reviewerID'].value_counts()
df_ge50 = df[df['reviewerID'].isin(counts_df[counts_df >= 50].index)]
df_ge50.reviewerID.value_counts()

A5JLAU2ARJ0BO     520
ADLVFFE4VBT8      501
A3OXHLG6DIBRW8    498
A6FIAB28IS79      431
A680RUE1FDO8B     406
A1ODOGXEYECQQ8    380
A36K2N527TXXJN    314
A2AY4YUOX2N1BQ    311
AWPODHOB4GFWL     308
ARBKYIVNYWK3C     296
A25C2M3QF9G7OQ    296
A22CW0ZHY3NJH8    292
A3EXWV8FNSSFL6    282
A38RMU1Y5TDP9     282
A3LGT6UZL99IW1    279
A2NOW4U7W3F7RI    277
A23GFTVIETX7DS    270
A3PD8JD9L4WEII    266
A17BUUBOU0598B    261
A3AYSYSLHU26U9    257
A2XRMQA6PJ5ZJ8    253
A12DQZKRKTNF5E    252
A231WM2Z2JL0U3    252
A1UQBFCERIP7VJ    247
AGVWTYW0ULXHT     244
A203OCQQ12MAVT    240
AEJAGHLC675A7     239
A2NYK9KWFMJV4Y    238
A3A4ZAIBQWKOZS    236
A3CW0ZLUO5X2B1    227
                 ... 
A17RFKCYS69M3Y     50
A319Y83RT0MRVR     50
A341HCMGNZCBIT     50
A37PV5GMP2ILJC     50
A3BY5KCNQZXV5U     50
A2M9ME0N2S3R39     50
AXU8RH1DEV21H      50
A1LA4K5JF78BER     50
AMO1MLSIJSQOF      50
A2AFTRU43PY9P5     50
A1W4F91DH3XPB2     50
AY4EXFOO43C3S      50
A2RGA7UGAN3UL7     50
A1EOTB1WHLSW6G     50
AOQLV2LSI9

In [0]:
counts_df_ele = df_ele['reviewerID'].value_counts()
df_ele_ge50 = df_ele[df_ele['reviewerID'].isin(counts_df_ele[counts_df_ele >= 50].index)]
df_ele_ge50.reviewerID.value_counts()

ADLVFFE4VBT8      431
A3OXHLG6DIBRW8    407
A6FIAB28IS79      367
A680RUE1FDO8B     352
A5JLAU2ARJ0BO     351
A1ODOGXEYECQQ8    333
A36K2N527TXXJN    281
ARBKYIVNYWK3C     267
A25C2M3QF9G7OQ    261
AWPODHOB4GFWL     260
A22CW0ZHY3NJH8    255
A3EXWV8FNSSFL6    250
A3LGT6UZL99IW1    245
A38RMU1Y5TDP9     244
A23GFTVIETX7DS    241
A2NOW4U7W3F7RI    241
A3AYSYSLHU26U9    231
A17BUUBOU0598B    228
A1UQBFCERIP7VJ    228
A2AY4YUOX2N1BQ    228
A2XRMQA6PJ5ZJ8    227
AGVWTYW0ULXHT     224
A3A4ZAIBQWKOZS    216
A12DQZKRKTNF5E    216
A2UOHALGF2X77Q    211
AEJAGHLC675A7     210
A3PD8JD9L4WEII    204
A1T1YSCDW0PD25    204
A4WEZJOIZIV4U     202
AVPNQUVZWMDSX     201
                 ... 
A1Y051MQ2SVPFI     50
A2HBOG4LVIY15L     50
AATWFX0ZZSE6C      50
A3U029B8Z5WGI2     50
A26CPEEWB2WKRE     50
A15XI2BEGGFEOW     50
AYP0YPLSP9ISM      50
A33CNFK776MTWR     50
A3U6J0DLLDEWM2     50
A1MCH5RXDOH87H     50
A2D0CO1OA6DSWY     50
A2SNE4QQGVP13U     50
A1LYMYNURB9EWW     50
A6TBR6L2D4XKC      50
A3PPO2X5PJ

In [0]:
df_merge = pd.merge(df_ge50, df_ele_ge50, on=['reviewerID', 'asin'])
df_merge.sample(5)

Unnamed: 0,reviewerID,asin,rating,reviewtime,helpful,overall,reviewText,reviewTime,reviewerName,summary,unixReviewTime
15320,A1L64KDYO5BOJA,B000S772V0,5.0,1198800000,"[0, 0]",5,This is a quality product and well worth the m...,"12 28, 2007","Floyd Goodrich ""Jim G.""",Very nice...,1198800000
85994,A22S7D0LP8GRDH,B00GA55OGE,5.0,1399680000,"[0, 0]",5,I LOVE this case. It has now opened up whole n...,"05 10, 2014","Jacob Hantla ""hantla.com""",Have You Ever Read Your Kindle In The Shower? ...,1399680000
47756,A3F7USIDJBR8WU,B004HW7DY8,5.0,1373155200,"[1, 2]",5,Am not going to write a review because others ...,"07 7, 2013",nobody ya know,One of the many we bought so far.,1373155200
51842,A3NEAETOSXDBOM,B004YAYM06,2.0,1314316800,"[1, 1]",2,"I plugged this in, got all green lights, but u...","08 26, 2011",Stephen M. Charme,Did not work with our router,1314316800
35123,A2BX8DDQGCCG2J,B0030LO5CU,5.0,1264464000,"[393, 421]",5,October 2010 update:The review is now woefully...,"01 26, 2010",MWebb,"Excellent Computer, Make Sure Yours Will Accep...",1264464000


In [0]:
print("Shape: ", df_merge.shape)
print("-----------------------------")
print("Columns: ", df_merge.columns)

Shape:  (89407, 11)
-----------------------------
Columns:  Index(['reviewerID', 'asin', 'rating', 'reviewtime', 'helpful', 'overall',
       'reviewText', 'reviewTime', 'reviewerName', 'summary',
       'unixReviewTime'],
      dtype='object')


In [0]:
df_merge.columns = ['reviewerID', 'productID', 'rating', 'reviewtime', 'helpful', 'overall', 'reviewText', 
                    'reviewTime', 'reviewerName', 'summary', 'unixReviewTime']
print("Columns: ", df_merge.columns)
df_merge.sample(5)

Columns:  Index(['reviewerID', 'productID', 'rating', 'reviewtime', 'helpful', 'overall',
       'reviewText', 'reviewTime', 'reviewerName', 'summary',
       'unixReviewTime'],
      dtype='object')


Unnamed: 0,reviewerID,productID,rating,reviewtime,helpful,overall,reviewText,reviewTime,reviewerName,summary,unixReviewTime
38458,A1MQQEM7W77L62,B003ES5ZUU,5.0,1340928000,"[0, 0]",5,I had heard from numerous interviews and artic...,"06 29, 2012",Robert,This is the one to buy!,1340928000
47328,A2DG63DN704LOI,B004GKM9SG,1.0,1304467200,"[0, 0]",1,This really is a bad mouse.Here's why:The fing...,"05 4, 2011",Eric Slay,More like a prototype.,1304467200
58689,A3T7V207KRDE2O,B005WKKIGO,5.0,1330560000,"[35, 42]",5,"Update 9/14/12 - just got our new iPhones, use...","03 1, 2012",SMXSteve,Great backup power,1330560000
57064,A2B7BUH8834Y6M,B005L38VPC,5.0,1327795200,"[17, 19]",5,I ordered the wired keyboard with my iMac beca...,"01 29, 2012","Shelley Gammon ""Geek""",I heart this keyboard,1327795200
82086,AGJRUK27RBVYS,B00DSGLM50,5.0,1379980800,"[0, 1]",5,This review is for the Crucial Ballistix Sport...,"09 24, 2013",Ivy,Sport XT 16 GIG DIMM pack,1379980800


In [0]:
result = df_merge.drop(['reviewtime', 'helpful', 'overall', 'reviewText', 'reviewTime', 'reviewerName', 'summary', 'unixReviewTime'], axis = 1)
result.sample(5)

Unnamed: 0,reviewerID,productID,rating
15964,A23NSKTMSPPBTR,B000UODATY,3.0
42386,A2Q204DY2L7YRP,B003Y8DIRC,5.0
60747,A2MJ8OL2FYN7CW,B006MVX5B2,5.0
79632,ANTN61S4L7WG9,B00CL8F98W,4.0
55316,AZMY6E8B52L2T,B005FN5DJA,5.0


In [0]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(result, test_size = 0.20, random_state=0)

#**Popularity Recommender Model**

In [0]:
#Class for Popularity based Recommender System model
class popularity_recommender_py():    
    def __init__(self):
        self.train_data = None
        self.reviewerID = None   
        self.productid = None
        self.popularity_recommendations = None
        
    #Create the popularity based recommender system model    
    def create(self, train_data, userid, productid):
        self.train_data = train_data
        self.reviewerID = userid
        self.productid = productid
        train_data_grouped = train_data.groupby([self.productid]).agg({self.reviewerID: 'count'}).reset_index() #Get a count of user_ids for recommendation score
        train_data_grouped.rename(columns = {'reviewerID': 'score'}, inplace=True)
        train_data_sort = train_data_grouped.sort_values(['score', self.productid], ascending = [0,1]) #Sort the products based upon recommendation score
        train_data_sort['Rank'] = train_data_sort['score'].rank(ascending=0, method='first') #Generate a recommendation rank based upon score        
        self.popularity_recommendations = train_data_sort.head(10) #Get the top 10 recommendations  
        
        #Use the popularity based recommender system model to make recommendations    
    def recommend(self, userid):            
        user_recommendations = self.popularity_recommendations
        user_recommendations['reviewerID'] = userid #Add user_id column for which the recommendations are being generated 
        cols = user_recommendations.columns.tolist() #Bring user_id column to the front      
        cols = cols[-1:] + cols[:-1]
        user_recommendations = user_recommendations[cols]
        return user_recommendations

Create an instance of popularity based recommender class

In [0]:
pm = popularity_recommender_py()
pm.create(train_data, 'reviewerID', 'productID')

Use the popularity model to make some predictions

In [0]:
users = result['reviewerID'].unique()
print(len(users)) ## unique users

reviewerID = users[5]
pm.recommend(reviewerID)

1100


Unnamed: 0,reviewerID,productID,score,Rank
19827,A1K4G5YJDJQI6Q,B0088CJT4U,154,1.0
12257,A1K4G5YJDJQI6Q,B003ES5ZUU,129,2.0
5022,A1K4G5YJDJQI6Q,B000N99BBC,119,3.0
19995,A1K4G5YJDJQI6Q,B008DWCRQW,108,4.0
19652,A1K4G5YJDJQI6Q,B00829TIEK,105,5.0
10729,A1K4G5YJDJQI6Q,B002R5AM7C,102,6.0
19648,A1K4G5YJDJQI6Q,B00829THK0,99,7.0
19462,A1K4G5YJDJQI6Q,B007WTAJTO,92,8.0
15603,A1K4G5YJDJQI6Q,B004T9RR6I,88,9.0
14384,A1K4G5YJDJQI6Q,B004CLYEDC,86,10.0


In [0]:
groupedt_raindata = train_data.groupby('productID').mean().reset_index()
sort_traindata = groupedt_raindata.sort_values(['rating', 'productID'], ascending = [0,1])

In [0]:
pred_df = test_data[['reviewerID', 'productID', 'rating']]
pred_df.rename(columns = {'rating' : 'true_ratings'}, inplace = True)
pred_df = pred_df.merge(sort_traindata, left_on='productID', right_on = 'productID')
pred_df.rename(columns = {'rating' : 'predicted_ratings'}, inplace = True)

In [0]:
import sklearn.metrics as metrics
from math import sqrt
MSE = metrics.mean_squared_error(pred_df['true_ratings'], pred_df['predicted_ratings'])

In [0]:
print("Mean Squared error is: ", MSE)
print("------------------------------------------------------------------------------------")
print("Root Mean Squared error is is:", sqrt(MSE))

Mean Squared error is:  1.158203611473322
------------------------------------------------------------------------------------
Root Mean Squared error is is: 1.0761986858723263


#**Collaborative Filtering Model**

In [0]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/4d/fc/cd4210b247d1dca421c25994740cbbf03c5e980e31881f10eaddf45fdab0/scikit-surprise-1.0.6.tar.gz (3.3MB)
[K     |████████████████████████████████| 3.3MB 9.9MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/ec/c0/55/3a28eab06b53c220015063ebbdb81213cd3dcbb72c088251ec
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.0.6


In [0]:
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import Normalizer
from sklearn.pipeline import make_pipeline
from surprise import Dataset, Reader

reader = Reader(rating_scale=(0.5, 5))

In [0]:
data = Dataset.load_from_df(result[['reviewerID', 'productID', 'rating']], reader)
data

<surprise.dataset.DatasetAutoFolds at 0x7ffb22797a20>

In [0]:
from surprise.model_selection import train_test_split
from surprise import SVD, accuracy

traindata, testdata = train_test_split(data, test_size = .30, random_state = 69)

In [0]:
SVDModel = SVD(n_factors = 150, reg_all = 0.01, lr_all = 0.001)
SVDModel.fit(traindata)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7ffb226f0c50>

In [0]:
Predtest = SVDModel.test(testdata)
print("Accuracy: ", accuracy.rmse(Predtest))

RMSE: 0.9804
Accuracy:  0.9804051176349827


In [0]:
from surprise.model_selection import GridSearchCV

param_grid = {'n_factors' : [5,10,15], "reg_all":[0.01,0.02]}
gs = GridSearchCV(SVD, param_grid, measures=['rmse'], cv=3, refit = True)

In [0]:
gs.fit(data)

In [0]:
# get all parameter combinations
print("parameter combinations: ", gs.param_combinations)

# get best parameters
print("best parameters: ", gs.best_params)

# get best score
print("best score: ", gs.best_score['rmse'])

parameter combinations:  [{'n_factors': 5, 'reg_all': 0.01}, {'n_factors': 5, 'reg_all': 0.02}, {'n_factors': 10, 'reg_all': 0.01}, {'n_factors': 10, 'reg_all': 0.02}, {'n_factors': 15, 'reg_all': 0.01}, {'n_factors': 15, 'reg_all': 0.02}]
best parameters:  {'rmse': {'n_factors': 5, 'reg_all': 0.02}}
best score:  0.9470554964516785


In [0]:
# Use the "best model" for prediction
gs.test(testdata)

[Prediction(uid='A680RUE1FDO8B', iid='B005IA844Q', r_ui=5.0, est=4.3059559064560515, details={'was_impossible': False}),
 Prediction(uid='A27B1U3OWCU14J', iid='B002WN30IM', r_ui=4.0, est=4.0702700282322235, details={'was_impossible': False}),
 Prediction(uid='A149RNR5RH19YY', iid='B0017KG70O', r_ui=4.0, est=4.043446099606564, details={'was_impossible': False}),
 Prediction(uid='AM8W6Y3HVXLZT', iid='B00009RU8K', r_ui=2.0, est=3.4625178317052256, details={'was_impossible': False}),
 Prediction(uid='A2HV76MYH7UL3S', iid='B0036WT1RW', r_ui=5.0, est=4.586390192436299, details={'was_impossible': False}),
 Prediction(uid='A296QED1MV1V0J', iid='B004RORMF6', r_ui=5.0, est=4.645852169991257, details={'was_impossible': False}),
 Prediction(uid='AW68KVDV7BBRS', iid='B00006HOAE', r_ui=5.0, est=4.431854626127761, details={'was_impossible': False}),
 Prediction(uid='A11EYMH9UV9XG7', iid='B000063K77', r_ui=5.0, est=4.574677529077987, details={'was_impossible': False}),
 Prediction(uid='A24RCBRDXRXR0Y'

In [0]:
svd_model = SVD(n_factors= 15, reg_all= 0.02)
svd_model.fit(traindata)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7ffb226f0b70>

In [0]:
test_pred =  svd_model.test(testdata)
print("Updated Accuracy is:", accuracy.rmse(test_pred))

RMSE: 0.9557
Updated Accuracy is: 0.9557477583845946


In [0]:
userfactors = svd_model.pu
itemfactors = svd_model.qi

In [0]:
pred = np.dot(userfactors,np.transpose(itemfactors))
df_pred = pd.DataFrame(test_pred)

In [0]:
print(df_pred.shape)
print("------------------------------")
print(df_pred.columns)

(26823, 5)
------------------------------
Index(['uid', 'iid', 'r_ui', 'est', 'details'], dtype='object')


In [0]:
dfpred_sorted = df_pred.sort_values(by=['uid','est'],ascending=[True,False]).groupby('uid').head(5)
dfpred_sorted

Unnamed: 0,uid,iid,r_ui,est,details
9588,A100UD67AHFODS,B0054JJ0QW,5.0,4.854796,{'was_impossible': False}
6845,A100UD67AHFODS,B004YLCE2S,5.0,4.655262,{'was_impossible': False}
6698,A100UD67AHFODS,B00483WRZ6,5.0,4.630708,{'was_impossible': False}
1368,A100UD67AHFODS,B00746LVOM,4.0,4.621099,{'was_impossible': False}
16810,A100UD67AHFODS,B00A83I8G2,5.0,4.616274,{'was_impossible': False}
688,A100WO06OQR8BQ,B0002L5R78,1.0,4.086989,{'was_impossible': False}
7923,A100WO06OQR8BQ,B0002LEMWE,4.0,4.034358,{'was_impossible': False}
23036,A100WO06OQR8BQ,B001342KM8,5.0,4.017836,{'was_impossible': False}
9124,A100WO06OQR8BQ,B0090J652Y,2.0,3.945522,{'was_impossible': False}
6553,A100WO06OQR8BQ,B002XVBAKI,1.0,3.935255,{'was_impossible': False}


In [0]:
from collections import defaultdict

def get_top_n(predictions, n):    
    top_n = defaultdict(list) # First map the predictions to each user.
    for uid, iid, true_r, est, _ in predictions:
         top_n[uid].append((iid, est))  
            
    for uid, user_ratings in top_n.items(): # Then sort the predictions for each user and retrieve the k highest ones.
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

get_top_n(test_pred, 5)

defaultdict(list,
            {'A680RUE1FDO8B': [('B0019EHU8G', 4.988857797776549),
              ('B001L1H0SC', 4.926544753395092),
              ('B00HVT27B8', 4.920802097486106),
              ('B000RZQZM0', 4.876787457752182),
              ('B000OLDG60', 4.868899073292262)],
             'A27B1U3OWCU14J': [('B002TLTE6O', 4.6829182514680685),
              ('B00007IFED', 4.582678020307667),
              ('B002UUTCNE', 4.403907431087444),
              ('B000NMKHW6', 4.365424513677464),
              ('B002MUYOLW', 4.294430800219809)],
             'A149RNR5RH19YY': [('B004HIN7SI', 4.437431853998978),
              ('B001OOZ1X2', 4.4234898595821095),
              ('B000089GN3', 4.3094555584892875),
              ('B00008I9K8', 4.295807293300478),
              ('B00007KDVJ', 4.2853865898048085)],
             'AM8W6Y3HVXLZT': [('B00D6XW62I', 4.512088572890403),
              ('B002YU83YO', 4.490156117397636),
              ('B000JMJWV2', 4.415593329288191),
              ('B000L47

popularity recommendation model based on rank ordering, give only recommendation based on popularity ranking only and will not be personalized.
whereas, collaborative filtering gives personalized recommendation for products / items. This will have more chances of user buying the product