Your task is implement a matrix factorization method—such as singular value
decomposition (SVD) or Alternating Least Squares (ALS)—in the context of a recommender
system.

This project, uses data from kaggle: https://www.kaggle.com/uciml/restaurant-data-with-consumer-ratings/data


In [0]:
!pip install surprise



I primary used the surprise libary in python building on the work of this medium post: https://medium.com/@m_n_malaeb/the-easy-guide-for-building-python-collaborative-filtering-recommendation-system-in-2017-d2736d2e92a8 and updating it by reading the surprise docs: https://surprise.readthedocs.io/en/stable/

In [0]:
#load libraries
import pandas as pd
import surprise
import numpy as np
from surprise import SVD
from surprise import Dataset, Reader
from surprise.model_selection import cross_validate

ratings = pd.read_csv('rating_final.csv')

TRAIN_SIZE = 0.90
msk = np.random.rand(len(ratings)) < TRAIN_SIZE

train = ratings[msk]  
test = ratings[~msk]

In [0]:
#Using surprise's built in reading, we load a pandas dataframe, filtering for only the overall rating of the restaurant
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(train[['userID', 'placeID', 'rating']], reader)

In [0]:
#use SVD algorithm built by Surprise
algo = SVD()

# Evaluate performances of our algorithm on the dataset.
perf = cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

print(perf)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.7245  0.6869  0.7288  0.7044  0.7086  0.7107  0.0150  
MAE (testset)     0.6209  0.5671  0.6235  0.5878  0.5892  0.5977  0.0215  
Fit time          0.05    0.05    0.05    0.05    0.05    0.05    0.00    
Test time         0.00    0.00    0.00    0.00    0.00    0.00    0.00    
{'test_rmse': array([0.72454511, 0.68690955, 0.72877067, 0.70439725, 0.7086497 ]), 'test_mae': array([0.62090767, 0.56707068, 0.62348782, 0.58778534, 0.58918231]), 'fit_time': (0.054006099700927734, 0.04840350151062012, 0.05390620231628418, 0.05436849594116211, 0.045880794525146484), 'test_time': (0.0015211105346679688, 0.001653909683227539, 0.0015032291412353516, 0.0013756752014160156, 0.0013315677642822266)}


In [0]:
#This is how to run a prediction, inputing the user and the place to extract the estimated result
prediction = algo.predict('U1077', 135104, 4)
prediction.est

1.0406384701503086

I also attempted to use svds myself, using scipy. SVDs are used to reduce the number of dimensions in a matrix. This provided great guidance: https://beckernick.github.io/matrix-factorization-recommender/

In [0]:
from scipy.sparse.linalg import svds
from scipy.sparse import csr_matrix

user_matrix = train.pivot(index='userID', columns='placeID', values='rating')
user_matrix = user_matrix.fillna(0)

R = user_matrix.as_matrix()
user_ratings_mean = np.mean(R, axis = 1)
R_demeaned = R - user_ratings_mean.reshape(-1, 1)

  import sys


In [0]:
U, s, Vt = svds(R_demeaned, k = 50)
sigma = np.diag(s)

In [0]:
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)
preds_df = pd.DataFrame(all_user_predicted_ratings, columns = user_matrix.columns, index = user_matrix.index)

In [0]:
preds_df.round(2)

placeID,132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,132626,132630,132654,132660,132663,132665,132667,132668,132706,132715,132717,132723,132732,132733,132740,132754,132755,132766,132767,132768,132773,132825,132830,132834,132845,132846,132847,132851,132854,132856,...,135044,135045,135046,135047,135048,135049,135050,135051,135052,135053,135054,135055,135057,135058,135059,135060,135062,135063,135064,135065,135066,135069,135070,135071,135072,135073,135074,135075,135076,135079,135080,135081,135082,135085,135086,135088,135104,135106,135108,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
U1001,0.02,0.03,0.02,0.06,0.07,-0.00,0.02,0.00,0.02,0.02,0.03,0.02,0.02,0.01,0.02,0.03,0.02,0.02,0.03,0.02,0.02,-0.16,0.02,-0.02,0.02,0.17,0.16,0.04,-0.01,-0.02,0.07,1.72,0.78,-0.13,-0.01,-0.05,-0.04,0.10,-0.03,0.30,...,-0.09,1.14,-0.07,0.21,0.17,0.18,0.16,0.79,0.28,-0.02,-0.03,0.00,0.06,0.10,0.25,0.04,-0.12,-0.21,0.02,-0.17,-0.18,0.11,0.06,0.15,0.04,0.04,0.04,-0.01,0.17,-0.01,-0.07,0.11,0.08,0.17,-0.02,0.12,0.01,-0.37,-0.05,0.03
U1002,-0.00,-0.00,-0.01,-0.10,0.03,-0.01,-0.00,-0.01,-0.01,0.00,0.01,0.00,-0.01,-0.01,-0.00,0.00,0.00,-0.00,-0.00,-0.00,-0.01,-0.01,-0.00,-0.00,0.01,0.14,-0.11,0.01,-0.01,-0.02,0.04,1.76,-0.04,-0.01,-0.04,-0.04,-0.02,-0.09,-0.02,-0.01,...,0.02,-0.06,0.14,0.03,0.15,0.02,-0.05,-0.06,1.19,-0.15,0.08,-0.03,0.01,0.14,1.18,0.19,0.88,-0.12,0.01,0.02,-0.10,-0.06,-0.11,-0.03,0.04,-0.35,-0.18,-0.01,0.29,-0.19,0.25,0.10,0.01,1.02,-0.11,0.07,0.00,0.84,0.18,-0.00
U1003,0.01,-0.01,0.01,-0.06,-0.01,0.00,0.00,-0.01,0.02,-0.01,-0.04,-0.01,0.02,-0.00,0.01,-0.02,-0.02,-0.00,-0.01,-0.01,0.01,2.07,0.02,-0.00,-0.04,2.14,1.92,0.01,0.01,0.01,-0.05,2.00,0.09,-0.02,0.01,-0.01,0.00,0.01,0.01,0.04,...,0.12,0.01,-0.04,0.01,0.12,0.04,-0.07,-0.03,-0.04,0.01,-0.03,-0.13,-0.02,0.04,2.19,0.00,-0.01,0.04,0.05,-0.10,-0.03,0.01,-0.02,0.08,0.05,0.08,0.26,1.99,0.02,1.92,1.62,0.04,-0.01,-0.05,-0.11,-0.07,-0.02,0.06,-0.10,0.03
U1004,0.02,-0.00,0.01,-0.23,0.02,0.00,-0.00,-0.02,0.02,-0.01,-0.03,0.00,0.01,-0.00,0.01,-0.02,-0.01,-0.00,0.00,-0.00,0.02,-0.11,0.02,-0.02,-0.03,0.02,0.16,0.02,0.00,-0.00,-0.00,-0.15,-0.02,-0.09,0.02,-0.02,-0.01,-0.02,-0.09,0.07,...,0.14,-0.06,0.09,0.14,0.10,0.31,0.11,0.10,0.03,0.14,-0.01,0.05,-0.01,-0.00,0.04,1.04,1.83,-0.05,0.05,-0.27,0.16,-0.03,-0.17,0.08,-0.05,-0.03,0.11,0.15,-0.09,-0.20,-0.07,-0.08,-0.09,0.07,0.11,-0.00,-0.02,1.87,0.14,0.03
U1005,-0.02,0.00,-0.01,0.01,-0.03,-0.01,0.00,0.02,-0.02,0.01,0.04,-0.01,-0.02,0.00,-0.01,0.02,0.01,0.00,-0.00,0.00,-0.03,0.22,-0.02,0.02,0.03,0.01,-0.22,-0.03,-0.00,0.00,-0.00,0.02,0.73,0.17,0.13,0.07,0.10,0.08,-0.15,-0.11,...,-0.07,0.08,-0.09,0.12,-0.03,-0.22,0.69,-0.00,-0.11,0.14,-0.08,-0.01,1.13,0.07,-0.17,0.00,0.19,-0.18,-0.11,0.07,1.60,0.02,-0.04,0.14,-0.12,-0.10,0.10,0.08,1.52,0.07,0.05,0.38,0.05,0.15,0.13,-0.01,0.02,0.25,0.03,-0.03
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
U1134,0.02,0.00,0.01,-0.02,0.01,-0.00,0.01,-0.01,0.02,-0.00,-0.02,-0.00,0.02,0.00,0.01,-0.01,-0.01,0.01,0.00,-0.00,0.01,0.17,0.02,-0.00,-0.02,-0.02,-0.05,0.02,0.01,0.00,-0.02,1.02,-0.08,-0.04,-0.01,0.03,0.04,-0.04,-0.06,-0.01,...,0.92,-0.01,1.98,0.92,-0.06,-0.07,-0.02,0.15,-0.11,0.24,0.05,1.96,0.05,-0.04,1.94,0.01,-0.09,-0.19,1.94,1.96,0.02,-0.02,0.03,0.04,-0.11,0.06,0.19,-0.04,0.03,0.80,-0.15,-0.04,-0.01,2.07,0.08,-0.03,-0.01,0.07,0.02,0.03
U1135,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
U1136,0.01,0.00,0.01,0.20,0.01,-0.00,0.00,-0.01,0.01,-0.00,-0.02,-0.00,0.01,0.00,0.01,-0.01,-0.01,0.00,-0.00,-0.00,0.01,-0.14,0.01,-0.00,-0.01,-0.02,-0.10,0.02,0.01,-0.00,-0.01,0.02,0.04,0.17,0.01,-0.01,-0.00,0.02,-0.03,-0.01,...,-0.16,0.10,2.20,0.13,0.55,-0.16,-0.16,-0.16,-0.12,0.06,-0.04,0.14,-0.02,-0.14,-0.08,-0.06,0.03,-0.18,0.90,-0.13,-0.15,-0.06,0.07,0.09,0.80,0.06,0.79,-0.09,0.07,0.24,0.32,0.11,-0.04,-0.01,0.06,-0.01,-0.01,0.08,-0.07,0.02
U1137,0.00,0.01,0.00,0.06,0.02,-0.00,0.01,0.00,0.00,0.00,0.01,0.00,0.01,0.00,0.00,0.01,0.00,0.01,0.01,0.00,0.00,1.90,0.00,-0.00,0.01,-0.06,1.96,0.01,-0.00,-0.01,0.02,1.99,-0.10,1.97,0.00,0.02,0.03,-0.01,-0.07,-0.01,...,-0.13,0.13,0.01,0.11,-0.24,1.82,-0.02,-0.05,-0.01,0.11,0.07,-0.12,-0.02,-0.14,1.95,-0.05,2.08,-0.12,-0.09,0.01,-0.09,0.02,0.02,0.05,-0.17,-0.05,0.01,1.99,0.07,0.08,0.15,-0.06,0.16,2.16,0.23,0.03,0.00,-0.05,-0.04,0.01


In [28]:
#prediction using other SVD recommender system
a = preds_df[135104]
a.loc['U1077']

0.012382409828239196