# Collaborative Filtering Model Based

In [1]:
import surprise

In [2]:
url = '/home/henri/Documents/Lighthouse-lab/Databases/w10-d2-db/filmtrust/'

In [8]:
import numpy as np
import pandas as pd
import urllib
import io
import zipfile

dataset = pd.read_table(url + 'ratings.txt', sep= ' ', names = ['uid','iid','rating'])

In [9]:
dataset.head()

Unnamed: 0,uid,iid,rating
0,1,1,2.0
1,1,2,4.0
2,1,3,3.5
3,1,4,3.0
4,1,5,4.0


In [12]:
lower_rating = dataset['rating'].min()
upper_rating = dataset['rating'].max()
print(lower_rating,upper_rating)

0.5 4.0


In [14]:
reader = surprise.Reader(rating_scale= (.5,4))
data = surprise.Dataset.load_from_df(dataset,reader)

We will use the method SVD++, one of best performers in the Netflix challenge, which has now become a popular method for fitting recommender systems.

In [15]:
alg = surprise.SVDpp()
output = alg.fit(data.build_full_trainset())

Now we’ve fitted the model, we can check the predicted score of, for example, user 50 on a music artist 52 using the predict method.

In [16]:
pred = alg.predict(uid='50',iid = '52')
score = pred.est
score

3.0028030537791928

## Making Recommendations

Let’s make our recommendations to a particular user. Let’s focus on uid 50 and find one item to recommend them. First we need to find the movie ids that user 50 didn’t rate, since we don’t want to recommend them a movie they’ve already watched!

In [17]:
iids = dataset['iid'].unique() # series of all movies ids
iids50 = dataset.loc[dataset['uid']==50,'iid'] # rated movies by user
iids_to_pred = np.setdiff1d(iids,iids50) #remove iids that uid 50 has rated from the list

Next we want to predict the score of each of the movie ids that user 50 didn’t rate, and find the best one. For this we have to create another dataset with the iids we want to predict in the sparse format as before of: uid, iid, rating. We'll just arbitrarily set all the ratings of this test set to 4, as they are not needed.

In [18]:
testset = [[50,iid,4.] for iid in iids_to_pred]
predictions = alg.test(testset)
predictions[0]

Prediction(uid=50, iid=14, r_ui=4.0, est=3.1756350972746894, details={'was_impossible': False})

In [21]:
pred_ratings = np.array([pred.est for pred in predictions])
i_max = pred_ratings.argmax()

iid = iids_to_pred[i_max]

## Tuning and Evaluating the Model

As you probably already know, it is bad practice to fit a model on the whole dataset without checking its performance and tuning parameters which affect the fit. So for the remainder of the tutorial we’ll show you how to tune the parameters of SVD++ and evaluate the performance of the method. 

In [22]:
param_grid = {'lr_all':[.001,0.1], 'reg_all':[.1,.5]}
gs = surprise.model_selection.GridSearchCV(surprise.SVDpp,param_grid,measures=['rmse','mae'], cv =3)
gs.fit(data)

In [24]:
gs.best_params['rmse']

{'lr_all': 0.1, 'reg_all': 0.1}

In [25]:
alg = surprise.SVDpp(lr_all=.001)
output = surprise.model_selection.cross_validate(alg,data,verbose=True)

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8186  0.8118  0.8435  0.8290  0.8355  0.8277  0.0114  
MAE (testset)     0.6493  0.6439  0.6656  0.6542  0.6630  0.6552  0.0082  
Fit time          11.18   11.51   11.17   11.28   11.12   11.25   0.14    
Test time         0.39    0.38    0.40    0.77    0.38    0.46    0.15    
