Matrix Factorization:
  https://github.com/sanjayssane/Machine-Learning/blob/master/Surprise_MF_Filmtrust.ipynb

In [1]:
import pandas as pd
import numpy as np
import surprise

In [5]:
ratings = pd.read_csv("ratings.txt",sep=' ',names = ['uid','iid','rating'])
ratings.head()

Unnamed: 0,uid,iid,rating
0,1,1,2.0
1,1,2,4.0
2,1,3,3.5
3,1,4,3.0
4,1,5,4.0


In [7]:
lowest_rating = ratings['rating'].min()
highest_rating = ratings['rating'].max()
print("Ratings range between {0} and {1}".format(lowest_rating,highest_rating))

Ratings range between 0.5 and 4.0


In [9]:
reader = surprise.Reader(rating_scale = (lowest_rating,highest_rating))
data = surprise.Dataset.load_from_df(ratings,reader)
type(data)

surprise.dataset.DatasetAutoFolds

In [68]:
similarity_options = {'name': 'cosine', 'user_based': True} # for item_based : set user_based = False
# Default k = 40
algo = surprise.KNNBasic(sim_options = similarity_options)
output = algo.fit(data.build_full_trainset())

Computing the cosine similarity matrix...
Done computing similarity matrix.


List of User IDs

In [14]:
ratings['uid'].unique()

array([   1,    2,    3, ..., 1506, 1507, 1508], dtype=int64)

Expected rating for user 100 for item 217:

In [17]:
pred = algo.predict(uid='100',iid='900')
print(pred.est)

3.0028030537791928


In [19]:
pred

Prediction(uid='100', iid='900', r_ui=None, est=3.0028030537791928, details={'was_impossible': True, 'reason': 'User and/or item is unknown.'})

Total Items:

In [22]:
iids = ratings['iid'].unique()
print(iids)

[   1    2    3 ... 2069 2070 2071]


The list of items rated by user 100:

In [25]:
u_iid = ratings[ratings['uid']==100]['iid'].unique()
print("List of items rated by user 100:", u_iid)
print("No. of items rated by user {0}: {1}".format(100, len(u_iid)))

List of items rated by user 100: [215]
No. of items rated by user 100: 1


List of the items not rated by user 50:

In [28]:
iids_to_predict = np.setdiff1d(iids, u_iid)
print("Items not rated by 100 or those items for which the expected ratings are to be predicted:",iids_to_predict )

Items not rated by 100 or those items for which the expected ratings are to be predicted: [   1    2    3 ... 2069 2070 2071]


In [30]:
len(iids_to_predict)

2070

Extracting the estimated rating for iids_to_predict

In [33]:
testset = [[100,iid,0.] for iid in iids_to_predict]
predictions = algo.test(testset)
exp_ratings = [ (predictions[i].iid,predictions[i].est) for i in range(0,len(predictions)) ]
exp_ratings = pd.DataFrame(exp_ratings, columns=['iid','est_rating'])
exp_ratings.sort_values(by='est_rating',ascending=False).head()

Unnamed: 0,iid,est_rating
482,484,4.0
543,545,4.0
1763,1765,4.0
1339,1341,4.0
1764,1766,4.0


Tuning for best K

In [36]:
from surprise.model_selection import GridSearchCV
from surprise.model_selection.split import KFold

### User-Based Filtering

In [39]:
param_grid = {'k': np.arange(30,70,10),  'user_based':[True]}
param_grid

{'k': array([30, 40, 50, 60]), 'user_based': [True]}

In [41]:
kfold = KFold(n_splits=5, random_state=23, shuffle=True)
gs = GridSearchCV(surprise.KNNBasic, param_grid,measures=['rmse', 'mae'], cv=kfold)

In [43]:
gs.fit(data)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computi

Best Score & Paramter:

In [45]:
print(gs.best_score['rmse'])
print(gs.best_params['rmse'])

0.8641633357915124
{'k': 60, 'user_based': True}


We can now use the algorithm that yields the best rmse:

In [47]:
algo = gs.best_estimator['rmse']
algo.fit(data.build_full_trainset())

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x2d64bf2bc80>

The recommendations can be generated for any user with the object **algo**.

### Item-Based Filtering

In [53]:
param_grid = {'k': np.arange(30,70,10), 'user_based':[False]}
param_grid

{'k': array([30, 40, 50, 60]), 'user_based': [False]}

In [55]:
kfold = KFold(n_splits=5, random_state=23, shuffle=True)
gs = GridSearchCV(surprise.KNNBasic, param_grid,measures=['rmse', 'mae'], cv=kfold)

In [57]:
gs.fit(data)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computi

Best Score:

In [59]:
print(gs.best_score['rmse'])

0.8641633357915124


Best Parameter:

In [61]:
print(gs.best_params['rmse'])

{'k': 60, 'user_based': False}


We can now use the algorithm that yields the best rmse:

In [63]:
algo = gs.best_estimator['rmse']
algo.fit(data.build_full_trainset())

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x2d64ae6c4d0>

The recommendations can be generated for any user with the object **algo**.