## Collaborative Filtering
#### Model Based Approach

In [1]:
import pandas as pd
# import SVD from surprise
from surprise import SVD

# # import dataset from surprise
from surprise import Dataset
from surprise import Reader


# import accuracy from surprise
from surprise import accuracy

# import train_test_split from surprise.model_selection
from surprise.model_selection import train_test_split
# import GridSearchCV from surprise.model_selection
from surprise.model_selection import GridSearchCV
# import cross_validate from surprise.model_selection
from surprise.model_selection import cross_validate

We will be working with the [same data](https://drive.google.com/file/d/1WvTmAfO09TCX7xp7uu06__ziic7JnrL5/view?usp=sharing) we used in the previous exercise.

In [2]:
book_ratings = pd.read_csv('BX-CSV-Dump/BX-Book-Ratings.csv',sep=";", encoding="latin")

In [3]:
book_ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


* create surprise dataset from book_ratings

In [4]:
reader = Reader(rating_scale=(0, 10))

# Loads Pandas dataframe
data = Dataset.load_from_df(book_ratings, reader)

* split data to train and test set, use test size 15%

In [5]:
trainset, testset = train_test_split(data, test_size=.15)

* Use SVD (with default settings) to create recommendations for each user
    - print default model's rmse that was computed on the test set (using object accuracy we imported in the beginning)

In [6]:
model = SVD()

In [7]:
output = model.fit(trainset)

In [8]:
base_pred = model.test(testset)

In [10]:
accuracy.rmse(base_pred)

RMSE: 3.4991


3.499074335147904

* create parameters grid, use this params:
* 'n_factors': [110, 120, 140, 160]
* 'reg_all': [0.08, 0.1, 0.15]

In [25]:
# params = {'n_factors':[120,140,160], 'reg_all':[0.08,0.1,0.15]}
params = {'n_factors':[120,160], 'reg_all':[0.08,0.15]}

* instantiate GridSearch with SVD as model, our pre-defined parameter grid and rmse and mae as evaluation metrics

In [26]:
grid = GridSearchCV(SVD, param_grid=params, measures=['rmse','mae'], n_jobs=-1)

* fit GridSearch

In [None]:
# to fit only on train_data need to convert? - see github questions
# `train_data = Dataset.load_from_df(reg_train[['user', 'movie', 'rating']], reader = reader)**try this
# trainset = train_data.build_full_trainset()

In [45]:
grid_res = grid.fit(data)

* print best RMSE score from training

In [46]:
print(grid.best_score['rmse'])

3.4333918345375514


In [47]:
print(grid.best_params['rmse']) #params with best RMSE score 

{'n_factors': 160, 'reg_all': 0.15}


* predict test set with optimal model based on `RMSE`

In [48]:
opt_model = grid.best_estimator['rmse']

In [54]:
model2 = SVD(n_factors=160, reg_all=0.15)

In [55]:
model2.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x179bacfd0>

In [57]:
m2_pred = model2.test(testset)

In [56]:
# opt model doesn't allow to test? 
# opt_preds = opt_model.test(testset)

* print optimal model's RMSE that was computed on test set
    - is it better than the default parameters?

In [58]:
print(accuracy.rmse(m2_pred))
print(accuracy.rmse(base_pred))

RMSE: 3.4271
3.4270749694971716
RMSE: 3.4991
3.499074335147904
