## Collaborative Filtering
#### Model Based Approach

In [10]:
import pandas as pd
# import SVD from surprise
from surprise import SVD

# # import dataset from surprise
from surprise import Dataset
from surprise import Reader


# import accuracy from surprise
from surprise import accuracy

# import train_test_split from surprise.model_selection
from surprise.model_selection import train_test_split
# import GridSearchCV from surprise.model_selection
from surprise.model_selection import GridSearchCV
# import cross_validate from surprise.model_selection
from surprise.model_selection import cross_validate

We will be working with the [same data](https://drive.google.com/file/d/1WvTmAfO09TCX7xp7uu06__ziic7JnrL5/view?usp=sharing) we used in the previous exercise.

In [2]:
book_ratings = pd.read_csv('./res/data/bx-books/BX-Book-Ratings.csv',sep=";", encoding="latin")

In [4]:
book_ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


* create surprise dataset from book_ratings

In [6]:
reader = Reader(rating_scale=(0, 10))

# Loads Pandas dataframe
data = Dataset.load_from_df(book_ratings, reader)

* split data to train and test set, use test size 15%

In [9]:
xtrain, xtest = train_test_split(data, test_size=0.15)

* Use SVD (with default settings) to create recommendations for each user
    - print default model's rmse that was computed on the test set (using object accuracy we imported in the beginning)

In [13]:
svd = SVD()
output = svd.fit(xtrain)

In [15]:
output.predict

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1301368e0>

In [23]:
pred = output.test(xtest)

In [24]:
accuracy.rmse(pred)

RMSE: 3.5020


3.5020136423244748

* create parameters grid, use this params:
* 'n_factors': [110, 120, 140, 160]
* 'reg_all': [0.08, 0.1, 0.15]

In [25]:
param_grid = {
    'n_factors': [110, 120, 140, 160],
    'reg_all': [0.08, 0.1, 0.15]
}

* instantiate GridSearch with SVD as model, our pre-defined parameter grid and rmse and mae as evaluation metrics

In [29]:
cv = GridSearchCV(SVD, param_grid=param_grid, measures=['rmse', 'mae'])

* fit GridSearch

In [30]:
cv.fit(data)

* print best RMSE score from training

In [31]:
print(cv.best_params['rmse'])

{'n_factors': 160, 'reg_all': 0.15}


* predict test set with optimal model based on `RMSE`

In [32]:
svd_optimized = SVD(n_factors=160, reg_all=0.15)

In [33]:
output_optimized = svd_optimized.fit(xtrain)

In [34]:
pred_optimized = output_optimized.test(xtest)

* print optimal model's RMSE that was computed on test set
    - is it better than the default parameters?

In [35]:
accuracy.rmse(pred_optimized)

RMSE: 3.4327


3.4326970406195456