## Collaborative Filtering
#### Model Based Approach

In [3]:
pip install scikit-surprise==1.1.1

Collecting scikit-surprise==1.1.1
  Downloading scikit-surprise-1.1.1.tar.gz (11.8 MB)
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py): started
  Building wheel for scikit-surprise (setup.py): finished with status 'done'
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp38-cp38-win_amd64.whl size=743350 sha256=02ce56ee7e2d49f8fea18b3a5c3fc7621024a865d53834c6d385f56ab55f383a
  Stored in directory: c:\users\jesse\appdata\local\pip\cache\wheels\20\91\57\2965d4cff1b8ac7ed1b6fa25741882af3974b54a31759e10b6
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1
Note: you may need to restart the kernel to use updated packages.


In [4]:
import pandas as pd
# import SVD from surprise
from surprise import SVD

# # import dataset from surprise
from surprise import Dataset
from surprise import Reader


# import accuracy from surprise
from surprise import accuracy

# import train_test_split from surprise.model_selection
from surprise.model_selection import train_test_split
# import GridSearchCV from surprise.model_selection
from surprise.model_selection import GridSearchCV
# import cross_validate from surprise.model_selection
from surprise.model_selection import cross_validate

We will be working with the [same data](https://drive.google.com/file/d/1WvTmAfO09TCX7xp7uu06__ziic7JnrL5/view?usp=sharing) we used in the previous exercise.

In [5]:
book_ratings = pd.read_csv('BX-Book-Ratings.csv',sep=";", encoding="latin")

* create surprise dataset from book_ratings

In [6]:
reader = Reader(rating_scale=(0, 10))

# Loads Pandas dataframe
data = Dataset.load_from_df(book_ratings, reader)

* split data to train and test set, use test size 15%

In [7]:
trainset, testset = train_test_split(data, test_size=.15)

* Use SVD (with default settings) to create recommendations for each user
    - print default model's rmse that was computed on the test set (using object accuracy we imported in the beginning)

In [8]:
# We'll use the famous SVD algorithm.
algo = SVD()

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 3.5095


3.509497978404905

* create parameters grid, use this params:
* 'n_factors': [110, 120, 140, 160]
* 'reg_all': [0.08, 0.1, 0.15]

In [9]:
param_grid = {'n_factors': [110, 120, 140, 160],
              'reg_all': [0.08, 0.1, 0.15]}

* instantiate GridSearch with SVD as model, our pre-defined parameter grid and rmse and mae as evaluation metrics

In [10]:
gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

* fit GridSearch

In [11]:
gs.fit(data)

* print best RMSE score from training

In [12]:
print(gs.best_score['rmse'])

3.4425916334620577


* predict test set with optimal model based on `RMSE`

In [14]:
print(gs.best_params['rmse'])

{'n_factors': 160, 'reg_all': 0.15}


* print optimal model's RMSE that was computed on test set
    - is it better than the default parameters?