### Using `Surprise`

This activity focuses on using the `Surprise` library to predict user ratings.  You will use a dataset derived from the movie lense data -- a common benchmark for recommendation algorithms.  Using `Surprise` you will load the data, create a train set and test set, make predictions for a test set, and cross validate the model on the dataset. 

### Index

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [83]:
from surprise import Dataset, Reader, SVD
import pandas as pd
import numpy as np

### The Data

The data is derived from the MovieLens data [here](https://grouplens.org/datasets/movielens/).  A smaller sample has been culled so the processing is faster, but the data is user reviews of different movies.  We have information on the user, movie, and the associated ratings when they exist.

In [84]:
movie_ratings = pd.read_csv('data/movie_ratings.csv', index_col=0)

In [85]:
movie_ratings.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


[Back to top](#-Index)

### Problem 1

#### Loading a Dataset

**10 Points**

Below, use the `Reader` and `Dataset` objects to create a dataset object named `sf` below.  Use the dataset to construct a train set named `train`.

In [86]:
### GRADED
reader = ''
sf = ''
train = ''

    
### BEGIN SOLUTION
a = movie_ratings[['userId', 'title', 'rating']]
reader = Reader(rating_scale=(0, 5))
sf = Dataset.load_from_df(a, reader)
train = sf.build_full_trainset()
### END SOLUTION

### ANSWER CHECK
print(type(sf))
print(type(train))

<class 'surprise.dataset.DatasetAutoFolds'>
<class 'surprise.trainset.Trainset'>


In [87]:
### BEGIN HIDDEN TESTS
a_ = movie_ratings[['userId', 'title', 'rating']]
reader_ = Reader(rating_scale=(0, 5))
sf_ = Dataset.load_from_df(a_, reader_)
train_ = sf_.build_full_trainset()
#
#
#
assert train.all_items() == train_.all_items()
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 2

#### Instantiate the `SVD` model

Below, create an `SVD` object with 2 factors and assign it as `model` below.

In [88]:
### GRADED
model = ''

    
### BEGIN SOLUTION
model = SVD(n_factors = 2)
### END SOLUTION

### ANSWER CHECK
print(model.n_factors)

2


In [89]:
### BEGIN HIDDEN TESTS
a_ = movie_ratings[['userId', 'title', 'rating']]
reader_ = Reader(rating_scale=(0, 5))
sf_ = Dataset.load_from_df(a_, reader_)
train_ = sf_.build_full_trainset()
model_ = SVD(n_factors = 2)
#
#
#
assert train.all_items() == train_.all_items()
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 3

### Fitting the Model

Below, fit the model on the training data. 

In [90]:
### GRADED
 #fit your model below

    
### BEGIN SOLUTION
model.fit(train)
### END SOLUTION

### ANSWER CHECK
print(model)

<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x7fbdbd8ef890>


In [93]:
### BEGIN HIDDEN TESTS
a_ = movie_ratings[['userId', 'title', 'rating']]
reader_ = Reader(rating_scale=(0, 5))
sf_ = Dataset.load_from_df(a_, reader_)
train_ = sf_.build_full_trainset()
model_ = SVD(n_factors = 2)
model_.fit(train_)
#
#
#
assert type(model) == type(model_)
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 4

### Making Predictions

Build a testset named `test` and use this to create a list of predictions for the testset.  Assign this to `predictions_list` below.

In [74]:
### GRADED
test = ''
predictions_list = ''

    
### BEGIN SOLUTION
test = train.build_testset()
predictions_list = model.test(test)
### END SOLUTION

### ANSWER CHECK
print(predictions_list[:5])

[Prediction(uid=1, iid='Toy Story (1995)', r_ui=4.0, est=4.72221554408827, details={'was_impossible': False}), Prediction(uid=1, iid='Grumpier Old Men (1995)', r_ui=4.0, est=4.0275581894141315, details={'was_impossible': False}), Prediction(uid=1, iid='Heat (1995)', r_ui=4.0, est=4.751626561965673, details={'was_impossible': False}), Prediction(uid=1, iid='Seven (a.k.a. Se7en) (1995)', r_ui=5.0, est=4.868344141268149, details={'was_impossible': False}), Prediction(uid=1, iid='Usual Suspects, The (1995)', r_ui=5.0, est=5, details={'was_impossible': False})]


In [94]:
### BEGIN HIDDEN TESTS
test_ = train_.build_testset()
predictions_list_ = model_.test(test_)
#
#
#
assert type(predictions_list[0]) == type(predictions_list_[0])
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 5

#### Cross Validate the Model

You may use the test data to evaluate the model, but we can also cross validate the model using the data object `sf`.  Use `RMSE` to cross validate with 5 folds and your results and assign these to `cross_val_results` below. 

In [47]:
from surprise.model_selection import cross_validate

In [75]:
### GRADED
cross_val_results = ''

    
### BEGIN SOLUTION
cross_val_results = cross_validate(model, sf, measures=['RMSE'])
### END SOLUTION

### ANSWER CHECK
print(cross_val_results)

{'test_rmse': array([0.87244772, 0.87385015, 0.86235689, 0.8717439 , 0.86558502]), 'fit_time': (0.8716928958892822, 0.8981771469116211, 0.9042961597442627, 0.8981382846832275, 0.9012491703033447), 'test_time': (0.27602505683898926, 0.08134293556213379, 0.08190298080444336, 0.0822598934173584, 0.0808858871459961)}


In [96]:
### BEGIN HIDDEN TESTS
cross_val_results_ = cross_validate(model, sf, measures=['RMSE'])
#
#
#
assert cross_val_results.keys() == cross_val_results_.keys()
### END HIDDEN TESTS