### Required Codio Assignment 19.2: Using SURPRISE

**Expected Time = 60 minutes**

**Total Points = 50**

This activity focuses on using the `Surprise` library to predict user ratings.  You will use a dataset derived from the movieLens data -- a common benchmark for recommendation algorithms.  Using `Surprise` you will load the data, create a train set and test set, make predictions for a test set, and cross validate the model on the dataset. 

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [1]:
from surprise import Dataset, Reader, SVD
import pandas as pd
import numpy as np

### The Data

The data is derived from the MovieLens data [here](https://grouplens.org/datasets/movielens/).  The original dataset has been sampled so the processing is faster.

The dataframe contain information about the user, movie, and the associated ratings when they exist.

In [2]:
movie_ratings = pd.read_csv('data/movie_ratings.csv', index_col=0)

In [3]:
movie_ratings.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


[Back to top](#-Index)

### Problem 1

#### Loading a Dataset

**10 Points**

Extract the columns `userId`, `title`, and `rating` from the `movie_ratings` dataframe and assign them to the variable `a`.

Initialize a `Reader` object, specifying that the ratings are on a scale from 0 to 5 and assign this result to `reader `. Next, use the `Dataset` object to convert the selected dataframe `a` into the format expected by `Surprise` using the `reader` object. Assign this result to `sf`.

Finally, use the `build_full_trainset` function on `sf` to build the full training set from the dataset, making it ready for training a recommendation algorithm. Assign this result to `train`.


In [4]:
### GRADED
reader = ''
sf = ''
train = ''

    
### BEGIN SOLUTION
a = movie_ratings[['userId', 'title', 'rating']]
reader = Reader(rating_scale=(0, 5))
sf = Dataset.load_from_df(a, reader)
train = sf.build_full_trainset()
### END SOLUTION

### ANSWER CHECK
print(type(sf))
print(type(train))

<class 'surprise.dataset.DatasetAutoFolds'>
<class 'surprise.trainset.Trainset'>


[Back to top](#-Index)

### Problem 2

#### Instantiate the `SVD` model

**10 Points**

Below, create an `SVD` object with 2 factors and assign it as `model` below.

In [6]:
### GRADED
model = ''

    
### BEGIN SOLUTION
model = SVD(n_factors = 2)
### END SOLUTION

### ANSWER CHECK
print(model.n_factors)

2


[Back to top](#-Index)

### Problem 3

### Fitting the Model

**10 Points**

Below, fit `model` on the training data `train`. 

In [8]:
### GRADED
#fit your model below. No variable needs to be assigned.

    
### BEGIN SOLUTION
model.fit(train)
### END SOLUTION

### ANSWER CHECK
print(model)

<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x73a02f1c8cc0>


[Back to top](#-Index)

### Problem 4

### Making Predictions

**10 Points**

Use the `build_testset` function on `train` to build a testset named `test`. Next, use `test` to create a list of predictions for the testset.  Assign the result to `predictions_list` below.

In [10]:
### GRADED
test = ''
predictions_list = ''

    
### BEGIN SOLUTION
test = train.build_testset()
predictions_list = model.test(test)
### END SOLUTION

### ANSWER CHECK
print(predictions_list[:5])

[Prediction(uid=1, iid='Toy Story (1995)', r_ui=4.0, est=4.6929849454515, details={'was_impossible': False}), Prediction(uid=1, iid='Grumpier Old Men (1995)', r_ui=4.0, est=4.025778279509925, details={'was_impossible': False}), Prediction(uid=1, iid='Heat (1995)', r_ui=4.0, est=4.7505863830043715, details={'was_impossible': False}), Prediction(uid=1, iid='Seven (a.k.a. Se7en) (1995)', r_ui=5.0, est=4.790983919359015, details={'was_impossible': False}), Prediction(uid=1, iid='Usual Suspects, The (1995)', r_ui=5.0, est=4.999876082459087, details={'was_impossible': False})]


[Back to top](#-Index)

### Problem 5

#### Cross Validate the Model

**10 Points**

You may use the test data to evaluate the model, as well as also cross validate the model using the data object `sf`. 

In the code cell below, use the `cross_validate` function to calculate the RMSE of the model. Assign the result to `cross_val_results` below. 

In [12]:
from surprise.model_selection import cross_validate

In [13]:
### GRADED
cross_val_results = ''

    
### BEGIN SOLUTION
cross_val_results = cross_validate(model, sf, measures=['RMSE'])
### END SOLUTION

### ANSWER CHECK
print(cross_val_results)

{'test_rmse': array([0.8696269 , 0.87505139, 0.87133353, 0.86538568, 0.86614243]), 'fit_time': (1.3928747177124023, 1.4190704822540283, 1.4176099300384521, 1.3964147567749023, 1.3946666717529297), 'test_time': (0.17817378044128418, 0.17273783683776855, 0.17184090614318848, 0.17148160934448242, 0.1666548252105713)}
