### Codio Activity 19.6: Using SURPRISE

**Expected Time = 60 minutes**

**Total Points = 50**

This activity focuses on using the `Surprise` library to predict user ratings.  You will use a dataset derived from the movieLens data -- a common benchmark for recommendation algorithms.  Using `Surprise` you will load the data, create a train set and test set, make predictions for a test set, and cross validate the model on the dataset. 

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [1]:
from surprise import Dataset, Reader, SVD
import pandas as pd
import numpy as np

### The Data

The data is derived from the MovieLens data [here](https://grouplens.org/datasets/movielens/).  A smaller sample has been culled so the processing is faster, but the data is user reviews of different movies.  We have information on the user, movie, and the associated ratings when they exist.

In [2]:
movie_ratings = pd.read_csv('data/movie_ratings.csv', index_col=0)

In [3]:
movie_ratings.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


[Back to top](#-Index)

### Problem 1

#### Loading a Dataset

**10 Points**

Below, use the `Reader` and `Dataset` objects to create a dataset object named `sf` below.  Use the dataset to construct a train set named `train`.

In [4]:
### GRADED
reader = ''
sf = ''
train = ''

    
# YOUR CODE HERE
a = movie_ratings[['userId', 'title', 'rating']]
reader = Reader(rating_scale=(0, 5))
sf = Dataset.load_from_df(a, reader)
train = sf.build_full_trainset()

### ANSWER CHECK
print(type(sf))
print(type(train))

<class 'surprise.dataset.DatasetAutoFolds'>
<class 'surprise.trainset.Trainset'>


[Back to top](#-Index)

### Problem 2

#### Instantiate the `SVD` model

**10 Points**

Below, create an `SVD` object with 2 factors and assign it as `model` below.

In [5]:
### GRADED
model = ''

    
# YOUR CODE HERE
model = SVD(n_factors = 2)

### ANSWER CHECK
print(model.n_factors)

2


[Back to top](#-Index)

### Problem 3

### Fitting the Model

**10 Points**

Below, fit the model on the training data. 

In [6]:
### GRADED
#fit your model below. No variable needs to be assigned.

    
# YOUR CODE HERE
model.fit(train)

### ANSWER CHECK
print(model)

<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x0000024BBC3D3400>


[Back to top](#-Index)

### Problem 4

### Making Predictions

**10 Points**

Build a testset named `test` and use this to create a list of predictions for the testset.  Assign this to `predictions_list` below.

In [7]:
### GRADED
test = ''
predictions_list = ''

    
# YOUR CODE HERE
test = train.build_testset()
predictions_list = model.test(test)

### ANSWER CHECK
print(predictions_list[:5])

[Prediction(uid=1, iid='Toy Story (1995)', r_ui=4.0, est=4.6674402087960045, details={'was_impossible': False}), Prediction(uid=1, iid='Grumpier Old Men (1995)', r_ui=4.0, est=4.033428283968486, details={'was_impossible': False}), Prediction(uid=1, iid='Heat (1995)', r_ui=4.0, est=4.756158740635537, details={'was_impossible': False}), Prediction(uid=1, iid='Seven (a.k.a. Se7en) (1995)', r_ui=5.0, est=4.794107695501601, details={'was_impossible': False}), Prediction(uid=1, iid='Usual Suspects, The (1995)', r_ui=5.0, est=5, details={'was_impossible': False})]


[Back to top](#-Index)

### Problem 5

#### Cross Validate the Model

**10 Points**

You may use the test data to evaluate the model, but we can also cross validate the model using the data object `sf`.  Use `RMSE` to cross validate and assign these to `cross_val_results` below. 

In [8]:
from surprise.model_selection import cross_validate

In [9]:
### GRADED
cross_val_results = ''

    
# YOUR CODE HERE
cross_val_results = cross_validate(model, sf, measures=['RMSE'])

### ANSWER CHECK
print(cross_val_results)

{'test_rmse': array([0.86484957, 0.86803642, 0.86692636, 0.87752086, 0.86978847]), 'fit_time': (0.3362917900085449, 0.35906338691711426, 0.3570291996002197, 0.34102821350097656, 0.3528022766113281), 'test_time': (0.13353896141052246, 0.059000492095947266, 0.05594778060913086, 0.11398863792419434, 0.059426069259643555)}
