### Required Codio Assignment 19.3: Hybrid Recommendations with SURPRISE

**Expected Time = 90 minutes**

**Total Points = 50**

This activity introduces the idea of using hybrid recommendations with the Surprise library.  Below, you will combine different algorithms predictions to create these hybrid recommendations.  You are again to use the `SVD` algorithm and will combine with the `KNNBasic` algorithm for hybrid recommendations.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)


In [1]:
import pandas as pd
from surprise import Reader, SVD, Dataset, NormalPredictor, KNNBasic
from surprise.model_selection import cross_validate

#### The Data

For this activity, you will use again a sampled set of data from Movie Lens.  The data is loaded and displayed below.

In [2]:
df = pd.read_csv('data/movie_ratings.csv', index_col=0)

In [3]:
df.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


[Back to top](#-Index)

### Problem 1

#### Loading the Data 

**10 Points**

Initialize a `Reader` object with argument `line_format` equal to `item user rating` and assign this result to `reader `. 

Next, use the `load_from_df` function on the `Dataset` object to convert the columns `title`, `userId` and `rating` of `df` in order and the `reader` object to a format thta `Surprise` can interpret. Assign this result to `data`.


Use the `build_full_trainset` function on `data` to build the full training set from the dataset, making it ready for training a recommendation algorithm. Assign this result to `train`.

Use the `build_testset` function on `train` to create a test set and assign this result to the variable `test`.



In [4]:
### GRADED
reader = ''
data = ''
train = ''
test = ''

    
### BEGIN SOLUTION
reader = Reader(line_format='item user rating')
data = Dataset.load_from_df(df[['title', 'userId', 'rating']], reader)
train = data.build_full_trainset()
test = train.build_testset()
### END SOLUTION

### ANSWER CHECK
print(type(train))
print(type(test))

<class 'surprise.trainset.Trainset'>
<class 'list'>


[Back to top](#-Index)

### Problem 2

#### SVD Model

**10 Points**

Now, create  an `SVD` model with  `random_state = 42`  as `svd` below.  Fit this model on the training data `train`. Make predictions using the model on the test set and assign these to `svd_preds` below.  


In [6]:
### GRADED
svd = ''
svd_preds = ''

    
### BEGIN SOLUTION
svd = SVD(random_state = 42)
svd.fit(train)
svd_preds = svd.test(test)
### END SOLUTION

### ANSWER CHECK
print(svd_preds[:5])

[Prediction(uid='Toy Story (1995)', iid=1, r_ui=4.0, est=4.402273922317781, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=5, r_ui=4.0, est=4.03204694074919, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=7, r_ui=4.5, est=4.1184783390134, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=15, r_ui=2.5, est=3.298923822257094, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=17, r_ui=4.5, est=4.191087116406786, details={'was_impossible': False})]


[Back to top](#-Index)

### Problem 3

#### SlopeOne Model

**10 Points**

Next, initialize a `SlopeOne` model below as `slope_one`.  Fit this model on the train data `train`. 

Finally, compute the test set predictions and assign them to the variable `slope_one_preds` below. 

In [8]:
from surprise import SlopeOne

In [9]:
### GRADED
slope_one = ''
slope_one_preds = ''

    
### BEGIN SOLUTION
slope_one = SlopeOne()
slope_one.fit(train)
slope_one_preds = slope_one.test(test)
### END SOLUTION

### ANSWER CHECK
print(slope_one_preds[:5])

[Prediction(uid='Toy Story (1995)', iid=1, r_ui=4.0, est=4.562773780247601, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=5, r_ui=4.0, est=3.876436672186221, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=7, r_ui=4.5, est=3.7666616617854367, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=15, r_ui=2.5, est=3.5513821707689663, details={'was_impossible': False}), Prediction(uid='Toy Story (1995)', iid=17, r_ui=4.5, est=4.113053394469709, details={'was_impossible': False})]


[Back to top](#-Index)

### Problem 4

#### Hybrid Predictions

**10 Points**

Now, use both the `slope_one_preds` and `svd_preds`  to average the predicted values for each user as new predictions.  Assign your results to the list `hybrid_preds` below.

In [11]:
### GRADED
hybrid_preds = ''

    
### BEGIN SOLUTION
hybrid_preds = [0.5*i.est + 0.5*j.est for i,j in zip(slope_one_preds, svd_preds)]
### END SOLUTION

### ANSWER CHECK
hybrid_preds[:5]

[4.482523851282691,
 3.9542418064677056,
 3.942570000399418,
 3.42515299651303,
 4.152070255438248]

[Back to top](#-Index)

### Problem 5

#### DataFrame of predictions

**10 Points**

Finally, create a DataFrame consisting of the user id, movie, and predicted hybrid ratings as `hybrid_df` below.  The table should begin as:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Title</th>      <th>user_id</th>      <th>hybrid_rating</th>      <th>svd_rating</th>      <th>slope_one_rating</th>    </tr>  </thead>  <tbody>    <tr>      <th>0</th>      <td>Toy Story (1995)</td>      <td>1</td>      <td>4.482524</td>      <td>4.402274</td>      <td>4.562774</td>    </tr>    <tr>      <th>1</th>      <td>Toy Story (1995)</td>      <td>5</td>      <td>3.954242</td>      <td>4.032047</td>      <td>3.876437</td>    </tr>    <tr>      <th>2</th>      <td>Toy Story (1995)</td>      <td>7</td>      <td>3.942570</td>      <td>4.118478</td>      <td>3.766662</td>    </tr>    <tr>      <th>3</th>      <td>Toy Story (1995)</td>      <td>15</td>      <td>3.425153</td>      <td>3.298924</td>      <td>3.551382</td>    </tr>    <tr>      <th>4</th>      <td>Toy Story (1995)</td>      <td>17</td>      <td>4.152070</td>      <td>4.191087</td>      <td>4.113053</td>    </tr>  </tbody></table>

In [13]:
### GRADED
hybrid_df = ''

    
### BEGIN SOLUTION
data = {'user_id': [i.uid for i in slope_one_preds],
       'title': [i.iid for i in slope_one_preds],
       'hybrid_rating': hybrid_preds,
       'svd_rating': [i.est for i in svd_preds],
       'slope_one_rating': [i.est for i in slope_one_preds]}

hybrid_df = pd.DataFrame(data)
### END SOLUTION

### ANSWER CHECK
hybrid_df.head()

Unnamed: 0,user_id,title,hybrid_rating,svd_rating,slope_one_rating
0,Toy Story (1995),1,4.482524,4.402274,4.562774
1,Toy Story (1995),5,3.954242,4.032047,3.876437
2,Toy Story (1995),7,3.94257,4.118478,3.766662
3,Toy Story (1995),15,3.425153,3.298924,3.551382
4,Toy Story (1995),17,4.15207,4.191087,4.113053


### Conclusion

There are many further steps with hybrid recommendations including writing a custom algorithm object with `Surprise`.  Note that you can incorporate the similarity of the objects much like we had in our distance based recommendations.