# Train test split and accuracy metrics
This module describes how to do the train test split in surprise

This notebook is based on the code from https://github.com/NicolasHug/Surprise/blob/master/examples/train_test_split.py


## Imports
The function we are interested in the dataset is the train_test_split from the model_selection module <br>
Since we are separating train and test we can start computing the accuracy of the model, it is done with the accuracy module in surprise:

In [1]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 10.1MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1618270 sha256=4f4737325d074550004c59801089f91dc3ba9a49c0aa7d5a4417776f3731d3e8
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [2]:

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

from surprise import KNNBasic
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split


## Load the movielens-100k dataset


In [3]:
data = Dataset.load_builtin('ml-100k')

Dataset ml-100k could not be found. Do you want to download it? [Y/n] y
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


## Split between train and test
When we want to evaluate models, we need to work with different data to train and test the models, surprise provide the train_test_split method that samples a random trainset and testset with the test_size parameter we indicate the size of the test set (in this case is made of 25% of the ratings)

In [4]:
trainset, testset = train_test_split(data, test_size=.25)


## Train 
As done in the previous notebook, we will train the model, now only with the trainset <br>

We will use the KNNBasic (K Nearest Neighbors) algorithm (still we are not interested in the particular algorithms) <br>


In [5]:
algo = KNNBasic()

# Train the algorithm on the trainset
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x7fecb31e9470>

## Predict
We can predict ratings for the full testset using the algo.test function:


In [6]:
predictions = algo.test(testset)


Lets see a particular prediction (first row): <br>

Notice that predictions have both the true_ranking (r_ui value) and the estimated one (est value)


In [7]:
print("sample prediction (predictions[0]):")
print(predictions[0])


sample prediction (predictions[0]):
user: 43         item: 153        r_ui = 5.00   est = 3.94   {'actual_k': 40, 'was_impossible': False}


## Accuracy 
Once we have the predictions we can compute error measures on the test set (it uses r_ui and est). For more information, use the documentation in https://surprise.readthedocs.io/en/stable/accuracy.html
<br>
In the last chapter of the course (evaluating recommender systems) we define different error measures for recommender systems, here we use the ones referred in the previous link:

In [8]:
print("accuracy measures:")
accuracy.rmse(predictions)
accuracy.mse(predictions)
accuracy.mae(predictions)
accuracy.fcp(predictions)



accuracy measures:
RMSE: 0.9834
MSE: 0.9670
MAE:  0.7765
FCP:  0.7127


0.7127139433716007