
# Fit and predict

This notebook is based on the code from: https://github.com/NicolasHug/Surprise/blob/master/examples/predict_ratings.py


Once we know how we can load a dataset, the next step is train a model and use it to get some recommendations <br>
This module describes how to train on a full dataset (when no testset is built/specified) and how to use the predict() method.

We start by importing the Dataset module and a basic KNNBasic method <br>
https://surprise.readthedocs.io/en/stable/knn_inspired.html

In [1]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 278kB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1618292 sha256=15a30e6dbe709df650cb07738ade13b13ad9fbf93a57e1839a30f60986acbe20
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [2]:
from surprise import Dataset
from surprise import KNNBasic

## Loading the data
Load the movielens-100k dataset as we learnt in the previous notebook


In [3]:
data = Dataset.load_builtin('ml-100k')

Dataset ml-100k could not be found. Do you want to download it? [Y/n] y
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


With surprise, we can retrieve the trainset from the full dataset with the build_full_trainset function:<br>
https://surprise.readthedocs.io/en/stable/dataset.html#surprise.dataset.DatasetAutoFolds.build_full_trainset


In [4]:
trainset = data.build_full_trainset()


## Training
Build an algorithm and train it<br>
At this point we are interested in how the library works, not in the algorithms. In more advanced notebooks we will compare different algorithms

In [5]:
algo = KNNBasic()
algo.fit(trainset)


Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x7f61096a94a8>

## Predict
Once we have the algorithm trained, we can now query for specific predictions <br>


To do so, we need to define the user_id and the item_id <br>
We chose the user 196 and item 302 (which is in the trainset!, this way we know the true rating=4):

In [6]:
user_id = str(196)  # raw user id (as in the ratings file). They are **strings**!
item_id = str(302)  # raw item id (as in the ratings file). They are **strings**!
# the true ranking is useful to see if the prediction given by the system is accurate:
true_ranking = 4


Get a prediction for specific users and items by directly calling the predict() method. <br>
We provide the user_id and item_id to make the prediction and the true_ranking to compare it with the model prediction<br>
The flag verbose provides the result in the console<br>

In [7]:

pred = algo.predict(user_id, item_id, r_ui=true_ranking, verbose=True)


user: 196        item: 302        r_ui = 4.00   est = 4.06   {'actual_k': 40, 'was_impossible': False}


## Analize the result

The algo.predict has provided an output with the following format: <br>
user: 196        item: 302        r_ui = 4.00   est = 4.06   {'actual_k': 40, 'was_impossible': False}

We can analyze the output reading the documentation of the prediction module <br>
https://surprise.readthedocs.io/en/stable/predictions_module.html#surprise.prediction_algorithms.predictions.Prediction

We get the following output:
* Values we input to the system:
    * user (uid) – The (raw) user id. 
    * item (iid) – The (raw) item id. 
    * True ranking (r_ui= (float) – The true rating 𝑟𝑢𝑖
* Algorithm prediction:
    * Estimated ranking (est) (float) – The estimated rating (prediction using the trained method)
* Details (dict) – Stores additional details about the prediction that might be useful for later analysis.
    * For the KNNBasic we get the following ones:
        * actual_k: For a given prediction, the actual number of neighbors can be retrieved in the 'actual_k' field of the details dictionary of the prediction.
        * was_impossible: Exception raised when a prediction is impossible. When raised, the estimation is set to the global mean of all ratings 𝜇.