Surprise is an open-source Python library that makes it easy for developers to build recommender systems with explicit rating data. 

# Getting started

The `dataset` module defines the Dataset class and other subclasses which are used for managing datasets.
Three built-in datasets are available:

- The movielens-100k dataset.
- The movielens-1m dataset.
- The Jester dataset 2.

In [3]:
from surprise import Dataset

| Method      | Description |
| :--- | :--- |
| `Dataset.load_builtin` | Load a built-in dataset |
| `Dataset.load_from_file` | Load a dataset from a (custom) file |
| `Dataset.load_from_folds` | Load a dataset where folds (for cross-validation) are predefined by some files |
| `load_from_df` | Load a dataset from a pandas dataframe ([example](https://surprise.readthedocs.io/en/stable/getting_started.html#load-from-df-example)) |

Built-in datasets can all be loaded/downloaded using the `Dataset.load_builtin()` method.

In [4]:
# Load the movielens-100k dataset
data = Dataset.load_builtin('ml-100k')

In [5]:
# examine the ratings
data.raw_ratings

[('196', '242', 3.0, '881250949'),
 ('186', '302', 3.0, '891717742'),
 ('22', '377', 1.0, '878887116'),
 ('244', '51', 2.0, '880606923'),
 ('166', '346', 1.0, '886397596'),
 ('298', '474', 4.0, '884182806'),
 ('115', '265', 2.0, '881171488'),
 ('253', '465', 5.0, '891628467'),
 ('305', '451', 3.0, '886324817'),
 ('6', '86', 3.0, '883603013'),
 ('62', '257', 2.0, '879372434'),
 ('286', '1014', 5.0, '879781125'),
 ('200', '222', 5.0, '876042340'),
 ('210', '40', 3.0, '891035994'),
 ('224', '29', 3.0, '888104457'),
 ('303', '785', 3.0, '879485318'),
 ('122', '387', 5.0, '879270459'),
 ('194', '274', 2.0, '879539794'),
 ('291', '1042', 4.0, '874834944'),
 ('234', '1184', 2.0, '892079237'),
 ('119', '392', 4.0, '886176814'),
 ('167', '486', 4.0, '892738452'),
 ('299', '144', 4.0, '877881320'),
 ('291', '118', 2.0, '874833878'),
 ('308', '1', 4.0, '887736532'),
 ('95', '546', 2.0, '879196566'),
 ('38', '95', 5.0, '892430094'),
 ('102', '768', 2.0, '883748450'),
 ('63', '277', 4.0, '875747401

In [16]:
# import and instantiate the SVD recommender 
from surprise import SVD
rec = SVD()

In [17]:
# train/test split
from surprise.model_selection import train_test_split
train, test = train_test_split(data, test_size=.25)

In [18]:
# Train the algorithm on the train set, and predict ratings for the tes tset
rec.fit(train)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x21a9f2716d0>

In [19]:
predictions = rec.test(test)

In [20]:
predictions

[Prediction(uid='329', iid='286', r_ui=4.0, est=3.2582809809813496, details={'was_impossible': False}),
 Prediction(uid='911', iid='186', r_ui=5.0, est=3.626139553204203, details={'was_impossible': False}),
 Prediction(uid='870', iid='401', r_ui=3.0, est=2.7637918662831265, details={'was_impossible': False}),
 Prediction(uid='279', iid='80', r_ui=4.0, est=3.1009204779353254, details={'was_impossible': False}),
 Prediction(uid='44', iid='183', r_ui=4.0, est=4.383225459842061, details={'was_impossible': False}),
 Prediction(uid='451', iid='334', r_ui=3.0, est=3.25912369191376, details={'was_impossible': False}),
 Prediction(uid='418', iid='895', r_ui=4.0, est=2.3058452347177947, details={'was_impossible': False}),
 Prediction(uid='291', iid='124', r_ui=5.0, est=4.640278163121002, details={'was_impossible': False}),
 Prediction(uid='517', iid='258', r_ui=5.0, est=3.9196888972859165, details={'was_impossible': False}),
 Prediction(uid='7', iid='523', r_ui=4.0, est=4.105612345174519, detail

In [21]:
# Then compute RMSE
from surprise import accuracy
accuracy.rmse(predictions)

RMSE: 0.9393


0.9393182874757205

We can now predict ratings by directly calling the predict() method.
Let’s say you’re interested in user 196 and item 302

In [23]:
uid = str(196)  # raw user id 
iid = str(302)  # raw item id 

# get a prediction for specific users and items.
pred = rec.predict(uid, iid, r_ui=4, verbose=True)

user: 196        item: 302        r_ui = 4.00   est = 4.21   {'was_impossible': False}
