# Introduction to [Surprise](http://surpriselib.com)

Before we explore what Surprise has to offer, here's a quick reminder:

Recommender Systems have become ubiquitous in the modern data science landscape, as companies like Google, Netflix, Pandora, and Facebook rely on them to provide targeted content recommendations and create a more enjoyable user experience.  In this lab, we'll focus on the Surprise package.

[Collaborative Filtering](https://en.wikipedia.org/wiki/Collaborative_filtering) relies on a ***ratings matrix*** for all items, to generate similarities between items and users based on similar ratings.

[Content-Based Filtering](https://en.wikipedia.org/wiki/Recommender_system#Content-based_filtering) explicitly maps items and/or users into a shared feature space based on explicit user/item characteristics. State of the art recommenders will often rely on hybrid approaches, so seek understand the differences, strengths, and weaknesses of each approach.

In [None]:
# Install via conda:

# !conda install scikit-surprise -y

# alternatively try

# !conda install -c conda-forge scikit-surprise 

In [None]:
import pandas as pd
from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate, train_test_split
from surprise import accuracy

We'll be looking at a [jokes dataset called Jester](http://eigentaste.berkeley.edu/dataset/). This is fortunately built-in to Surprise and can be downloaded on the backend.

In [None]:
# Load the Jester dataset (download if needed)
# data = Dataset.load_builtin('jester')
data = Dataset.load_builtin('ml-100k')

> Look for the prompt above to download the dataset to a hidden location. Remember to delete if you need the storage space!

In [None]:
# We'll use the famous SVD algorithm.
algo = SVD(verbose=True)

# you can also build KNNBasic and other types of models

# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, n_jobs=-1, verbose=True)

# ml-100k dataset: this takes around .5 minute
# jester dataset: this takes around 10 minutes

In [None]:
# let's do train-test-split, where test set is 25% of the ratings
trainset, testset = train_test_split(data, test_size=.25)

# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

# you can also use this one-liner: `predictions = algo.fit(trainset).test(testset)`

In [None]:
# compute RMSE
accuracy.rmse(predictions)

In [None]:
# get a prediction for specific users and items.
uid = 3
iid = 15

pred = algo.predict(uid, iid, verbose=True)

The model says user 3 will slightly like joke 15!