In [1]:
from surprise import SVD, BaselineOnly, KNNWithMeans
from surprise import Dataset
from surprise import evaluate, print_perf

In [2]:
# Load the movielens-100k dataset (download it if needed),
# and split it into 3 folds for cross-validation.
data = Dataset.load_builtin('ml-1m')
data.split(n_folds=3)

* Рейтингов: 1,000,209 
* Фильмов: 3,900
* Пользователей: 6,040

## Item-based CF

$$\hat{r}_{ui} = \mu_i + \frac{ \sum\limits_{j \in N^k_u(i)}
\text{sim}(i, j) \cdot (r_{uj} - \mu_j)} {\sum\limits_{j \in
N^k_u(i)} \text{sim}(i, j)}$$

In [25]:
sim_options = {
    'name': 'cosine',
    'user_based': False  # compute  similarities between items
}

algo = KNNWithMeans(k=30, min_k=2, sim_options=sim_options)
perf = evaluate(algo, data, measures=['RMSE'])
print_perf(perf)

Evaluating RMSE of algorithm KNNWithMeans.

------------
Fold 1
Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 0.8987
------------
Fold 2
Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 0.9044
------------
Fold 3
Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 0.9015
------------
------------
Mean RMSE: 0.9015
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.8987  0.9044  0.9015  0.9015  


## Baseline predictor

$\hat{r}_{ui} = b_{ui} = \mu + b_u + b_i$

In [27]:
algo = BaselineOnly()
perf = evaluate(algo, data, measures=['RMSE'])
print_perf(perf)

Evaluating RMSE of algorithm BaselineOnly.

------------
Fold 1
Estimating biases using als...
RMSE: 0.9065
------------
Fold 2
Estimating biases using als...
RMSE: 0.9127
------------
Fold 3
Estimating biases using als...
RMSE: 0.9101
------------
------------
Mean RMSE: 0.9098
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.9065  0.9127  0.9101  0.9098  


## SVD

In [3]:
algo = SVD(n_factors=250)
perf = evaluate(algo, data, measures=['RMSE'])
print_perf(perf)

Evaluating RMSE of algorithm SVD.

------------
Fold 1
RMSE: 0.8937
------------
Fold 2
RMSE: 0.8928
------------
Fold 3
RMSE: 0.8936
------------
------------
Mean RMSE: 0.8933
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.8937  0.8928  0.8936  0.8933  
