<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">


# Recommendations with surprise

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-the-movielens-100k-dataset-from-disk" data-toc-modified-id="Load-the-movielens-100k-dataset-from-disk-1">Load the movielens-100k dataset from disk</a></span><ul class="toc-item"><li><span><a href="#Instantiate-the-algorithm" data-toc-modified-id="Instantiate-the-algorithm-1.1">Instantiate the algorithm</a></span></li><li><span><a href="#Extract-the-model-parameters" data-toc-modified-id="Extract-the-model-parameters-1.2">Extract the model parameters</a></span></li><li><span><a href="#Evaluate-the-model:" data-toc-modified-id="Evaluate-the-model:-1.3">Evaluate the model:</a></span></li><li><span><a href="#Put-the-predictions-in-a-dataframe" data-toc-modified-id="Put-the-predictions-in-a-dataframe-1.4">Put the predictions in a dataframe</a></span></li><li><span><a href="#Correlations-between-predicted-and-true-ratings" data-toc-modified-id="Correlations-between-predicted-and-true-ratings-1.5">Correlations between predicted and true ratings</a></span></li></ul></li><li><span><a href="#Cross-validation,-train-test-split-and-grid-search" data-toc-modified-id="Cross-validation,-train-test-split-and-grid-search-2">Cross validation, train-test split and grid search</a></span></li><li><span><a href="#Slope-One" data-toc-modified-id="Slope-One-3">Slope One</a></span></li><li><span><a href="#KNN-with-Means" data-toc-modified-id="KNN-with-Means-4">KNN with Means</a></span></li><li><span><a href="#Precision@k-and-Recall@k" data-toc-modified-id="Precision@k-and-Recall@k-5">Precision@k and Recall@k</a></span></li><li><span><a href="#Top-n-predictions" data-toc-modified-id="Top-n-predictions-6">Top-n predictions</a></span><ul class="toc-item"><li><span><a href="#Coverage" data-toc-modified-id="Coverage-6.1">Coverage</a></span></li></ul></li></ul></div>

In this lab we will make use of the [surprise package](https://surprise.readthedocs.io/en/stable/index.html), a package dedicated to recommendation systems.

`conda install -c conda-forge scikit-surprise`

First we will need some data. Download the Movielens 100K dataset from [here](https://grouplens.org/datasets/movielens/). 
It is a very famous dataset about movie ratings.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('ggplot')
sns.set(font_scale=1.5)
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [2]:
# load surprise
import surprise as sur

## Load the movielens-100k dataset from disk

With the above command we could load the data in a simplified and already prepared way. As reading and preparing other files is not that straight-forward, we will rather load the file from disk.

In [3]:
df_data = pd.read_csv(
    './ml-100k/u.data', sep='\t', header=None)
df_data.columns = ['user_id', 'item_id', 'rating', 'timestamp']
df_data.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [4]:
df_data.rating.describe()

count    100000.000000
mean          3.529860
std           1.125674
min           1.000000
25%           3.000000
50%           4.000000
75%           4.000000
max           5.000000
Name: rating, dtype: float64

The reader function serves to specify which columns are referring to user, items and ratings as well as the rating scale.

In [5]:
reader = sur.Reader(rating_scale=(1, 5))

In [6]:
# The columns must correspond to user id, item id and ratings (in that order).
data_1 = sur.Dataset.load_from_df(
    df_data[['user_id', 'item_id', 'rating']], reader)

### Instantiate the algorithm

In [7]:
algo = sur.SVD(random_state=1,
               biased=True,  # isolate biases
               reg_all=0.2,  # use regularisation (the same for all)
               n_epochs=20,  # number of epochs for stochastic gradient descent search
               n_factors=100  # number of factors to retain in SVD
               )

# we have to build a training set from the data - it's not a train-test, just the full data set for training
trainset_full = data_1.build_full_trainset()
# fit the model
algo.fit(trainset_full)

# we prepare a test set from the training set
trainsetfull_build = trainset_full.build_testset()
# obtain the predictions
predictions_full = algo.test(trainsetfull_build)
# evaluate the predictions - you can call others from within sur.accuracy
print(sur.accuracy.rmse(predictions_full, verbose=False))

0.9167802882204997


In [8]:
predictions_full[:5]

[Prediction(uid=196, iid=242, r_ui=3.0, est=3.911793041493128, details={'was_impossible': False}),
 Prediction(uid=196, iid=393, r_ui=4.0, est=3.4204953983043573, details={'was_impossible': False}),
 Prediction(uid=196, iid=381, r_ui=4.0, est=3.5515672295024134, details={'was_impossible': False}),
 Prediction(uid=196, iid=251, r_ui=3.0, est=4.108294470660626, details={'was_impossible': False}),
 Prediction(uid=196, iid=655, r_ui=5.0, est=3.7920116316066803, details={'was_impossible': False})]

### Extract the model parameters

In [9]:
mu = algo.default_prediction()
bu = algo.bu
bi = algo.bi
pu = algo.pu
qi = algo.qi
puqi = pu.dot(qi.T)

> Note that internally surprise uses other (inner) indices for users and items than in the original data.
> The original ones are the raw indices. There are functions to translate between the two.

In [10]:
# check that we can reconstruct the predictions using the parameters
i = 10
print(predictions_full[i])
print()
uid = predictions_full[i].uid
iid = predictions_full[i].iid
u_inner = trainset_full.to_inner_uid(uid)
i_inner = trainset_full.to_inner_iid(iid)

pred_calc = mu + bu[u_inner] + bi[i_inner] + puqi[u_inner, i_inner]
print('Results agree:', predictions_full[i].est - pred_calc)

user: 196        item: 580        r_ui = 2.00   est = 3.34   {'was_impossible': False}

Results agree: 0.0


### Evaluate the model:

In [11]:
sur.accuracy.rmse(predictions_full)
sur.accuracy.mae(predictions_full);

RMSE: 0.9168
MAE:  0.7305


### Put the predictions in a dataframe

In [12]:
# true and estimated into a df:
df_pred = pd.DataFrame([(x.r_ui, x.est) for x in predictions_full],
                       columns=['Rating', 'Predicted'])

In [13]:
# reconstruct RMSE
np.sqrt(df_pred.apply(lambda x: (x[0]-x[1])**2, axis=1).mean())

0.9167802882205027

In [14]:
# reconstruct MAE
df_pred.apply(lambda x: abs(x[0]-x[1]), axis=1).mean()

0.7305196298517427

### Correlations between predicted and true ratings

In [15]:
df_pred.corr(method='pearson')

Unnamed: 0,Rating,Predicted
Rating,1.0,0.591973
Predicted,0.591973,1.0


In [16]:
df_pred.corr(method='spearman')
# often we only care about the monotonic relationship between true and pred ratings... not the actual linear
# 1:1 unit relationship between the two

Unnamed: 0,Rating,Predicted
Rating,1.0,0.576486
Predicted,0.576486,1.0


In [17]:
df_pred.corr(method='kendall')
# this is a rank correlation too, but for all possible pairwise observations, asks which one of the pair is 
# higher and repeats for all possible pairwise combinations... tries to adjust for the fact that true will have
# many ties as it's 1,2,3,4 or 5, whereas predict won't

Unnamed: 0,Rating,Predicted
Rating,1.0,0.451749
Predicted,0.451749,1.0


## Cross validation, train-test split and grid search

Example from https://surprise.readthedocs.io/en/stable/FAQ.html?highlight=raw_ratings

In [18]:
import random

raw_ratings = data_1.raw_ratings
np.random.seed(1)
# shuffle ratings if you want
random.shuffle(raw_ratings)

# A = 90% of the data, B = 10% of the data
threshold = int(.9 * len(raw_ratings))
A_raw_ratings = raw_ratings[:threshold]
B_raw_ratings = raw_ratings[threshold:]

print(len(A_raw_ratings))
print(len(B_raw_ratings))

data_1.raw_ratings = A_raw_ratings  # data is now the set A

90000
10000


In [19]:
data_1

<surprise.dataset.DatasetAutoFolds at 0x1a1af59470>

In [20]:
len(data_1.raw_ratings)

90000

In [21]:
algo = sur.SVD(random_state=1)

In [22]:
cv_results = sur.model_selection.cross_validate(
    algo, data_1, measures=['RMSE', 'MAE'], cv=5)
pd.DataFrame(cv_results) # each row = each fold

Unnamed: 0,test_rmse,test_mae,fit_time,test_time
0,0.940265,0.744632,5.777606,0.296309
1,0.935984,0.73663,5.761301,0.157348
2,0.946569,0.747572,5.818404,0.16334
3,0.943829,0.745388,5.750789,0.160263
4,0.935082,0.73773,8.817828,0.178889


In [23]:
# All well and good but we'd like to tune for our parameters:
# Select your best algo with grid search.
print('Grid Search...')
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005]}
grid_search = sur.model_selection.GridSearchCV(sur.SVD,
                                               param_grid,
                                               measures=['rmse'],
                                               cv=3,
                                               refit=True)
grid_search.fit(data_1)

algo = grid_search.best_estimator['rmse']

# retrain on the whole set A
trainset = data_1.build_full_trainset()
algo.fit(trainset)

# Compute score on training set
trainset_build = trainset.build_testset()
predictions_train = algo.test(trainset_build)
print('Training score ', end='   ')
sur.accuracy.rmse(predictions_train)

# Compute score on rated test set
testset = data_1.construct_testset(B_raw_ratings)  # testset is now the set B, ie. the 10% you created
predictions_test = algo.test(testset)
print('Test score (rated items) ', end=' ')
sur.accuracy.rmse(predictions_test)

# Compute score on unrated data
# The anti-test set is the part where we did not have any ratings
no_ratings = trainset.build_anti_testset()
predictions_no_ratings = algo.test(no_ratings)
print('Test score (unrated items) ', end='   ')
sur.accuracy.rmse(predictions_no_ratings, verbose=False)

Grid Search...
Training score    RMSE: 0.8380
Test score (rated items)  RMSE: 0.9362
Test score (unrated items)    

0.5207896451467428

In [24]:
# note - we're saying here that rmse for the unrated is quite low vs. baseline, so we're saying that our model
# is quite different to the baseline

In [25]:
print(len(trainset_build), len(testset), len(no_ratings))

90000 10000 1481981


In [26]:
print(predictions_train[0])
print(predictions_test[0])
print(predictions_no_ratings[0])

user: 449        item: 1073       r_ui = 5.00   est = 3.97   {'was_impossible': False}
user: 932        item: 967        r_ui = 4.00   est = 3.47   {'was_impossible': False}
user: 449        item: 275        r_ui = 3.53   est = 4.30   {'was_impossible': False}


In [27]:
# extract model parameters
mu = algo.default_prediction()
print(f'Training set mean: {mu:.6}')
bu = algo.bu
bi = algo.bi
pu = algo.pu
qi = algo.qi
puqi = pu.dot(qi.T)

Training set mean: 3.52856


In [28]:
bu[0]

0.11292889945173722

In [29]:
# reconstruct predictions
i = 10
print(predictions_train[i])
print()
uid = predictions_train[i].uid
iid = predictions_train[i].iid
u_inner = trainset.to_inner_uid(uid)
i_inner = trainset.to_inner_iid(iid)

pred_calc = mu + bu[u_inner] + bi[i_inner] + puqi[u_inner, i_inner]
print('Results agree:', predictions_train[i].est - pred_calc)

user: 449        item: 546        r_ui = 2.00   est = 3.06   {'was_impossible': False}

Results agree: 0.0


## Slope One

Repeat the same steps with the slope one model.

In [30]:
algo = sur.SlopeOne()

In [31]:
cv_results = sur.model_selection.cross_validate(algo, data_1, measures=['RMSE', 'MAE'], cv=5)
pd.DataFrame(cv_results) # each row = each fold

Unnamed: 0,test_rmse,test_mae,fit_time,test_time
0,0.945544,0.741333,1.73795,6.742487
1,0.951233,0.750014,1.303606,4.458552
2,0.952977,0.75071,1.166876,5.111754
3,0.948424,0.745197,1.611681,4.552875
4,0.94227,0.741513,0.972226,6.681476


In [32]:
# retrain on the whole set A
trainset = data_1.build_full_trainset()
algo.fit(trainset)

# Compute score on training set
trainset_build = trainset.build_testset()
predictions_train = algo.test(trainset_build)
print('Training score ', end='   ')
sur.accuracy.rmse(predictions_train)

# Compute score on rated test set
testset = data_1.construct_testset(B_raw_ratings)  # testset is now the set B
predictions_test = algo.test(testset)
print('Test score (rated items) ', end=' ')
sur.accuracy.rmse(predictions_test)

# Compute score on unrated data
# The anti-test set is the part where we did not have any ratings
no_ratings = trainset.build_anti_testset()
predictions_no_ratings = algo.test(no_ratings)
print('Test score (unrated items) ', end='   ')
sur.accuracy.rmse(predictions_no_ratings, verbose=False)

Training score    RMSE: 0.8473
Test score (rated items)  RMSE: 0.9339


KeyboardInterrupt: 

## KNN with Means

Repeat the same steps with the kNN with means model.

In [33]:
algo = sur.KNNWithMeans()

In [34]:
print('Grid Search...')
param_grid = {'k': [20, 40, 100]}
grid_search = sur.model_selection.GridSearchCV(sur.KNNWithMeans,
                                               param_grid,
                                               measures=['rmse'],
                                               cv=3,
                                               refit=True)
grid_search.fit(data_1)

algo = grid_search.best_estimator['rmse']

# retrain on the whole set A
trainset = data_1.build_full_trainset()
algo.fit(trainset)

# Compute score on training set
trainset_build = trainset.build_testset()
predictions_train = algo.test(trainset_build)
print('Training score ', end='   ')
sur.accuracy.rmse(predictions_train)

# Compute score on rated test set
testset = data_1.construct_testset(B_raw_ratings)  # testset is now the set B
predictions_test = algo.test(testset)
print('Test score (rated items) ', end=' ')
sur.accuracy.rmse(predictions_test)

# Compute score on unrated data
# The anti-test set is the part where we did not have any ratings
no_ratings = trainset.build_anti_testset()
predictions_no_ratings = algo.test(no_ratings)
print('Test score (unrated items) ', end='   ')
sur.accuracy.rmse(predictions_no_ratings, verbose=False)

Grid Search...
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Training score    RMSE: 0.7668
Test score (rated items)  RMSE: 0.9368
Test score (unrated items)    

0.7821219960658621

## Precision@k and Recall@k

Obtain  precision@k and recall@k following the [example](https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-compute-precision-k-and-recall-k).

In [35]:
from collections import defaultdict

def precision_recall_at_k(predictions, k=10, threshold=3.5):
    '''Return precision and recall at k metrics for each user.'''

    # First map the predictions to each user.
    user_est_true = defaultdict(list) # this defaultdict enables you to add a value corresponding to that called
    # key even if it's not there... here we've defined the default item will be a list
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))

    precisions = dict()
    recalls = dict()
    for uid, user_ratings in user_est_true.items():

        # Sort user ratings by estimated value
        user_ratings.sort(key=lambda x: x[0], reverse=True)

        # Number of relevant items
        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)

        # Number of recommended items in top k
        n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])

        
        # Number of relevant and recommended items in top k
        n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold))
                              for (est, true_r) in user_ratings[:k])

        # Precision@K: Proportion of recommended items that are relevant
        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 1

        # Recall@K: Proportion of relevant items that are recommended
        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 1

        # ie these are in line with normal Precision definition that we've seen before.
        # e.g... recall could be lower than precision depending on denominator, recall is largely just divided by 
        # 
        
    return precisions, recalls

In [36]:
precision_recall_at_k(predictions_test)

({932: 1.0,
  137: 0.875,
  406: 0.6,
  448: 0.0,
  592: 1.0,
  181: 0.0,
  889: 0.8,
  263: 0.7,
  404: 0.5,
  259: 0.6666666666666666,
  606: 0.8,
  325: 0.8,
  305: 0.8,
  22: 1.0,
  747: 1.0,
  85: 1.0,
  429: 0.8,
  269: 1.0,
  836: 0.0,
  357: 0.8571428571428571,
  234: 0.3,
  675: 0.5,
  557: 0.5,
  637: 1,
  223: 1,
  913: 0.8,
  718: 0.6666666666666666,
  58: 0.7,
  507: 0.9,
  23: 0.7,
  460: 0.6666666666666666,
  756: 1.0,
  709: 1.0,
  338: 0.8,
  363: 0.6666666666666666,
  788: 0.0,
  13: 0.8571428571428571,
  59: 1.0,
  66: 1,
  229: 1,
  111: 1.0,
  707: 0.8,
  711: 0.7,
  250: 1.0,
  296: 1.0,
  694: 1.0,
  154: 0.6,
  864: 1.0,
  54: 1.0,
  624: 0.8571428571428571,
  474: 0.9,
  559: 0.75,
  598: 1.0,
  214: 0.7,
  887: 0.8,
  683: 1.0,
  321: 0.6,
  713: 1.0,
  328: 0.875,
  758: 1.0,
  579: 0.5714285714285714,
  255: 1,
  311: 0.6,
  117: 1.0,
  313: 0.7,
  211: 1.0,
  466: 0.5,
  524: 0.7,
  564: 0.5,
  815: 0.6,
  903: 0.9,
  567: 0.8,
  500: 0.6666666666666666,
  

## Top-n predictions

Obtain the n top-ranked predictions for each user following the [example](https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user).

In [37]:
def get_top_n(predictions, n=10):
    '''Return the top-N recommendation for each user from a set of predictions.

    Args:
        predictions(list of Prediction objects): The list of predictions, as
            returned by the test method of an algorithm.
        n(int): The number of recommendation to output for each user. Default
            is 10.

    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of size n.
    '''

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [38]:
# Print the recommended items for each user
for uid, user_ratings in get_top_n(predictions_test).items():
    print(uid, [iid for (iid, _) in user_ratings])

932 [1449, 515, 488, 189, 56, 506, 530, 519, 615, 208]
137 [15, 79, 257, 222, 300, 476, 261, 235, 260]
406 [89, 513, 488, 144, 813, 504, 521, 28, 919, 505]
448 [316, 874, 321]
592 [408, 56, 174, 64, 482, 180, 185, 187, 9, 87]
181 [1341, 285, 129, 919, 242, 19, 1068, 137, 1356, 1198]
889 [511, 98, 1142, 659, 657, 475, 651, 654, 655, 615]
263 [183, 23, 588, 510, 177, 661, 86, 204, 237, 622]
404 [750, 270, 331, 678, 892, 901]
259 [179, 117, 475, 762, 781, 772, 748]
606 [172, 178, 56, 211, 11, 1039, 684, 282, 55, 222]
325 [474, 114, 483, 484, 498, 183, 199, 529, 185, 86]
305 [251, 302, 199, 709, 52, 462, 921, 176, 201, 151]
22 [172, 174, 511, 648, 79, 430, 201, 403, 184, 4]
747 [483, 172, 408, 427, 132, 320, 498, 488, 302, 135]
85 [511, 483, 64, 199, 705, 528, 98, 647, 478, 481]
429 [173, 484, 192, 124, 177, 936, 1, 166, 144, 203]
269 [483, 127, 357, 192, 56, 517, 659, 482, 705, 632]
836 [896]
357 [275, 283, 291, 405, 472, 926, 1047, 760]
234 [119, 136, 170, 9, 647, 589, 211, 166, 1460, 14

209 [50, 242, 906, 321, 271]
122 [513, 661, 1268, 127, 197, 727, 673, 1113]
217 [176, 210, 117, 50, 550, 27, 2, 679, 825, 761]
348 [117, 151, 294, 827]
508 [474, 735, 210, 228]
933 [98, 12, 357, 64, 654, 467, 4, 117, 636, 561]
165 [318, 500]
63 [1137, 50, 14, 79, 508, 328, 1010, 952, 294]
361 [100, 276, 83, 97, 176, 204, 1041, 155]
918 [659, 179, 165, 645, 462, 1099, 485, 419, 995]
326 [515, 172, 96, 657, 196, 9, 443, 608, 211, 705]
488 [318, 515, 173, 87, 134, 429, 434, 357, 208, 200]
398 [64, 199, 654, 127, 185, 79, 87, 633, 153, 504]
26 [316, 127, 475, 257, 304, 328, 288, 148]
162 [55, 237, 11]
175 [172]
330 [357, 205, 293, 82, 1084, 185, 208, 238, 568, 151]
471 [1, 393, 768, 878]
939 [742, 220, 508, 121, 890]
833 [320, 56, 200, 203, 183, 1019, 434, 188, 50, 58]
312 [114, 483, 134, 96, 481, 276, 144, 494, 507, 638]
538 [174, 317, 144, 213, 215, 137, 273]
183 [228, 241, 356, 230, 77, 1217]
509 [181, 310, 690]
742 [1, 284]
288 [515, 197, 50, 69, 216, 214]
528 [178, 168, 56, 410, 751, 

### Coverage

The coverage is the fraction of all available items which appear at least once in any recommended list of items.
Calculate the coverage of the top-ranked recommendations.

In [39]:
get_top_n(predictions_test).items()

dict_items([(932, [(1449, 5), (515, 4.686346690959688), (488, 4.515846557272105), (189, 4.479268769071656), (56, 4.45266888694004), (506, 4.3810405627177635), (530, 4.3736920291225285), (519, 4.329199652824656), (615, 4.320590702958526), (208, 4.3169586007199054)]), (137, [(15, 4.721292851871029), (79, 4.700616927156308), (257, 4.476351325136609), (222, 4.457442175770066), (300, 4.442912483755417), (476, 3.9848593154302865), (261, 3.580312888611908), (235, 3.5394705421805295), (260, 3.233213823888147)]), (406, [(89, 4.209697826886972), (513, 4.203858817379177), (488, 4.100068459814152), (144, 4.0971399540690205), (813, 3.9108258221179506), (504, 3.8204708810066172), (521, 3.8044241509534316), (28, 3.804334505217531), (919, 3.7757156471416935), (505, 3.768621200083772)]), (448, [(316, 4.209790385454004), (874, 3.1672630000222926), (321, 2.815610509995624)]), (592, [(408, 4.621234392802586), (56, 4.571575670664018), (174, 4.539087934521044), (64, 4.401524688825126), (482, 4.3532470743774

In [41]:
coverage_list = []

for uid, user_ratings in get_top_n(predictions_test).items():
    for (iid,est) in user_ratings:
        coverage_list.append(iid)

In [42]:
test_items = []
for uid, iid, true_r, est, _ in predictions_test:
        test_items.append(iid)

In [43]:
coverage = len(set(coverage_list)) / len(set(test_items))
coverage

0.7300564061240935