# Recommender Systems 2018/19

### Practice session on BPR-MF


## Recap on BPR
S.Rendle et al. BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI2009

The usual approach for item recommenders is to predict a personalized score $\hat{x}_{ui}$ for an item that reflects the preference of the user for the item. Then the items are ranked by sorting them according to that score.

Machine learning approaches are tipically fit by using observed items as a positive sample and missing ones for the negative class. A perfect model would thus be useless, as it would classify as negative (non-interesting) all the items that were non-observed at training time. The only reason why such methods work is regularization.

BPR use a different approach. The training dataset is composed by triplets $(u,i,j)$ representing that user u is assumed to prefer i over j. For an implicit dataset this means that u observed i but not j:
$$D_S := \{(u,i,j) \mid i \in I_u^+ \wedge j \in I \setminus I_u^+\}$$

### BPR-OPT
A machine learning model can be represented by a parameter vector $\Theta$ which is found at fitting time. BPR wants to find the parameter vector that is most probable given the desired, but latent, preference structure $>_u$:
$$p(\Theta \mid >_u) \propto p(>_u \mid \Theta)p(\Theta) $$
$$\prod_{u\in U} p(>_u \mid \Theta) = \dots = \prod_{(u,i,j) \in D_S} p(i >_u j \mid \Theta) $$

The probability that a user really prefers item $i$ to item $j$ is defined as:
$$ p(i >_u j \mid \Theta) := \sigma(\hat{x}_{uij}(\Theta)) $$
Where $\sigma$ represent the logistic sigmoid and $\hat{x}_{uij}(\Theta)$ is an arbitrary real-valued function of $\Theta$ (the output of your arbitrary model).


To complete the Bayesian setting, we define a prior density for the parameters:
$$p(\Theta) \sim N(0, \Sigma_\Theta)$$
And we can now formulate the maximum posterior estimator:
$$BPR-OPT := \log p(\Theta \mid >_u) $$
$$ = \log p(>_u \mid \Theta) p(\Theta) $$
$$ = \log \prod_{(u,i,j) \in D_S} \sigma(\hat{x}_{uij})p(\Theta) $$
$$ = \sum_{(u,i,j) \in D_S} \log \sigma(\hat{x}_{uij}) + \log p(\Theta) $$
$$ = \sum_{(u,i,j) \in D_S} \log \sigma(\hat{x}_{uij}) - \lambda_\Theta ||\Theta||^2 $$

Where $\lambda_\Theta$ are model specific regularization parameters.

### BPR learning algorithm
Once obtained the log-likelihood, we need to maximize it in order to find our obtimal $\Theta$. As the crierion is differentiable, gradient descent algorithms are an obvious choiche for maximization.

Gradient descent comes in many fashions, you can find an overview on my master thesis https://www.politesi.polimi.it/bitstream/10589/133864/3/tesi.pdf on pages 18-19-20 (I'm linking my thesis just because I'm sure of what it's written there, many posts you can find online contain some error). A nice post about momentum is available here https://distill.pub/2017/momentum/

The basic version of gradient descent consists in evaluating the gradient using all the available samples and then perform a single update. The problem with this is, in our case, that our training dataset is very skewed. Suppose an item i is very popular. Then we habe many terms of the form $\hat{x}_{uij}$ in the loss because for many users u the item i is compared against all negative items j.

The other popular approach is stochastic gradient descent, where for each training sample an update is performed. This is a better approach, but the order in which the samples are traversed is crucial. To solve this issue BPR uses a stochastic gradient descent algorithm that choses the triples randomly.

The gradient of BPR-OPT with respect to the model parameters is: 
$$\frac{\partial BPR-OPT}{\partial \Theta} = \sum_{(u,i,j) \in D_S} \frac{\partial}{\partial \Theta} \log \sigma (\hat{x}_{uij}) - \lambda_\Theta \frac{\partial}{\partial\Theta} || \Theta ||^2$$
$$ =  \sum_{(u,i,j) \in D_S} \frac{-e^{-\hat{x}_{uij}}}{1+e^{-\hat{x}_{uij}}} \frac{\partial}{\partial \Theta}\hat{x}_{uij} - \lambda_\Theta \Theta $$

### BPR-MF

In order to practically apply this learning schema to an existing algorithm, we first split the real valued preference term: $\hat{x}_{uij} := \hat{x}_{ui} − \hat{x}_{uj}$. And now we can apply any standard collaborative filtering model that predicts $\hat{x}_{ui}$.

The problem of predicting $\hat{x}_{ui}$ can be seen as the task of estimating a matrix $X:U×I$. With matrix factorization teh target matrix $X$ is approximated by the matrix product of two low-rank matrices $W:|U|\times k$ and $H:|I|\times k$:
$$X := WH^t$$
The prediction formula can also be written as:
$$\hat{x}_{ui} = \langle w_u,h_i \rangle = \sum_{f=1}^k w_{uf} \cdot h_{if}$$
Besides the dot product ⟨⋅,⋅⟩, in general any kernel can be used.

We can now specify the derivatives:
$$ \frac{\partial}{\partial \theta} \hat{x}_{uij} = \begin{cases}
(h_{if} - h_{jf}) \text{ if } \theta=w_{uf}, \\
w_{uf} \text{ if } \theta = h_{if}, \\
-w_{uf} \text{ if } \theta = h_{jf}, \\
0 \text{ else }
\end{cases} $$

Which basically means: user $u$ prefer $i$ over $j$, let's do the following:
- Increase the relevance (according to $u$) of features belonging to $i$ but not to $j$ and vice-versa
- Increase the relevance of features assigned to $i$
- Decrease the relevance of features assigned to $j$

We're now ready to look at some code!

In [1]:
from urllib.request import urlretrieve
import zipfile, os

# If file exists, skip the download
data_file_path = "data/Movielens_10M/"
data_file_name = data_file_path + "movielens_10m.zip"

# If directory does not exist, create
if not os.path.exists(data_file_path):
    os.makedirs(data_file_path)

if not os.path.exists(data_file_name):
    urlretrieve ("http://files.grouplens.org/datasets/movielens/ml-10m.zip", data_file_name)
    
dataFile = zipfile.ZipFile(data_file_name)
URM_path = dataFile.extract("ml-10M100K/ratings.dat", path="data/Movielens_10M")
URM_file = open(URM_path, 'r')


def rowSplit (rowString):
    
    split = rowString.split("::")
    split[3] = split[3].replace("\n","")
    
    split[0] = int(split[0])
    split[1] = int(split[1])
    split[2] = float(split[2])
    split[3] = int(split[3])
    
    result = tuple(split)
    
    return result


URM_file.seek(0)
URM_tuples = []

for line in URM_file:
   URM_tuples.append(rowSplit (line))

userList, itemList, ratingList, timestampList = zip(*URM_tuples)

userList = list(userList)
itemList = list(itemList)
ratingList = list(ratingList)
timestampList = list(timestampList)

import scipy.sparse as sps

URM_all = sps.coo_matrix((ratingList, (userList, itemList)))
URM_all = URM_all.tocsr()



from Notebooks_utils.data_splitter import train_test_holdout


URM_train, URM_test = train_test_holdout(URM_all, train_perc = 0.8)

### MF Computing prediction

### In a MF model you have two matrices, one with a row per user and the other with a column per item. The other dimension, columns for the first one and rows for the second one is called latent factors

In [2]:
num_factors = 10

n_users, n_items = URM_train.shape

In [3]:
import numpy as np

user_factors = np.random.random((n_users, num_factors))

item_factors = np.random.random((n_items, num_factors))

### To compute the prediction we have to muliply the user factors to the item factors

In [4]:
item_index = 15
user_index = 42

prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction is {:.2f}".format(prediction))

Prediction is 1.74


# Train a MF MSE model

### Use SGD as we saw for SLIM

In [5]:
test_data = 5
learning_rate = 1e-2
regularization = 1e-3

gradient = test_data - prediction

print("Prediction error is {:.2f}".format(gradient))

Prediction error is 3.26


In [6]:
# Copy original value to avoid messing up the updates
H_i = item_factors[item_index,:]
W_u = user_factors[user_index,:]

user_factors[user_index,:] += learning_rate * (gradient * H_i - regularization * W_u)
item_factors[item_index,:] += learning_rate * (gradient * W_u - regularization * H_i)


In [7]:
prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction after the update is {:.2f}".format(prediction))
print("Prediction error is {:.2f}".format(test_data - prediction))

Prediction after the update is 1.93
Prediction error is 3.07


### WARNING: Initialization must be done with random non-zero values ... otherwise

In [8]:
user_factors = np.zeros((n_users, num_factors))

item_factors = np.zeros((n_items, num_factors))

In [9]:
prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction is {:.2f}".format(prediction))

gradient = test_data - prediction

print("Prediction error is {:.2f}".format(gradient))

Prediction is 0.00
Prediction error is 5.00


In [10]:
H_i = item_factors[item_index,:]
W_u = user_factors[user_index,:]

user_factors[user_index,:] += learning_rate * (gradient * H_i - regularization * W_u)
item_factors[item_index,:] += learning_rate * (gradient * W_u - regularization * H_i)


In [11]:
prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction after the update is {:.2f}".format(prediction))
print("Prediction error is {:.2f}".format(test_data - prediction))

Prediction after the update is 0.00
Prediction error is 5.00


### Since the updates multiply the gradient and the latent factors, if those are zero the SGD will never be able to move from that point

# Train a MF BPR model

## The basics are the same, except for how we compute the gradient, we have to sample a triplet

In [12]:
URM_mask = URM_train.copy()
URM_mask.data[URM_mask.data <= 3] = 0

URM_mask.eliminate_zeros()

# Extract users having at least one interaction to choose from
eligibleUsers = []

for user_id in range(n_users):

    start_pos = URM_mask.indptr[user_id]
    end_pos = URM_mask.indptr[user_id+1]

    if len(URM_mask.indices[start_pos:end_pos]) > 0:
        eligibleUsers.append(user_id)
                
                

def sampleTriplet():
    
    # By randomly selecting a user in this way we could end up 
    # with a user with no interactions
    #user_id = np.random.randint(0, n_users)
    
    user_id = np.random.choice(eligibleUsers)
    
    # Get user seen items and choose one
    userSeenItems = URM_mask[user_id,:].indices
    pos_item_id = np.random.choice(userSeenItems)

    negItemSelected = False

    # It's faster to just try again then to build a mapping of the non-seen items
    while (not negItemSelected):
        neg_item_id = np.random.randint(0, n_items)

        if (neg_item_id not in userSeenItems):
            
            negItemSelected = True

    return user_id, pos_item_id, neg_item_id


In [13]:
for _ in range(10):
    print(sampleTriplet())

(70735, 1748, 21851)
(29107, 5995, 13801)
(947, 2406, 10918)
(19871, 457, 54234)
(45728, 1318, 52817)
(36838, 342, 33240)
(5071, 165, 5468)
(5779, 2280, 64413)
(17119, 11, 43205)
(48172, 2571, 10322)


In [14]:
user_factors = np.random.random((n_users, num_factors))
item_factors = np.random.random((n_items, num_factors))

In [15]:
user_id, positive_item, negative_item = sampleTriplet()

print(user_id, positive_item, negative_item)

21831 508 51822


In [16]:
x_uij = np.dot(user_factors[user_id, :], (item_factors[positive_item,:] - item_factors[negative_item,:]))

x_uij

1.7264682977647943

In [17]:
sigmoid_item = 1 / (1 + np.exp(x_uij))

sigmoid_item

0.15103988084996914

### When using BPR we have to update three components, the user factors and the item factors of both the positive and negative item

In [18]:

H_i = item_factors[positive_item,:]
H_j = item_factors[negative_item,:]
W_u = user_factors[user_id,:]


user_factors[user_index,:] += learning_rate * (sigmoid_item * ( H_i - H_j ) - regularization * W_u)
item_factors[positive_item,:] += learning_rate * (sigmoid_item * ( W_u ) - regularization * H_i)
item_factors[negative_item,:] += learning_rate * (sigmoid_item * (-W_u ) - regularization * H_j)


In [19]:
x_uij = np.dot(user_factors[user_id, :], (item_factors[positive_item,:] - item_factors[negative_item,:]))

x_uij

1.738394989175958

In [20]:
## How to rank items with MF ?

## Compute the prediction for all items and rank them

item_scores = np.dot(user_factors[user_index,:], item_factors.T)
item_scores

array([1.64238   , 1.33704803, 1.91010916, ..., 2.51305353, 2.08655125,
       2.68833044])

In [21]:
item_scores.shape

(65134,)

## Early stopping, how to used and when it is needed

### Problem, how many epochs? 5, 10, 150, 2487 ?

### We could try different values in increasing order: 5, 10, 15, 20, 25...
### However, in this way we would train up to a point, test and then discard the model, to re-train it again up to that same point and then some more... not a good idea.

### Early stopping! 
* Train the model up to a certain number of epochs, say 5
* Compute the recommendation quality on the validation set
* Train for other 5 epochs
* Compute the recommendation quality on the validation set AND compare it with the previous one. If better, then we have another best model, if not, go ahead...
* Repeat until you have either reached the max number of epoch you want to allow (e.g., 300) or a certain number of contiguous validation seps have not updated te best model

### Advantages:
* Easy to implement, we already have all that is required, a train function, a predictor function and an evaluator
* MUCH faster than retraining everything from the beginning
* Often allows to reach even better solutions

### Challenges:
* The evaluation step may be very slow compared to the time it takes to re-train the model

# Train a PureSVD model

### As opposed to the previous ones, PureSVD relies on the SVD decomposition of the URM, which is an easily available function

In [22]:
from sklearn.utils.extmath import randomized_svd

# Other SVDs are also available, like from sklearn.decomposition import TruncatedSVD

In [23]:
U, Sigma, VT = randomized_svd(URM_train,
              n_components=num_factors,
              #n_iter=5,
              random_state=None)

In [24]:
U

array([[-1.06361265e-22,  8.40935702e-17,  8.50010407e-16, ...,
         2.45254722e-15,  9.48277371e-16, -3.54568146e-15],
       [ 8.59745974e-04, -3.36843680e-03, -7.90044382e-04, ...,
        -8.81207236e-04, -2.16693248e-03, -1.19207403e-04],
       [ 5.70997490e-04, -1.39024014e-03, -2.56976795e-04, ...,
         2.64430812e-03,  1.96766510e-03,  8.26974282e-04],
       ...,
       [ 3.13594599e-03,  1.55684804e-03,  5.38902703e-03, ...,
        -1.42657566e-03,  3.00583309e-03,  2.23597019e-03],
       [ 1.22178385e-03, -5.26678849e-03,  1.19623250e-03, ...,
        -7.49036248e-05, -1.69857146e-03,  2.56664448e-03],
       [ 1.23230217e-03, -2.68338232e-04, -6.32760148e-04, ...,
         2.45695255e-03, -1.34372812e-03,  4.13393456e-03]])

In [25]:
U.shape

(71568, 10)

In [26]:
Sigma

array([4274.95500448, 1783.67004383, 1533.9859579 , 1227.92689579,
       1184.72071643, 1013.99745833,  960.27315676,  908.38172069,
        843.04010687,  745.66796854])

In [27]:
Sigma.shape

(10,)

In [28]:
VT

array([[ 1.91792537e-22,  8.02497822e-02,  3.44842715e-02, ...,
         0.00000000e+00,  0.00000000e+00,  4.35614245e-05],
       [-5.20176601e-16, -4.52061362e-02, -5.03095688e-02, ...,
        -0.00000000e+00, -0.00000000e+00,  7.09105665e-05],
       [-9.55039973e-16, -1.14378476e-02, -2.11044482e-02, ...,
         0.00000000e+00,  0.00000000e+00,  2.71196832e-06],
       ...,
       [ 2.78140196e-16,  1.44694433e-01,  2.48286022e-02, ...,
         0.00000000e+00,  0.00000000e+00,  1.38541899e-04],
       [ 1.57614670e-16, -3.80283247e-02, -2.96388908e-02, ...,
         0.00000000e+00,  0.00000000e+00,  7.10929324e-05],
       [ 1.27757327e-16, -9.59188228e-03, -1.39591606e-03, ...,
        -0.00000000e+00, -0.00000000e+00,  4.09184806e-05]])

In [29]:
VT.shape

(10, 65134)

### Truncating the number of singular values introduces an approximation which allows to fill the missing urm entries

### Computing a prediction

In [30]:
# Store an intermediate pre-multiplied matrix

s_Vt = sps.diags(Sigma)*VT

In [31]:
prediction = U[user_index, :].dot(s_Vt[:,item_index])

print("Prediction is {:.2f}".format(prediction))

Prediction is 0.03


In [32]:
item_scores = U[user_index, :].dot(s_Vt)
item_scores

array([-1.68597132e-16,  7.24899190e-01,  4.42185258e-01, ...,
        0.00000000e+00,  0.00000000e+00,  2.67442759e-04])

In [33]:
item_scores.shape

(65134,)

# Let's compare the three MF: BPR, FunkSVD and PureSVD

In [34]:
from MatrixFactorization.Cython.MatrixFactorization_Cython import MatrixFactorization_BPR_Cython, MatrixFactorization_FunkSVD_Cython
from MatrixFactorization.PureSVDRecommender import PureSVDRecommender

from Base.Evaluation.Evaluator import EvaluatorHoldout

evaluator_test = EvaluatorHoldout(URM_test, cutoff_list=[5])

evaluator_validation_early_stopping = EvaluatorHoldout(URM_train, cutoff_list=[5], exclude_seen = False)


In [35]:
recommender = MatrixFactorization_BPR_Cython(URM_train)
recommender.fit(num_factors = 50, 
                validation_every_n = 10, 
                stop_on_validation = True, 
                evaluator_object = evaluator_validation_early_stopping,
                lower_validations_allowed = 5, 
                validation_metric = "MAP")

result_dict, _ = evaluator_test.evaluateRecommender(recommender)

MatrixFactorization_BPR_Cython_Recommender: URM Detected 1690 (2.36 %) cold users.
MatrixFactorization_BPR_Cython_Recommender: URM Detected 54481 (83.64 %) cold items.
MF_BPR: Processed 71000 ( 98.61% ) in 0.53 seconds. BPR loss 1.00E-02. Sample per second: 134836
MF_BPR: Epoch 1 of 300. Elapsed time 0.40 sec
MF_BPR: Processed 71000 ( 98.61% ) in 0.83 seconds. BPR loss 1.00E-02. Sample per second: 85770
MF_BPR: Epoch 2 of 300. Elapsed time 0.71 sec
MF_BPR: Processed 71000 ( 98.61% ) in 1.13 seconds. BPR loss 1.01E-02. Sample per second: 62830
MF_BPR: Epoch 3 of 300. Elapsed time 1.00 sec
MF_BPR: Processed 71000 ( 98.61% ) in 0.40 seconds. BPR loss 9.99E-03. Sample per second: 176945
MF_BPR: Epoch 4 of 300. Elapsed time 1.27 sec
MF_BPR: Processed 71000 ( 98.61% ) in 0.67 seconds. BPR loss 1.00E-02. Sample per second: 105906
MF_BPR: Epoch 5 of 300. Elapsed time 1.54 sec
MF_BPR: Processed 71000 ( 98.61% ) in 0.92 seconds. BPR loss 1.01E-02. Sample per second: 76887
MF_BPR: Epoch 6 of 300.

MF_BPR: Epoch 39 of 300. Elapsed time 6.36 min
MF_BPR: Processed 71000 ( 98.61% ) in 0.95 seconds. BPR loss 1.00E-02. Sample per second: 74528
MF_BPR: Validation begins...
EvaluatorHoldout: Processed 17001 ( 24.33% ) in 31.07 sec. Users per second: 547
EvaluatorHoldout: Processed 33001 ( 47.23% ) in 1.02 min. Users per second: 540
EvaluatorHoldout: Processed 50001 ( 71.55% ) in 1.52 min. Users per second: 549
EvaluatorHoldout: Processed 68001 ( 97.31% ) in 2.04 min. Users per second: 556
EvaluatorHoldout: Processed 69878 ( 100.00% ) in 2.07 min. Users per second: 563
MF_BPR: CUTOFF: 5 - ROC_AUC: 0.0043767, PRECISION: 0.0017316, PRECISION_RECALL_MIN_DEN: 0.0017316, RECALL: 0.0000697, MAP: 0.0008087, MRR: 0.0040234, NDCG: 0.0001492, F1: 0.0001340, HIT_RATE: 0.0086579, ARHR: 0.0040335, RMSE: 3.7473011, NOVELTY: 0.0002032, AVERAGE_POPULARITY: 0.0041145, DIVERSITY_MEAN_INTER_LIST: 0.9996262, DIVERSITY_HERFINDAHL: 0.9999224, COVERAGE_ITEM: 0.6478951, COVERAGE_USER: 0.9763861, DIVERSITY_GINI:

In [36]:
result_dict

{5: {'ROC_AUC': 0.0010923286297543156,
  'PRECISION': 0.0004412291383138733,
  'PRECISION_RECALL_MIN_DEN': 0.0004457655850822519,
  'RECALL': 8.133819467011618e-05,
  'MAP': 0.00019785274852963388,
  'MRR': 0.0009834539073132309,
  'NDCG': 0.00010500043733765248,
  'F1': 0.00013735562589175856,
  'HIT_RATE': 0.0022061456915693717,
  'ARHR': 0.0009834539073132309,
  'RMSE': 3.7381009608792857,
  'NOVELTY': 0.00020195889984728468,
  'AVERAGE_POPULARITY': 0.003769269591917909,
  'DIVERSITY_MEAN_INTER_LIST': 0.9996264277562489,
  'DIVERSITY_HERFINDAHL': 0.9999224214973059,
  'COVERAGE_ITEM': 0.6478490496514877,
  'COVERAGE_USER': 0.9753660854012967,
  'DIVERSITY_GINI': 0.4047776492495892,
  'SHANNON_ENTROPY': 14.413630172798793}}

In [37]:
recommender = MatrixFactorization_FunkSVD_Cython(URM_train)
recommender.fit(num_factors = 50, 
                validation_every_n = 10, 
                stop_on_validation = True, 
                evaluator_object = evaluator_validation_early_stopping,
                lower_validations_allowed = 5, 
                validation_metric = "MAP")

result_dict, _ = evaluator_test.evaluateRecommender(recommender)

MatrixFactorization_FunkSVD_Cython_Recommender: URM Detected 1690 (2.36 %) cold users.
MatrixFactorization_FunkSVD_Cython_Recommender: URM Detected 54481 (83.64 %) cold items.
FUNK_SVD: Processed 8000000 ( 99.99% ) in 18.63 seconds. MSE loss 1.96E+00. Sample per second: 429422
FUNK_SVD: Epoch 1 of 300. Elapsed time 18.30 sec
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.39 seconds. MSE loss 1.14E+00. Sample per second: 460158
FUNK_SVD: Epoch 2 of 300. Elapsed time 35.05 sec
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.12 seconds. MSE loss 1.13E+00. Sample per second: 467176
FUNK_SVD: Epoch 3 of 300. Elapsed time 51.79 sec
FUNK_SVD: Processed 8000000 ( 99.99% ) in 16.98 seconds. MSE loss 1.13E+00. Sample per second: 471184
FUNK_SVD: Epoch 4 of 300. Elapsed time 1.14 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.68 seconds. MSE loss 1.13E+00. Sample per second: 452580
FUNK_SVD: Epoch 5 of 300. Elapsed time 1.42 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.37 seconds. MSE loss 1.12E

FUNK_SVD: Epoch 35 of 300. Elapsed time 16.92 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.18 seconds. MSE loss 1.07E+00. Sample per second: 465611
FUNK_SVD: Epoch 36 of 300. Elapsed time 17.20 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 16.75 seconds. MSE loss 1.07E+00. Sample per second: 477488
FUNK_SVD: Epoch 37 of 300. Elapsed time 17.47 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.38 seconds. MSE loss 1.07E+00. Sample per second: 460387
FUNK_SVD: Epoch 38 of 300. Elapsed time 17.75 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 16.92 seconds. MSE loss 1.06E+00. Sample per second: 472691
FUNK_SVD: Epoch 39 of 300. Elapsed time 18.03 min
FUNK_SVD: Processed 8000000 ( 99.99% ) in 17.60 seconds. MSE loss 1.06E+00. Sample per second: 454646
FUNK_SVD: Validation begins...
EvaluatorHoldout: Processed 14001 ( 20.04% ) in 30.24 sec. Users per second: 463
EvaluatorHoldout: Processed 29001 ( 41.50% ) in 1.01 min. Users per second: 479
EvaluatorHoldout: Processed 44001 ( 62.97% ) in 

EvaluatorHoldout: Processed 44001 ( 62.97% ) in 1.51 min. Users per second: 486
EvaluatorHoldout: Processed 59001 ( 84.43% ) in 2.01 min. Users per second: 488
EvaluatorHoldout: Processed 69878 ( 100.00% ) in 2.35 min. Users per second: 495
FUNK_SVD: CUTOFF: 5 - ROC_AUC: 0.3765849, PRECISION: 0.3058101, PRECISION_RECALL_MIN_DEN: 0.3058101, RECALL: 0.0224756, MAP: 0.2209573, MRR: 0.4764926, NDCG: 0.0672830, F1: 0.0418737, HIT_RATE: 1.5290506, ARHR: 0.7273047, RMSE: 0.9880807, NOVELTY: 0.0006580, AVERAGE_POPULARITY: 0.7635722, DIVERSITY_MEAN_INTER_LIST: 0.2304001, DIVERSITY_HERFINDAHL: 0.8460794, COVERAGE_ITEM: 0.0005374, COVERAGE_USER: 0.9763861, DIVERSITY_GINI: 0.1995526, SHANNON_ENTROPY: 2.9767800, 

FUNK_SVD: Convergence reached! Terminating at epoch 70. Best value for 'MAP' at epoch 20 is 0.2391. Elapsed time 36.13 min
FUNK_SVD: Epoch 70 of 300. Elapsed time 36.13 min
EvaluatorHoldout: Processed 14929 ( 21.39% ) in 30.00 sec. Users per second: 498
EvaluatorHoldout: Processed 30001 (

In [38]:
recommender = PureSVDRecommender(URM_train)
recommender.fit()

result_dict, _ = evaluator_test.evaluateRecommender(recommender)

PureSVDRecommender: URM Detected 1690 (2.36 %) cold users.
PureSVDRecommender: URM Detected 54481 (83.64 %) cold items.
PureSVDRecommender: Computing SVD decomposition...
PureSVDRecommender: Computing SVD decomposition... Done!
EvaluatorHoldout: Processed 23001 ( 32.95% ) in 30.32 sec. Users per second: 759
EvaluatorHoldout: Processed 48406 ( 69.34% ) in 1.01 min. Users per second: 802
EvaluatorHoldout: Processed 69805 ( 100.00% ) in 1.43 min. Users per second: 815
