# Recommender Systems 2018/19

### Practice session on BPR-MF


## Recap on BPR
S.Rendle et al. BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI2009

The usual approach for item recommenders is to predict a personalized score $\hat{x}_{ui}$ for an item that reflects the preference of the user for the item. Then the items are ranked by sorting them according to that score.

Machine learning approaches are tipically fit by using observed items as a positive sample and missing ones for the negative class. A perfect model would thus be useless, as it would classify as negative (non-interesting) all the items that were non-observed at training time. The only reason why such methods work is regularization.

BPR use a different approach. The training dataset is composed by triplets $(u,i,j)$ representing that user u is assumed to prefer i over j. For an implicit dataset this means that u observed i but not j:
$$D_S := \{(u,i,j) \mid i \in I_u^+ \wedge j \in I \setminus I_u^+\}$$

### BPR-OPT
A machine learning model can be represented by a parameter vector $\Theta$ which is found at fitting time. BPR wants to find the parameter vector that is most probable given the desired, but latent, preference structure $>_u$:
$$p(\Theta \mid >_u) \propto p(>_u \mid \Theta)p(\Theta) $$
$$\prod_{u\in U} p(>_u \mid \Theta) = \dots = \prod_{(u,i,j) \in D_S} p(i >_u j \mid \Theta) $$

The probability that a user really prefers item $i$ to item $j$ is defined as:
$$ p(i >_u j \mid \Theta) := \sigma(\hat{x}_{uij}(\Theta)) $$
Where $\sigma$ represent the logistic sigmoid and $\hat{x}_{uij}(\Theta)$ is an arbitrary real-valued function of $\Theta$ (the output of your arbitrary model).


To complete the Bayesian setting, we define a prior density for the parameters:
$$p(\Theta) \sim N(0, \Sigma_\Theta)$$
And we can now formulate the maximum posterior estimator:
$$BPR-OPT := \log p(\Theta \mid >_u) $$
$$ = \log p(>_u \mid \Theta) p(\Theta) $$
$$ = \log \prod_{(u,i,j) \in D_S} \sigma(\hat{x}_{uij})p(\Theta) $$
$$ = \sum_{(u,i,j) \in D_S} \log \sigma(\hat{x}_{uij}) + \log p(\Theta) $$
$$ = \sum_{(u,i,j) \in D_S} \log \sigma(\hat{x}_{uij}) - \lambda_\Theta ||\Theta||^2 $$

Where $\lambda_\Theta$ are model specific regularization parameters.

### BPR learning algorithm
Once obtained the log-likelihood, we need to maximize it in order to find our obtimal $\Theta$. As the crierion is differentiable, gradient descent algorithms are an obvious choiche for maximization.

Gradient descent comes in many fashions, you can find an overview on my master thesis https://www.politesi.polimi.it/bitstream/10589/133864/3/tesi.pdf on pages 18-19-20 (I'm linking my thesis just because I'm sure of what it's written there, many posts you can find online contain some error). A nice post about momentum is available here https://distill.pub/2017/momentum/

The basic version of gradient descent consists in evaluating the gradient using all the available samples and then perform a single update. The problem with this is, in our case, that our training dataset is very skewed. Suppose an item i is very popular. Then we habe many terms of the form $\hat{x}_{uij}$ in the loss because for many users u the item i is compared against all negative items j.

The other popular approach is stochastic gradient descent, where for each training sample an update is performed. This is a better approach, but the order in which the samples are traversed is crucial. To solve this issue BPR uses a stochastic gradient descent algorithm that choses the triples randomly.

The gradient of BPR-OPT with respect to the model parameters is: 
$$\frac{\partial BPR-OPT}{\partial \Theta} = \sum_{(u,i,j) \in D_S} \frac{\partial}{\partial \Theta} \log \sigma (\hat{x}_{uij}) - \lambda_\Theta \frac{\partial}{\partial\Theta} || \Theta ||^2$$
$$ =  \sum_{(u,i,j) \in D_S} \frac{-e^{-\hat{x}_{uij}}}{1+e^{-\hat{x}_{uij}}} \frac{\partial}{\partial \Theta}\hat{x}_{uij} - \lambda_\Theta \Theta $$

### BPR-MF

In order to practically apply this learning schema to an existing algorithm, we first split the real valued preference term: $\hat{x}_{uij} := \hat{x}_{ui} − \hat{x}_{uj}$. And now we can apply any standard collaborative filtering model that predicts $\hat{x}_{ui}$.

The problem of predicting $\hat{x}_{ui}$ can be seen as the task of estimating a matrix $X:U×I$. With matrix factorization teh target matrix $X$ is approximated by the matrix product of two low-rank matrices $W:|U|\times k$ and $H:|I|\times k$:
$$X := WH^t$$
The prediction formula can also be written as:
$$\hat{x}_{ui} = \langle w_u,h_i \rangle = \sum_{f=1}^k w_{uf} \cdot h_{if}$$
Besides the dot product ⟨⋅,⋅⟩, in general any kernel can be used.

We can now specify the derivatives:
$$ \frac{\partial}{\partial \theta} \hat{x}_{uij} = \begin{cases}
(h_{if} - h_{jf}) \text{ if } \theta=w_{uf}, \\
w_{uf} \text{ if } \theta = h_{if}, \\
-w_{uf} \text{ if } \theta = h_{jf}, \\
0 \text{ else }
\end{cases} $$

Which basically means: user $u$ prefer $i$ over $j$, let's do the following:
- Increase the relevance (according to $u$) of features belonging to $i$ but not to $j$ and vice-versa
- Increase the relevance of features assigned to $i$
- Decrease the relevance of features assigned to $j$

We're now ready to look at some code!

In [1]:
from urllib.request import urlretrieve
import zipfile

# skip the download
#urlretrieve ("http://files.grouplens.org/datasets/movielens/ml-10m.zip", "data/Movielens_10M/movielens_10m.zip")
dataFile = zipfile.ZipFile("data/Movielens_10M/movielens_10m.zip")
URM_path = dataFile.extract("ml-10M100K/ratings.dat", path = "data/Movielens_10M")
URM_file = open(URM_path, 'r')


def rowSplit (rowString):
    
    split = rowString.split("::")
    split[3] = split[3].replace("\n","")
    
    split[0] = int(split[0])
    split[1] = int(split[1])
    split[2] = float(split[2])
    split[3] = int(split[3])
    
    result = tuple(split)
    
    return result


URM_file.seek(0)
URM_tuples = []

for line in URM_file:
   URM_tuples.append(rowSplit (line))

userList, itemList, ratingList, timestampList = zip(*URM_tuples)

userList = list(userList)
itemList = list(itemList)
ratingList = list(ratingList)
timestampList = list(timestampList)

import scipy.sparse as sps

URM_all = sps.coo_matrix((ratingList, (userList, itemList)))
URM_all = URM_all.tocsr()



from Notebooks_utils.data_splitter import train_test_holdout


URM_train, URM_test = train_test_holdout(URM_all, train_perc = 0.8)

### MF Computing prediction

### In a MF model you have two matrices, one with a row per user and the other with a column per item. The other dimension, columns for the first one and rows for the second one is called latent factors

In [2]:
num_factors = 10

n_users, n_items = URM_train.shape

In [3]:
import numpy as np

user_factors = np.random.random((n_users, num_factors))

item_factors = np.random.random((n_items, num_factors))

### To compute the prediction we have to muliply the user factors to the item factors

In [4]:
item_index = 15
user_index = 42

prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction is {:.2f}".format(prediction))

Prediction is 2.48


# Train a MF MSE model

### Use SGD as we saw for SLIM

In [5]:
test_data = 5
learning_rate = 1e-2
regularization = 1e-3

gradient = test_data - prediction

print("Prediction error is {:.2f}".format(gradient))

Prediction error is 2.52


In [6]:
# Copy original value to avoid messing up the updates
H_i = item_factors[item_index,:]
W_u = user_factors[user_index,:]

user_factors[user_index,:] += learning_rate * (gradient * H_i - regularization * W_u)
item_factors[item_index,:] += learning_rate * (gradient * W_u - regularization * H_i)


In [7]:
prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction after the update is {:.2f}".format(prediction))
print("Prediction error is {:.2f}".format(test_data - prediction))

Prediction after the update is 2.66
Prediction error is 2.34


### WARNING: Initialization must be done with random non-zero values ... otherwise

In [8]:
user_factors = np.zeros((n_users, num_factors))

item_factors = np.zeros((n_items, num_factors))

In [9]:
prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction is {:.2f}".format(prediction))

gradient = test_data - prediction

print("Prediction error is {:.2f}".format(gradient))

Prediction is 0.00
Prediction error is 5.00


In [10]:
H_i = item_factors[item_index,:]
W_u = user_factors[user_index,:]

user_factors[user_index,:] += learning_rate * (gradient * H_i - regularization * W_u)
item_factors[item_index,:] += learning_rate * (gradient * W_u - regularization * H_i)


In [11]:
prediction = np.dot(user_factors[user_index,:], item_factors[item_index,:])

print("Prediction after the update is {:.2f}".format(prediction))
print("Prediction error is {:.2f}".format(test_data - prediction))

Prediction after the update is 0.00
Prediction error is 5.00


### Since the updates multiply the gradient and the latent factors, if those are zero the SGD will never be able to move from that point

# Train a MF BPR model

## The basics are the same, except for how we compute the gradient, we have to sample a triplet

In [12]:
URM_mask = URM_train.copy()
URM_mask.data[URM_mask.data <= 3] = 0

URM_mask.eliminate_zeros()

# Extract users having at least one interaction to choose from
eligibleUsers = []

for user_id in range(n_users):

    start_pos = URM_mask.indptr[user_id]
    end_pos = URM_mask.indptr[user_id+1]

    if len(URM_mask.indices[start_pos:end_pos]) > 0:
        eligibleUsers.append(user_id)
                
                

def sampleTriplet():
    
    # By randomly selecting a user in this way we could end up 
    # with a user with no interactions
    #user_id = np.random.randint(0, n_users)
    
    user_id = np.random.choice(eligibleUsers)
    
    # Get user seen items and choose one
    userSeenItems = URM_mask[user_id,:].indices
    pos_item_id = np.random.choice(userSeenItems)

    negItemSelected = False

    # It's faster to just try again then to build a mapping of the non-seen items
    while (not negItemSelected):
        neg_item_id = np.random.randint(0, n_items)

        if (neg_item_id not in userSeenItems):
            
            negItemSelected = True

    return user_id, pos_item_id, neg_item_id


In [13]:
for _ in range(10):
    print(sampleTriplet())

(66684, 1393, 45864)
(59643, 6502, 2062)
(62891, 47, 43248)
(37105, 349, 32445)
(47279, 3735, 45120)
(70494, 2328, 51696)
(11491, 780, 35812)
(13572, 1320, 42365)
(30479, 457, 44214)
(40397, 3247, 32260)


In [14]:
user_factors = np.random.random((n_users, num_factors))
item_factors = np.random.random((n_items, num_factors))

In [15]:
user_id, positive_item, negative_item = sampleTriplet()

print(user_id, positive_item, negative_item)

28598 2268 21308


In [16]:
x_uij = np.dot(user_factors[user_id, :], (item_factors[positive_item,:] - item_factors[negative_item,:]))

x_uij

-0.42732239823975204

In [17]:
sigmoid_item = 1 / (1 + np.exp(x_uij))

sigmoid_item

0.6052341009282085

### When using BPR we have to update three components, the user factors and the item factors of both the positive and negative item

In [18]:

H_i = item_factors[positive_item,:]
H_j = item_factors[negative_item,:]
W_u = user_factors[user_id,:]


user_factors[user_index,:] += learning_rate * (sigmoid_item * ( H_i - H_j ) - regularization * W_u)
item_factors[positive_item,:] += learning_rate * (sigmoid_item * ( W_u ) - regularization * H_i)
item_factors[negative_item,:] += learning_rate * (sigmoid_item * (-W_u ) - regularization * H_j)


In [19]:
x_uij = np.dot(user_factors[user_id, :], (item_factors[positive_item,:] - item_factors[negative_item,:]))

x_uij

-0.3850862149057356

In [20]:
## How to rank items with MF ?

## Compute the prediction for all items and rank them

item_scores = np.dot(user_factors[user_index,:], item_factors.T)
item_scores

array([2.59440076, 3.1408176 , 2.11425215, ..., 3.2081935 , 2.30431551,
       2.43617798])

In [21]:
item_scores.shape

(65134,)

## Early stopping, how to used and when it is needed

### Problem, how many epochs? 5, 10, 150, 2487 ?

### We could try different values in increasing order: 5, 10, 15, 20, 25...
### However, in this way we would train up to a point, test and then discard the model, to re-train it again up to that same point and then some more... not a good idea.

### Early stopping! 
* Train the model up to a certain number of epochs, say 5
* Compute the recommendation quality on the validation set
* Train for other 5 epochs
* Compute the recommendation quality on the validation set AND compare it with the previous one. If better, then we have another best model, if not, go ahead...
* Repeat until you have either reached the max number of epoch you want to allow (e.g., 300) or a certain number of contiguous validation seps have not updated te best model

### Advantages:
* Easy to implement, we already have all that is required, a train function, a predictor function and an evaluator
* MUCH faster than retraining everything from the beginning
* Often allows to reach even better solutions

### Challenges:
* The evaluation step may be very slow compared to the time it takes to re-train the model

# Train a PureSVD model

### As opposed to the previous ones, PureSVD relies on the SVD decomposition of the URM, which is an easily available function

In [22]:
from sklearn.utils.extmath import randomized_svd

# Other SVDs are also available, like from sklearn.decomposition import TruncatedSVD

In [23]:
U, Sigma, VT = randomized_svd(URM_train,
              n_components=num_factors,
              #n_iter=5,
              random_state=None)

In [24]:
U

array([[-3.62860248e-21, -2.86908742e-15, -1.70833465e-15, ...,
         6.05353976e-15,  8.17980280e-15, -1.27965256e-14],
       [ 7.73702654e-04, -3.70932439e-03, -6.30083625e-04, ...,
         5.37827209e-04,  3.79869107e-03, -5.85808191e-04],
       [ 2.20534047e-04, -1.49250322e-04,  7.76749694e-05, ...,
         1.80315491e-03, -1.02520131e-03,  7.04484079e-06],
       ...,
       [ 3.29725752e-03,  1.28308257e-03,  4.60025465e-03, ...,
        -3.67094234e-03, -2.49380474e-03,  1.05406244e-03],
       [ 1.58065632e-03, -5.67907694e-03,  1.54425631e-03, ...,
         2.75470454e-04,  2.58417748e-03,  2.47037630e-03],
       [ 1.28258378e-03, -1.69028440e-03, -1.27537791e-03, ...,
         1.56111840e-04, -7.94927193e-04,  6.24569521e-03]])

In [25]:
U.shape

(71568, 10)

In [26]:
Sigma

array([2679.32033857, 1135.51634312,  970.4556902 ,  792.87302934,
        758.07221025,  658.67485515,  625.66690723,  599.51880552,
        554.71522957,  496.27595451])

In [27]:
Sigma.shape

(10,)

In [28]:
VT

array([[ 1.44524003e-23,  8.11106148e-02,  3.46733422e-02, ...,
         0.00000000e+00,  0.00000000e+00,  5.31624403e-05],
       [ 1.75810964e-16, -4.62293928e-02, -4.78348455e-02, ...,
        -0.00000000e+00, -0.00000000e+00,  8.08586300e-05],
       [ 7.11726351e-16, -1.42087867e-02, -2.08641748e-02, ...,
        -0.00000000e+00, -0.00000000e+00,  6.70975470e-05],
       ...,
       [-5.45842487e-17,  1.67520428e-01,  3.05554064e-02, ...,
         0.00000000e+00,  0.00000000e+00,  1.22372329e-04],
       [ 8.52557073e-17,  6.38527730e-02,  2.77012749e-02, ...,
         0.00000000e+00,  0.00000000e+00, -2.12571017e-04],
       [-1.39226914e-16, -6.78881547e-02, -7.75174561e-03, ...,
        -0.00000000e+00, -0.00000000e+00,  6.04441228e-06]])

In [29]:
VT.shape

(10, 65134)

### Truncating the number of singular values introduces an approximation which allows to fill the missing urm entries

### Computing a prediction

In [30]:
# Store an intermediate pre-multiplied matrix

s_Vt = sps.diags(Sigma)*VT

In [31]:
prediction = U[user_index, :].dot(s_Vt[:,item_index])

print("Prediction is {:.2f}".format(prediction))

Prediction is 0.03


In [32]:
item_scores = U[user_index, :].dot(s_Vt)
item_scores

array([-6.24023584e-16,  5.68159760e-01,  3.21676239e-01, ...,
        0.00000000e+00,  0.00000000e+00,  2.68767896e-04])

In [33]:
item_scores.shape

(65134,)

# Let's compare the three MF: BPR, FunkSVD and PureSVD

In [34]:
from MatrixFactorization.Cython.MatrixFactorization_Cython import MatrixFactorization_BPR_Cython, MatrixFactorization_FunkSVD_Cython
from MatrixFactorization.PureSVD import PureSVDRecommender

from Base.Evaluation.Evaluator import SequentialEvaluator

evaluator_test = SequentialEvaluator(URM_test, cutoff_list=[5])

evaluator_validation_early_stopping = SequentialEvaluator(URM_train, cutoff_list=[5], exclude_seen = False)


In [35]:
recommender = MatrixFactorization_BPR_Cython(URM_train)
recommender.fit(num_factors = 50, 
                validation_every_n = 10, 
                stop_on_validation = True, 
                evaluator_object = evaluator_validation_early_stopping,
                lower_validatons_allowed = 5, 
                validation_metric = "MAP")

result_dict, _ = evaluator_test.evaluateRecommender(recommender)

Processed 71568 ( 100.00% ) in 0.79 seconds. BPR loss 7.70E-02. Sample per second: 90320
MF_BPR: Epoch 1 of 300. Elapsed time 0.00 min
Processed 71568 ( 100.00% ) in 0.94 seconds. BPR loss 1.89E-01. Sample per second: 76316
MF_BPR: Epoch 2 of 300. Elapsed time 0.01 min
Processed 71568 ( 100.00% ) in 1.06 seconds. BPR loss 3.46E-01. Sample per second: 67605
MF_BPR: Epoch 3 of 300. Elapsed time 0.01 min
Processed 71568 ( 100.00% ) in 0.20 seconds. BPR loss 5.28E-01. Sample per second: 362019
MF_BPR: Epoch 4 of 300. Elapsed time 0.01 min
Processed 71568 ( 100.00% ) in 0.32 seconds. BPR loss 7.27E-01. Sample per second: 226577
MF_BPR: Epoch 5 of 300. Elapsed time 0.01 min
Processed 71568 ( 100.00% ) in 0.43 seconds. BPR loss 9.35E-01. Sample per second: 166934
MF_BPR: Epoch 6 of 300. Elapsed time 0.01 min
Processed 71568 ( 100.00% ) in 0.54 seconds. BPR loss 1.13E+00. Sample per second: 131321
MF_BPR: Epoch 7 of 300. Elapsed time 0.02 min
Processed 71568 ( 100.00% ) in 0.69 seconds. BPR lo

MF_BPR: Epoch 35 of 300. Elapsed time 8.11 min
Processed 71568 ( 100.00% ) in 0.25 seconds. BPR loss 7.37E+00. Sample per second: 284475
MF_BPR: Epoch 36 of 300. Elapsed time 8.11 min
Processed 71568 ( 100.00% ) in 0.38 seconds. BPR loss 7.51E+00. Sample per second: 189790
MF_BPR: Epoch 37 of 300. Elapsed time 8.11 min
Processed 71568 ( 100.00% ) in 0.50 seconds. BPR loss 7.72E+00. Sample per second: 143831
MF_BPR: Epoch 38 of 300. Elapsed time 8.11 min
Processed 71568 ( 100.00% ) in 0.60 seconds. BPR loss 7.88E+00. Sample per second: 118896
MF_BPR: Epoch 39 of 300. Elapsed time 8.12 min
Processed 71568 ( 100.00% ) in 0.72 seconds. BPR loss 8.20E+00. Sample per second: 99923
MF_BPR: Validation begins...
SequentialEvaluator: Processed 13001 ( 18.61% ) in 31.63 seconds. Users per second: 411
SequentialEvaluator: Processed 27001 ( 38.64% ) in 62.68 seconds. Users per second: 431
SequentialEvaluator: Processed 41001 ( 58.68% ) in 93.62 seconds. Users per second: 438
SequentialEvaluator: Pr

SequentialEvaluator: Processed 13391 ( 19.16% ) in 30.00 seconds. Users per second: 446
SequentialEvaluator: Processed 27001 ( 38.64% ) in 60.12 seconds. Users per second: 449
SequentialEvaluator: Processed 41001 ( 58.68% ) in 90.20 seconds. Users per second: 455
SequentialEvaluator: Processed 55001 ( 78.71% ) in 120.31 seconds. Users per second: 457
SequentialEvaluator: Processed 69001 ( 98.74% ) in 150.71 seconds. Users per second: 458
SequentialEvaluator: Processed 69878 ( 100.00% ) in 151.10 seconds. Users per second: 462
MF_BPR: {'ROC_AUC': 0.32820296325977943, 'PRECISION': 0.2341802856407383, 'RECALL': 0.03029791462314629, 'RECALL_TEST_LEN': 0.2341802856407383, 'MAP': 0.15422941889200245, 'MRR': 0.3866016485875291, 'NDCG': 0.06731249751130026, 'F1': 0.05365413326079693, 'HIT_RATE': 1.1708978505395118, 'ARHR': 0.5498754972952541, 'NOVELTY': 0.000634022619976794, 'DIVERSITY_MEAN_INTER_LIST': 0.0, 'DIVERSITY_HERFINDAHL': 0.7999999999999999, 'COVERAGE_ITEM': 7.676482328737679e-05, 'C

MF_BPR: Epoch 101 of 300. Elapsed time 26.38 min
Processed 71568 ( 100.00% ) in 0.49 seconds. BPR loss 1.74E+01. Sample per second: 147198
MF_BPR: Epoch 102 of 300. Elapsed time 26.38 min
Processed 71568 ( 100.00% ) in 0.60 seconds. BPR loss 1.76E+01. Sample per second: 119963
MF_BPR: Epoch 103 of 300. Elapsed time 26.38 min
Processed 71568 ( 100.00% ) in 0.71 seconds. BPR loss 1.76E+01. Sample per second: 101374
MF_BPR: Epoch 104 of 300. Elapsed time 26.38 min
Processed 71568 ( 100.00% ) in 0.83 seconds. BPR loss 1.76E+01. Sample per second: 86446
MF_BPR: Epoch 105 of 300. Elapsed time 26.39 min
Processed 71568 ( 100.00% ) in 0.94 seconds. BPR loss 1.79E+01. Sample per second: 75961
MF_BPR: Epoch 106 of 300. Elapsed time 26.39 min
Processed 71568 ( 100.00% ) in 1.09 seconds. BPR loss 1.81E+01. Sample per second: 65411
MF_BPR: Epoch 107 of 300. Elapsed time 26.39 min
Processed 71568 ( 100.00% ) in 0.20 seconds. BPR loss 1.81E+01. Sample per second: 363168
MF_BPR: Epoch 108 of 300. Elap

In [36]:
result_dict

{5: {'ROC_AUC': 0.41810965778451625,
  'PRECISION': 0.33416192029923675,
  'RECALL': 0.04044970532023318,
  'RECALL_TEST_LEN': 0.33416192029923675,
  'MAP': 0.2668385853439912,
  'MRR': 0.49559828081321394,
  'NDCG': 0.09251542186105109,
  'F1': 0.07216407757231579,
  'HIT_RATE': 1.6707833652937978,
  'ARHR': 0.8151852752892721,
  'NOVELTY': 0.0006381631695154302,
  'DIVERSITY_MEAN_INTER_LIST': 0.3464988925806891,
  'DIVERSITY_HERFINDAHL': 0.8692987867908664,
  'COVERAGE_ITEM': 0.00035311818712193327,
  'COVERAGE_USER': 0.9763860943438408,
  'DIVERSITY_GINI': 0.3354024156386845,
  'SHANNON_ENTROPY': 3.124327469803162}}

In [37]:
recommender = MatrixFactorization_FunkSVD_Cython(URM_train)
recommender.fit(num_factors = 50, 
                validation_every_n = 10, 
                stop_on_validation = True, 
                evaluator_object = evaluator_validation_early_stopping,
                lower_validatons_allowed = 5, 
                validation_metric = "MAP")

result_dict, _ = evaluator_test.evaluateRecommender(recommender)

Processed 5002124 ( 100.00% ) in 5.08 seconds. MSE loss 1.73E+00. Sample per second: 984709
FUNK_SVD: Epoch 1 of 300. Elapsed time 0.08 min
Processed 5002124 ( 100.00% ) in 4.71 seconds. MSE loss 9.02E-01. Sample per second: 1061309
FUNK_SVD: Epoch 2 of 300. Elapsed time 0.16 min
Processed 5002124 ( 100.00% ) in 5.11 seconds. MSE loss 8.53E-01. Sample per second: 978075
FUNK_SVD: Epoch 3 of 300. Elapsed time 0.23 min
Processed 5002124 ( 100.00% ) in 4.85 seconds. MSE loss 8.35E-01. Sample per second: 1031556
FUNK_SVD: Epoch 4 of 300. Elapsed time 0.31 min
Processed 5002124 ( 100.00% ) in 5.56 seconds. MSE loss 8.25E-01. Sample per second: 899115
FUNK_SVD: Epoch 5 of 300. Elapsed time 0.39 min
Processed 5002124 ( 100.00% ) in 5.34 seconds. MSE loss 8.20E-01. Sample per second: 937096
FUNK_SVD: Epoch 6 of 300. Elapsed time 0.47 min
Processed 5002124 ( 100.00% ) in 5.17 seconds. MSE loss 8.17E-01. Sample per second: 968180
FUNK_SVD: Epoch 7 of 300. Elapsed time 0.55 min
Processed 5002124 

FUNK_SVD: Epoch 35 of 300. Elapsed time 9.58 min
Processed 5002124 ( 100.00% ) in 4.93 seconds. MSE loss 7.98E-01. Sample per second: 1014141
FUNK_SVD: Epoch 36 of 300. Elapsed time 9.65 min
Processed 5002124 ( 100.00% ) in 5.31 seconds. MSE loss 7.97E-01. Sample per second: 942355
FUNK_SVD: Epoch 37 of 300. Elapsed time 9.72 min
Processed 5002124 ( 100.00% ) in 4.26 seconds. MSE loss 7.97E-01. Sample per second: 1174405
FUNK_SVD: Epoch 38 of 300. Elapsed time 9.78 min
Processed 5002124 ( 100.00% ) in 4.28 seconds. MSE loss 7.97E-01. Sample per second: 1169776
FUNK_SVD: Epoch 39 of 300. Elapsed time 9.85 min
Processed 5002124 ( 100.00% ) in 5.75 seconds. MSE loss 7.97E-01. Sample per second: 870453
FUNK_SVD: Validation begins...
SequentialEvaluator: Processed 13001 ( 18.61% ) in 30.75 seconds. Users per second: 423
SequentialEvaluator: Processed 27001 ( 38.64% ) in 62.09 seconds. Users per second: 435
SequentialEvaluator: Processed 41246 ( 59.03% ) in 92.09 seconds. Users per second: 4

Processed 5002124 ( 100.00% ) in 4.70 seconds. MSE loss 7.96E-01. Sample per second: 1064986
FUNK_SVD: Validation begins...
SequentialEvaluator: Processed 14001 ( 20.04% ) in 30.29 seconds. Users per second: 462
SequentialEvaluator: Processed 29001 ( 41.50% ) in 60.80 seconds. Users per second: 477
SequentialEvaluator: Processed 43556 ( 62.33% ) in 90.80 seconds. Users per second: 480
SequentialEvaluator: Processed 58574 ( 83.82% ) in 120.81 seconds. Users per second: 485
SequentialEvaluator: Processed 69878 ( 100.00% ) in 143.79 seconds. Users per second: 486
FUNK_SVD: {'ROC_AUC': 0.308604520259119, 'PRECISION': 0.19620266750625867, 'RECALL': 0.02529725021994087, 'RECALL_TEST_LEN': 0.19620266750625867, 'MAP': 0.12983857580355646, 'MRR': 0.3583257009836068, 'NDCG': 0.05880995185181036, 'F1': 0.044816160878768635, 'HIT_RATE': 0.9810097598671971, 'ARHR': 0.48322600341544447, 'NOVELTY': 0.0006671221449425705, 'DIVERSITY_MEAN_INTER_LIST': 0.49745945608065006, 'DIVERSITY_HERFINDAHL': 0.8994

SequentialEvaluator: Processed 16001 ( 22.90% ) in 30.37 seconds. Users per second: 527
SequentialEvaluator: Processed 34001 ( 48.66% ) in 61.46 seconds. Users per second: 553
SequentialEvaluator: Processed 51001 ( 72.99% ) in 91.61 seconds. Users per second: 557
SequentialEvaluator: Processed 69001 ( 98.74% ) in 122.54 seconds. Users per second: 563
SequentialEvaluator: Processed 69878 ( 100.00% ) in 122.88 seconds. Users per second: 569


In [38]:
recommender = PureSVDRecommender(URM_train)
recommender.fit()

result_dict, _ = evaluator_test.evaluateRecommender(recommender)

PureSVDRecommender Computing SVD decomposition...
PureSVDRecommender Computing SVD decomposition... Done!
SequentialEvaluator: Processed 22001 ( 31.48% ) in 30.86 seconds. Users per second: 713
SequentialEvaluator: Processed 44883 ( 64.23% ) in 60.86 seconds. Users per second: 737
SequentialEvaluator: Processed 67001 ( 95.88% ) in 91.48 seconds. Users per second: 732
SequentialEvaluator: Processed 69878 ( 100.00% ) in 94.60 seconds. Users per second: 739
