# Recommender Systems 2022/23

### Practice - AsySVD implemented with Python

AsymmetricSVD is a model-based matrix factorization algorithm in which the user latent factors are represented as a function of their user profile and of a second item factor matrix.

In [1]:
import time
import numpy as np

In [2]:
from Data_manager.split_functions.split_train_validation_random_holdout import split_train_in_two_percentage_global_sample
from Data_manager.Movielens.Movielens10MReader import Movielens10MReader

data_reader = Movielens10MReader()
data_loaded = data_reader.load_data()

URM_all = data_loaded.get_URM_all()

URM_train, URM_test = split_train_in_two_percentage_global_sample(URM_all, train_percentage = 0.8)

Movielens10M: Verifying data consistency...
Movielens10M: Verifying data consistency... Passed!
DataReader: current dataset is: Movielens10M
	Number of items: 10681
	Number of users: 69878
	Number of interactions in URM_all: 10000054
	Value range in URM_all: 0.50-5.00
	Interaction density: 1.34E-02
	Interactions per user:
		 Min: 2.00E+01
		 Avg: 1.43E+02
		 Max: 7.36E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 9.36E+02
		 Max: 3.49E+04
	Gini Index: 0.57

	ICM name: ICM_all, Value range: 1.00 / 69.00, Num features: 10126, feature occurrences: 128384, density 1.19E-03
	ICM name: ICM_genres, Value range: 1.00 / 1.00, Num features: 20, feature occurrences: 21564, density 1.01E-01
	ICM name: ICM_tags, Value range: 1.00 / 69.00, Num features: 10106, feature occurrences: 106820, density 9.90E-04
	ICM name: ICM_year, Value range: 6.00E+00 / 2.01E+03, Num features: 1, feature occurrences: 10681, density 1.00E+00




In [3]:
URM_train

<69878x10681 sparse matrix of type '<class 'numpy.float64'>'
	with 8000043 stored elements in Compressed Sparse Row format>

### What do we need for AsySVD?

* Loss function
* User factor and Item factor matrices
* Computing prediction
* Update rule
* Training loop and some patience


In [4]:
n_users, n_items = URM_train.shape

#### The two methods are based on two latent factor matrices $ W, V \in R^{I \times E}$ with E the embedding size, and biases

#### How to compute the predictions
$ \hat{r}_{ui} = \sum_{k=0}^{E}\sum_{j=0}^{I} r_{uj}W_{jk}H_{ki}$


#### The loss function we are interested in minimizing is
$L = |R - RWH|_F + \alpha|W|_2 + \beta|H|_2$

#### Gradients

$\frac{\partial}{\partial W} L = -2(R - RWH)RH + 2\alpha W $

$\frac{\partial}{\partial H} L = -2(R - RWH)RW + 2\alpha H $


#### The update is going to be (we can remove the coefficients)
$ W = W - \frac{\partial}{\partial W}$, or 

$ W = W + l((R - RWH)RH - \alpha W)$, with $l$ the learning rate


## Step 1: We create the dense latent factor matrices
### In a MF model you have two matrices, one with a row per user and the other with a column per item. The other dimension, columns for the first one and rows for the second one is called latent factors

In [7]:
num_factors = 10

user_profile_factors = np.random.random((n_items, num_factors))
item_factors = np.random.random((n_items, num_factors))

In [9]:
user_profile_factors

array([[0.83268015, 0.84024123, 0.99456521, ..., 0.25918597, 0.09471632,
        0.86196292],
       [0.86063324, 0.01208739, 0.38109799, ..., 0.70819048, 0.88845714,
        0.53291314],
       [0.94926168, 0.71983227, 0.51506809, ..., 0.48813948, 0.57082261,
        0.94407572],
       ...,
       [0.10407294, 0.03068574, 0.83906296, ..., 0.82817777, 0.22792677,
        0.59978549],
       [0.08701777, 0.41126252, 0.32324311, ..., 0.85788997, 0.94494768,
        0.98576588],
       [0.63997934, 0.42204321, 0.95740229, ..., 0.76016072, 0.99073497,
        0.06241658]])

In [10]:
item_factors

array([[0.05861773, 0.7651285 , 0.70024038, ..., 0.481705  , 0.81016677,
        0.1152574 ],
       [0.40387444, 0.24274421, 0.74289589, ..., 0.5830678 , 0.94015999,
        0.77827218],
       [0.86464517, 0.60128236, 0.20057371, ..., 0.70083542, 0.45258294,
        0.02065533],
       ...,
       [0.80002861, 0.32511835, 0.54157556, ..., 0.04413489, 0.88323551,
        0.7875761 ],
       [0.80074544, 0.49147636, 0.26592538, ..., 0.93730028, 0.86763265,
        0.94898977],
       [0.86186482, 0.66649598, 0.48786971, ..., 0.59470717, 0.24553824,
        0.25834748]])

## Step 2: We sample an interaction and compute the prediction of the current model

In [13]:
URM_train_coo = URM_train.tocoo()

sample_index = np.random.randint(URM_train_coo.nnz)
sample_index

6862565

In [45]:
user_id = URM_train_coo.row[sample_index]
item_id = URM_train_coo.col[sample_index]
rating = URM_train_coo.data[sample_index]

user_profile = URM_train[user_id]

(user_id, item_id, rating)

(60134, 1011, 3.0)

In [46]:
# The estimated user factors may be divided by the square root of the profile length to improve learning
estimated_user_factors = user_profile.dot(user_profile_factors).ravel()/np.sqrt(user_profile.nnz)
estimated_user_factors

array([64.83785302, 62.92767598, 63.17999842, 63.20199527, 63.01659713,
       63.31245479, 63.11436768, 61.27201204, 63.23221939, 62.48529393])

In [47]:
predicted_rating = np.dot(estimated_user_factors, item_factors[item_id,:])
predicted_rating

343.7178484123175

#### The first predicted rating is a random prediction, essentially

### Step 3: We compute the prediction error and update the latent factor matrices

In [48]:
prediction_error = rating - predicted_rating
prediction_error

-340.7178484123175

### The error is positive, so we need to increase the prediction our model computes. Meaning, we have to increase the values latent factor matrices

### Which latent factors we modify? All the factors of the item and user we used

In [49]:
# Copy original value to avoid messing up the updates
H_all = item_factors.copy()
W_u = user_profile_factors[user_profile.indices,:]

In [50]:
H_all

array([[0.05861773, 0.7651285 , 0.70024038, ..., 0.481705  , 0.81016677,
        0.1152574 ],
       [0.40387444, 0.24274421, 0.74289589, ..., 0.5830678 , 0.94015999,
        0.77827218],
       [0.86464517, 0.60128236, 0.20057371, ..., 0.70083542, 0.45258294,
        0.02065533],
       ...,
       [0.80002861, 0.32511835, 0.54157556, ..., 0.04413489, 0.88323551,
        0.7875761 ],
       [0.80074544, 0.49147636, 0.26592538, ..., 0.93730028, 0.86763265,
        0.94898977],
       [0.86186482, 0.66649598, 0.48786971, ..., 0.59470717, 0.24553824,
        0.25834748]])

In [51]:
W_u

array([[0.86063324, 0.01208739, 0.38109799, ..., 0.70819048, 0.88845714,
        0.53291314],
       [0.67079319, 0.85719395, 0.0350964 , ..., 0.28510831, 0.86279411,
        0.59299786],
       [0.88044625, 0.66813129, 0.60754717, ..., 0.92434201, 0.08687705,
        0.66332509],
       ...,
       [0.81937852, 0.20340655, 0.6920163 , ..., 0.0222942 , 0.09001285,
        0.57903703],
       [0.8099745 , 0.4414923 , 0.69282439, ..., 0.81132473, 0.48334043,
        0.00612573],
       [0.54257684, 0.18361609, 0.11396505, ..., 0.7562293 , 0.91916405,
        0.78030075]])

#### Apply the update rule

In [52]:
learning_rate = 1e-4
regularization = 1e-5

In [53]:
user_factors_update = prediction_error * user_profile.dot(H_all) - regularization * W_u
user_factors_update

array([[-769046.78113353, -793623.69665649, -796064.23861583, ...,
        -791028.89924351, -797009.41242191, -765061.1628353 ],
       [-769046.78113163, -793623.69666494, -796064.23861237, ...,
        -791028.89923928, -797009.41242166, -765061.1628359 ],
       [-769046.78113373, -793623.69666305, -796064.23861809, ...,
        -791028.89924567, -797009.4124139 , -765061.16283661],
       ...,
       [-769046.78113312, -793623.6966584 , -796064.23861894, ...,
        -791028.89923665, -797009.41241393, -765061.16283576],
       [-769046.78113302, -793623.69666078, -796064.23861895, ...,
        -791028.89924454, -797009.41241786, -765061.16283003],
       [-769046.78113035, -793623.6966582 , -796064.23861316, ...,
        -791028.89924399, -797009.41242222, -765061.16283778]])

In [54]:
item_factors_update = prediction_error * W_u - regularization * item_factors[user_profile.indices]
item_factors_update

array([[-293.23311105,   -4.11839155, -129.84689321, ..., -241.29314302,
        -302.7132139 , -181.57302735],
       [-228.55121174, -292.06128677,  -11.95797951, ...,  -97.14149736,
        -293.9693559 , -202.04495825],
       [-299.98375349, -227.64425954, -207.00217296, ..., -314.93982163,
         -29.60056545, -226.00670004],
       ...,
       [-279.17688881,  -69.30424544, -235.7823044 , ...,   -7.59603791,
         -30.66899336, -197.28825183],
       [-275.97277076, -150.42431682, -236.05764197, ..., -276.43281766,
        -164.68271683,   -2.08714724],
       [-184.86562407,  -62.56128159,  -38.82993362, ..., -257.66082309,
        -313.17560579, -265.8623991 ]])

In [55]:
user_profile_factors[user_profile.indices,:] += learning_rate * user_factors_update 
item_factors[user_profile.indices,:] += learning_rate * item_factors_update

### Let's check what the new prediction for the same user-item interaction would be

In [57]:
estimated_user_factors = user_profile.dot(user_profile_factors).ravel()/np.sqrt(user_profile.nnz)

predicted_rating = np.dot(estimated_user_factors, item_factors[item_id,:])
predicted_rating

335.8291625846479

### The value is higher than before, we are moving in the right direction

### And now? Sample another interaction and repeat... a lot of times

### WARNING: Initialization must be done with random non-zero values ... otherwise

In [72]:
user_profile_factors = np.zeros((n_items, num_factors))
item_factors = np.zeros((n_items, num_factors))

user_profile = URM_train[user_id]
estimated_user_factors = user_profile.dot(user_profile_factors).ravel()/np.sqrt(user_profile.nnz)
predicted_rating = np.dot(estimated_user_factors, item_factors[item_id,:])

print("Prediction is {:.2f}".format(predicted_rating))

prediction_error = rating - predicted_rating

print("Prediction error is {:.2f}".format(prediction_error))


Prediction is 0.00
Prediction error is 3.00


In [73]:
H_all = item_factors.copy()
W_u = user_profile_factors[user_profile.indices,:]

user_profile_factors[user_profile.indices,:] += learning_rate * (prediction_error * user_profile.dot(H_all) - regularization * W_u)
item_factors[user_profile.indices,:] += learning_rate * (prediction_error * W_u - regularization * item_factors[user_profile.indices])

In [74]:
estimated_user_factors = user_profile.dot(user_profile_factors).ravel()/np.sqrt(user_profile.nnz)
predicted_rating = np.dot(estimated_user_factors, item_factors[item_id,:])

print("Prediction after the update is {:.2f}".format(predicted_rating))
print("Prediction error is {:.2f}".format(rating - predicted_rating))

Prediction after the update is 0.00
Prediction error is 3.00


### Since the matrices are multiplied, if we initialize one of them as zero, the updates will always be zero and the model will not be able to learn.

### Let's put all together in a training loop.

In [91]:
URM_train_coo = URM_train.tocoo()

num_factors = 10
learning_rate = 1e-9    # Notice the low learning rate
regularization = 1e-1   # Notice the high regularization

user_profile_factors = np.random.random((n_items, num_factors))
item_factors = np.random.random((n_items, num_factors))

loss = 0.0
start_time = time.time()

for sample_num in range(1000000):
    
    # Randomly pick sample
    sample_index = np.random.randint(URM_train_coo.nnz)

    user_id = URM_train_coo.row[sample_index]
    item_id = URM_train_coo.col[sample_index]
    rating = URM_train_coo.data[sample_index]
    
    # Compute prediction
    user_profile = URM_train[user_id]
    
    if user_profile.nnz == 0:
        continue 
        
    estimated_user_factors = user_profile.dot(user_profile_factors).ravel()/np.sqrt(user_profile.nnz)
    predicted_rating = np.dot(estimated_user_factors, item_factors[item_id,:])
        
    # Compute prediction error, or gradient
    prediction_error = rating - predicted_rating
    loss += prediction_error**2

    if np.isnan(loss):
        break 
        
    # Copy original value to avoid messing up the updates
    H_all = item_factors.copy()
    W_u = user_profile_factors[user_profile.indices,:]
    
    # Apply the updates
    user_factors_update = prediction_error * user_profile.dot(H_all) - regularization * W_u
    item_factors_update = prediction_error * W_u - regularization * item_factors[user_profile.indices]
    
    user_profile_factors[user_profile.indices,:] += learning_rate * user_factors_update 
    item_factors[user_profile.indices,:] += learning_rate * item_factors_update    
    
    # Print some stats
    if (sample_num +1)% 100000 == 0:
        elapsed_time = time.time() - start_time
        samples_per_second = sample_num/elapsed_time
        print("Iteration {} in {:.2f} seconds, loss is {:.2f}. Samples per second {:.2f}".format(sample_num+1, elapsed_time, loss/sample_num, samples_per_second))

Iteration 100000 in 34.51 seconds, loss is 1871.05. Samples per second 2897.45
Iteration 200000 in 68.44 seconds, loss is 1008.91. Samples per second 2922.15
Iteration 300000 in 102.65 seconds, loss is 687.28. Samples per second 2922.64
Iteration 400000 in 136.81 seconds, loss is 524.98. Samples per second 2923.71
Iteration 500000 in 172.00 seconds, loss is 427.02. Samples per second 2907.05
Iteration 600000 in 206.93 seconds, loss is 361.07. Samples per second 2899.48
Iteration 700000 in 241.92 seconds, loss is 313.46. Samples per second 2893.46
Iteration 800000 in 276.11 seconds, loss is 277.43. Samples per second 2897.43
Iteration 900000 in 310.14 seconds, loss is 249.20. Samples per second 2901.90
Iteration 1000000 in 344.65 seconds, loss is 226.47. Samples per second 2901.46


### What do we see? The loss generally goes down but may oscillate a bit.
### With higher learning rates or lower regularization you may see numerical instability (i.e., the loss suddendly explodes and then becomes nan, at which point some model parameters will also become none and the model is ruined)

### How long do we train such a model?

* An epoch: a complete loop over all the train data
* Usually you train for multiple epochs. Depending on the algorithm and data 10s or 100s of epochs.

In [94]:
estimated_seconds = 8e6 * 10 / samples_per_second
print("Estimated time with the previous training speed is {:.2f} seconds ({:.2f} minutes, {:.2f} hours)".format(estimated_seconds, estimated_seconds/60, estimated_seconds/3600))

Estimated time with the previous training speed is 27572.30 seconds (459.54 minutes, 7.66 hours)


### AsySVD can be very slow. Each sample requires to compute dozens (or hundreds) of dot products. Cython does not help in this case because most of the computational cost is already vectorized by numpy. Tools such as PyTorch may become useful in this case because they allow to better parallelize these operations.