# Recommender Systems 2021/22

### Practice - Implicit Alternating Least Squares

See:
Y. Hu, Y. Koren and C. Volinsky, Collaborative filtering for implicit feedback datasets, ICDM 2008.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167.5120&rep=rep1&type=pdf

R. Pan et al., One-class collaborative filtering, ICDM 2008.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.306.4684&rep=rep1&type=pdf

Factorization model for binary feedback.
First, splits the feedback matrix R as the element-wise a Preference matrix P and a Confidence matrix C.
Then computes the decomposition of them into the dot product of two matrices X and Y of latent factors.
X represent the user latent factors, Y the item latent factors.

The model is learned by solving the following regularized Least-squares objective function with Stochastic Gradient Descent
    
$$\frac{1}{2}\sum_{i,j}{c_{ij}\left(p_{ij}-x_i^T y_j\right) + \lambda\left(\sum_{i}{||x_i||^2} + \sum_{j}{||y_j||^2}\right)}$$


In [1]:
import time
import numpy as np

In [2]:
from Data_manager.split_functions.split_train_validation_random_holdout import split_train_in_two_percentage_global_sample
from Data_manager.Movielens.Movielens10MReader import Movielens10MReader

data_reader = Movielens10MReader()
data_loaded = data_reader.load_data()

URM_all = data_loaded.get_URM_all()

URM_train, URM_test = split_train_in_two_percentage_global_sample(URM_all, train_percentage = 0.80)

Movielens10M: Verifying data consistency...
Movielens10M: Verifying data consistency... Passed!
DataReader: current dataset is: <class 'Data_manager.Dataset.Dataset'>
	Number of items: 10681
	Number of users: 69878
	Number of interactions in URM_all: 10000054
	Value range in URM_all: 0.50-5.00
	Interaction density: 1.34E-02
	Interactions per user:
		 Min: 2.00E+01
		 Avg: 1.43E+02
		 Max: 7.36E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 9.36E+02
		 Max: 3.49E+04
	Gini Index: 0.57

	ICM name: ICM_genres, Value range: 1.00 / 1.00, Num features: 20, feature occurrences: 21564, density 1.01E-01
	ICM name: ICM_all, Value range: 1.00 / 69.00, Num features: 10126, feature occurrences: 128384, density 1.19E-03
	ICM name: ICM_year, Value range: 6.00E+00 / 2.01E+03, Num features: 1, feature occurrences: 10681, density 1.00E+00
	ICM name: ICM_tags, Value range: 1.00 / 69.00, Num features: 10106, feature occurrences: 106820, density 9.90E-04




In [3]:
URM_train

<69878x10681 sparse matrix of type '<class 'numpy.float64'>'
	with 8000043 stored elements in Compressed Sparse Row format>

### What do we need for IALS?

* User factor and Item factor matrices
* Confidence function
* Update rule for items
* Update rule for users
* Training loop and some patience


In [4]:
n_users, n_items = URM_train.shape

## Step 1: We create the dense latent factor matrices
### In a MF model you have two matrices, one with a row per user and the other with a column per item. The other dimension, columns for the first one and rows for the second one is called latent factors

In [5]:
num_factors = 10

user_factors = np.random.random((n_users, num_factors))
item_factors = np.random.random((n_items, num_factors))

In [6]:
user_factors

array([[0.68170924, 0.45510017, 0.38803449, ..., 0.68549916, 0.46539247,
        0.06620504],
       [0.06305243, 0.95881947, 0.06663733, ..., 0.04360239, 0.07490854,
        0.22469506],
       [0.04097471, 0.39420552, 0.5667942 , ..., 0.79911491, 0.08627878,
        0.5937455 ],
       ...,
       [0.11816292, 0.4737209 , 0.7678413 , ..., 0.60987589, 0.29662157,
        0.18942834],
       [0.66924699, 0.03590675, 0.07888202, ..., 0.46117881, 0.72213628,
        0.23674881],
       [0.20848604, 0.04554099, 0.28709882, ..., 0.25089374, 0.89577473,
        0.03565192]])

In [7]:
item_factors

array([[0.15069107, 0.80681573, 0.30778298, ..., 0.14677423, 0.23341278,
        0.37704531],
       [0.92211302, 0.44460682, 0.7913981 , ..., 0.78896563, 0.1446843 ,
        0.92989713],
       [0.23538478, 0.99340528, 0.81193351, ..., 0.845437  , 0.74659047,
        0.17975963],
       ...,
       [0.12806035, 0.5234029 , 0.19163775, ..., 0.40652814, 0.60689866,
        0.15534618],
       [0.30970641, 0.62823476, 0.67904598, ..., 0.46228659, 0.09578545,
        0.60984898],
       [0.92799839, 0.37126583, 0.74574018, ..., 0.98925788, 0.67638146,
        0.51500969]])

## Step 2: We define a function to transform the interaction data in a "confidence" value. 
* If you have explicit data, the higher it is the higher the confidence (logarithmic, linear?)
* Other options include scaling the data lowering it if the item or use has very few interactions (lower support)

In [8]:
def linear_confidence_function(URM_train, alpha):
    
    URM_train.data = 1.0 + alpha*URM_train.data
    
    return URM_train

In [9]:
alpha = 0.5
C_URM_train = linear_confidence_function(URM_train, alpha)

C_URM_train.data[:10]

array([3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5])

The concept of confidence can be defined in different ways, for example in terms of the number of interactions an item or a user has, the more they have the more support your model will have for the respective latent factors.

In [10]:
def popularity_confidence(URM_train):
    
    item_popularity = np.ediff1d(URM_train.tocsc().indptr)
    item_confidence = np.zeros(len(item_popularity))
    item_confidence[item_popularity!=0] = np.log(item_popularity[item_popularity!=0])
    
    C_URM_train = URM_train.copy()
    C_URM_train = C_URM_train.tocsc()
    
    for item_id in range(C_URM_train.shape[1]):
        start_pos = C_URM_train.indices[item_id]
        end_pos = C_URM_train.indices[item_id+1]
        
        C_URM_train.data[start_pos:end_pos] = item_confidence[item_id]
    
    C_URM_train = C_URM_train.tocsr()
    
    return C_URM_train

In [11]:
C_URM_train = popularity_confidence(URM_train)

C_URM_train.data[:10]

array([4.57471098, 4.41884061, 5.70711026, 1.79175947, 7.47986413,
       3.5       , 3.5       , 3.5       , 3.5       , 3.5       ])

## Step 3: Define the update rules for the user factors


Update latent factors for a single user or item.

Y = |n_interactions|x|n_factors|

YtY =   |n_factors|x|n_factors|



Latent factors ony of item/users for which an interaction exists in the interaction profile
Y_interactions = Y[interaction_profile, :]

Following the notation of the original paper we report the update rule for the Item factors (User factors are identical):
* __Y__ are the item factors |n_items|x|n_factors|
* __Cu__ is a diagonal matrix |n_interactions|x|n_interactions| with the user confidence for the observed items
* __p(u)__ is a boolean vectors indexing only observed items. Here it will disappear as we already extract only the observed latent factors however, it will have an impact in the dimensions of the matrix, since it transforms Cu from a diagonal matrix to a row vector of 1 row and |n_interactions| columns

$$(Yt*Cu*Y + reg*I)^-1 * Yt*Cu*profile$$ which can be decomposed as $$(YtY + Yt*(Cu-I)*Y + reg*I)^-1 * Yt*Cu*p(u)$$ 

* __A__ = (|n_interactions|x|n_factors|) dot (|n_interactions|x|n_interactions| ) dot (|n_interactions|x|n_factors| )
  = |n_factors|x|n_factors|
  
We use an equivalent formulation (v * k.T).T which is much faster
* __A__ = Y_interactions.T.dot(((interaction_confidence - 1) * Y_interactions.T).T)
* __B__ = YtY + A + self.regularization_diagonal
* __new factors__ = np.dot(np.linalg.inv(B), Y_interactions.T.dot(interaction_confidence))


In [12]:
def _update_row(interaction_profile, interaction_confidence, Y, YtY, regularization_diagonal):

    Y_interactions = Y[interaction_profile, :]
    
    A = Y_interactions.T.dot(((interaction_confidence - 1) * Y_interactions.T).T)

    B = YtY + A + regularization_diagonal

    return np.dot(np.linalg.inv(B), Y_interactions.T.dot(interaction_confidence))


In [13]:
regularization_coefficient = 1e-4

regularization_diagonal = np.diag(regularization_coefficient * np.ones(num_factors))
regularization_diagonal

array([[0.0001, 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.0001, 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.0001, 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.0001, 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.0001, 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.0001, 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.0001, 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.0001,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.0001, 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.0001]])

In [14]:
# VV = n_factors x n_factors
VV = item_factors.T.dot(item_factors)
VV.shape

(10, 10)

In [15]:
user_id = 154

In [16]:
C_URM_train = linear_confidence_function(URM_train, alpha)

start_pos = C_URM_train.indptr[user_id]
end_pos = C_URM_train.indptr[user_id + 1]

user_profile = C_URM_train.indices[start_pos:end_pos]
user_confidence = C_URM_train.data[start_pos:end_pos]

user_factors[user_id, :] = _update_row(user_profile, user_confidence, item_factors, VV, regularization_diagonal)

## Step 4: Apply updates on the user item factors as well

In [17]:
# UU = n_factors x n_factors
UU = user_factors.T.dot(user_factors)
UU.shape

(10, 10)

In [18]:
item_id = 154

In [19]:
C_URM_train_csc = C_URM_train.tocsc()

start_pos = C_URM_train_csc.indptr[item_id]
end_pos = C_URM_train_csc.indptr[item_id + 1]

item_profile = C_URM_train_csc.indices[start_pos:end_pos]
item_confidence = C_URM_train_csc.data[start_pos:end_pos]

item_factors[item_id, :] = _update_row(item_profile, item_confidence, user_factors, UU, regularization_diagonal)

### Let's put all together in a training loop.

In [20]:
C_URM_train_csc = C_URM_train.tocsc()

num_factors = 10

user_factors = np.random.random((n_users, num_factors))
item_factors = np.random.random((n_items, num_factors))


for n_epoch in range(10):
    
    start_time = time.time()

    VV = item_factors.T.dot(item_factors)
        
    for user_id in range(C_URM_train.shape[0]):

        start_pos = C_URM_train.indptr[user_id]
        end_pos = C_URM_train.indptr[user_id + 1]

        user_profile = C_URM_train.indices[start_pos:end_pos]
        user_confidence = C_URM_train.data[start_pos:end_pos]
        
        user_factors[user_id, :] = _update_row(user_profile, user_confidence, item_factors, VV, regularization_diagonal)   

        # Print some stats
        if (user_id +1)% 100000 == 0 or user_id == C_URM_train.shape[0]-1:
            elapsed_time = time.time() - start_time
            samples_per_second = user_id/elapsed_time
            print("Iteration {} in {:.2f} seconds. Users per second {:.2f}".format(user_id+1, elapsed_time, samples_per_second))
    
    UU = user_factors.T.dot(user_factors)

    for item_id in range(C_URM_train.shape[1]):

        start_pos = C_URM_train_csc.indptr[item_id]
        end_pos = C_URM_train_csc.indptr[item_id + 1]

        item_profile = C_URM_train_csc.indices[start_pos:end_pos]
        item_confidence = C_URM_train_csc.data[start_pos:end_pos]

        item_factors[item_id, :] = _update_row(item_profile, item_confidence, user_factors, UU, regularization_diagonal)    

        # Print some stats
        if (item_id +1)% 100000 == 0 or item_id == C_URM_train.shape[1]-1:
            elapsed_time = time.time() - start_time
            samples_per_second = item_id/elapsed_time
            print("Iteration {} in {:.2f} seconds. Items per second {:.2f}".format(item_id+1, elapsed_time, samples_per_second))

    total_epoch_time = time.time() - start_time  
    print("Epoch {} complete in in {:.2f} seconds".format(n_epoch+1, total_epoch_time))


Iteration 69878 in 56.18 seconds. Users per second 1243.74
Iteration 10681 in 75.47 seconds. Items per second 141.51
Epoch 1 complete in in 75.47 seconds
Iteration 69878 in 62.83 seconds. Users per second 1112.08
Iteration 10681 in 78.29 seconds. Items per second 136.41
Epoch 2 complete in in 78.29 seconds
Iteration 69878 in 50.60 seconds. Users per second 1380.85
Iteration 10681 in 68.83 seconds. Items per second 155.17
Epoch 3 complete in in 68.83 seconds
Iteration 69878 in 51.41 seconds. Users per second 1359.21
Iteration 10681 in 61.39 seconds. Items per second 173.97
Epoch 4 complete in in 61.39 seconds
Iteration 69878 in 24.50 seconds. Users per second 2852.00
Iteration 10681 in 30.12 seconds. Items per second 354.53
Epoch 5 complete in in 30.12 seconds
Iteration 69878 in 25.73 seconds. Users per second 2715.40
Iteration 10681 in 33.49 seconds. Items per second 318.94
Epoch 6 complete in in 33.49 seconds
Iteration 69878 in 15.25 seconds. Users per second 4580.85
Iteration 10681 i

### How long do we train such a model?

* An epoch: a complete loop over all the train data
* Usually you train for multiple epochs. Depending on the algorithm and data 10s or 100s of epochs.

In [21]:
estimated_seconds = total_epoch_time*10
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 668.00 seconds, or 11.13 minutes


## Lastly: Computing a prediction for any given user or item

In [22]:
user_id = 17025
item_id = 468

In [23]:
predicted_rating = np.dot(user_factors[user_id,:], item_factors[item_id,:])
predicted_rating

1.0313089655029795