# Recommender Systems 2021/22

### Practice - Implicit Alternating Least Squares

See:
Y. Hu, Y. Koren and C. Volinsky, Collaborative filtering for implicit feedback datasets, ICDM 2008.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167.5120&rep=rep1&type=pdf

R. Pan et al., One-class collaborative filtering, ICDM 2008.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.306.4684&rep=rep1&type=pdf

Factorization model for binary feedback.
First, splits the feedback matrix R as the element-wise a Preference matrix P and a Confidence matrix C.
Then computes the decomposition of them into the dot product of two matrices X and Y of latent factors.
X represent the user latent factors, Y the item latent factors.

The model is learned by solving the following regularized Least-squares objective function with Stochastic Gradient Descent
    
$$\frac{1}{2}\sum_{i,j}{c_{ij}\left(p_{ij}-x_i^T y_j\right) + \lambda\left(\sum_{i}{||x_i||^2} + \sum_{j}{||y_j||^2}\right)}$$


In [1]:
import time
import numpy as np

In [2]:
import scipy.sparse as sps

from Data_manager.split_functions.split_train_validation_random_holdout import \
    split_train_in_two_percentage_global_sample
from challenge.utils.functions import read_data, evaluate_algorithm, generate_submission_csv


data_file_path = '../challenge/input_files/data_train.csv'
users_file_path = '../challenge/input_files/data_target_users_test.csv'
URM_all_dataframe, users_list = read_data(data_file_path, users_file_path)

URM_all = sps.coo_matrix(
    (URM_all_dataframe['Data'].values, (URM_all_dataframe['UserID'].values, URM_all_dataframe['ItemID'].values)))
URM_all = URM_all.tocsr()

URM_train, URM_test = split_train_in_two_percentage_global_sample(URM_all, train_percentage=0.80)



In [3]:
URM_train

<13025x22348 sparse matrix of type '<class 'numpy.float64'>'
	with 382984 stored elements in Compressed Sparse Row format>

### What do we need for IALS?

* User factor and Item factor matrices
* Confidence function
* Update rule for items
* Update rule for users
* Training loop and some patience


In [4]:
n_users, n_items = URM_train.shape

## Step 1: We create the dense latent factor matrices
### In a MF model you have two matrices, one with a row per user and the other with a column per item. The other dimension, columns for the first one and rows for the second one is called latent factors

In [5]:
num_factors = 10

user_factors = np.random.random((n_users, num_factors))
item_factors = np.random.random((n_items, num_factors))

In [6]:
user_factors

array([[0.16407308, 0.00649691, 0.6579195 , ..., 0.88932852, 0.38336941,
        0.24233853],
       [0.25380496, 0.78139655, 0.47604554, ..., 0.79615149, 0.70535599,
        0.38206734],
       [0.76390603, 0.5622817 , 0.47190413, ..., 0.30212421, 0.88263303,
        0.37869519],
       ...,
       [0.4539675 , 0.93940015, 0.91809905, ..., 0.96307741, 0.70820284,
        0.60168815],
       [0.4321185 , 0.27190809, 0.82540688, ..., 0.90222375, 0.84766443,
        0.53892239],
       [0.89662591, 0.4515939 , 0.77216422, ..., 0.35550742, 0.35134464,
        0.93892266]])

In [7]:
item_factors

array([[0.46353391, 0.59425837, 0.09775851, ..., 0.58334571, 0.06496289,
        0.04005821],
       [0.08607168, 0.12711198, 0.24479113, ..., 0.90519102, 0.01483432,
        0.37950819],
       [0.28029474, 0.64324866, 0.98812874, ..., 0.44800259, 0.45167813,
        0.15554027],
       ...,
       [0.54313751, 0.62692431, 0.3285164 , ..., 0.41258404, 0.02190922,
        0.78755353],
       [0.21891638, 0.23277344, 0.18059221, ..., 0.79882938, 0.25701765,
        0.63449884],
       [0.30587698, 0.70257865, 0.38267958, ..., 0.80949176, 0.78436452,
        0.55857392]])

## Step 2: We define a function to transform the interaction data in a "confidence" value. 
* If you have explicit data, the higher it is the higher the confidence (logarithmic, linear?)
* Other options include scaling the data lowering it if the item or use has very few interactions (lower support)

In [8]:
def linear_confidence_function(URM_train, alpha):
    
    URM_train.data = 1.0 + alpha*URM_train.data
    
    return URM_train

In [9]:
alpha = 0.5
C_URM_train = linear_confidence_function(URM_train, alpha)

C_URM_train.data[:10]

array([1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5])

The concept of confidence can be defined in different ways, for example in terms of the number of interactions an item or a user has, the more they have the more support your model will have for the respective latent factors.

In [10]:
def popularity_confidence(URM_train):
    
    item_popularity = np.ediff1d(URM_train.tocsc().indptr)
    item_confidence = np.zeros(len(item_popularity))
    item_confidence[item_popularity!=0] = np.log(item_popularity[item_popularity!=0])
    
    C_URM_train = URM_train.copy()
    C_URM_train = C_URM_train.tocsc()
    
    for item_id in range(C_URM_train.shape[1]):
        start_pos = C_URM_train.indptr[item_id]
        end_pos = C_URM_train.indptr[item_id+1]
        
        C_URM_train.data[start_pos:end_pos] = item_confidence[item_id]
    
    C_URM_train = C_URM_train.tocsr()
    
    return C_URM_train

In [11]:
C_URM_train = popularity_confidence(URM_train)

C_URM_train.data[:10]

array([7.01750614, 6.69703425, 5.78382518, 5.26269019, 5.33271879,
       5.39362755, 5.36129217, 5.50125821, 5.06259503, 4.93447393])

## Step 3: Define the update rules for the user factors


Update latent factors for a single user or item.

Y = |n_interactions|x|n_factors|

YtY =   |n_factors|x|n_factors|



Latent factors ony of item/users for which an interaction exists in the interaction profile
Y_interactions = Y[interaction_profile, :]

Following the notation of the original paper we report the update rule for the Item factors (User factors are identical):
* __Y__ are the item factors |n_items|x|n_factors|
* __Cu__ is a diagonal matrix |n_interactions|x|n_interactions| with the user confidence for the observed items
* __p(u)__ is a boolean vectors indexing only observed items. Here it will disappear as we already extract only the observed latent factors however, it will have an impact in the dimensions of the matrix, since it transforms Cu from a diagonal matrix to a row vector of 1 row and |n_interactions| columns

$$(Yt*Cu*Y + reg*I)^-1 * Yt*Cu*profile$$ which can be decomposed as $$(YtY + Yt*(Cu-I)*Y + reg*I)^-1 * Yt*Cu*p(u)$$ 

* __A__ = (|n_interactions|x|n_factors|) dot (|n_interactions|x|n_interactions| ) dot (|n_interactions|x|n_factors| )
  = |n_factors|x|n_factors|
  
We use an equivalent formulation (v * k.T).T which is much faster
* __A__ = Y_interactions.T.dot(((interaction_confidence - 1) * Y_interactions.T).T)
* __B__ = YtY + A + self.regularization_diagonal
* __new factors__ = np.dot(np.linalg.inv(B), Y_interactions.T.dot(interaction_confidence))


In [12]:
def _update_row(interaction_profile, interaction_confidence, Y, YtY, regularization_diagonal):

    Y_interactions = Y[interaction_profile, :]
    
    A = Y_interactions.T.dot(((interaction_confidence - 1) * Y_interactions.T).T)

    B = YtY + A + regularization_diagonal

    return np.dot(np.linalg.inv(B), Y_interactions.T.dot(interaction_confidence))


In [13]:
regularization_coefficient = 1e-4

regularization_diagonal = np.diag(regularization_coefficient * np.ones(num_factors))
regularization_diagonal

array([[0.0001, 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.0001, 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.0001, 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.0001, 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.0001, 0.    , 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.0001, 0.    , 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.0001, 0.    ,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.0001,
        0.    , 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.0001, 0.    ],
       [0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.0001]])

In [14]:
# VV = n_factors x n_factors
VV = item_factors.T.dot(item_factors)
VV.shape

(10, 10)

In [15]:
user_id = 154

In [16]:
C_URM_train = linear_confidence_function(URM_train, alpha)

start_pos = C_URM_train.indptr[user_id]
end_pos = C_URM_train.indptr[user_id + 1]

user_profile = C_URM_train.indices[start_pos:end_pos]
user_confidence = C_URM_train.data[start_pos:end_pos]

user_factors[user_id, :] = _update_row(user_profile, user_confidence, item_factors, VV, regularization_diagonal)

## Step 4: Apply updates on the user item factors as well

In [17]:
# UU = n_factors x n_factors
UU = user_factors.T.dot(user_factors)
UU.shape

(10, 10)

In [18]:
item_id = 154

In [19]:
C_URM_train_csc = C_URM_train.tocsc()

start_pos = C_URM_train_csc.indptr[item_id]
end_pos = C_URM_train_csc.indptr[item_id + 1]

item_profile = C_URM_train_csc.indices[start_pos:end_pos]
item_confidence = C_URM_train_csc.data[start_pos:end_pos]

item_factors[item_id, :] = _update_row(item_profile, item_confidence, user_factors, UU, regularization_diagonal)

### Let's put all together in a training loop.

In [20]:
C_URM_train_csc = C_URM_train.tocsc()

num_factors = 10

user_factors = np.random.random((n_users, num_factors))
item_factors = np.random.random((n_items, num_factors))


for n_epoch in range(10):
    
    start_time = time.time()

    VV = item_factors.T.dot(item_factors)
        
    for user_id in range(C_URM_train.shape[0]):

        start_pos = C_URM_train.indptr[user_id]
        end_pos = C_URM_train.indptr[user_id + 1]

        user_profile = C_URM_train.indices[start_pos:end_pos]
        user_confidence = C_URM_train.data[start_pos:end_pos]
        
        user_factors[user_id, :] = _update_row(user_profile, user_confidence, item_factors, VV, regularization_diagonal)   

        # Print some stats
        if (user_id +1)% 100000 == 0 or user_id == C_URM_train.shape[0]-1:
            elapsed_time = time.time() - start_time
            samples_per_second = user_id/elapsed_time
            print("Iteration {} in {:.2f} seconds. Users per second {:.2f}".format(user_id+1, elapsed_time, samples_per_second))
    
    UU = user_factors.T.dot(user_factors)

    for item_id in range(C_URM_train.shape[1]):

        start_pos = C_URM_train_csc.indptr[item_id]
        end_pos = C_URM_train_csc.indptr[item_id + 1]

        item_profile = C_URM_train_csc.indices[start_pos:end_pos]
        item_confidence = C_URM_train_csc.data[start_pos:end_pos]

        item_factors[item_id, :] = _update_row(item_profile, item_confidence, user_factors, UU, regularization_diagonal)    

        # Print some stats
        if (item_id +1)% 100000 == 0 or item_id == C_URM_train.shape[1]-1:
            elapsed_time = time.time() - start_time
            samples_per_second = item_id/elapsed_time
            print("Iteration {} in {:.2f} seconds. Items per second {:.2f}".format(item_id+1, elapsed_time, samples_per_second))

    total_epoch_time = time.time() - start_time  
    print("Epoch {} complete in in {:.2f} seconds".format(n_epoch+1, total_epoch_time))


Iteration 13025 in 0.51 seconds. Users per second 25345.88
Iteration 22348 in 1.35 seconds. Items per second 16509.03
Epoch 1 complete in in 1.35 seconds
Iteration 13025 in 0.52 seconds. Users per second 24857.77
Iteration 22348 in 1.38 seconds. Items per second 16155.71
Epoch 2 complete in in 1.38 seconds
Iteration 13025 in 0.57 seconds. Users per second 22928.73
Iteration 22348 in 1.61 seconds. Items per second 13864.97
Epoch 3 complete in in 1.61 seconds
Iteration 13025 in 0.64 seconds. Users per second 20352.45
Iteration 22348 in 1.69 seconds. Items per second 13219.90
Epoch 4 complete in in 1.69 seconds
Iteration 13025 in 0.63 seconds. Users per second 20605.12
Iteration 22348 in 1.63 seconds. Items per second 13708.34
Epoch 5 complete in in 1.63 seconds
Iteration 13025 in 0.59 seconds. Users per second 22076.34
Iteration 22348 in 1.56 seconds. Items per second 14345.86
Epoch 6 complete in in 1.56 seconds
Iteration 13025 in 0.58 seconds. Users per second 22331.39
Iteration 22348 i

### How long do we train such a model?

* An epoch: a complete loop over all the train data
* Usually you train for multiple epochs. Depending on the algorithm and data 10s or 100s of epochs.

In [21]:
estimated_seconds = total_epoch_time*10
print("Estimated time with the previous training speed is {:.2f} seconds, or {:.2f} minutes".format(estimated_seconds, estimated_seconds/60))

Estimated time with the previous training speed is 15.90 seconds, or 0.26 minutes


## Lastly: Computing a prediction for any given user or item

In [36]:
user_id = 13024
item_id = 101

In [37]:
predicted_rating = np.dot(user_factors[user_id,:], item_factors[item_id,:])
predicted_rating

0.11505826723711587