Feature/add_bias_terms_to_als #310

howtodowtle · 2019-12-19T19:49:25Z

Added optional user and item bias terms to als.py as suggested by ita9naiwa in issue #176 and already implemented in LMF. Set use_bias=True to train with bias terms (not implemented for GPU yet).

default value: False

... so that second-to-last element in self.item_factors becomes a learned item bias term

... so that last element in self.user_factors becomes a learned user bias term

Training with bias terms not implemented for GPU yet.

dselivanov · 2019-12-27T16:58:31Z

This is far from being the same as LMF. Just setting to 1 part of latent factors doesn't do anything useful.

benfred · 2020-01-02T00:43:49Z

I'm also not sure that this is doing the right thing here - just setting to 1 after each iteration isn't quite right afaict (like we shouldn't have to set back to 1 after each iteration because the least squares regression at each iteration shouldn't have changed the value - the fact that its changing means that the coefficients on the other parameters aren't totally correct).

ita9naiwa · 2020-01-16T04:35:22Z

Sorry for the late response
setting the last element of user/item factors to be 1.0 will work with setting the last/last-1 vector of user/item vectors to be 1.0 in the first place like in

implicit/implicit/lmf.pyx

Lines 143 to 152 in fbed621

    
           if self.item_factors is None: 
        
               self.item_factors = np.random.normal(size=(items, self.factors + 2)).astype(np.float32) 
        
               self.item_factors[:, -1] = 1.0 
        
               # set factors to all zeros for items without any ratings 
        
               self.item_factors[item_counts == 0] = np.zeros(self.factors + 2) 
        
           if self.user_factors is None: 
        
               self.user_factors = np.random.normal(size=(users, self.factors + 2)).astype(np.float32) 
        
               self.user_factors[:, -2] = 1.0

.

it's actually the same representation with ALS with bias. (I'm not definitely sure)
and, I think that adding bias term to the ALS model will not give performance improvements.

SavanGowda

Hi @ita9naiwa

Thank you very much for this implementation of ALS with bias. I have a doubt -
In the change done, shouldn't we add the bias to the factors (self.factors+2) as done in LMF?

self.user_factors = np.random.rand(users, self.factors+ 2).astype(self.dtype) * 0.01 
self.item_factors = np.random.rand(items, self.factors + 2).astype(self.dtype) * 0.01

Thanks in advance,
Savan

ita9naiwa · 2020-01-30T04:21:41Z

shouldn't we add the bias to the factors (self.factors+2) as done in LMF?
I think we need to. I missed.

Could you please add some comparison between ALS with bias and ALS on datasets like Movielens 20M or lastfm?

IanEisenberg · 2020-12-23T02:08:03Z

This functionality would be very helpful! Along with the additions already proposed, wrapping similar_items and rank_items with functionality aimed at removing popularity bias would be helpful.

lucatrovato · 2021-03-29T16:11:18Z

Hi all,
can this be reviewed? It is a very interesting feature which would actually have a lot of added value to the model.

I had a look to the code and in principle the approached implemented by @ita9naiwa is correct, although having these additional ones in the positions last and before the last will have as consequence that the regularization terms will be very large and in the order of magnitude of number of users and items.

In theory, one should suppress this elements for the calculation of the regularization part, but I am not sure how it would fit the overall implementation.

dselivanov · 2021-03-29T16:44:11Z

The proposed implementation is not correct.

…

On Mon, 29 Mar 2021, 20:11 Luca Trovato, ***@***.***> wrote: Hi all, can this be reviewed? It is a very interesting feature which would actually have a lot of added value to the model. I had a look to the code and in principle the approached implemented by @ita9naiwa <https://github.com/ita9naiwa> is correct, although having these additional ones in the positions last and before the last will have as consequence that the regularization terms will be to be very large (in the order of magnitude of number of users and items). In theory, one should suppress this items in the calculation of the regularization part — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#310 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHC5XKHOCOEFSZGYH4RW73TGCYEVANCNFSM4J5OOQSQ> .

lucatrovato · 2021-03-29T20:37:05Z

@dselivanov,
thanks a lot for your answer.

Would you mind to expand on your comment, given that if we write the biased ALS equation:

and we define the extended vectors, which corresponds to the implementation suggested by @ita9naiwa

We then have:

which, apart from the issues with the regularization term, to me it looks like it actually leads to the right implementation? We have a bit of mismatch between the number of factors (which should be handled better), but in reality one of spots in the factor array is occupied by the bias , and the other is occupied by the one.

Am i missing something here?

dselivanov · 2021-03-30T05:31:31Z

You can take a look here which is mostly correct:

need to subtract user bias (beta) from p_ui when you solve system for item embeddings
need to subtract item bias (gamma) from p_ui when you solve system for user embeddings

What articles misses is that when you subtract these biases from p_ui (which is very sparse) you ruin the sparsity. And you need to solve much large least squares problem for each user/item unless you use some smart cache/init.

Long story short - this PR requires quite some work on C/cuda side in order to solve problem with biases correctly.

lucatrovato · 2021-03-30T08:09:39Z

Hi @dselivanov,
I already went through that page and the approach is different, because the user and item factors vectors are not fully extended with two items (a 1 and the bias properly placed). The implementation from @ita9naiwa exactly does that, and you can avoid ruining the sparsity of p_u,i. The only difference is in the regularization terms, that will have many 1's in it equal to the number of user and items, leading to a very different regularization term for equivalent problems

benfred · 2021-04-02T05:43:30Z

@lucatrovato I agree with you that once these factors are calculated - the loss function calculations are basically correct (aside from regularization). However the method for calculating these factors in this PR is wrong (if it was correct we wouldn't have to set user/item bias terms back to 1 every iteration after calculating the item/user factors - it could just be set once at the beginning). When learning the user factors, we only want to update Xu and the user bias (but not update the '1' value that corresponds to the item bias).

Also - does anyone have any proof that bias terms have any benefit for implicit feedback models? I realize that they have benefit for ratings prediction tasks with datasets like movielens/netflix price etc (and other explicit feedback scenarios), but implicit feedback models are different in that the missing data indicates a true 0 (no interactions) rather than an unknown rating. I haven't seen any benefit myself to bias terms in limited testing with the BPR model - and I'm not sure that adding this feature is worth the effort tbh.

skrypkvi · 2021-04-02T09:21:56Z

I agree with @dselivanov adding two coordinates and re-assigning one of them to 1 is not sufficient.

Assume for user we extend the coordinates

and for item as

Then as @lucatrovato pointed out we can write the objective function as

with exception that we do not want regularize with lambda the coordinate corresp. to 1 (we will correct it later). For now we proceed as usual: we take derivative of the objective function and write solution in the matrix form:

In order to take care of (*) and not penalize the last coordinate in

we define

which is equal to I but the last element on diagonal is set to 0.
So until now it was basic algebra with equalities, so one should not worry about "interaction of variables" or "side effects."
Now in code one has to disable the impact of last coordinate corresp. to "1":

one has to remove the lambda regularization on "1" coordinate, as described above in the formula
in conjugate gradient we do not care about the residual (r) for this coordinate "1", so have to remove it either

Therefore, in my opinion the following additional code changes should suffice in als.py (cpu version)

def least_squares_cg(Cui, X, Y, regularization, num_threads=0, cg_steps=3):

a - in line: YtY = Y.T.dot(Y) + regularization * np.eye(factors, dtype=Y.dtype)
instead of taking np.eye, we need to set the last diagonal element (corresp. to "1" in xu) to 0
so we disable the regularization of "1"

b- in conjugate gradient we also need to disable the tracking of fit for coorinate "1":
prior to line 378 p = r.copy(), we need to set the last coordinate in vector residual "r" to 0 (corresp. to "1")
after line 395 r -= alpha * Ap we again overwrite the last coordinate in residual by 0
With these changes we ensure the conjugate gradient is not impacted by the dummy coordinate "1"

@benfred : regarding benefits of intercept: I believe it is helpful to say "I recommend you this movie because it is popular and 80% of people watched it" vs "I recommend you this movie because of your affinity to it". That's what bias helps to do, w/o it one cannot do such separation.

ita9naiwa · 2021-04-02T10:13:04Z

@dselivanov @skrypkvi
I made a wrong point. it behaves awkardly when we use Conjugate Gradient. I found that 1 affects CG and make convergent slower.

ita9naiwa · 2021-04-02T10:14:46Z

imho, I personally see no performance improvements from bias terms when we deal with implicit dataset, unlike on explicit dataset a lot of gain comes from.

I'd love to see(might be @skrypkvi) conducts benchmark on some publicly available dataset with/without bias term. I think there's no serious problem if we use exact least squares for model update.

dselivanov · 2021-04-02T11:49:24Z

With the proposed reparametrization biases need to be subtracted from p_u_i on the RHS of the least squares equation. Which destroys sparsity.
As for the usefulness - it naturally allows to recommended popular items (item biases usually strongly correlated with popularity) to users with few/no interactions. But overall ndcg/map gain is tiny/moderate.

julioasotodv · 2021-04-02T23:01:12Z

@benfred as for whether biases are useful for implicit datasets, sometimes they should be; especially if you have very heavy users (with tons of interactions on quite some items vs regular users) or very popular items. An example for this could be the Steam V1 dataset (https://cseweb.ucsd.edu/~jmcauley/datasets.html#steam_data) where there are some players with tons of game hours (minutes in the dataset) that completely dominate the dataset; alongside very popular games that are recommended to pretty much everyone just because they are very popular (therefore the preferences almost fade away).

With that said, you can always preprocess the dataset to normalize this (for instance, subtracting the mean for each user row/column or using IDF similar to how you do it in the examples folder)

skrypkvi · 2021-04-06T16:39:51Z

@dselivanov :

With the proposed reparametrization biases need to be subtracted from p_u_i on the RHS of the least squares equation. Which destroys sparsity.

-> I am quite certain the solution I propose to Conjugate gradient (with 2 coordinates added) should be 1-1 with the Conjugate gradient derived for the problem defined in #176 where 1 coordinate is added (so subtracting of bias is taken care of). I will try to upload the derivation in the next few days. Who can check it?

@ita9naiwa :

I'd love to see(might be @skrypkvi) conducts benchmark on some publicly available dataset with/without bias term. I think there's no serious problem if we use exact least squares for model update.

-> I believe once the equivalence above is proven, there is no need to conduct empirical analysis.
-> I am afraid one cannot use the exact least squares solution, i.e. not use Conjugate gradient, since the matrices are large, so one needs a numerical method.

dselivanov · 2021-04-06T17:32:09Z

problem defined in #176 where 1 coordinate is added

This is applicable to the LHS, but not to RHS of the least squares equation.

skrypkvi · 2021-04-06T17:56:09Z

@dselivanov I mean equivalence to this formulation at http://activisiongamescience.github.io/2016/01/11/Implicit-Recommender-Systems-Biased-Matrix-Factorization/#Implict-ALS-with-Biases

skrypkvi · 2021-05-10T15:51:43Z

Dear all,

as promised, you can find a detailed explanation of bias integration in the document https://github.com/skrypkvi/recommender/blob/main/Recommender_with_bias.pdf.
The structure is the following:
Sections 1-2: intro, motivation and overview of the original algorithm 'collaborative filtering for implicit feedback datasets' to introduce notation
Section 3.1: description of original proposal at http://activisiongamescience.github.io/2016/01/11/Implicit-Recommender-Systems-Biased-Matrix-Factorization/#Implict-ALS-with-Biases (1 coordinate added at a time), which has been so far supported by the community
Section 3.2: description of the alternative (preferred) proposal discussed in this thread with 2 coordinates added and analytical proof of its equivalence to original incl. the conjugate gradient method
Section 4: Final list of the required changes in the package

Since the changes are so immaterial, it would be great if the community could endorse the proposal and the bias terms could be added to the package. As said in the document, it would be a great added value when it comes to the interpretation of the ratings.

lucatrovato · 2021-05-31T15:09:47Z

Hi all,
I went through the document and in my opinion completely and correctly describes the idea that I and @ita9naiwa mentioned, including the step to fix in the conjugate gradient algo.

Any opinion on this @benfred or @dselivanov?

howtodowtle and others added 7 commits December 19, 2019 16:00

add use_bias parameter

749dffa

default value: False

fix second-to-last element in self.user_factors to 1

da66ee4

... so that second-to-last element in self.item_factors becomes a learned item bias term

fix last element in self.item_factors to 1

9bb6725

... so that last element in self.user_factors becomes a learned user bias term

add logs

d39f8be

warn when if self.use_gpu and self.use_bias

bac8759

Training with bias terms not implemented for GPU yet.

if self.use_gpu: set self.use_bias to False

21c2ac6

change log for consistency

03f0a06

ita9naiwa added 2 commits January 16, 2020 14:18

Merge branch 'master' into feature/als_add_bias

924745d

Fix ALS with bias update

2eccd1d

SavanGowda reviewed Jan 23, 2020

View reviewed changes

Merge branch 'master' into feature/als_add_bias

4130b8f

IanEisenberg mentioned this pull request Dec 23, 2020

Trouble installing from local repo or git #437

Closed

julioasotodv mentioned this pull request Mar 31, 2021

Adding user biases and item biases for implicit matrix factorization #176

Closed

benfred deleted the branch benfred:master October 2, 2021 00:03

benfred closed this Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add_bias_terms_to_als #310

Feature/add_bias_terms_to_als #310

howtodowtle commented Dec 19, 2019

dselivanov commented Dec 27, 2019

benfred commented Jan 2, 2020

ita9naiwa commented Jan 16, 2020 •

edited

SavanGowda left a comment •

edited

ita9naiwa commented Jan 30, 2020

IanEisenberg commented Dec 23, 2020

lucatrovato commented Mar 29, 2021 •

edited

dselivanov commented Mar 29, 2021 via email

lucatrovato commented Mar 29, 2021 •

edited

dselivanov commented Mar 30, 2021 •

edited

lucatrovato commented Mar 30, 2021

benfred commented Apr 2, 2021

skrypkvi commented Apr 2, 2021

ita9naiwa commented Apr 2, 2021

ita9naiwa commented Apr 2, 2021 •

edited

dselivanov commented Apr 2, 2021

julioasotodv commented Apr 2, 2021 •

edited

skrypkvi commented Apr 6, 2021 •

edited

dselivanov commented Apr 6, 2021 •

edited

skrypkvi commented Apr 6, 2021 •

edited

skrypkvi commented May 10, 2021 •

edited

lucatrovato commented May 31, 2021

Feature/add_bias_terms_to_als #310

Feature/add_bias_terms_to_als #310

Conversation

howtodowtle commented Dec 19, 2019

dselivanov commented Dec 27, 2019

benfred commented Jan 2, 2020

ita9naiwa commented Jan 16, 2020 • edited

SavanGowda left a comment • edited

Choose a reason for hiding this comment

ita9naiwa commented Jan 30, 2020

IanEisenberg commented Dec 23, 2020

lucatrovato commented Mar 29, 2021 • edited

dselivanov commented Mar 29, 2021 via email

lucatrovato commented Mar 29, 2021 • edited

dselivanov commented Mar 30, 2021 • edited

lucatrovato commented Mar 30, 2021

benfred commented Apr 2, 2021

skrypkvi commented Apr 2, 2021

ita9naiwa commented Apr 2, 2021

ita9naiwa commented Apr 2, 2021 • edited

dselivanov commented Apr 2, 2021

julioasotodv commented Apr 2, 2021 • edited

skrypkvi commented Apr 6, 2021 • edited

dselivanov commented Apr 6, 2021 • edited

skrypkvi commented Apr 6, 2021 • edited

skrypkvi commented May 10, 2021 • edited

lucatrovato commented May 31, 2021

ita9naiwa commented Jan 16, 2020 •

edited

SavanGowda left a comment •

edited

lucatrovato commented Mar 29, 2021 •

edited

lucatrovato commented Mar 29, 2021 •

edited

dselivanov commented Mar 30, 2021 •

edited

ita9naiwa commented Apr 2, 2021 •

edited

julioasotodv commented Apr 2, 2021 •

edited

skrypkvi commented Apr 6, 2021 •

edited

dselivanov commented Apr 6, 2021 •

edited

skrypkvi commented Apr 6, 2021 •

edited

skrypkvi commented May 10, 2021 •

edited