New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/add_bias_terms_to_als #310
Conversation
default value: False
... so that second-to-last element in self.item_factors becomes a learned item bias term
... so that last element in self.user_factors becomes a learned user bias term
Training with bias terms not implemented for GPU yet.
This is far from being the same as LMF. Just setting to 1 part of latent factors doesn't do anything useful. |
I'm also not sure that this is doing the right thing here - just setting to 1 after each iteration isn't quite right afaict (like we shouldn't have to set back to 1 after each iteration because the least squares regression at each iteration shouldn't have changed the value - the fact that its changing means that the coefficients on the other parameters aren't totally correct). |
Sorry for the late response Lines 143 to 152 in fbed621
it's actually the same representation with ALS with bias. (I'm not definitely sure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ita9naiwa
Thank you very much for this implementation of ALS with bias. I have a doubt -
In the change done, shouldn't we add the bias to the factors (self.factors+2) as done in LMF?
self.user_factors = np.random.rand(users, self.factors+ 2).astype(self.dtype) * 0.01
self.item_factors = np.random.rand(items, self.factors + 2).astype(self.dtype) * 0.01
Thanks in advance,
Savan
Could you please add some comparison between ALS with bias and ALS on datasets like Movielens 20M or lastfm? |
This functionality would be very helpful! Along with the additions already proposed, wrapping |
Hi all, I had a look to the code and in principle the approached implemented by @ita9naiwa is correct, although having these additional ones in the positions last and before the last will have as consequence that the regularization terms will be very large and in the order of magnitude of number of users and items. In theory, one should suppress this elements for the calculation of the regularization part, but I am not sure how it would fit the overall implementation. |
The proposed implementation is not correct.
…On Mon, 29 Mar 2021, 20:11 Luca Trovato, ***@***.***> wrote:
Hi all,
can this be reviewed? It is a very interesting feature which would
actually have a lot of added value to the model.
I had a look to the code and in principle the approached implemented by
@ita9naiwa <https://github.com/ita9naiwa> is correct, although having
these additional ones in the positions last and before the last will have
as consequence that the regularization terms will be to be very large (in
the order of magnitude of number of users and items). In theory, one should
suppress this items in the calculation of the regularization part
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#310 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHC5XKHOCOEFSZGYH4RW73TGCYEVANCNFSM4J5OOQSQ>
.
|
@dselivanov, Would you mind to expand on your comment, given that if we write the biased ALS equation: and we define the extended vectors, which corresponds to the implementation suggested by @ita9naiwa We then have: which, apart from the issues with the regularization term, to me it looks like it actually leads to the right implementation? We have a bit of mismatch between the number of factors (which should be handled better), but in reality one of spots in the factor array is occupied by the bias , and the other is occupied by the one. Am i missing something here? |
You can take a look here which is mostly correct:
What articles misses is that when you subtract these biases from p_ui (which is very sparse) you ruin the sparsity. And you need to solve much large least squares problem for each user/item unless you use some smart cache/init. Long story short - this PR requires quite some work on C/cuda side in order to solve problem with biases correctly. |
Hi @dselivanov, |
@lucatrovato I agree with you that once these factors are calculated - the loss function calculations are basically correct (aside from regularization). However the method for calculating these factors in this PR is wrong (if it was correct we wouldn't have to set user/item bias terms back to 1 every iteration after calculating the item/user factors - it could just be set once at the beginning). When learning the user factors, we only want to update Xu and the user bias (but not update the '1' value that corresponds to the item bias). Also - does anyone have any proof that bias terms have any benefit for implicit feedback models? I realize that they have benefit for ratings prediction tasks with datasets like movielens/netflix price etc (and other explicit feedback scenarios), but implicit feedback models are different in that the missing data indicates a true 0 (no interactions) rather than an unknown rating. I haven't seen any benefit myself to bias terms in limited testing with the BPR model - and I'm not sure that adding this feature is worth the effort tbh. |
I agree with @dselivanov adding two coordinates and re-assigning one of them to 1 is not sufficient. Assume for user we extend the coordinates Then as @lucatrovato pointed out we can write the objective function as In order to take care of (*) and not penalize the last coordinate in
Therefore, in my opinion the following additional code changes should suffice in als.py (cpu version) def least_squares_cg(Cui, X, Y, regularization, num_threads=0, cg_steps=3): a - in line: YtY = Y.T.dot(Y) + regularization * np.eye(factors, dtype=Y.dtype) b- in conjugate gradient we also need to disable the tracking of fit for coorinate "1": @benfred : regarding benefits of intercept: I believe it is helpful to say "I recommend you this movie because it is popular and 80% of people watched it" vs "I recommend you this movie because of your affinity to it". That's what bias helps to do, w/o it one cannot do such separation. |
@dselivanov @skrypkvi |
imho, I personally see no performance improvements from bias terms when we deal with implicit dataset, unlike on explicit dataset a lot of gain comes from. I'd love to see(might be @skrypkvi) conducts benchmark on some publicly available dataset with/without bias term. I think there's no serious problem if we use exact least squares for model update. |
With the proposed reparametrization biases need to be subtracted from p_u_i on the RHS of the least squares equation. Which destroys sparsity. |
@benfred as for whether biases are useful for implicit datasets, sometimes they should be; especially if you have very heavy users (with tons of interactions on quite some items vs regular users) or very popular items. An example for this could be the Steam V1 dataset (https://cseweb.ucsd.edu/~jmcauley/datasets.html#steam_data) where there are some players with tons of game hours (minutes in the dataset) that completely dominate the dataset; alongside very popular games that are recommended to pretty much everyone just because they are very popular (therefore the preferences almost fade away). With that said, you can always preprocess the dataset to normalize this (for instance, subtracting the mean for each user row/column or using IDF similar to how you do it in the examples folder) |
-> I am quite certain the solution I propose to Conjugate gradient (with 2 coordinates added) should be 1-1 with the Conjugate gradient derived for the problem defined in #176 where 1 coordinate is added (so subtracting of bias is taken care of). I will try to upload the derivation in the next few days. Who can check it?
-> I believe once the equivalence above is proven, there is no need to conduct empirical analysis. |
This is applicable to the LHS, but not to RHS of the least squares equation. |
@dselivanov I mean equivalence to this formulation at http://activisiongamescience.github.io/2016/01/11/Implicit-Recommender-Systems-Biased-Matrix-Factorization/#Implict-ALS-with-Biases |
Dear all, as promised, you can find a detailed explanation of bias integration in the document https://github.com/skrypkvi/recommender/blob/main/Recommender_with_bias.pdf. Since the changes are so immaterial, it would be great if the community could endorse the proposal and the bias terms could be added to the package. As said in the document, it would be a great added value when it comes to the interpretation of the ratings. |
Hi all, Any opinion on this @benfred or @dselivanov? |
Added optional user and item bias terms to
als.py
as suggested by ita9naiwa in issue #176 and already implemented in LMF. Setuse_bias=True
to train with bias terms (not implemented for GPU yet).