Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022

Closed
wants to merge 2 commits into from

Conversation

davideis
Copy link

What changes were proposed in this pull request?

Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior.

How was this patch tested?

This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored.

@mengxr

Revert the handling of negative values in ALS with implicit feedback and test for regression.
@davideis davideis changed the title [SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback [SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS May 17, 2017
Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are likely right about this change, just may need to help with some more explanation

* @param numItemBlocks number of item blocks
* @return a trained ALSModel
*/
def trainALS(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a new overload?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.

@@ -78,7 +79,7 @@ class ALSSuite
val k = 2
val ne0 = new NormalEquation(k)
.add(Array(1.0f, 2.0f), 3.0)
.add(Array(4.0f, 5.0f), 6.0, 2.0) // weighted
.add(Array(4.0f, 5.0f), 12.0, 2.0) // weighted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this test change intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this test change was intentional, because I change the semantic meaning of the arguments to add, before add would multiply the second and third arguments together internally, so to make this test valid I premultiplied them together. In the usage of this function in ALS.scala, for non-implicit the third argument is 1, so there is no change, and implicit is now handled correctly.

@@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
require(a.length == k)
copyToDouble(a)
blas.dspr(upper, k, c, da, 1, ata)
if (b != 0.0) {
blas.daxpy(k, c * b, da, 1, atb, 1)
if (Math.abs(b) > Double.MinPositiveValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from != 0.0? I get that it differs for Double.MinPositiveValue but why is that important?

You're right that the condition was b > 0 before on purpose, though I think this is trying to handle explicit/implicit cases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0.

}
ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the substance of the change? I might need some help understanding why this is needed. Yes, even negative values should be recorded for implicit prefs, I agree. It adds 1 + c1 now instead of (1 + c1) / c1, so that's why the factor of c is taken out above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the (1+c1)/c1 was coming from and then when it is actually used in add, it gets multiplied by c1, which is a wasted operation and may not even exactly yield 1+c1 in the end.

* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0.
*
* Distributing and letting b,,i,, = d,,i,, * b,,i,,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on this change. It defines $b_i$ in terms of itself? what is this correcting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I meant $b_i=c_i*d_i$. The function below accepts three arguments a, b and c, I wanted to name them something more meaningful, but in light of this comment a, b, and c make sense to use.

Copy link
Author

@davideis davideis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make the changes and push again.

* @param numItemBlocks number of item blocks
* @return a trained ALSModel
*/
def trainALS(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.

* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0.
*
* Distributing and letting b,,i,, = d,,i,, * b,,i,,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I meant $b_i=c_i*d_i$. The function below accepts three arguments a, b and c, I wanted to name them something more meaningful, but in light of this comment a, b, and c make sense to use.

@@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
require(a.length == k)
copyToDouble(a)
blas.dspr(upper, k, c, da, 1, ata)
if (b != 0.0) {
blas.daxpy(k, c * b, da, 1, atb, 1)
if (Math.abs(b) > Double.MinPositiveValue) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0.

}
ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the (1+c1)/c1 was coming from and then when it is actually used in add, it gets multiplied by c1, which is a wasted operation and may not even exactly yield 1+c1 in the end.

@SparkQA
Copy link

SparkQA commented May 20, 2017

Test build #3745 has finished for PR 18022 at commit 21c0fd9.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 26, 2017

Test build #3763 has finished for PR 18022 at commit 21c0fd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this checks out. Maybe see if @MLnick or @sethah has some thoughts

@srowen
Copy link
Member

srowen commented May 31, 2017

Merged to master/2.2

asfgit pushed a commit that referenced this pull request May 31, 2017
…edback in ALS

## What changes were proposed in this pull request?

Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior.

## How was this patch tested?

This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored.

mengxr

Author: David Eis <deis@bloomberg.net>

Closes #18022 from davideis/bugfix/negative-rating.

(cherry picked from commit d52f636)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in d52f636 May 31, 2017
val itemFactorsNeg = modelWithNeg.itemFactors
val userFactorsZero = modelWithZero.userFactors
val itemFactorsZero = modelWithZero.itemFactors
userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" ")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit here but ideally we don't usually log info during this sort of test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I meant to remove, shall I open another pr?

@MLnick
Copy link
Contributor

MLnick commented Jun 2, 2017 via email

@davideis
Copy link
Author

davideis commented Jun 2, 2017 via email

@MLnick
Copy link
Contributor

MLnick commented Jun 2, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants