-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022
Conversation
Revert the handling of negative values in ALS with implicit feedback and test for regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are likely right about this change, just may need to help with some more explanation
* @param numItemBlocks number of item blocks | ||
* @return a trained ALSModel | ||
*/ | ||
def trainALS( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a new overload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.
@@ -78,7 +79,7 @@ class ALSSuite | |||
val k = 2 | |||
val ne0 = new NormalEquation(k) | |||
.add(Array(1.0f, 2.0f), 3.0) | |||
.add(Array(4.0f, 5.0f), 6.0, 2.0) // weighted | |||
.add(Array(4.0f, 5.0f), 12.0, 2.0) // weighted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this test change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this test change was intentional, because I change the semantic meaning of the arguments to add, before add would multiply the second and third arguments together internally, so to make this test valid I premultiplied them together. In the usage of this function in ALS.scala, for non-implicit the third argument is 1, so there is no change, and implicit is now handled correctly.
@@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging { | |||
require(a.length == k) | |||
copyToDouble(a) | |||
blas.dspr(upper, k, c, da, 1, ata) | |||
if (b != 0.0) { | |||
blas.daxpy(k, c * b, da, 1, atb, 1) | |||
if (Math.abs(b) > Double.MinPositiveValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from != 0.0
? I get that it differs for Double.MinPositiveValue
but why is that important?
You're right that the condition was b > 0
before on purpose, though I think this is trying to handle explicit/implicit cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0.
} | ||
ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the substance of the change? I might need some help understanding why this is needed. Yes, even negative values should be recorded for implicit prefs, I agree. It adds 1 + c1 now instead of (1 + c1) / c1, so that's why the factor of c is taken out above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the (1+c1)/c1
was coming from and then when it is actually used in add
, it gets multiplied by c1
, which is a wasted operation and may not even exactly yield 1+c1
in the end.
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0. | ||
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0. | ||
* | ||
* Distributing and letting b,,i,, = d,,i,, * b,,i,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on this change. It defines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make the changes and push again.
* @param numItemBlocks number of item blocks | ||
* @return a trained ALSModel | ||
*/ | ||
def trainALS( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0. | ||
* \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0. | ||
* | ||
* Distributing and letting b,,i,, = d,,i,, * b,,i,, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I meant
@@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging { | |||
require(a.length == k) | |||
copyToDouble(a) | |||
blas.dspr(upper, k, c, da, 1, ata) | |||
if (b != 0.0) { | |||
blas.daxpy(k, c * b, da, 1, atb, 1) | |||
if (Math.abs(b) > Double.MinPositiveValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0.
} | ||
ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the (1+c1)/c1
was coming from and then when it is actually used in add
, it gets multiplied by c1
, which is a wasted operation and may not even exactly yield 1+c1
in the end.
Test build #3745 has finished for PR 18022 at commit
|
Test build #3763 has finished for PR 18022 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged to master/2.2 |
…edback in ALS ## What changes were proposed in this pull request? Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior. ## How was this patch tested? This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored. mengxr Author: David Eis <deis@bloomberg.net> Closes #18022 from davideis/bugfix/negative-rating. (cherry picked from commit d52f636) Signed-off-by: Sean Owen <sowen@cloudera.com>
val itemFactorsNeg = modelWithNeg.itemFactors | ||
val userFactorsZero = modelWithZero.userFactors | ||
val itemFactorsZero = modelWithZero.itemFactors | ||
userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" "))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit here but ideally we don't usually log info during this sort of test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I meant to remove, shall I open another pr?
Yeah you can just open a small follow up PR
…On Fri, 2 Jun 2017 at 15:10, davideis ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
<#18022 (comment)>:
> @@ -455,6 +487,22 @@ class ALSSuite
targetRMSE = 0.3)
}
+ test("implicit feedback regression") {
+ val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, -3)))
+ val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, 0)))
+ val modelWithNeg =
+ trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
+ val modelWithZero =
+ trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true)
+ val userFactorsNeg = modelWithNeg.userFactors
+ val itemFactorsNeg = modelWithNeg.itemFactors
+ val userFactorsZero = modelWithZero.userFactors
+ val itemFactorsZero = modelWithZero.itemFactors
+ userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" ")))
Good point, I meant to remove, shall I open another pr?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18022 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA_SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX>
.
|
Does it need another jira ticket?
…On Jun 2, 2017 9:13 AM, "Nick Pentreath" ***@***.***> wrote:
Yeah you can just open a small follow up PR
On Fri, 2 Jun 2017 at 15:10, davideis ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In mllib/src/test/scala/org/apache/spark/ml/
recommendation/ALSSuite.scala
> <#18022 (comment)>:
>
> > @@ -455,6 +487,22 @@ class ALSSuite
> targetRMSE = 0.3)
> }
>
> + test("implicit feedback regression") {
> + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1,
1, 1), Rating(0, 1, -3)))
> + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1,
1, 1), Rating(0, 1, 0)))
> + val modelWithNeg =
> + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01,
implicitPrefs = true)
> + val modelWithZero =
> + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01,
implicitPrefs = true)
> + val userFactorsNeg = modelWithNeg.userFactors
> + val itemFactorsNeg = modelWithNeg.itemFactors
> + val userFactorsZero = modelWithZero.userFactors
> + val itemFactorsZero = modelWithZero.itemFactors
> + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " +
arr.mkString(" ")))
>
> Good point, I meant to remove, shall I open another pr?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#18022 (comment)>, or
mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AA_
SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#18022 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AbQISk92QY-a_H3_8Fi10HD_B7FvyKsaks5sAAqQgaJpZM4NecIX>
.
|
You can link the same JIRA since it's a small follow up
…On Fri, 2 Jun 2017 at 15:16, davideis ***@***.***> wrote:
Does it need another jira ticket?
On Jun 2, 2017 9:13 AM, "Nick Pentreath" ***@***.***> wrote:
> Yeah you can just open a small follow up PR
> On Fri, 2 Jun 2017 at 15:10, davideis ***@***.***> wrote:
>
> > ***@***.**** commented on this pull request.
> > ------------------------------
> >
> > In mllib/src/test/scala/org/apache/spark/ml/
> recommendation/ALSSuite.scala
> > <#18022 (comment)>:
> >
> > > @@ -455,6 +487,22 @@ class ALSSuite
> > targetRMSE = 0.3)
> > }
> >
> > + test("implicit feedback regression") {
> > + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1,
> 1, 1), Rating(0, 1, -3)))
> > + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1),
Rating(1,
> 1, 1), Rating(0, 1, 0)))
> > + val modelWithNeg =
> > + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01,
> implicitPrefs = true)
> > + val modelWithZero =
> > + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01,
> implicitPrefs = true)
> > + val userFactorsNeg = modelWithNeg.userFactors
> > + val itemFactorsNeg = modelWithNeg.itemFactors
> > + val userFactorsZero = modelWithZero.userFactors
> > + val itemFactorsZero = modelWithZero.itemFactors
> > + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " +
> arr.mkString(" ")))
> >
> > Good point, I meant to remove, shall I open another pr?
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <#18022 (comment)>, or
> mute
> > the thread
> > <https://github.com/notifications/unsubscribe-auth/AA_
> SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX>
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#18022 (comment)>, or
mute
> the thread
> <
https://github.com/notifications/unsubscribe-auth/AbQISk92QY-a_H3_8Fi10HD_B7FvyKsaks5sAAqQgaJpZM4NecIX
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18022 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA_SB2zzEDd0c2Mll0dhinpoQAbV_jfVks5sAAsmgaJpZM4NecIX>
.
|
What changes were proposed in this pull request?
Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior.
How was this patch tested?
This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored.
@mengxr