[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022

davideis · 2017-05-17T21:06:57Z

What changes were proposed in this pull request?

Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior.

How was this patch tested?

This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored.

@mengxr

Revert the handling of negative values in ALS with implicit feedback and test for regression.

srowen

I think you are likely right about this change, just may need to help with some more explanation

srowen · 2017-05-18T08:42:42Z

mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala

+  * @param numItemBlocks number of item blocks
+  * @return a trained ALSModel
+  */
+  def trainALS(


Why do you need a new overload?

It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.

srowen · 2017-05-18T08:45:56Z

mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala

@@ -78,7 +79,7 @@ class ALSSuite
    val k = 2
    val ne0 = new NormalEquation(k)
      .add(Array(1.0f, 2.0f), 3.0)
-      .add(Array(4.0f, 5.0f), 6.0, 2.0) // weighted
+      .add(Array(4.0f, 5.0f), 12.0, 2.0) // weighted


Was this test change intentional?

Yes this test change was intentional, because I change the semantic meaning of the arguments to add, before add would multiply the second and third arguments together internally, so to make this test valid I premultiplied them together. In the usage of this function in ALS.scala, for non-implicit the third argument is 1, so there is no change, and implicit is now handled correctly.

srowen · 2017-05-18T08:47:10Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

@@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
      require(a.length == k)
      copyToDouble(a)
      blas.dspr(upper, k, c, da, 1, ata)
-      if (b != 0.0) {
-        blas.daxpy(k, c * b, da, 1, atb, 1)
+      if (Math.abs(b) > Double.MinPositiveValue) {


How does this differ from != 0.0? I get that it differs for Double.MinPositiveValue but why is that important?

You're right that the condition was b > 0 before on purpose, though I think this is trying to handle explicit/implicit cases

You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0.

srowen · 2017-05-18T08:51:57Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

              }
+              ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)


Is this the substance of the change? I might need some help understanding why this is needed. Yes, even negative values should be recorded for implicit prefs, I agree. It adds 1 + c1 now instead of (1 + c1) / c1, so that's why the factor of c is taken out above?

Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the (1+c1)/c1 was coming from and then when it is actually used in add, it gets multiplied by c1, which is a wasted operation and may not even exactly yield 1+c1 in the end.

srowen · 2017-05-18T08:53:08Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

-   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
+   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0.
+   *
+   * Distributing and letting b,,i,, = d,,i,, * b,,i,,


I'm not clear on this change. It defines $b_i$ in terms of itself? what is this correcting?

good point, I meant $b_i=c_i*d_i$. The function below accepts three arguments a, b and c, I wanted to name them something more meaningful, but in light of this comment a, b, and c make sense to use.

davideis

I will make the changes and push again.

davideis · 2017-05-18T14:24:39Z

mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala

+  * @param numItemBlocks number of item blocks
+  * @return a trained ALSModel
+  */
+  def trainALS(


It is a helper function, because I call it twice in the test. I also wanted to use this in the testALS function, but it wasn't straightforward. I can't use testALS in my test since it does more than just train the model and it doesn't allow me to compare the two models the test generates, one with negative values and one with those negative values zeroed out.

davideis · 2017-05-18T14:26:40Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

-   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - b,,i,, a,,i,,) + lambda * x = 0.
+   * \sum,,i,, c,,i,, (a,,i,, a,,i,,^T^ x - d,,i,, a,,i,,) + lambda * x = 0.
+   *
+   * Distributing and letting b,,i,, = d,,i,, * b,,i,,


good point, I meant $b_i=c_i*d_i$. The function below accepts three arguments a, b and c, I wanted to name them something more meaningful, but in light of this comment a, b, and c make sense to use.

davideis · 2017-05-18T14:32:45Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

@@ -795,8 +799,8 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
      require(a.length == k)
      copyToDouble(a)
      blas.dspr(upper, k, c, da, 1, ata)
-      if (b != 0.0) {
-        blas.daxpy(k, c * b, da, 1, atb, 1)
+      if (Math.abs(b) > Double.MinPositiveValue) {


You are right, I should pick a looser threshold. It seems that this check is only really to prevent extra work, since daxpy will just be adding a zeros vector if b==0.

davideis · 2017-05-18T14:38:26Z

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

              }
+              ls.add(srcFactor, if (rating > 0.0) 1.0 + c1 else 0.0, c1)


Correct, this is the crux of the change (moving outside of the if condition). Changing the arguments was more to be less confusing and more direct, since it was very confusing to me before where the (1+c1)/c1 was coming from and then when it is actually used in add, it gets multiplied by c1, which is a wasted operation and may not even exactly yield 1+c1 in the end.

SparkQA · 2017-05-20T12:41:34Z

Test build #3745 has finished for PR 18022 at commit 21c0fd9.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2017-05-26T10:24:19Z

Test build #3763 has finished for PR 18022 at commit 21c0fd9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

I think this checks out. Maybe see if @MLnick or @sethah has some thoughts

srowen · 2017-05-31T12:53:25Z

Merged to master/2.2

…edback in ALS ## What changes were proposed in this pull request? Revert the handling of negative values in ALS with implicit feedback, so that the confidence is the absolute value of the rating and the preference is 0 for negative ratings. This was the original behavior. ## How was this patch tested? This patch was tested with the existing unit tests and an added unit test to ensure that negative ratings are not ignored. mengxr Author: David Eis <deis@bloomberg.net> Closes #18022 from davideis/bugfix/negative-rating. (cherry picked from commit d52f636) Signed-off-by: Sean Owen <sowen@cloudera.com>

MLnick · 2017-05-31T13:00:14Z

mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala

+    val itemFactorsNeg = modelWithNeg.itemFactors
+    val userFactorsZero = modelWithZero.userFactors
+    val itemFactorsZero = modelWithZero.itemFactors
+    userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" ")))


Small nit here but ideally we don't usually log info during this sort of test?

Good point, I meant to remove, shall I open another pr?

MLnick · 2017-06-02T13:12:39Z

Yeah you can just open a small follow up PR

…

On Fri, 2 Jun 2017 at 15:10, davideis ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala <#18022 (comment)>: > @@ -455,6 +487,22 @@ class ALSSuite targetRMSE = 0.3) } + test("implicit feedback regression") { + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, -3))) + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, 0))) + val modelWithNeg = + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true) + val modelWithZero = + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true) + val userFactorsNeg = modelWithNeg.userFactors + val itemFactorsNeg = modelWithNeg.itemFactors + val userFactorsZero = modelWithZero.userFactors + val itemFactorsZero = modelWithZero.itemFactors + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" "))) Good point, I meant to remove, shall I open another pr? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18022 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX> .

davideis · 2017-06-02T13:15:08Z

Does it need another jira ticket?

…

On Jun 2, 2017 9:13 AM, "Nick Pentreath" ***@***.***> wrote: Yeah you can just open a small follow up PR On Fri, 2 Jun 2017 at 15:10, davideis ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In mllib/src/test/scala/org/apache/spark/ml/ recommendation/ALSSuite.scala > <#18022 (comment)>: > > > @@ -455,6 +487,22 @@ class ALSSuite > targetRMSE = 0.3) > } > > + test("implicit feedback regression") { > + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, -3))) > + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, 1, 1), Rating(0, 1, 0))) > + val modelWithNeg = > + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true) > + val modelWithZero = > + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, implicitPrefs = true) > + val userFactorsNeg = modelWithNeg.userFactors > + val itemFactorsNeg = modelWithNeg.itemFactors > + val userFactorsZero = modelWithZero.userFactors > + val itemFactorsZero = modelWithZero.itemFactors > + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + arr.mkString(" "))) > > Good point, I meant to remove, shall I open another pr? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#18022 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AA_ SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#18022 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AbQISk92QY-a_H3_8Fi10HD_B7FvyKsaks5sAAqQgaJpZM4NecIX> .

MLnick · 2017-06-02T13:19:44Z

You can link the same JIRA since it's a small follow up

…

On Fri, 2 Jun 2017 at 15:16, davideis ***@***.***> wrote: Does it need another jira ticket? On Jun 2, 2017 9:13 AM, "Nick Pentreath" ***@***.***> wrote: > Yeah you can just open a small follow up PR > On Fri, 2 Jun 2017 at 15:10, davideis ***@***.***> wrote: > > > ***@***.**** commented on this pull request. > > ------------------------------ > > > > In mllib/src/test/scala/org/apache/spark/ml/ > recommendation/ALSSuite.scala > > <#18022 (comment)>: > > > > > @@ -455,6 +487,22 @@ class ALSSuite > > targetRMSE = 0.3) > > } > > > > + test("implicit feedback regression") { > > + val trainingWithNeg = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, > 1, 1), Rating(0, 1, -3))) > > + val trainingWithZero = sc.parallelize(Array(Rating(0, 0, 1), Rating(1, > 1, 1), Rating(0, 1, 0))) > > + val modelWithNeg = > > + trainALS(trainingWithNeg, rank = 1, maxIter = 5, regParam = 0.01, > implicitPrefs = true) > > + val modelWithZero = > > + trainALS(trainingWithZero, rank = 1, maxIter = 5, regParam = 0.01, > implicitPrefs = true) > > + val userFactorsNeg = modelWithNeg.userFactors > > + val itemFactorsNeg = modelWithNeg.itemFactors > > + val userFactorsZero = modelWithZero.userFactors > > + val itemFactorsZero = modelWithZero.itemFactors > > + userFactorsNeg.collect().foreach(arr => logInfo(s"implicit test " + > arr.mkString(" "))) > > > > Good point, I meant to remove, shall I open another pr? > > > > — > > You are receiving this because you were mentioned. > > Reply to this email directly, view it on GitHub > > <#18022 (comment)>, or > mute > > the thread > > <https://github.com/notifications/unsubscribe-auth/AA_ > SBxnsNsrJoMbMWP1I8Fvr9GAPHNxcks5sAAnGgaJpZM4NecIX> > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#18022 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AbQISk92QY-a_H3_8Fi10HD_B7FvyKsaks5sAAqQgaJpZM4NecIX > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18022 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_SB2zzEDd0c2Mll0dhinpoQAbV_jfVks5sAAsmgaJpZM4NecIX> .

Fix regression introduced by ccafd75

767d72e

Revert the handling of negative values in ALS with implicit feedback and test for regression.

davideis changed the title ~~[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback~~ [SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS May 17, 2017

srowen reviewed May 18, 2017

View reviewed changes

davideis commented May 18, 2017

View reviewed changes

Fixed comment and zero check

21c0fd9

srowen reviewed May 26, 2017

View reviewed changes

asfgit closed this in d52f636 May 31, 2017

MLnick reviewed May 31, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022

[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022

davideis commented May 17, 2017

srowen left a comment

srowen May 18, 2017

davideis May 18, 2017

srowen May 18, 2017

davideis May 18, 2017

srowen May 18, 2017

davideis May 18, 2017

srowen May 18, 2017

davideis May 18, 2017

srowen May 18, 2017

davideis May 18, 2017

davideis left a comment

davideis May 18, 2017

davideis May 18, 2017

davideis May 18, 2017

davideis May 18, 2017

SparkQA commented May 20, 2017

SparkQA commented May 26, 2017

srowen left a comment

srowen commented May 31, 2017

MLnick May 31, 2017

davideis Jun 2, 2017

MLnick commented Jun 2, 2017 via email

davideis commented Jun 2, 2017 via email

MLnick commented Jun 2, 2017 via email

[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022

[SPARK-20790] [MLlib] Correctly handle negative values for implicit feedback in ALS #18022

Conversation

davideis commented May 17, 2017

What changes were proposed in this pull request?

How was this patch tested?

srowen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davideis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 20, 2017

SparkQA commented May 26, 2017

srowen left a comment

Choose a reason for hiding this comment

srowen commented May 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MLnick commented Jun 2, 2017 via email

davideis commented Jun 2, 2017 via email

MLnick commented Jun 2, 2017 via email