[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910

zhengruifeng · 2017-05-09T05:49:01Z

What changes were proposed in this pull request?

make param family in LoR and optimizer in LDA case insensitive

How was this patch tested?

updated tests

hhbyyh · 2017-05-09T06:33:37Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") (
      case None => histogram.length
    }

-    val isMultinomial = $(family) match {
+    val isMultinomial = $(family).toLowerCase(Locale.ROOT) match {


As a general practice, I would recommend moving the .toLowerCase(Locale.ROOT) into the setter. Then we don't need to invoke the .toLowerCase(Locale.ROOT) multiple times in the code. (here it happens to be once). And we can always assume the $(family) has predictable values in the code.

I follow the style in GeneralizedLinearRegression.
Lower the param in setter can simplify the codes, but it also change the output of coresponding getter. What is your opinion? @yanboliang

+1 @zhengruifeng. I'd like to put the original value in param, since users may compare the param value with the original input as following:

val family = "Binomial" val lr = new LogisticRegression().setFamily(family) val model = lr.fit(dataset) ... if (family == lr.getFamily) println("A") else println("B")

good point.

hhbyyh · 2017-05-09T06:34:47Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

@@ -890,7 +890,7 @@ object LogisticRegression extends DefaultParamsReadable[LogisticRegression] {
  override def load(path: String): LogisticRegression = super.load(path)

  private[classification] val supportedFamilyNames =
-    Array("auto", "binomial", "multinomial").map(_.toLowerCase(Locale.ROOT))
+    Array("auto", "binomial", "multinomial")


We may need to be careful to remove the map. Since Locale.Root can be some special case.

I am not sure about this. If we should keep toLowerCase here, we may also do this in GeneralizedLinearRegression and others

+1 @hhbyyh Let's keep it to handle some special locale.

hhbyyh

Perhaps we should find the best pattern and resolve this and other similar issues in one PR.

SparkQA · 2017-05-09T06:53:46Z

Test build #76633 has finished for PR 17910 at commit 33c0f9e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhengruifeng · 2017-05-09T07:45:12Z

@hhbyyh I find that param in ALS and treeParams is lowered in the getter, and the getter instead of the param is used in the algs.
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L125

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala#L236

I think it maybe better to follow this way.

SparkQA · 2017-05-09T09:07:56Z

Test build #76660 has finished for PR 17910 at commit c2426e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-09T09:36:58Z

Test build #76661 has finished for PR 17910 at commit 69e99c6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2017-05-10T07:21:26Z

@zhengruifeng That may be the best solution I see for now. The down side is that we need to remember the special treatment for string params.

sethah · 2017-05-10T18:11:46Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
      assert(m1.interceptVector ~== m2.interceptVector absTol 0.05)
    }
    val testParams = Seq(
-      ("binomial", smallBinaryDataset, 2),
-      ("multinomial", smallMultinomialDataset, 3)
+      ("Binomial", smallBinaryDataset, 2),


What about "binomial" and "BiNoMiaL"? Also, what about doing:

lr.setFamily("BiNomial") assert(lr.getFamily === "binomial")

I'm not a big fan of sprinkling these very subtle "tests" around the test suite. We should have dedicated tests of this functionality. Otherwise, how should future developers know that the capital "B" here is supposed to be there and is actually testing some desired functionality?

@sethah OK, I will add dedicated tests for this.

The changes you made don't address this comment at all, and there are not tests for the suggestion from Yanbo either.

SparkQA · 2017-05-11T05:40:14Z

Test build #76777 has finished for PR 17910 at commit 82109ca.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-11T06:20:52Z

Test build #76779 has finished for PR 17910 at commit efda91d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2017-05-11T09:19:34Z

@zhengruifeng I think we should return the same string value compared with the original input. For example, users set paramA with Spark, and they should get paramA with Spark rather than spark. See my comments at #17910 (comment) . Thanks.

zhengruifeng · 2017-05-11T09:27:00Z

@yanboliang Updated. Thanks for reviewing

SparkQA · 2017-05-11T10:25:19Z

Test build #76797 has finished for PR 17910 at commit aedddc4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhengruifeng · 2017-05-15T02:49:31Z

Ping @yanboliang

SparkQA · 2017-05-15T09:34:53Z

Test build #76939 has finished for PR 17910 at commit 6319f51.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-15T12:53:01Z

Test build #76940 has finished for PR 17910 at commit 7abbe36.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2017-05-15T15:21:43Z

LGTM, merged into master and branch-2.2. Thanks for all.
@zhengruifeng Could you send a follow-up PR to fix all other algorithms like this?

…tive ## What changes were proposed in this pull request? make param `family` in LoR and `optimizer` in LDA case insensitive ## How was this patch tested? updated tests yanboliang Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #17910 from zhengruifeng/lr_family_lowercase. (cherry picked from commit 9970aa0) Signed-off-by: Yanbo Liang <ybliang8@gmail.com>

sethah · 2017-05-15T17:22:01Z

@zhengruifeng In the follow up PR, would you mind changing the logistic regression tests to incorporate setMaxIter(1)?

zhengruifeng · 2017-05-16T01:25:44Z

@sethah Good point. Thanks

…tive ## What changes were proposed in this pull request? make param `family` in LoR and `optimizer` in LDA case insensitive ## How was this patch tested? updated tests yanboliang Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes apache#17910 from zhengruifeng/lr_family_lowercase.

hhbyyh reviewed May 9, 2017

View reviewed changes

sethah reviewed May 10, 2017

View reviewed changes

zhengruifeng force-pushed the lr_family_lowercase branch from 69e99c6 to 82109ca Compare May 11, 2017 05:09

zhengruifeng force-pushed the lr_family_lowercase branch from efda91d to aedddc4 Compare May 11, 2017 09:25

zhengruifeng changed the title ~~[SPARK-20669][ML] LogisticRegression family should be case insensitive~~ [SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive May 15, 2017

zhengruifeng added 3 commits May 15, 2017 19:25

recreate pr

e5a2711

add tests

43108fa

add model getter check

7abbe36

zhengruifeng force-pushed the lr_family_lowercase branch from 6319f51 to 7abbe36 Compare May 15, 2017 11:26

asfgit closed this in 9970aa0 May 15, 2017

zhengruifeng deleted the lr_family_lowercase branch May 15, 2017 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910

[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910

zhengruifeng commented May 9, 2017 •

edited

Loading

hhbyyh May 9, 2017

zhengruifeng May 9, 2017

yanboliang May 9, 2017

hhbyyh May 10, 2017

hhbyyh May 9, 2017

zhengruifeng May 9, 2017

yanboliang May 9, 2017

hhbyyh left a comment

SparkQA commented May 9, 2017

zhengruifeng commented May 9, 2017

SparkQA commented May 9, 2017

SparkQA commented May 9, 2017

hhbyyh commented May 10, 2017 •

edited

Loading

sethah May 10, 2017 •

edited

Loading

zhengruifeng May 11, 2017

sethah May 15, 2017

SparkQA commented May 11, 2017

SparkQA commented May 11, 2017

yanboliang commented May 11, 2017 •

edited

Loading

zhengruifeng commented May 11, 2017

SparkQA commented May 11, 2017

zhengruifeng commented May 15, 2017

SparkQA commented May 15, 2017

SparkQA commented May 15, 2017

yanboliang commented May 15, 2017

sethah commented May 15, 2017

zhengruifeng commented May 16, 2017

[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910

[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910

Conversation

zhengruifeng commented May 9, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hhbyyh left a comment

Choose a reason for hiding this comment

SparkQA commented May 9, 2017

zhengruifeng commented May 9, 2017

SparkQA commented May 9, 2017

SparkQA commented May 9, 2017

hhbyyh commented May 10, 2017 • edited Loading

sethah May 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented May 11, 2017

SparkQA commented May 11, 2017

yanboliang commented May 11, 2017 • edited Loading

zhengruifeng commented May 11, 2017

SparkQA commented May 11, 2017

zhengruifeng commented May 15, 2017

SparkQA commented May 15, 2017

SparkQA commented May 15, 2017

yanboliang commented May 15, 2017

sethah commented May 15, 2017

zhengruifeng commented May 16, 2017

zhengruifeng commented May 9, 2017 •

edited

Loading

hhbyyh commented May 10, 2017 •

edited

Loading

sethah May 10, 2017 •

edited

Loading

yanboliang commented May 11, 2017 •

edited

Loading