Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910

Closed
wants to merge 3 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented May 9, 2017

What changes were proposed in this pull request?

make param family in LoR and optimizer in LDA case insensitive

How was this patch tested?

updated tests

@yanboliang

@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") (
case None => histogram.length
}

val isMultinomial = $(family) match {
val isMultinomial = $(family).toLowerCase(Locale.ROOT) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general practice, I would recommend moving the .toLowerCase(Locale.ROOT) into the setter. Then we don't need to invoke the .toLowerCase(Locale.ROOT) multiple times in the code. (here it happens to be once). And we can always assume the $(family) has predictable values in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow the style in GeneralizedLinearRegression.
Lower the param in setter can simplify the codes, but it also change the output of coresponding getter. What is your opinion? @yanboliang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @zhengruifeng. I'd like to put the original value in param, since users may compare the param value with the original input as following:

val family = "Binomial"
val lr = new LogisticRegression().setFamily(family)
val model = lr.fit(dataset)
...
if (family == lr.getFamily) 
  println("A")
else
  println("B")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

@@ -890,7 +890,7 @@ object LogisticRegression extends DefaultParamsReadable[LogisticRegression] {
override def load(path: String): LogisticRegression = super.load(path)

private[classification] val supportedFamilyNames =
Array("auto", "binomial", "multinomial").map(_.toLowerCase(Locale.ROOT))
Array("auto", "binomial", "multinomial")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to be careful to remove the map. Since Locale.Root can be some special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. If we should keep toLowerCase here, we may also do this in GeneralizedLinearRegression and others

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @hhbyyh Let's keep it to handle some special locale.

Copy link
Contributor

@hhbyyh hhbyyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should find the best pattern and resolve this and other similar issues in one PR.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76633 has finished for PR 17910 at commit 33c0f9e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

@hhbyyh I find that param in ALS and treeParams is lowered in the getter, and the getter instead of the param is used in the algs.
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala#L125

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala#L236

I think it maybe better to follow this way.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76660 has finished for PR 17910 at commit c2426e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2017

Test build #76661 has finished for PR 17910 at commit 69e99c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hhbyyh
Copy link
Contributor

hhbyyh commented May 10, 2017

@zhengruifeng That may be the best solution I see for now. The down side is that we need to remember the special treatment for string params.

@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite
assert(m1.interceptVector ~== m2.interceptVector absTol 0.05)
}
val testParams = Seq(
("binomial", smallBinaryDataset, 2),
("multinomial", smallMultinomialDataset, 3)
("Binomial", smallBinaryDataset, 2),
Copy link
Contributor

@sethah sethah May 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "binomial" and "BiNoMiaL"? Also, what about doing:

lr.setFamily("BiNomial")
assert(lr.getFamily === "binomial")

I'm not a big fan of sprinkling these very subtle "tests" around the test suite. We should have dedicated tests of this functionality. Otherwise, how should future developers know that the capital "B" here is supposed to be there and is actually testing some desired functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah OK, I will add dedicated tests for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you made don't address this comment at all, and there are not tests for the suggestion from Yanbo either.

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76777 has finished for PR 17910 at commit 82109ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76779 has finished for PR 17910 at commit efda91d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

yanboliang commented May 11, 2017

@zhengruifeng I think we should return the same string value compared with the original input. For example, users set paramA with Spark, and they should get paramA with Spark rather than spark. See my comments at #17910 (comment) . Thanks.

@zhengruifeng
Copy link
Contributor Author

@yanboliang Updated. Thanks for reviewing

@SparkQA
Copy link

SparkQA commented May 11, 2017

Test build #76797 has finished for PR 17910 at commit aedddc4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

Ping @yanboliang

@zhengruifeng zhengruifeng changed the title [SPARK-20669][ML] LogisticRegression family should be case insensitive [SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive May 15, 2017
@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76939 has finished for PR 17910 at commit 6319f51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 15, 2017

Test build #76940 has finished for PR 17910 at commit 7abbe36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

LGTM, merged into master and branch-2.2. Thanks for all.
@zhengruifeng Could you send a follow-up PR to fix all other algorithms like this?

asfgit pushed a commit that referenced this pull request May 15, 2017
…tive

## What changes were proposed in this pull request?
make param `family` in LoR and `optimizer` in LDA case insensitive

## How was this patch tested?
updated tests

yanboliang

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #17910 from zhengruifeng/lr_family_lowercase.

(cherry picked from commit 9970aa0)
Signed-off-by: Yanbo Liang <ybliang8@gmail.com>
@asfgit asfgit closed this in 9970aa0 May 15, 2017
@zhengruifeng zhengruifeng deleted the lr_family_lowercase branch May 15, 2017 16:09
@sethah
Copy link
Contributor

sethah commented May 15, 2017

@zhengruifeng In the follow up PR, would you mind changing the logistic regression tests to incorporate setMaxIter(1)?

@zhengruifeng
Copy link
Contributor Author

@sethah Good point. Thanks

robert3005 pushed a commit to palantir/spark that referenced this pull request May 19, 2017
…tive

## What changes were proposed in this pull request?
make param `family` in LoR and `optimizer` in LDA case insensitive

## How was this patch tested?
updated tests

yanboliang

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes apache#17910 from zhengruifeng/lr_family_lowercase.
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
…tive

## What changes were proposed in this pull request?
make param `family` in LoR and `optimizer` in LDA case insensitive

## How was this patch tested?
updated tests

yanboliang

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes apache#17910 from zhengruifeng/lr_family_lowercase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants