-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20669][ML] LoR.family and LDA.optimizer should be case insensitive #17910
Conversation
@@ -526,7 +526,7 @@ class LogisticRegression @Since("1.2.0") ( | |||
case None => histogram.length | |||
} | |||
|
|||
val isMultinomial = $(family) match { | |||
val isMultinomial = $(family).toLowerCase(Locale.ROOT) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general practice, I would recommend moving the .toLowerCase(Locale.ROOT)
into the setter. Then we don't need to invoke the .toLowerCase(Locale.ROOT)
multiple times in the code. (here it happens to be once). And we can always assume the $(family) has predictable values in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow the style in GeneralizedLinearRegression
.
Lower the param in setter can simplify the codes, but it also change the output of coresponding getter. What is your opinion? @yanboliang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 @zhengruifeng. I'd like to put the original value in param, since users may compare the param value with the original input as following:
val family = "Binomial"
val lr = new LogisticRegression().setFamily(family)
val model = lr.fit(dataset)
...
if (family == lr.getFamily)
println("A")
else
println("B")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
@@ -890,7 +890,7 @@ object LogisticRegression extends DefaultParamsReadable[LogisticRegression] { | |||
override def load(path: String): LogisticRegression = super.load(path) | |||
|
|||
private[classification] val supportedFamilyNames = | |||
Array("auto", "binomial", "multinomial").map(_.toLowerCase(Locale.ROOT)) | |||
Array("auto", "binomial", "multinomial") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to be careful to remove the map. Since Locale.Root can be some special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this. If we should keep toLowerCase
here, we may also do this in GeneralizedLinearRegression
and others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 @hhbyyh Let's keep it to handle some special locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should find the best pattern and resolve this and other similar issues in one PR.
Test build #76633 has finished for PR 17910 at commit
|
@hhbyyh I find that param in I think it maybe better to follow this way. |
Test build #76660 has finished for PR 17910 at commit
|
Test build #76661 has finished for PR 17910 at commit
|
@zhengruifeng That may be the best solution I see for now. The down side is that we need to remember the special treatment for string params. |
@@ -2318,8 +2319,8 @@ class LogisticRegressionSuite | |||
assert(m1.interceptVector ~== m2.interceptVector absTol 0.05) | |||
} | |||
val testParams = Seq( | |||
("binomial", smallBinaryDataset, 2), | |||
("multinomial", smallMultinomialDataset, 3) | |||
("Binomial", smallBinaryDataset, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "binomial" and "BiNoMiaL"? Also, what about doing:
lr.setFamily("BiNomial")
assert(lr.getFamily === "binomial")
I'm not a big fan of sprinkling these very subtle "tests" around the test suite. We should have dedicated tests of this functionality. Otherwise, how should future developers know that the capital "B" here is supposed to be there and is actually testing some desired functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethah OK, I will add dedicated tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes you made don't address this comment at all, and there are not tests for the suggestion from Yanbo either.
69e99c6
to
82109ca
Compare
Test build #76777 has finished for PR 17910 at commit
|
Test build #76779 has finished for PR 17910 at commit
|
@zhengruifeng I think we should return the same string value compared with the original input. For example, users set |
efda91d
to
aedddc4
Compare
@yanboliang Updated. Thanks for reviewing |
Test build #76797 has finished for PR 17910 at commit
|
Ping @yanboliang |
Test build #76939 has finished for PR 17910 at commit
|
6319f51
to
7abbe36
Compare
Test build #76940 has finished for PR 17910 at commit
|
LGTM, merged into master and branch-2.2. Thanks for all. |
…tive ## What changes were proposed in this pull request? make param `family` in LoR and `optimizer` in LDA case insensitive ## How was this patch tested? updated tests yanboliang Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #17910 from zhengruifeng/lr_family_lowercase. (cherry picked from commit 9970aa0) Signed-off-by: Yanbo Liang <ybliang8@gmail.com>
@zhengruifeng In the follow up PR, would you mind changing the logistic regression tests to incorporate |
@sethah Good point. Thanks |
…tive ## What changes were proposed in this pull request? make param `family` in LoR and `optimizer` in LDA case insensitive ## How was this patch tested? updated tests yanboliang Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes apache#17910 from zhengruifeng/lr_family_lowercase.
…tive ## What changes were proposed in this pull request? make param `family` in LoR and `optimizer` in LDA case insensitive ## How was this patch tested? updated tests yanboliang Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes apache#17910 from zhengruifeng/lr_family_lowercase.
What changes were proposed in this pull request?
make param
family
in LoR andoptimizer
in LDA case insensitiveHow was this patch tested?
updated tests
@yanboliang