[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression #5301

oefirouz · 2015-03-31T18:30:59Z

I have the fit intercept enabled by default for logistic regression, I
wonder what others think here. I understand that it enables allocation
by default which is undesirable, but one needs to have a very strong
reason for not having an intercept term enabled so it is the safer
default from a statistical sense.

Explicitly modeling the intercept by adding a column of all 1s does not
work. I believe the reason is that since the API for
LogisticRegressionWithLBFGS forces column normalization, and a column of all
1s has 0 variance so dividing by 0 kills it.

I have the fit intercept enabled by default for logistic regression, I wonder what others think here. I understand that it enables allocation by default which is undesirable, but one needs to have a very strong reason for not having an intercept term enabled so it is the safer default from a statistical sense. Explicitly modeling the intercept by adding a column of all 1s does not work. I believe the reason is that since the API for LogisticRegressionWithLBFGS forces column normalization, and a column of all 1s has 0 variance so dividing by 0 kills it.

AmplabJenkins · 2015-03-31T18:32:13Z

Can one of the admins verify this patch?

jkbradley · 2015-04-02T22:36:45Z

@oefirouz Thanks for the PR! I agree we need to support this option, and that it should be set to True by default.

Can you please make a JIRA and put it in the PR title? "[SPARK-####] [MLLIB] Add fit..."

I believe you're correct about normalization causing problems for an all-ones column; the other issue is that LR needs to know not to regularize the intercept term.

I'll look at the code now

jkbradley · 2015-04-02T22:38:34Z

mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala

+   * param for fitting the intercept term
+   * @group param
+   */
+  val fitIntercept: BooleanParam = new BooleanParam(this, "fitIntercept", "fits the intercept term or not")


"fits the intercept term or not" --> "Indicates whether to fit an intercept term"

jkbradley · 2015-04-02T22:39:52Z

It looks fine to me other than that one comment. However, could you please add a unit test (or modify an existing one) to test this feature in org/apache/spark/ml/classification/LogisticRegressionSuite.scala? Thanks!

Added unit tests and changed docs in line with PR comments

oefirouz · 2015-04-03T20:45:03Z

@jkbradley Thanks for your comments! I've updated the PR.

jkbradley · 2015-04-03T21:15:24Z

mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala

+   * param for fitting the intercept term
+   * @group param
+   */
+  val fitIntercept: BooleanParam =


Thinking more about this, can you please make this default to true? (Add "Some(true)" as a 4th argument, plus update doc.) Thanks!

jkbradley · 2015-04-03T21:18:07Z

@oefirouz Just a few more comments, and then that should be it. Thanks!

Made the trait default true Changed float comparisons to === in unit tests

Forgot to update doc

srowen · 2015-04-04T07:33:20Z

ok to test

AmplabJenkins · 2015-04-04T07:44:40Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29707/
Test FAILed.

Whoops, add this to the logisticRegression and not the optimizer

AmplabJenkins · 2015-04-04T23:54:47Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29718/
Test PASSed.

SparkQA · 2015-04-06T17:33:58Z

Test build #636 has started for PR 5301 at commit 9f1286b.

jkbradley · 2015-04-06T17:39:11Z

LGTM once the full tests pass

SparkQA · 2015-04-06T18:56:16Z

Test build #636 has finished for PR 5301 at commit 9f1286b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

mengxr · 2015-04-06T20:10:54Z

One comment before we merge this: Is it useful to specify the bias constant? In LIBLINEAR, there is an option called -B bias. If bias == 0, we don't add intercept, otherwise we add intercept with the provided bias. If we take this approach, the param name should be bias.

jkbradley · 2015-04-06T20:28:27Z

I'm going to say no; that sounds like an extra option we could add later. More details on that API:

Our current API: Either fit the intercept, or fix it at 0. Do not regularize it if fitting.
"bias" API in liblinear: Fit the intercept, but adjust regularization by adjusting "bias."
- My issues with this:
  - It's not intuitive (if I understand their API correctly). To avoid regularizing the intercept, you set the "bias" to be large.
  - R glmnet uses our current API (fit intercept or don't), and it seems like a more authoritative codebase to follow.

Later on, I could imagine us adding the option to regularize the intercept.

jkbradley · 2015-04-08T03:35:35Z

I'll go ahead and merge this with master. Thanks @oefirouz !

jkbradley reviewed Apr 2, 2015
View reviewed changes

[MLLIB] Add fit intercept term

329c1e2

Added unit tests and changed docs in line with PR comments

jkbradley reviewed Apr 3, 2015
View reviewed changes

Omede Firouz added 3 commits April 3, 2015 14:41

[MLLIB] Add fitIntercept param to logistic regression

2257fca

Made the trait default true Changed float comparisons to === in unit tests

[MLLIB] Add fitIntercept to LogisticRegression

9963509

Forgot to update doc

[SPARK-6705][MLLIB] Add a fit intercept term to ML LogisticRegression

1d6bd6f

oefirouz changed the title ~~[MLLIB] Add fit intercept api to ml logisticregression~~ [SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression Apr 3, 2015

[SPARK-6705][MLLIB] Add fitInterceptTerm to LogisticRegression

9f1286b

Whoops, add this to the logisticRegression and not the optimizer

asfgit closed this in d138aa8 Apr 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression #5301

[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression #5301

oefirouz commented Mar 31, 2015

AmplabJenkins commented Mar 31, 2015

jkbradley commented Apr 2, 2015

jkbradley Apr 2, 2015

jkbradley commented Apr 2, 2015

oefirouz commented Apr 3, 2015

jkbradley Apr 3, 2015

jkbradley commented Apr 3, 2015

srowen commented Apr 4, 2015

AmplabJenkins commented Apr 4, 2015

AmplabJenkins commented Apr 4, 2015

SparkQA commented Apr 6, 2015

jkbradley commented Apr 6, 2015

SparkQA commented Apr 6, 2015

mengxr commented Apr 6, 2015

jkbradley commented Apr 6, 2015

jkbradley commented Apr 8, 2015

[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression #5301

[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression #5301

Conversation

oefirouz commented Mar 31, 2015

AmplabJenkins commented Mar 31, 2015

jkbradley commented Apr 2, 2015

jkbradley Apr 2, 2015

Choose a reason for hiding this comment

jkbradley commented Apr 2, 2015

oefirouz commented Apr 3, 2015

jkbradley Apr 3, 2015

Choose a reason for hiding this comment

jkbradley commented Apr 3, 2015

srowen commented Apr 4, 2015

AmplabJenkins commented Apr 4, 2015

AmplabJenkins commented Apr 4, 2015

SparkQA commented Apr 6, 2015

jkbradley commented Apr 6, 2015

SparkQA commented Apr 6, 2015

mengxr commented Apr 6, 2015

jkbradley commented Apr 6, 2015

jkbradley commented Apr 8, 2015