Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression #5301

Closed
wants to merge 6 commits into from

Conversation

oefirouz
Copy link

I have the fit intercept enabled by default for logistic regression, I
wonder what others think here. I understand that it enables allocation
by default which is undesirable, but one needs to have a very strong
reason for not having an intercept term enabled so it is the safer
default from a statistical sense.

Explicitly modeling the intercept by adding a column of all 1s does not
work. I believe the reason is that since the API for
LogisticRegressionWithLBFGS forces column normalization, and a column of all
1s has 0 variance so dividing by 0 kills it.

I have the fit intercept enabled by default for logistic regression, I
wonder what others think here. I understand that it enables allocation
by default which is undesirable, but one needs to have a very strong
reason for not having an intercept term enabled so it is the safer
default from a statistical sense.

Explicitly modeling the intercept by adding a column of all 1s does not
work. I believe the reason is that since the API for
LogisticRegressionWithLBFGS forces column normalization, and a column of all
1s has 0 variance so dividing by 0 kills it.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jkbradley
Copy link
Member

@oefirouz Thanks for the PR! I agree we need to support this option, and that it should be set to True by default.

Can you please make a JIRA and put it in the PR title? "[SPARK-####] [MLLIB] Add fit..."

I believe you're correct about normalization causing problems for an all-ones column; the other issue is that LR needs to know not to regularize the intercept term.

I'll look at the code now

* param for fitting the intercept term
* @group param
*/
val fitIntercept: BooleanParam = new BooleanParam(this, "fitIntercept", "fits the intercept term or not")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fits the intercept term or not" --> "Indicates whether to fit an intercept term"

@jkbradley
Copy link
Member

It looks fine to me other than that one comment. However, could you please add a unit test (or modify an existing one) to test this feature in org/apache/spark/ml/classification/LogisticRegressionSuite.scala? Thanks!

Added unit tests and changed docs in line with PR comments
@oefirouz
Copy link
Author

oefirouz commented Apr 3, 2015

@jkbradley Thanks for your comments! I've updated the PR.

* param for fitting the intercept term
* @group param
*/
val fitIntercept: BooleanParam =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, can you please make this default to true? (Add "Some(true)" as a 4th argument, plus update doc.) Thanks!

@jkbradley
Copy link
Member

@oefirouz Just a few more comments, and then that should be it. Thanks!

@oefirouz oefirouz changed the title [MLLIB] Add fit intercept api to ml logisticregression [SPARK-6705][MLLIB] Add fit intercept api to ml logisticregression Apr 3, 2015
@srowen
Copy link
Member

srowen commented Apr 4, 2015

ok to test

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29707/
Test FAILed.

Whoops, add this to the logisticRegression and not the optimizer
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29718/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Apr 6, 2015

Test build #636 has started for PR 5301 at commit 9f1286b.

@jkbradley
Copy link
Member

LGTM once the full tests pass

@SparkQA
Copy link

SparkQA commented Apr 6, 2015

Test build #636 has finished for PR 5301 at commit 9f1286b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 6, 2015

One comment before we merge this: Is it useful to specify the bias constant? In LIBLINEAR, there is an option called -B bias. If bias == 0, we don't add intercept, otherwise we add intercept with the provided bias. If we take this approach, the param name should be bias.

@jkbradley
Copy link
Member

I'm going to say no; that sounds like an extra option we could add later. More details on that API:

  • Our current API: Either fit the intercept, or fix it at 0. Do not regularize it if fitting.
  • "bias" API in liblinear: Fit the intercept, but adjust regularization by adjusting "bias."
    • My issues with this:
      • It's not intuitive (if I understand their API correctly). To avoid regularizing the intercept, you set the "bias" to be large.
      • R glmnet uses our current API (fit intercept or don't), and it seems like a more authoritative codebase to follow.

Later on, I could imagine us adding the option to regularize the intercept.

@jkbradley
Copy link
Member

I'll go ahead and merge this with master. Thanks @oefirouz !

@asfgit asfgit closed this in d138aa8 Apr 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants