Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13676] Fix mismatched default values for regParam in LogisticRegression #11519

Closed
wants to merge 1 commit into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

The default value of regularization parameter for LogisticRegression algorithm is different in Scala and Python. We should provide the same value.

Scala

scala> new org.apache.spark.ml.classification.LogisticRegression().getRegParam
res0: Double = 0.0

Python

>>> from pyspark.ml.classification import LogisticRegression
>>> LogisticRegression().getRegParam()
0.1

How was this patch tested?

manual. Check the following in pyspark.

>>> from pyspark.ml.classification import LogisticRegression
>>> LogisticRegression().getRegParam()
0.0

@srowen
Copy link
Member

srowen commented Mar 4, 2016

Jenkins test this please

@srowen
Copy link
Member

srowen commented Mar 4, 2016

Sounds good. If you have a moment would be good to look over other default regularization params to see if there are other mismatches.

@SparkQA
Copy link

SparkQA commented Mar 4, 2016

Test build #52464 has finished for PR 11519 at commit d5323af.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Mar 4, 2016

Merged into master. Thanks!

@asfgit asfgit closed this in c8f2545 Mar 4, 2016
@dongjoon-hyun
Copy link
Member Author

I see, @srowen . I will check the others, too.

Thank you, @srowen and @mengxr .

@dongjoon-hyun
Copy link
Member Author

Also, thank you for Coverity, @srowen !
Today, I will check them, too. :)

@dongjoon-hyun
Copy link
Member Author

Hi, @srowen .
According to your advice, I reviewed the other algorithms and I found that all other algorithms use the same default values for reqParam in Scala/Python. It's a good new.

  • ALS: 0.1
  • LassoWithSGD: 0.01
  • LinearRegression: 0.0
  • LinearRegressionWithSGD: 0.0
  • LogisticRegression: 0.0
  • LogisticRegressionWithSGD: 0.01
  • LogisticRegressionWithLBFGS: 0.00
  • RidgeRegressionWithSGD: 0.01
  • SVMWithSGD: 0.01
  • StreamingLogisticRegressionWithSGD: 0.0

However, I found that LinearRegressionWithSGD and StreamingLinearRegressionWithSGD does not have regParam as constructor arguments. They just depends on GradientDescent's default reqParam values. So, I think we need to file this as a new JIRA issue.

Thank you, @srowen .

@dongjoon-hyun
Copy link
Member Author

Here is the JIRA Issue SPARK-13686 and PR.

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…egression

## What changes were proposed in this pull request?

The default value of regularization parameter for `LogisticRegression` algorithm is different in Scala and Python. We should provide the same value.

**Scala**
```
scala> new org.apache.spark.ml.classification.LogisticRegression().getRegParam
res0: Double = 0.0
```

**Python**
```
>>> from pyspark.ml.classification import LogisticRegression
>>> LogisticRegression().getRegParam()
0.1
```

## How was this patch tested?
manual. Check the following in `pyspark`.
```
>>> from pyspark.ml.classification import LogisticRegression
>>> LogisticRegression().getRegParam()
0.0
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#11519 from dongjoon-hyun/SPARK-13676.
@dongjoon-hyun dongjoon-hyun deleted the SPARK-13676 branch March 29, 2016 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants