Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10194] [MLlib] [PySpark] SGD algorithms need convergenceTol parameter in Python #8457

Closed
wants to merge 2 commits into from

Conversation

yanboliang
Copy link
Contributor

SPARK-3382 added a convergenceTol parameter for GradientDescent-based methods in Scala. We need that parameter in Python; otherwise, Python users will not be able to adjust that behavior (or even reproduce behavior from previous releases since the default changed).

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41619 has finished for PR 8457 at commit 92d6d8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

:param: miniBatchFraction Fraction of data on which SGD is run for each
:param stepSize: Step size for each iteration of gradient descent.
:param numIterations: Total number of iterations run.
:param miniBatchFraction: Fraction of data on which SGD is run for each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault, but if you make any additional changes can you add a "." at the end of this sentence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the default values should be documented.

@feynmanliang
Copy link
Contributor

LGTM overall

The formal parameter changes to PythonMLLibAPI will break binary compatibility, but this may not be a big issue since the class is private. Perhaps others will have better insight.

@@ -212,7 +218,7 @@ private[python] class PythonMLLibAPI extends Serializable {
initialWeights: Vector,
regType: String,
intercept: Boolean,
validateData: Boolean): JList[Object] = {
validateData: Boolean, convergenceTol: Double): JList[Object] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chop down args

@yanboliang
Copy link
Contributor Author

I open SPARK-10560 to track the StreamingLogisticRegressionWithSGD issues included what @feynmanliang mentioned in this PR.
Let's this PR focus on the topic. @mengxr

@SparkQA
Copy link

SparkQA commented Sep 11, 2015

Test build #42317 has finished for PR 8457 at commit c2f3112.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in ce6f3f1 Sep 14, 2015
@mengxr
Copy link
Contributor

mengxr commented Sep 14, 2015

LGTM. Merged into master. Thanks!

@yanboliang yanboliang deleted the spark-10194 branch September 15, 2015 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants