Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6256] [MLlib] MLlib Python API parity check for regression #4997

Closed
wants to merge 3 commits into from

Conversation

yanboliang
Copy link
Contributor

MLlib Python API parity check for Regression, major disparities need to be added for Python list following:

LinearRegressionWithSGD
    setValidateData
LassoWithSGD
    setIntercept
    setValidateData
RidgeRegressionWithSGD
    setIntercept
    setValidateData

setFeatureScaling is mllib private function which is not needed to expose in pyspark.

@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28509 has started for PR 4997 at commit 2dff3df.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28509 has finished for PR 4997 at commit 2dff3df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28509/
Test PASSed.

@yanboliang
Copy link
Contributor Author

@jkbradley @mengxr Can you review this patch?

@mengxr
Copy link
Contributor

mengxr commented Mar 20, 2015

@yanboliang setFeatureScaling is not a public method. We were a little hesitated to expose it. Shall we only add validateData in this PR?

lassoAlg.optimizer
.setNumIterations(numIterations)
.setRegParam(regParam)
.setStepSize(stepSize)
.setMiniBatchFraction(miniBatchFraction)
lassoAlg.optimizer.setUpdater(getUpdaterFromString(regType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use builder pattern.

@jkbradley
Copy link
Member

Let's also add setIntercept. Also, in addition to setFeatureScaling being private, we do not need to expose optimizer.setUpdater for the 2 algorithms you listed because they have fixed updaters they should use (corresponding to the regularization they use).

@SparkQA
Copy link

SparkQA commented Mar 23, 2015

Test build #28978 has started for PR 4997 at commit de5ecbc.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 23, 2015

Test build #28978 has finished for PR 4997 at commit de5ecbc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28978/
Test PASSed.

@@ -111,9 +111,11 @@ private[python] class PythonMLLibAPI extends Serializable {
initialWeights: Vector,
regParam: Double,
regType: String,
intercept: Boolean): JList[Object] = {
intercept: Boolean,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "addIntercept" should be more clear and consistent.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29090 has started for PR 4997 at commit 1fb7b4f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29090 has finished for PR 4997 at commit 1fb7b4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29090/
Test PASSed.

@@ -142,7 +149,8 @@ class LinearRegressionWithSGD(object):

@classmethod
def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0,
initialWeights=None, regParam=0.0, regType=None, intercept=False):
initialWeights=None, regParam=0.0, regType=None, addIntercept=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry! I got confused about "intercept," thinking it was being added to this class. We should stick with the original name ("intercept") since it's a public API change otherwise.

@jkbradley
Copy link
Member

@yanboliang It looks fine to me, except for the intercept issue (sorry!) and for doc tests. Could you please add doc tests for LassoWithSGD, RidgeRegressionWithSGD using setIntercept, setValidateData? Thanks!

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29160 has started for PR 4997 at commit 102f498.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29160 has finished for PR 4997 at commit 102f498.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29160/
Test PASSed.

@jkbradley
Copy link
Member

@yanboliang LGTM merging into master
Thanks!

@asfgit asfgit closed this in 4353373 Mar 25, 2015
@yanboliang yanboliang deleted the spark-6256 branch April 24, 2015 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants