New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6256] [MLlib] MLlib Python API parity check for regression #4997
Conversation
Test build #28509 has started for PR 4997 at commit
|
Test build #28509 has finished for PR 4997 at commit
|
Test PASSed. |
@jkbradley @mengxr Can you review this patch? |
@yanboliang |
lassoAlg.optimizer | ||
.setNumIterations(numIterations) | ||
.setRegParam(regParam) | ||
.setStepSize(stepSize) | ||
.setMiniBatchFraction(miniBatchFraction) | ||
lassoAlg.optimizer.setUpdater(getUpdaterFromString(regType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use builder pattern.
Let's also add setIntercept. Also, in addition to setFeatureScaling being private, we do not need to expose optimizer.setUpdater for the 2 algorithms you listed because they have fixed updaters they should use (corresponding to the regularization they use). |
Test build #28978 has started for PR 4997 at commit
|
Test build #28978 has finished for PR 4997 at commit
|
Test PASSed. |
@@ -111,9 +111,11 @@ private[python] class PythonMLLibAPI extends Serializable { | |||
initialWeights: Vector, | |||
regParam: Double, | |||
regType: String, | |||
intercept: Boolean): JList[Object] = { | |||
intercept: Boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "addIntercept" should be more clear and consistent.
Test build #29090 has started for PR 4997 at commit
|
Test build #29090 has finished for PR 4997 at commit
|
Test PASSed. |
@@ -142,7 +149,8 @@ class LinearRegressionWithSGD(object): | |||
|
|||
@classmethod | |||
def train(cls, data, iterations=100, step=1.0, miniBatchFraction=1.0, | |||
initialWeights=None, regParam=0.0, regType=None, intercept=False): | |||
initialWeights=None, regParam=0.0, regType=None, addIntercept=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry! I got confused about "intercept," thinking it was being added to this class. We should stick with the original name ("intercept") since it's a public API change otherwise.
@yanboliang It looks fine to me, except for the intercept issue (sorry!) and for doc tests. Could you please add doc tests for LassoWithSGD, RidgeRegressionWithSGD using setIntercept, setValidateData? Thanks! |
Test build #29160 has started for PR 4997 at commit
|
Test build #29160 has finished for PR 4997 at commit
|
Test PASSed. |
@yanboliang LGTM merging into master |
MLlib Python API parity check for Regression, major disparities need to be added for Python list following:
setFeatureScaling is mllib private function which is not needed to expose in pyspark.