[SPARK-17138][ML][MLib] Add Python API for multinomial logistic regression#14852
[SPARK-17138][ML][MLib] Add Python API for multinomial logistic regression#14852WeichenXu123 wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #64543 has finished for PR 14852 at commit
|
|
Now that #14834 has been merged, we can make the updates to Python API. There is no new interface to implement, but it would be great if this PR could take care of updating the Python side to reflect that LOR supports multiclass now. |
2ddb948 to
0e9b423
Compare
|
Test build #65721 has finished for PR 14852 at commit
|
|
cc @sethah @yanboliang thanks! |
|
@WeichenXu123 We should also expose |
|
Done. thanks! @yanboliang |
|
Test build #65828 has finished for PR 14852 at commit
|
sethah
left a comment
There was a problem hiding this comment.
Looking good, only minor comments. Thanks!
python/pyspark/ml/classification.py
Outdated
| """ | ||
| Logistic regression. | ||
| Currently, this class only supports binary classification. | ||
| This class supports binary and multinomial classification. |
There was a problem hiding this comment.
"This supports multinomial logistic (softmax) and binomial logistic regression."
| return self._call_java("intercept") | ||
|
|
||
| @property | ||
| @since("2.1.0") |
There was a problem hiding this comment.
We should document the behavior of coefficients and intercept for the multinomial case, which is that they will throw an error. You can see the scaladoc for reference.
python/pyspark/ml/classification.py
Outdated
| DenseVector([5.5...]) | ||
| >>> model.intercept | ||
| -2.68... | ||
| >>> lrm = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight", |
There was a problem hiding this comment.
let's use mlor/mlorModel for the multinomial and blor/blorModel for the variable names.
python/pyspark/ml/classification.py
Outdated
| -2.68... | ||
| >>> lrm = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight", | ||
| ... family="multinomial") | ||
| >>> modelm = lrm.fit(df) |
There was a problem hiding this comment.
it might be nice to fit this to an actual multiclass dataset, that way we can show that it works and produces a model with numClasses > 2.
|
Done. thanks for careful review :) @sethah |
|
Test build #65885 has finished for PR 14852 at commit
|
|
LGTM. @yanboliang what do you think? |
|
LGTM2, merged into master. Thanks! @WeichenXu123 @sethah |
What changes were proposed in this pull request?
Add Python API for multinomial logistic regression.
familyparam in python api.coefficientMatrixandinterceptVectorforLogisticRegressionModelHow was this patch tested?
existing and added doc tests.