-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34356][ML] OVR transform fix potential column conflict #31472
[SPARK-34356][ML] OVR transform fix potential column conflict #31472
Conversation
@@ -223,6 +223,13 @@ class OneVsRestSuite extends MLTest with DefaultReadWriteTest { | |||
assert(oldCols === newCols) | |||
} | |||
|
|||
test("SPARK-SPARK-34356: OneVsRestModel.transform should avoid potential column conflict") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in 3.0.1 and master
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #134869 has finished for PR 31472 at commit
|
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
Outdated
Show resolved
Hide resolved
tmpModel.setPredictionCol("") | ||
tmpModel match { | ||
case m: ProbabilisticClassificationModel[_, _] => m.setProbabilityCol("") | ||
case _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this case be silently ignored? if it's always ProbabilisticClassificationModel then just cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #134918 has finished for PR 31472 at commit
|
thanks @srowen for reviewing! |
What changes were proposed in this pull request?
1, clear predictionCol & probabilityCol, use tmp rawPred col, to avoid potential column conflict;
2, use array instead of map, to keep in line with the python side;
3, simplify transform
Why are the changes needed?
if input dataset has a column whose name is
predictionCol
,probabilityCol
,RawPredictionCol
, transfrom will fail.Does this PR introduce any user-facing change?
No
How was this patch tested?
added testsuite