-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28514][ML] Remove the redundant transformImpl method in RF & GBT #25256
[SPARK-28514][ML] Remove the redundant transformImpl method in RF & GBT #25256
Conversation
Test build #108167 has finished for PR 25256 at commit
|
Seem like a partial revert of #6300, cc @BryanCutler for a further review. |
Yes, from what I can remember the point of these methods was to broadcast the model. It's been a while since I looked at this and it has gotten a little confusing over time. I'm not sure if this is still needed or can be removed cc @mengxr @WeichenXu123 |
I think I'd leave this, as it's on purpose and probably for performance reasons. I wonder if we can just always broadcast the model here? What's the downside? the model is already by default serialized in the closure, so it should serialize. There's overhead to broadcasting a tiny model I guess, but maybe that's fine. |
This sounds reasonable to me and would make the code easier to follow |
@BryanCutler @srowen I am neutral on model broadcasting, I notice that there are three approachs for broadcastable/small models to performance transformation: As to this pr, if it can improve performance, I am OK to leave |
Is it really not used in |
@srowen The |
Test build #108562 has finished for PR 25256 at commit
|
I see. I wonder if we should be extra safe and override I'm OK with the current approach or further restricting inheritance of |
good idea, It is reasonable to restrict the |
Test build #108618 has finished for PR 25256 at commit
|
@@ -210,6 +210,9 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur | |||
outputData.toDF | |||
} | |||
|
|||
final override private def transformImpl(dataset: Dataset[_]): DataFrame = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this just can't be private
Test build #108636 has finished for PR 25256 at commit
|
Merged to master |
What changes were proposed in this pull request?
Remove the redundant and confusing transformImpl method in RF & GBT;
In
GBTClassifier
&RandomForestClassifier
, the realtransform
methods inherit fromProbabilisticClassificationModel
which can deal with multi output columns.The
transformImpl
method, which deals with only one column -predictionCol
, completely does nothing. This is quite confusing.How was this patch tested?
existing suites