New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35203][SQL] Improve Repartition statistics estimation #32309
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #137850 has finished for PR 32309 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #137863 has finished for PR 32309 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #139740 has finished for PR 32309 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
@@ -81,9 +81,9 @@ object BasicStatsPlanVisitor extends LogicalPlanVisitor[Statistics] { | |||
ProjectEstimation.estimate(p).getOrElse(fallback(p)) | |||
} | |||
|
|||
override def visitRepartition(p: Repartition): Statistics = default(p) | |||
override def visitRepartition(p: Repartition): Statistics = fallback(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: we need to fall back into the size-based one instead of just calling p.child.stats
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better maintenance, if BasicStatsPlanVisitor
and SizeInBytesOnlyStatsPlanVisitor
have the same implementation, then fallback. For example visitLocalLimit
:
Line 76 in 5553429
override def visitLocalLimit(p: LocalLimit): Statistics = fallback(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. It's trivial and sgtm.
Looks fine otherwise. |
Thank you, @wangyum . Merged to master. |
What changes were proposed in this pull request?
This PR improves
Repartition
andRepartitionByExpr
statistics estimation using child statistics.Why are the changes needed?
The current implementation will missing column stat. For example:
Before this PR:
After this PR:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.