New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36228][SQL] Skip splitting a skewed partition when some map outputs are removed #33445
Conversation
Test build #141345 has finished for PR 33445 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
LGTM. +1 |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141373 has finished for PR 33445 at commit
|
thanks for the review, merging to master/3.2! |
…tputs are removed ### What changes were proposed in this pull request? Sometimes, AQE skew join optimization can fail with NPE. This is because AQE tries to get the shuffle block sizes, but some map outputs are missing due to the executor lost or something. This PR fixes this bug by skipping skew join handling if some map outputs are missing in the `MapOutputTracker`. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? a new UT Closes #33445 from cloud-fan/bug. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 9c8a3d3) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Sometimes, AQE skew join optimization can fail with NPE. This is because AQE tries to get the shuffle block sizes, but some map outputs are missing due to the executor lost or something.
This PR fixes this bug by skipping skew join handling if some map outputs are missing in the
MapOutputTracker
.Why are the changes needed?
bug fix
Does this PR introduce any user-facing change?
no
How was this patch tested?
a new UT