[SPARK-33832][SQL] Add an option in AQE to mitigate skew even if it c…#30829
[SPARK-33832][SQL] Add an option in AQE to mitigate skew even if it c…#30829ekoifman wants to merge 4 commits intoapache:masterfrom
Conversation
|
This failed with |
|
Jenkins, retest this please |
eed4f64 to
ff8e8e8
Compare
ff8e8e8 to
c9c236b
Compare
|
Hi, |
|
ok to test |
|
It's hard to calculate the cost of a shuffle and compare it with the benefit of skew join handling. We need some ways to tune it manually. But I don't understand why this patch is so complicated. Isn't it simply skip counting the shuffles at the end of |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #133913 has finished for PR 30829 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
I agree that automatically computing the cost of the new shuffle is hard. That is why this first step uses an option to enable this behavior - based on previous runs of a query or report, for example. I think the mechanics of adding a new shuffle have to be built first before it can become a cost based decision. I'm not sure what you mean by "skip counting"
So if This seems like the easiest way to preserve the flow of the existing algorithm. The in The processing the final stage (before any of my changes) is slightly different - |
|
@ekoifman ah I get your point. By design, |
|
@cloud-fan I don't think you can move
|
|
OK, then can you briefly explain your idea about how to allow |
|
A Rule in
|
|
This seems like a big change to the AQE framework. If it's rare to force apply the skew join handling, how about we start with a sub-optimal solution: we just add the extra shuffle and do not create a query stage for this shuffle. Under the hood, |
|
@cloud-fan I think introducing an "unhandled" shuffle is conceptually a bigger change. Currently all the code assumes (and tests assert it) that that there are no "unhandled" shuffles. That is the reason I made this PR such that when a I think the mechanics of adding a shuffle in the middle of stage creation would be just as complicated. Currently, I'm raising these questions to illustrate why I don't think adding an "unhandled" shuffle makes it easier. Please let me know if you think they have easy answers. |
|
Query stage is quite self-contained. One problem is, |
|
After a second thought, I think adding "unhandled" shuffle is a bit risky, but allowing the stage optimization phase to add new shuffles is too complicated. I'd like to revisit the idea of putting the skew join optimization rule in the stage preparation phase. For the two points you gave:
|
|
RE 1, I see this comment on Also, I'd like to clarify one thing in the in the current PR. Stage optimization doesn't actually add the shuffle. So in fact, the new shuffle is added during Query Stage preparation as you suggest, but it looks different in the code because |
|
Ah sorry for my bad memory, so the comment in I get your point that it's like backtracking, but it does make the control flow more complicated. And using exceptions in control flow is anti-pattern (see here), we need more effort to refactor the code of the AQE loop. So I'd like to avoid changing the control flow if possible. I don't see any blockers to run |
|
ok, let me think about this - I'll be back |
|
The main loop in Why does it check for new shuffles? Under what circumstances can If we move |
|
I think it's just a very conservative check. We can skip this check if the config to force-apply skew join optimization is turned on. |
|
@ekoifman This check is to guard against re-planning that turns an already shuffle-materialized SMJ to BHJ while causing extra shuffles downstream. One way around this might be to cost a skew join differently so it counters the increased cost of the extra shuffle. |
|
Hi @cloud-fan, @maryannxue |
|
@cloud-fan do you have any more thoughts about this or #31653? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
…auses an new shuffle
What changes were proposed in this pull request?
This PR adds
spark.sql.adaptive.force.if.shuffleconfig option which, if set to true, causesOptimizeSkewedJointo perform skew mitigation even if it requires a new shuffle to be performed.For example, current behavior is not perform skew mitigation in the following query
With
spark.sql.adaptive.force.if.shuffle=true, the SMJ will becomeskew=trueand a new shuffle will be added for the Aggregate. New shuffle itself will be processed by AQE and coalesced, etc if needed.Why are the changes needed?
It enables AQE to apply in cases where the cost of extra shuffle is less than the cost of unmitigated skew.
Does this PR introduce any user-facing change?
No, with
spark.sql.adaptive.force.if.shuffle=false, which is the defaultHow was this patch tested?
New UTs + enhancements of existing ones