[SPARK-49905][SQL][SS] Use different ShuffleOrigin for the shuffle required from stateful operators #48382
+38
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR proposes to use different ShuffleOrigin for the shuffle required from stateful operators.
Spark has been using ENSURE_REQUIREMENTS as ShuffleOrigin which is open for optimization e.g. AQE can adjust the shuffle spec. Quoting the code of ENSURE_REQUIREMENTS:
But the distribution requirement for stateful operators is lot more strict - it has to use the all expressions to calculate the hash (for partitioning) and the number of shuffle partitions must be the same with the spec. This is because stateful operator assumes that there is 1:1 mapping between the partition for the operator and the "physical" partition for checkpointed state. That said, it is fragile if we allow any optimization to be made against shuffle for stateful operator.
To prevent this, this PR introduces a new ShuffleOrigin with note that the shuffle is not expected to be "modified".
Why are the changes needed?
This exposes a possibility of broken state based on the contract. We introduced StatefulOpClusteredDistribution in similar reason.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UT added.
Was this patch authored or co-authored using generative AI tooling?
No.