[SPARK-56615][SQL] Clarify KeyedPartitioning satisfies0() and nonGroupedSatisfies()/groupedSatisfies() contract#55538
Conversation
…dSatisfies/groupedSatisfies contract
### What changes were proposed in this pull request?
`KeyedPartitioning.satisfies0` previously delegated unconditionally to both
`nonGroupedSatisfies` and `groupedSatisfies`, meaning an ungrouped KP would
incorrectly claim to satisfy `ClusteredDistribution` if called directly. The
correct form is:
nonGroupedSatisfies(required) || (isGrouped && groupedSatisfies(required))
`EnsureRequirements` never calls `satisfies()` on an ungrouped `KeyedPartitioning`
(it uses `splitKeyedPartitionings` and calls the two helpers directly), so the
missing guard has had no runtime effect. This PR adds the guard to make the
semantics correct for any future direct caller.
Also adds documentation to `KeyedPartitioning` and `EnsureRequirements` explaining
the intended call contract: `groupedSatisfies` answers "would this KP satisfy if it
were grouped?", and `EnsureRequirements` calls it directly on non-grouped KPs to
decide whether inserting `GroupPartitionsExec` would satisfy the distribution.
### Why are the changes needed?
The missing `isGrouped &&` guard in `satisfies0` is a latent correctness issue.
The documentation clarifies a subtle and previously undocumented contract between
`KeyedPartitioning` and `EnsureRequirements`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No new tests needed -- the fix aligns the implementation with the contract that
`EnsureRequirements` already relies on. Existing `KeyGroupedPartitioningSuite`
and related suites cover the behaviour.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6
|
|
||
| override def satisfies0(required: Distribution): Boolean = { | ||
| nonGroupedSatisfies(required) || groupedSatisfies(required) | ||
| nonGroupedSatisfies(required) || (isGrouped && groupedSatisfies(required)) |
There was a problem hiding this comment.
Technically, the missing isGrouped && was not an issue as we called this only on grouped KeyedPartitionings.
|
I got your point, but it would be great if we can have a small unit test case specifically to prevent any regressions (not from this PR) for future proof. Of course, this is nit.
|
Added a simple test in cb2c03e. |
|
Thank you @dongjoon-hyun and @yaooqinn for the review! Merged to |
What changes were proposed in this pull request?
KeyedPartitioning.satisfies0previously delegated unconditionally to bothnonGroupedSatisfiesandgroupedSatisfies, meaning an ungrouped KP would incorrectly claim to satisfyClusteredDistributionif called directly. The correct form is:EnsureRequirementsnever callssatisfies()on an ungroupedKeyedPartitioning(it usessplitKeyedPartitioningsand calls the two helpers directly), so the missing guard has had no runtime effect. This PR adds the guard to make the semantics correct for any future direct caller.Also adds documentation to
KeyedPartitioningandEnsureRequirementsexplaining the intended call contract:groupedSatisfiesanswers "would this KP satisfy if it were grouped?", andEnsureRequirementscalls it directly on non-grouped KPs to decide whether insertingGroupPartitionsExecwould satisfy the distribution.Why are the changes needed?
The missing
isGrouped &&guard insatisfies0is a latent correctness issue. The documentation clarifies a subtle and previously undocumented contract betweenKeyedPartitioningandEnsureRequirements.Does this PR introduce any user-facing change?
No.
How was this patch tested?
The fix aligns the implementation with the contract that
EnsureRequirementsalready relies on. ExistingKeyGroupedPartitioningSuiteand related suites cover the behaviour, but added a new test as well.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6