Skip to content

[SPARK-56615][SQL] Clarify KeyedPartitioning satisfies0() and nonGroupedSatisfies()/groupedSatisfies() contract#55538

Closed
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-56615-keyedpartitioning-clarifications
Closed

[SPARK-56615][SQL] Clarify KeyedPartitioning satisfies0() and nonGroupedSatisfies()/groupedSatisfies() contract#55538
peter-toth wants to merge 2 commits intoapache:masterfrom
peter-toth:SPARK-56615-keyedpartitioning-clarifications

Conversation

@peter-toth
Copy link
Copy Markdown
Contributor

@peter-toth peter-toth commented Apr 24, 2026

What changes were proposed in this pull request?

KeyedPartitioning.satisfies0 previously delegated unconditionally to both nonGroupedSatisfies and groupedSatisfies, meaning an ungrouped KP would incorrectly claim to satisfy ClusteredDistribution if called directly. The correct form is:

nonGroupedSatisfies(required) || (isGrouped && groupedSatisfies(required))

EnsureRequirements never calls satisfies() on an ungrouped KeyedPartitioning (it uses splitKeyedPartitionings and calls the two helpers directly), so the missing guard has had no runtime effect. This PR adds the guard to make the semantics correct for any future direct caller.

Also adds documentation to KeyedPartitioning and EnsureRequirements explaining the intended call contract: groupedSatisfies answers "would this KP satisfy if it were grouped?", and EnsureRequirements calls it directly on non-grouped KPs to decide whether inserting GroupPartitionsExec would satisfy the distribution.

Why are the changes needed?

The missing isGrouped && guard in satisfies0 is a latent correctness issue. The documentation clarifies a subtle and previously undocumented contract between KeyedPartitioning and EnsureRequirements.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The fix aligns the implementation with the contract that EnsureRequirements already relies on. Existing KeyGroupedPartitioningSuite and related suites cover the behaviour, but added a new test as well.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6

…dSatisfies/groupedSatisfies contract

### What changes were proposed in this pull request?

`KeyedPartitioning.satisfies0` previously delegated unconditionally to both
`nonGroupedSatisfies` and `groupedSatisfies`, meaning an ungrouped KP would
incorrectly claim to satisfy `ClusteredDistribution` if called directly. The
correct form is:

    nonGroupedSatisfies(required) || (isGrouped && groupedSatisfies(required))

`EnsureRequirements` never calls `satisfies()` on an ungrouped `KeyedPartitioning`
(it uses `splitKeyedPartitionings` and calls the two helpers directly), so the
missing guard has had no runtime effect. This PR adds the guard to make the
semantics correct for any future direct caller.

Also adds documentation to `KeyedPartitioning` and `EnsureRequirements` explaining
the intended call contract: `groupedSatisfies` answers "would this KP satisfy if it
were grouped?", and `EnsureRequirements` calls it directly on non-grouped KPs to
decide whether inserting `GroupPartitionsExec` would satisfy the distribution.

### Why are the changes needed?

The missing `isGrouped &&` guard in `satisfies0` is a latent correctness issue.
The documentation clarifies a subtle and previously undocumented contract between
`KeyedPartitioning` and `EnsureRequirements`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No new tests needed -- the fix aligns the implementation with the contract that
`EnsureRequirements` already relies on. Existing `KeyGroupedPartitioningSuite`
and related suites cover the behaviour.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Sonnet 4.6
@peter-toth peter-toth marked this pull request as ready for review April 24, 2026 16:19
@peter-toth
Copy link
Copy Markdown
Contributor Author

peter-toth commented Apr 24, 2026

cc @dongjoon-hyun , @cloud-fan, @szehon-ho, @yaooqinn


override def satisfies0(required: Distribution): Boolean = {
nonGroupedSatisfies(required) || groupedSatisfies(required)
nonGroupedSatisfies(required) || (isGrouped && groupedSatisfies(required))
Copy link
Copy Markdown
Contributor Author

@peter-toth peter-toth Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the missing isGrouped && was not an issue as we called this only on grouped KeyedPartitionings.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, @peter-toth .

@dongjoon-hyun
Copy link
Copy Markdown
Member

dongjoon-hyun commented Apr 24, 2026

I got your point, but it would be great if we can have a small unit test case specifically to prevent any regressions (not from this PR) for future proof. Of course, this is nit.

No new tests needed -- the fix aligns the implementation with the contract that EnsureRequirements already relies on. Existing KeyGroupedPartitioningSuite and related suites cover the behaviour.

@peter-toth
Copy link
Copy Markdown
Contributor Author

I got your point, but it would be great if we can have a small unit test case specifically to prevent any regressions (not from this PR) for future proof. Of course, this is nit.

Added a simple test in cb2c03e.

@peter-toth
Copy link
Copy Markdown
Contributor Author

Thank you @dongjoon-hyun and @yaooqinn for the review!

Merged to master (4.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants