[SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict #46325

szehon-ho · 2024-05-01T20:51:45Z

What changes were proposed in this pull request?

If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys) being in partition keys, to the two sets overlapping.

Why are the changes needed?

If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests in KeyGroupedPartitioningSuite

Was this patch authored or co-authored using generative AI tooling?

No

### What changes were proposed in this pull request? If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys) being in partition keys, to the two sets overlapping. ### Why are the changes needed? If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? -Added tests in KeyGroupedPartitioningSuite

szehon-ho · 2024-05-02T22:22:46Z

@sunchao I think its a simple fix, can you take a look?

sunchao

LGTM

sunchao · 2024-05-02T23:16:32Z

Merged to master, thanks @szehon-ho ! Do you think we need to backport this to branch-3.4 and branch-3.5?

szehon-ho · 2024-05-02T23:42:16Z

Thanks for fast review! Yea will do that.

szehon-ho · 2024-05-03T01:50:06Z

Actually just checked, looks like original pr #42306 was not backported because it is a new feature and not bug fix. So I think no need.

### What changes were proposed in this pull request? If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, change KeyGroupedPartitioning.satisfies0(distribution) check from all clustering keys (here, join keys) being in partition keys, to the two sets overlapping. ### Why are the changes needed? If spark.sql.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled is true, then SPJ no longer triggers if there are more join keys than partition keys. But SPJ is supported in this case if flag is false. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests in KeyGroupedPartitioningSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46325 from szehon-ho/fix_spj_less_join_key. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Chao Sun <chao@openai.com>

github-actions bot added the SQL label May 1, 2024

szehon-ho force-pushed the fix_spj_less_join_key branch from 18aec40 to c2c2659 Compare May 2, 2024 00:49

sunchao approved these changes May 2, 2024

View reviewed changes

sunchao closed this in 5ec62a7 May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict #46325

[SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict #46325

szehon-ho commented May 1, 2024

szehon-ho commented May 2, 2024

sunchao left a comment

sunchao commented May 2, 2024

szehon-ho commented May 2, 2024

szehon-ho commented May 3, 2024

[SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict #46325

[SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict #46325

Conversation

szehon-ho commented May 1, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

szehon-ho commented May 2, 2024

sunchao left a comment

Choose a reason for hiding this comment

sunchao commented May 2, 2024

szehon-ho commented May 2, 2024

szehon-ho commented May 3, 2024