Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45882][SQL] BroadcastHashJoinExec propagate partitioning should respect CoalescedHashPartitioning #43753

Closed
wants to merge 2 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.

Why are the changes needed?

Avoid unnecessary shuffle exchange.

Does this PR introduce any user-facing change?

yes, avoid regression

How was this patch tested?

add test

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Nov 10, 2023
@cloud-fan
Copy link
Contributor

cc @dongjoon-hyun The AQE cache table fix has a perf regression. This PR fixes it


override def satisfies0(required: Distribution): Boolean = from.satisfies0(required)
override val expressions: Seq[Expression] = from.expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override val expressions: Seq[Expression] = from.expressions
override def expressions: Seq[Expression] = from.expressions

override protected def withNewChildrenInternal(
newChildren: IndexedSeq[Expression]): CoalescedHashPartitioning =
copy(from = from.copy(expressions = newChildren))

override val numPartitions: Int = partitions.length

override def toString: String = from.toString
override def sql: String = from.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we include CoalescedHashPartitioning in the string format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after second thought, we never hold the CoalescedHashPartitioning so just use the default behavior.

@cloud-fan cloud-fan closed this in 961dcdf Nov 13, 2023
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan
Copy link
Contributor

@ulysses-you can you also open backport PRs? thanks!

ulysses-you added a commit to ulysses-you/spark that referenced this pull request Nov 14, 2023
…d respect CoalescedHashPartitioning

Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.

Avoid unnecessary shuffle exchange.

yes, avoid regression

add test

no

Closes apache#43753 from ulysses-you/partitioning.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@ulysses-you ulysses-you deleted the partitioning branch November 14, 2023 01:45
ulysses-you added a commit to ulysses-you/spark that referenced this pull request Nov 14, 2023
…d respect CoalescedHashPartitioning

Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.

Avoid unnecessary shuffle exchange.

yes, avoid regression

add test

no

Closes apache#43753 from ulysses-you/partitioning.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late LGTM.

ulysses-you added a commit that referenced this pull request Nov 14, 2023
…should respect CoalescedHashPartitioning

This pr backport #43753 to branch-3.5

### What changes were proposed in this pull request?

Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.

### Why are the changes needed?

Avoid unnecessary shuffle exchange.

### Does this PR introduce _any_ user-facing change?

yes, avoid regression

### How was this patch tested?

add test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43792 from ulysses-you/partitioning-3.5.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: youxiduo <youxiduo@corp.netease.com>
ulysses-you added a commit that referenced this pull request Nov 14, 2023
…should respect CoalescedHashPartitioning

This pr backport #43753 to branch-3.4

### What changes were proposed in this pull request?

Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.

### Why are the changes needed?

Avoid unnecessary shuffle exchange.

### Does this PR introduce _any_ user-facing change?

yes, avoid regression

### How was this patch tested?

add test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43793 from ulysses-you/partitioning-3.4.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: youxiduo <youxiduo@corp.netease.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
…should respect CoalescedHashPartitioning

This pr backport apache#43753 to branch-3.4

### What changes were proposed in this pull request?

Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.

### Why are the changes needed?

Avoid unnecessary shuffle exchange.

### Does this PR introduce _any_ user-facing change?

yes, avoid regression

### How was this patch tested?

add test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#43793 from ulysses-you/partitioning-3.4.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: youxiduo <youxiduo@corp.netease.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants