-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45882][SQL] BroadcastHashJoinExec propagate partitioning should respect CoalescedHashPartitioning #43753
Conversation
7a7f465
to
155b86c
Compare
cc @dongjoon-hyun The AQE cache table fix has a perf regression. This PR fixes it |
|
||
override def satisfies0(required: Distribution): Boolean = from.satisfies0(required) | ||
override val expressions: Seq[Expression] = from.expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override val expressions: Seq[Expression] = from.expressions | |
override def expressions: Seq[Expression] = from.expressions |
override protected def withNewChildrenInternal( | ||
newChildren: IndexedSeq[Expression]): CoalescedHashPartitioning = | ||
copy(from = from.copy(expressions = newChildren)) | ||
|
||
override val numPartitions: Int = partitions.length | ||
|
||
override def toString: String = from.toString | ||
override def sql: String = from.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we include CoalescedHashPartitioning
in the string format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after second thought, we never hold the CoalescedHashPartitioning so just use the default behavior.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
Show resolved
Hide resolved
thanks, merging to master! |
@ulysses-you can you also open backport PRs? thanks! |
…d respect CoalescedHashPartitioning Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression. Avoid unnecessary shuffle exchange. yes, avoid regression add test no Closes apache#43753 from ulysses-you/partitioning. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…d respect CoalescedHashPartitioning Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression. Avoid unnecessary shuffle exchange. yes, avoid regression add test no Closes apache#43753 from ulysses-you/partitioning. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late LGTM.
…should respect CoalescedHashPartitioning This pr backport #43753 to branch-3.5 ### What changes were proposed in this pull request? Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression. ### Why are the changes needed? Avoid unnecessary shuffle exchange. ### Does this PR introduce _any_ user-facing change? yes, avoid regression ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes #43792 from ulysses-you/partitioning-3.5. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: youxiduo <youxiduo@corp.netease.com>
…should respect CoalescedHashPartitioning This pr backport #43753 to branch-3.4 ### What changes were proposed in this pull request? Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression. ### Why are the changes needed? Avoid unnecessary shuffle exchange. ### Does this PR introduce _any_ user-facing change? yes, avoid regression ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes #43793 from ulysses-you/partitioning-3.4. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: youxiduo <youxiduo@corp.netease.com>
…should respect CoalescedHashPartitioning This pr backport apache#43753 to branch-3.4 ### What changes were proposed in this pull request? Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression. ### Why are the changes needed? Avoid unnecessary shuffle exchange. ### Does this PR introduce _any_ user-facing change? yes, avoid regression ### How was this patch tested? add test ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#43793 from ulysses-you/partitioning-3.4. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: youxiduo <youxiduo@corp.netease.com>
What changes were proposed in this pull request?
Add HashPartitioningLike trait and make HashPartitioning and CoalescedHashPartitioning extend it. When we propagate output partiitoning, we should handle HashPartitioningLike instead of HashPartitioning. This pr also changes the BroadcastHashJoinExec to use HashPartitioningLike to avoid regression.
Why are the changes needed?
Avoid unnecessary shuffle exchange.
Does this PR introduce any user-facing change?
yes, avoid regression
How was this patch tested?
add test
Was this patch authored or co-authored using generative AI tooling?
no