New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37316][SQL] Add code-gen for existence sort merge join #34601
Conversation
@@ -675,6 +677,9 @@ case class SortMergeJoinExec( | |||
bufferedVars ++ streamedVars | |||
case LeftSemi | LeftAnti => | |||
streamedVars | |||
case ExistenceJoin(_) => | |||
streamedVars ++ Seq(ExprCode.forNonNullValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI - this is using same logic as hash join - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L667-L668 .
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
Test build #145232 has finished for PR 34601 at commit
|
Test build #145237 has finished for PR 34601 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
cc @cloud-fan could you help take a look when you have time? Thanks. |
Test build #145247 has finished for PR 34601 at commit
|
thanks, merging to master! |
Thank you @cloud-fan for review! |
What changes were proposed in this pull request?
This PR is to add code-gen for Existence sort merge join. Followed the same algorithm used in iterator mode from
SortMergeJoinExec.scala
andHashJoin.scala
. Check every left side row if there's an existing match from right side. Output every left side row and boolean flag to indicate whether it has a match or not.In addition, to help review as this PR triggers several TPCDS plan files change. The below file is having the real code change:
SortMergeJoinExec.scala
Why are the changes needed?
To improve performance for Existence sort merge join, and conclude all join types code-gen support for sort merge join.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing unit test in
ExistenceJoinSuite.scala