-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35352][SQL] Add code-gen for full outer sort merge join #34581
Conversation
Kubernetes integration test starting |
Test build #145186 has finished for PR 34581 at commit
|
Kubernetes integration test status failure |
Test build #145194 has finished for PR 34581 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145195 has finished for PR 34581 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145207 has finished for PR 34581 at commit
|
cc @cloud-fan could you help take a look when you have time? Thanks. |
Kubernetes integration test starting |
val rightIndex = ctx.freshName("rightIndex") | ||
|
||
// Generate code for join condition | ||
// val leftVars = genOneSideJoinVars(ctx, leftOutputRow, left, setDefaultValue = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - ah sorry, was added during debugging, removed.
val leftAnyNull = leftKeyVars.map(_.isNull).mkString(" || ") | ||
val rightKeyVars = createJoinKey(ctx, rightInputRow, rightKeys, right.output) | ||
val rightAnyNull = rightKeyVars.map(_.isNull).mkString(" || ") | ||
val matchedKeyVars = copyKeys(ctx, leftKeyVars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it's only related to left keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see, it's matched, so left and right keys are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - yeah, it can be copying from either leftKeyVars
or rightKeyVars
.
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
thanks, merging to master! |
Test build #145225 has finished for PR 34581 at commit
|
Thank you @cloud-fan for review! |
Test build #145226 has finished for PR 34581 at commit
|
What changes were proposed in this pull request?
This PR is to add code-gen for FULL OUTER sort merge join. The change is in
SortMergeJoinExec.scala:codegenFullOuter()
. Followed the same algorithm in iterator mode -SortMergeFullOuterJoinScanner
: maintain buffer for join left and right sides, and iterate over matched rows in buffers.Example query:
Example generated code: https://gist.github.com/c21/5cab9751f24ae448d77a259d28cb77d7
In addition, to help review as this PR triggers several TPCDS plan files change. The below files are having the real code change:
SortMergeJoinExec.scala
WholeStageCodegenSuite.scala
All other files are auto-generated golden file plan changes for TPCDS queries.
Why are the changes needed?
Improve the run-time/CPU performance of FULL OUTER sort merge join.
Micro benchmark (same query in
JoinBenchmark.scala
):Seeing 20-30% of run-time improvement:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
WholeStageCodegenSuite.scala
.OuterJoinSuite.scala
.