Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37316][SQL] Add code-gen for existence sort merge join #34601

Closed
wants to merge 3 commits into from

Conversation

c21
Copy link
Contributor

@c21 c21 commented Nov 15, 2021

What changes were proposed in this pull request?

This PR is to add code-gen for Existence sort merge join. Followed the same algorithm used in iterator mode from SortMergeJoinExec.scala and HashJoin.scala. Check every left side row if there's an existing match from right side. Output every left side row and boolean flag to indicate whether it has a match or not.

In addition, to help review as this PR triggers several TPCDS plan files change. The below file is having the real code change:

  • SortMergeJoinExec.scala

Why are the changes needed?

To improve performance for Existence sort merge join, and conclude all join types code-gen support for sort merge join.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit test in ExistenceJoinSuite.scala

@github-actions github-actions bot added the SQL label Nov 15, 2021
@@ -675,6 +677,9 @@ case class SortMergeJoinExec(
bufferedVars ++ streamedVars
case LeftSemi | LeftAnti =>
streamedVars
case ExistenceJoin(_) =>
streamedVars ++ Seq(ExprCode.forNonNullValue(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49702/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49707/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49702/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49707/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Test build #145232 has finished for PR 34601 at commit 95e6c0f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Test build #145237 has finished for PR 34601 at commit 77a386c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49717/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49717/

@c21
Copy link
Contributor Author

c21 commented Nov 15, 2021

cc @cloud-fan could you help take a look when you have time? Thanks.

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Test build #145247 has finished for PR 34601 at commit f7906be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in dfca899 Nov 17, 2021
@c21
Copy link
Contributor Author

c21 commented Nov 17, 2021

Thank you @cloud-fan for review!

@c21 c21 deleted the existence-join-codegen branch November 17, 2021 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants