Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jun 1, 2021

Which issue does this PR close?

Closes #461

Rationale for this change

Adds semi join support. This will enable us to rewrite WHERE id IN (..) and WHERE EXISTS to a (semi) join.

What changes are included in this PR?

Add semi join to hash join implementation.
What's not in this PR

  • Semi join sql (should be added to sqlparser if we want to) / dataframe syntax
  • Using the semi join in the planner for in/exists

Are there any user-facing changes?

No

@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2021

Codecov Report

Merging #470 (9cb007a) into master (c8ab5a4) will increase coverage by 0.50%.
The diff coverage is 77.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #470      +/-   ##
==========================================
+ Coverage   75.30%   75.80%   +0.50%     
==========================================
  Files         152      153       +1     
  Lines       25275    25924     +649     
==========================================
+ Hits        19033    19652     +619     
- Misses       6242     6272      +30     
Impacted Files Coverage Δ
...sta/rust/core/src/serde/logical_plan/from_proto.rs 36.17% <0.00%> (-0.05%) ⬇️
...lista/rust/core/src/serde/logical_plan/to_proto.rs 62.32% <0.00%> (-0.08%) ⬇️
...ta/rust/core/src/serde/physical_plan/from_proto.rs 39.63% <0.00%> (-0.15%) ⬇️
...ista/rust/core/src/serde/physical_plan/to_proto.rs 50.46% <0.00%> (-0.16%) ⬇️
datafusion/src/logical_plan/builder.rs 90.08% <0.00%> (-0.39%) ⬇️
datafusion/src/logical_plan/plan.rs 80.62% <ø> (+0.52%) ⬆️
datafusion/src/physical_plan/planner.rs 80.32% <0.00%> (-0.14%) ⬇️
datafusion/src/optimizer/hash_build_probe_order.rs 58.53% <20.00%> (-1.97%) ⬇️
datafusion/src/physical_plan/hash_join.rs 86.45% <97.29%> (+0.39%) ⬆️
datafusion/src/physical_plan/hash_utils.rs 98.50% <100.00%> (+0.02%) ⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8ab5a4...9cb007a. Read the comment docs.

@alamb
Copy link
Contributor

alamb commented Jun 1, 2021

I plan to review this tomorrow

@Dandandan Dandandan mentioned this pull request Jun 2, 2021
@Dandandan
Copy link
Contributor Author

Related PR for anti join here: #482

@houqp
Copy link
Member

houqp commented Jun 2, 2021

should the readme be updated to include semi join as well?

@Dandandan
Copy link
Contributor Author

Dandandan commented Jun 2, 2021

should the readme be updated to include semi join as well?

I think for now we don't - I would say that at least need to have in or exists to be mapped to a semi join, as currently there is not really a way for anyone to create it (other than creating a LogicalPlan explicitly).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @Dandandan ! Nice work

| LogicalPlan::CreateExternalTable { .. }
| LogicalPlan::Explain { .. }
| LogicalPlan::Union { .. }
| LogicalPlan::Join { .. }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you also possibly conditionalize this on being a SemiJoin? like

Suggested change
| LogicalPlan::Join { .. }
| LogicalPlan::Join if *join_type == JoinType::Semi { .. }

So if we add a new join type in the future it doesn't accidentally get allowed / disallowed in some confusing way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that in earlier commit... But than the exhaustiveness checking gets in the way unfortunately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way the compiler can do it now is by explicitly matching on all the join types, but as this feature is not added https://github.com/rust-lang/rfcs/blob/master/text/2535-or-patterns.md that would be quite verbose currently

Copy link
Contributor Author

@Dandandan Dandandan Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote it a bit to use a function in the body of the join match

}
}

fn supports_swap(join_type: JoinType) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 33ff660 into apache:master Jun 2, 2021
@houqp houqp added datafusion enhancement New feature or request labels Jul 30, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for semi (hash) join

4 participants