Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for right_semi and right_anti. PR 2 #9897

Merged
merged 61 commits into from Dec 12, 2023

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Dec 5, 2023

Continuation of this PR. Moved here since the commit history was getting muddy. The original PR was based off of the feature branch.

#9406

Add right semi and right anti join support.

the operators for right anti and right semi utilize the same operator for right joins, but with one adjustment.

  1. Instead marking a match with true and propagating the result, we only mark the match in the hash table with true
  2. When scanning the hash table after the probe phase, we propagate all hash table entries with a true match.

This will not work for any other join operator type that is not a hash join. There is a check in the join order optimizer to make sure the (semi/anti) join conditions have an equality. If the conditions do not have this equality, we do not switch the children.

There is also better cardinality estimation for semi/anti joins.

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great. Some minor comments, otherwise good to go from my side. @lnkuiper all good from yours as well?

@@ -49,7 +49,7 @@ class PhysicalPiecewiseMergeJoin : public PhysicalRangeJoin {
SourceResultType GetData(ExecutionContext &context, DataChunk &chunk, OperatorSourceInput &input) const override;

bool IsSource() const override {
return IsRightOuterJoin(join_type);
return PropogatesBuildSide(join_type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: PropagatesBuildSide

size_t has_range = 0;
for (size_t c = 0; c < op.conditions.size(); ++c) {
auto &cond = op.conditions[c];
bool PhysicalPlanGenerator::HasEquality(vector<JoinCondition> &conds, idx_t &has_range) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we either turn has_range into a bool or rename it to range_count

@github-actions github-actions bot marked this pull request as draft December 6, 2023 08:56
@Tmonster Tmonster marked this pull request as ready for review December 6, 2023 09:31
Copy link
Contributor

@lnkuiper lnkuiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have left some very minor comments.

@@ -17,6 +17,7 @@ bool PhysicalJoin::EmptyResultIfRHSIsEmpty() const {
case JoinType::INNER:
case JoinType::RIGHT:
case JoinType::SEMI:
case JoinType::RIGHT_ANTI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't RIGHT_SEMI also be in this switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I remember taking this out because some tests with "IS NOT DISTINCT " conditions were failing. But those same issues would happen for right joins. so I'll add it back in

// (and NULL if no partner is found)
RIGHT_SEMI = 9,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a comment here explaining what the JoinTypes do like for the others

@github-actions github-actions bot marked this pull request as draft December 7, 2023 15:31
@Tmonster Tmonster marked this pull request as ready for review December 7, 2023 15:37
@github-actions github-actions bot marked this pull request as draft December 8, 2023 09:20
@Tmonster Tmonster marked this pull request as ready for review December 8, 2023 12:26
@Mytherin Mytherin merged commit 36b13ce into duckdb:main Dec 12, 2023
47 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 14, 2023
Merge pull request duckdb/duckdb#9985 from hawkfish/summarize-overflow
Merge pull request duckdb/duckdb#9972 from yiyuanliu/lyy/fix-9949
Merge pull request duckdb/duckdb#9962 from Tishj/statement_error_expected_non_optional
Merge pull request duckdb/duckdb#9961 from Tishj/python_supply_config_to_connect
Merge pull request duckdb/duckdb#9947 from taniabogatsch/list-intersect
Merge pull request duckdb/duckdb#9568 from Jens-H/append-BigDecimal
Merge pull request duckdb/duckdb#9959 from Tishj/python_udf_arrow_side_effects
Merge pull request duckdb/duckdb#9855 from StarveZhou/issue_9795_arg
Merge pull request duckdb/duckdb#9944 from Tishj/pyproject_toml
Merge pull request duckdb/duckdb#9946 from lnkuiper/issue9718
Merge pull request duckdb/duckdb#9953 from mlafeldt/fix-httpfs-null-ptr
Merge pull request duckdb/duckdb#9897 from Tmonster/left_semi_anti_feature_rebased
Merge pull request duckdb/duckdb#9936 from hawkfish/empty-frames
Merge pull request duckdb/duckdb#9715 from Tishj/dependency_set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants