Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: propagate EmptyRelation for more join types #10963

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tshauck
Copy link
Contributor

@tshauck tshauck commented Jun 17, 2024

Which issue does this PR close?

Doesn't fully close, but related to #10967

Rationale for this change

Currently the PropagateEmptyRelation relation only can optimize inner joins (probably the most important).

This PR updates the join handling to include simple cases for left, right, left-semi, right-semi, left-anti, and right-anti joins.

What changes are included in this PR?

Slight refactoring of the join match statement PropagateEmptyRelation::rewrite then add the aforementioned joins.

Are these changes tested?

Add tests for the added join types to confirm they're rewritten to an empty relation.

Are there any user-facing changes?

No

@github-actions github-actions bot added the optimizer Optimizer rules label Jun 17, 2024
@tshauck tshauck force-pushed the fill-out-empty-relation-opt-for-more-join-types branch from dda1845 to 5a82986 Compare June 17, 2024 19:32
@@ -400,6 +433,182 @@ mod tests {
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing these could/should be slt tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having some end to end slt would be good too, but for these rules I think having good unit tests are key as well

01)LeftSemi Join: t1.t1_id = __correlated_sq_1.t2_id
02)--TableScan: t1 projection=[t1_id, t1_name]
03)--EmptyRelation
logical_plan EmptyRelation
Copy link
Contributor Author

@tshauck tshauck Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the EmptyRelation in the original test is causing the entire join to be optimized out with the additional join types now propagating emptyrelation.

Is there a way I can preserve this test's testing of just the subquery rewrite logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

causing the entire join to be optimized out with the additional join types now propagating emptyrelation.

I agree with your assesment. I think the limit 0 means this plan is better

In terms of preserving the test of the subquery rewrite I think it does (otherwise it wouldn't be rewritten to a SEMI JOIN and thus the join couldn't be removed)

@tshauck tshauck marked this pull request as ready for review June 17, 2024 20:57
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tshauck -- I think this looks like a nice improvement to me

I agree it would be nice to add some slt tests for this feature as well as maybe reduce repetition in the unit tests, but I don't think they are required to merge the PR

I'll plan to leave this one open for another day or two to give others a chance to review it as well if they would like

@@ -400,6 +433,182 @@ mod tests {
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having some end to end slt would be good too, but for these rules I think having good unit tests are key as well

01)LeftSemi Join: t1.t1_id = __correlated_sq_1.t2_id
02)--TableScan: t1 projection=[t1_id, t1_name]
03)--EmptyRelation
logical_plan EmptyRelation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

causing the entire join to be optimized out with the additional join types now propagating emptyrelation.

I agree with your assesment. I think the limit 0 means this plan is better

In terms of preserving the test of the subquery rewrite I think it does (otherwise it wouldn't be rewritten to a SEMI JOIN and thus the join couldn't be removed)

if left_empty || right_empty {
return Ok(Transformed::yes(LogicalPlan::EmptyRelation(
EmptyRelation {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that this code faithfully implements the semantics in the comments, and I did some mental review and verification to convince myself that these semantics are correct

i wonder if @jackwener has a moment to double check too?

}

#[test]
fn test_right_anti_join_empty_right_table() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add the negative cases ? Like test right antijoin_empty_left_table and assert it wasn't rewritten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see

fn test_join_empty_propagation_rules() -> Result<()> {
// test left join with empty left
empty_left_and_right_lp(true, false, JoinType::Left, true)?;
// test right join with empty right
empty_left_and_right_lp(false, true, JoinType::Right, true)?;
// test left semi join with empty left
empty_left_and_right_lp(true, false, JoinType::LeftSemi, true)?;
// test left semi join with empty right
empty_left_and_right_lp(false, true, JoinType::LeftSemi, true)?;
// test right semi join with empty left
empty_left_and_right_lp(true, false, JoinType::RightSemi, true)?;
// test right semi join with empty right
empty_left_and_right_lp(false, true, JoinType::RightSemi, true)?;
// test left anti join empty left
empty_left_and_right_lp(true, false, JoinType::LeftAnti, true)?;
// test right anti join empty right
empty_left_and_right_lp(false, true, JoinType::RightAnti, true)
}
#[test]
fn test_join_empty_propagation_rules_noop() -> Result<()> {
// these cases should not result in an empty relation
// test left join with empty right
empty_left_and_right_lp(false, true, JoinType::Left, false)?;
// test right join with empty left
empty_left_and_right_lp(true, false, JoinType::Right, false)?;
// test left semi with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::LeftSemi, false)?;
// test right semi with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::RightSemi, false)?;
// test left anti join with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::LeftAnti, false)?;
// test left anti with non-empty left and empty right
empty_left_and_right_lp(false, true, JoinType::LeftAnti, false)?;
// test right anti join with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::RightAnti, false)?;
// test right anti with empty left and non-empty right
empty_left_and_right_lp(true, false, JoinType::RightAnti, false)
}
(second half)

@tshauck
Copy link
Contributor Author

tshauck commented Jun 18, 2024

Thanks for the feedback @alamb, I took some steps in 66aa31a to improve the tests by reducing redundancy and adding negative cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants