Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: propagate EmptyRelation for more join types #10963

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 221 additions & 12 deletions datafusion/optimizer/src/propagate_empty_relation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
use std::sync::Arc;

use datafusion_common::tree_node::Transformed;
use datafusion_common::JoinType::Inner;
use datafusion_common::JoinType;
use datafusion_common::{internal_err, plan_err, Result};
use datafusion_expr::logical_plan::tree_node::unwrap_arc;
use datafusion_expr::logical_plan::LogicalPlan;
Expand Down Expand Up @@ -94,29 +94,62 @@ impl OptimizerRule for PropagateEmptyRelation {
Ok(Transformed::no(LogicalPlan::CrossJoin(join.clone())))
}

LogicalPlan::Join(ref join) if join.join_type == Inner => {
LogicalPlan::Join(ref join) => {
// TODO: For Join, more join type need to be careful:
jackwener marked this conversation as resolved.
Show resolved Hide resolved
// For LeftOuter/LeftSemi/LeftAnti Join, only the left side is empty, the Join result is empty.
// For LeftSemi Join, if the right side is empty, the Join result is empty.
// For LeftAnti Join, if the right side is empty, the Join result is left side(should exclude null ??).
// For RightOuter/RightSemi/RightAnti Join, only the right side is empty, the Join result is empty.
// For RightSemi Join, if the left side is empty, the Join result is empty.
// For RightAnti Join, if the left side is empty, the Join result is right side(should exclude null ??).
// For Full Join, only both sides are empty, the Join result is empty.
// For LeftOut/Full Join, if the right side is empty, the Join can be eliminated with a Projection with left side
// columns + right side columns replaced with null values.
// For RightOut/Full Join, if the left side is empty, the Join can be eliminated with a Projection with right side
// columns + left side columns replaced with null values.
let (left_empty, right_empty) = binary_plan_children_is_empty(&plan)?;
if left_empty || right_empty {
return Ok(Transformed::yes(LogicalPlan::EmptyRelation(
EmptyRelation {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that this code faithfully implements the semantics in the comments, and I did some mental review and verification to convince myself that these semantics are correct

i wonder if @jackwener has a moment to double check too?

match join.join_type {
JoinType::Inner if left_empty || right_empty => Ok(Transformed::yes(
LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
},
)));
}),
)),
JoinType::Left if left_empty => Ok(Transformed::yes(
LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
}),
)),
JoinType::Right if right_empty => Ok(Transformed::yes(
LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
}),
)),
JoinType::LeftSemi if left_empty || right_empty => Ok(
Transformed::yes(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
})),
),
JoinType::RightSemi if left_empty || right_empty => Ok(
Transformed::yes(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
})),
),
JoinType::LeftAnti if left_empty => Ok(Transformed::yes(
LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
}),
)),
JoinType::RightAnti if right_empty => Ok(Transformed::yes(
LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: join.schema.clone(),
}),
)),
jackwener marked this conversation as resolved.
Show resolved Hide resolved
_ => Ok(Transformed::no(LogicalPlan::Join(join.clone()))),
}
Ok(Transformed::no(LogicalPlan::Join(join.clone())))
}
LogicalPlan::Aggregate(ref agg) => {
if !agg.group_expr.is_empty() {
Expand Down Expand Up @@ -400,6 +433,182 @@ mod tests {
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing these could/should be slt tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having some end to end slt would be good too, but for these rules I think having good unit tests are key as well

fn test_left_join_empty_left_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan).build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::Left,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_right_join_empty_right_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan).build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::Right,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_left_semi_join_empty_left_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan).build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::LeftSemi,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_left_semi_join_empty_right_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan).build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::LeftSemi,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_right_semi_join_empty_right_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan).build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::RightSemi,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_right_semi_join_empty_left_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan).build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::RightSemi,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_right_anti_join_empty_right_table() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add the negative cases ? Like test right antijoin_empty_left_table and assert it wasn't rewritten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see

fn test_join_empty_propagation_rules() -> Result<()> {
// test left join with empty left
empty_left_and_right_lp(true, false, JoinType::Left, true)?;
// test right join with empty right
empty_left_and_right_lp(false, true, JoinType::Right, true)?;
// test left semi join with empty left
empty_left_and_right_lp(true, false, JoinType::LeftSemi, true)?;
// test left semi join with empty right
empty_left_and_right_lp(false, true, JoinType::LeftSemi, true)?;
// test right semi join with empty left
empty_left_and_right_lp(true, false, JoinType::RightSemi, true)?;
// test right semi join with empty right
empty_left_and_right_lp(false, true, JoinType::RightSemi, true)?;
// test left anti join empty left
empty_left_and_right_lp(true, false, JoinType::LeftAnti, true)?;
// test right anti join empty right
empty_left_and_right_lp(false, true, JoinType::RightAnti, true)
}
#[test]
fn test_join_empty_propagation_rules_noop() -> Result<()> {
// these cases should not result in an empty relation
// test left join with empty right
empty_left_and_right_lp(false, true, JoinType::Left, false)?;
// test right join with empty left
empty_left_and_right_lp(true, false, JoinType::Right, false)?;
// test left semi with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::LeftSemi, false)?;
// test right semi with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::RightSemi, false)?;
// test left anti join with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::LeftAnti, false)?;
// test left anti with non-empty left and empty right
empty_left_and_right_lp(false, true, JoinType::LeftAnti, false)?;
// test right anti join with non-empty left and right
empty_left_and_right_lp(false, false, JoinType::RightAnti, false)?;
// test right anti with empty left and non-empty right
empty_left_and_right_lp(true, false, JoinType::RightAnti, false)
}
(second half)

let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan).build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::RightAnti,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_left_anti_join_empty_left_table() -> Result<()> {
let left_table_scan = test_table_scan()?;
let left = LogicalPlanBuilder::from(left_table_scan)
.filter(Expr::Literal(ScalarValue::Boolean(Some(false))))?
.build()?;

let right_table_scan = test_table_scan_with_name("test2")?;
let right = LogicalPlanBuilder::from(right_table_scan).build()?;

let plan = LogicalPlanBuilder::from(left)
.join_using(
right,
JoinType::LeftAnti,
vec![Column::from_name("a".to_string())],
)?
.build()?;

let expected = "EmptyRelation";
assert_together_optimized_plan_eq(plan, expected)
}

#[test]
fn test_empty_with_non_empty() -> Result<()> {
let table_scan = test_table_scan()?;
Expand Down
5 changes: 1 addition & 4 deletions datafusion/sqllogictest/test_files/subquery.slt
Original file line number Diff line number Diff line change
Expand Up @@ -613,10 +613,7 @@ SELECT t1_id, t1_name FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2_id = t1_id
query TT
explain SELECT t1_id, t1_name FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2_id = t1_id limit 0)
----
logical_plan
01)LeftSemi Join: t1.t1_id = __correlated_sq_1.t2_id
02)--TableScan: t1 projection=[t1_id, t1_name]
03)--EmptyRelation
logical_plan EmptyRelation
Copy link
Contributor Author

@tshauck tshauck Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the EmptyRelation in the original test is causing the entire join to be optimized out with the additional join types now propagating emptyrelation.

Is there a way I can preserve this test's testing of just the subquery rewrite logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

causing the entire join to be optimized out with the additional join types now propagating emptyrelation.

I agree with your assesment. I think the limit 0 means this plan is better

In terms of preserving the test of the subquery rewrite I think it does (otherwise it wouldn't be rewritten to a SEMI JOIN and thus the join couldn't be removed)


query IT rowsort
SELECT t1_id, t1_name FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2_id = t1_id limit 0)
Expand Down