Skip to content

Conversation

@liukun4515
Copy link
Contributor

Which issue does this PR close?

add more join case which can be optimized for the limit push down rule.

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the optimizer Optimizer rules label Nov 29, 2022
@liukun4515 liukun4515 requested a review from Dandandan November 29, 2022 03:27
@liukun4515
Copy link
Contributor Author

cc @jackwener

\n Limit: skip=0, fetch=1000\
\n TableScan: test2, fetch=1000";

assert_optimized_plan_eq(&plan, expected).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_optimized_plan_eq(&plan, expected).unwrap();
assert_optimized_plan_eq(&plan, expected)?;

Comment on lines 211 to 212
JoinType::Left => push_down_join(join, Some(limit), None),
JoinType::Right => push_down_join(join, None, Some(limit)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those are the lesser restrictive versions of push down (only to left/right side)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these

JoinType::Left | JoinType::Right | JoinType::Full
                        if is_no_join_condition(join) =>

the on condition is empty, the rule can be applied, but if the on condition is not empty,

     JoinType::Left => push_down_join(join, Some(limit), None),
                    JoinType::Right => push_down_join(join, None, Some(limit)),

can be applied.

cc @jackwener

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understanded it

// push left and right
push_down_join(join, Some(limit), Some(limit))
}
JoinType::LeftSemi | JoinType::LeftAnti
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget RightSemi/RightAnti?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good catch

@jackwener
Copy link
Member

A good improvement👍Thanks @liukun4515

@jackwener
Copy link
Member

related issue to track problems like this #4413

// push left
push_down_join(join, Some(limit), None)
}
JoinType::RightSemi | JoinType::RightAnti
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the right semi and right anti
cc @jackwener

@liukun4515 liukun4515 requested a review from jackwener November 29, 2022 11:50
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvement.
LGTM

@liukun4515
Copy link
Contributor Author

cc @Dandandan if it looks good to you, I will merge this.

let limit = fetch + skip;
let new_join = match join.join_type {
JoinType::Left | JoinType::Right | JoinType::Full
if is_no_join_condition(join) =>
Copy link
Contributor

@alamb alamb Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how often there will be no join conditions 🤔 that effectively then becomes a CROSS JOIN

To be clear I think the optimization is still correct, I just wonder how often it will appear in real queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how often there will be no join conditions 🤔 that effectively then becomes a CROSS JOIN

To be clear I think the optimization is still correct, I just wonder how often it will appear in real queries

Good question for this, in the current framework of optimizer in the datafusion, the join query will be converted to the cross join by other rule.

The limit push down rule just do the best to optimize the plan.

Some SQL are generated by the program, and the join type and on condition are added by the program, it will generate sql like select * from left_table right join right_table.

@liukun4515 liukun4515 merged commit 48f0f3a into apache:master Dec 1, 2022
@liukun4515 liukun4515 deleted the optimize_limit_join branch December 1, 2022 02:35
@ursabot
Copy link

ursabot commented Dec 1, 2022

Benchmark runs are scheduled for baseline = 3fe542f and contender = 48f0f3a. 48f0f3a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants