Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Nov 27, 2025

Which issue does this PR close?

Rationale for this change

Improves the plan / plan readability for queries with joins.

Seems to improve perf as well a bit for more challenging joins: (TPC-H SF=10, in-memory)

│ QQuery 21    │              7948.54 ms │    6340.74 ms │ +1.25x faster │
│ QQuery 22    │                58.30 ms │      49.86 ms │ +1.17x faster │

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Nov 27, 2025
@Dandandan Dandandan changed the title Refactor join2 Coalesce batches inside hash join Nov 27, 2025
@github-actions github-actions bot added the core Core DataFusion crate label Nov 27, 2025
@Dandandan Dandandan changed the title Coalesce batches inside hash join Coalesce batches inside hash join, reuse indices buffer Nov 28, 2025
@Dandandan
Copy link
Contributor Author

@alamb let's fire some benchmarks?
I mostly see improvement on SF=10 in memory on the standard set of benchmarks

plan.transform_up(|plan| {
let plan_any = plan.as_any();
let wrap_in_coalesce = plan_any.downcast_ref::<HashJoinExec>().is_some()
let wrap_in_coalesce = plan_any
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list is getting smaller and smaller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate BatchCoalescer into HashJoinExec and remove from CoalesceBatches optimization rule

2 participants