Skip to content

[branch-53] Cherry pick Skip probe-side consumption when hash join build side is empty (#21068)#101

Merged
gabotechs merged 1 commit intobranch-53from
branch-53-cherry-pick-1
Apr 16, 2026
Merged

[branch-53] Cherry pick Skip probe-side consumption when hash join build side is empty (#21068)#101
gabotechs merged 1 commit intobranch-53from
branch-53-cherry-pick-1

Conversation

@gabotechs
Copy link
Copy Markdown

Cherry picks apache#21068

…e#21068)

## Which issue does this PR close?

* Closes apache#20492.

## Rationale for this change

`HashJoinExec` currently continues polling and consuming the probe side
even after the build side has completed with zero rows.

For join types whose output is guaranteed to be empty when the build
side is empty, this work is unnecessary. In practice, it can trigger
large avoidable scans and extra compute despite producing no output.
This is especially costly for cases such as INNER, LEFT, LEFT SEMI, LEFT
ANTI, LEFT MARK, and RIGHT SEMI joins.

This change makes the stream state machine aware of that condition so
execution can terminate as soon as the build side is known to be empty
and no probe rows are needed to determine the final result.

The change also preserves the existing behavior for join types that
still require probe-side rows even when the build side is empty, such as
RIGHT, FULL, RIGHT ANTI, and RIGHT MARK joins.

## What changes are included in this PR?

* Added `JoinType::empty_build_side_produces_empty_result` to centralize
logic determining when an empty build side guarantees empty output.
* Updated `HashJoinStream` state transitions to:

* Skip transitioning to `FetchProbeBatch` when the build side is empty
and output is deterministically empty.
  * Immediately complete the stream in such cases.
* Refactored logic in `build_batch_empty_build_side` to reuse the new
helper method and simplify match branches.
* Ensured probe-side consumption still occurs for join types that
require probe rows (e.g., RIGHT, FULL).
* Added helper `state_after_build_ready` to unify post-build decision
logic.
* Introduced reusable helper for constructing hash joins with dynamic
filters in tests.

## Are these changes tested?

Yes, comprehensive tests have been added:

* Verified that probe side is **not consumed** when:

  * Build side is empty
  * Join type guarantees empty output
* Verified that probe side **is still consumed** when required by join
semantics (e.g., RIGHT, FULL joins)
* Covered both filtered and non-filtered joins
* Added tests ensuring correct behavior with dynamic filters
* Added regression test ensuring correct behavior after partition bounds
reporting

These tests validate both correctness and the intended optimization
behavior.

## Are there any user-facing changes?

No API changes.

However, this introduces a performance optimization:

* Queries involving joins with empty build sides may complete
significantly faster
* Reduced unnecessary IO and compute

No behavioral changes in query results.

## LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.

(cherry picked from commit 6c5e241)
@gabotechs gabotechs merged commit 42ecad5 into branch-53 Apr 16, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants