Fix dynamic filter pushdown to wrong table with same-named columns#20303
Closed
adriangb wants to merge 1 commit intoapache:mainfrom
Closed
Fix dynamic filter pushdown to wrong table with same-named columns#20303adriangb wants to merge 1 commit intoapache:mainfrom
adriangb wants to merge 1 commit intoapache:mainfrom
Conversation
When both sides of a join have columns with the same name (e.g. `k`), the dynamic filter from an outer join was incorrectly pushed to both children instead of only the correct one. This happened because `FilterColumnChecker` matched columns by name only. Fix `FilterColumnChecker` to match on `(name, index)` pairs, and use `column_indices` in `HashJoinExec::gather_filters_for_pushdown` to build per-side allowed-column sets via `from_child_with_allowed_columns`. Also adds `lr_is_preserved` to correctly gate pushdown for non-inner join types (Left, Right, Semi, Anti, Mark). Closes apache#20213 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
@jackkleeman it looks like we basically ended up with #20192. Maybe just copy over tests from here and confirm your PR fixes #20213 and we can discard this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #20213
Rationale for this change
When both sides of a join have columns with the same name (e.g.
k), the dynamic filter from an outer join was incorrectly pushed to both children instead of only the correct one. With small row groups this caused wrong results (0 rows instead of the expected result set).The root cause was that
FilterColumnCheckerinfilter_pushdown.rsmatched columns by name only. When the parent pushed a filter referencing columnkat index 2 (the right child'sk), the name-based check foundkin the left child's schema too, and incorrectly pushed the filter to both sides.What changes are included in this PR?
Approach adopted from #20192:
FilterColumnCheckernow matches on(name, index)pairs instead of just names, preventing incorrect cross-side pushdown when columns share namesChildFilterDescription::from_child_with_allowed_columns— new method that restricts pushdown to an explicit set of allowed(name, index)pairsChildFilterDescription::all_unsupported— helper to mark all filters unsupported for a childHashJoinExec::gather_filters_for_pushdown— builds per-side allowed-column sets fromcolumn_indices(+ optionalprojection), useslr_is_preservedto gate pushdown eligibility per join typelr_is_preserved— mirrors the logical optimizer's preserved-side logic, enabling parent filter pushdown for non-inner join types (Left, Right, Semi, Anti, Mark)Are these changes tested?
Yes:
lr_is_preservedcovering all join typesCOUNT(*)andSELECT *correctnessAre there any user-facing changes?
🤖 Generated with Claude Code