[opt](nereids) rewrite PreferPushDownProject to slots before filter-join pushdown#61635
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 26542 ms |
TPC-DS: Total hot run time: 167594 ms |
|
/review |
There was a problem hiding this comment.
Code Review Summary: PushDownFilterThroughJoin PreferPushDownProject Rewrite
Overview
This PR adds handling for PreferPushDownProject expressions in filter predicates above joins. When a filter predicate (especially an OR predicate) contains a PreferPushDownProject expression (e.g., MATCH_ANY, element_at, MapKeys), the expression is replaced with an alias slot and pushed into a LogicalProject under the appropriate join child. This enables storage-level pruning acceleration for complex-type and inverted-index operations.
The implementation follows the same well-established pattern used by PushDownProject.java (rewriteDownShortCircuit + alias replacement + project insertion).
Critical Checkpoint Conclusions
-
Goal and correctness: The code accomplishes the stated goal. The
rewritePreferPushDownProjects()method correctly identifiesPreferPushDownProjectexpressions, replaces them with slot references, and insertsLogicalProjectnodes under the matching join child. The short-circuit rewrite correctly handles nestedPreferPushDownProjectby replacing the outermost one. Expressions spanning both sides gracefully fall through unchanged. -
Modification scope: The change is focused and minimal -- only
PushDownFilterThroughJoin.javalogic is modified, plus one test. -
Concurrency: No concurrency concerns. This is a single-threaded rewrite rule operating on immutable plan trees.
-
Lifecycle / static init: No lifecycle or static initialization concerns. The
INSTANCEsingleton is already established. -
Configuration: No new config items added. None needed.
-
Incompatible changes: None. This is an optimizer-internal rewrite change with no storage/protocol impact.
-
Parallel code paths:
PushDownProject.javahandles the samePreferPushDownProjectmarker for join conjuncts and project-above-join patterns. This PR covers the filter-above-join pattern specifically. The two rules sit in the samePUSH_DOWN_FILTERSbatch (positions 6 and 17 respectively) and operate on different plan shapes, so no conflict. -
Rewrite-loop risk: None.
PreferPushDownProjectexpressions are consumed (replaced with plainSlotReferences), so the rule will not re-match on the same expressions. -
Mark join semantics: Preserved. The new
LogicalJoinis constructed withjoin.getMarkJoinConjuncts()andjoin.getMarkJoinSlotReference()intact. -
Extra output slots: The inserted
LogicalProjectadds extra alias slots to the join child's output, which propagate through the join. This is safe -- the rewritten predicate references these slots, andColumnPruning(in a subsequent rewrite topic) will prune any unreferenced extras. This matches the existing pattern inPushDownProject. -
Test coverage: Insufficient. See inline comments for details. Only one scenario (left-side MatchAny in OR, INNER_JOIN) is tested. The assertion is weak (see inline comment). Missing: right-side pushdown, both-sides simultaneous, non-inner joins, dedup path, cross-side expression (negative case), regression test.
-
Observability: No new observability needed for this optimizer-internal change.
-
Performance: The early-exit check at line 172 (
noneMatch(containsType)) ensures zero overhead for the common case where noPreferPushDownProjectexists. ThecontainsTypecheck is O(1) BitSet lookup. No concerns.
Issues Found
- [Medium] Test coverage is thin -- only 1 of ~9 meaningful scenarios covered. See inline comments.
- [Low] Test assertion weakness --
anyMatch(SlotReference.class::isInstance)is trivially satisfied by existing right-side slots. See inline comment. - [Suggestion] A regression test with actual SQL (e.g.,
MATCH_ANYorelement_atin a join filter) would provide end-to-end validation through the full optimizer pipeline.
| Expression rewrittenPredicate = ImmutableList.copyOf(filter.getConjuncts()).get(0); | ||
| return rewrittenPredicate instanceof Or | ||
| && rewrittenPredicate.anyMatch(SlotReference.class::isInstance) | ||
| && !rewrittenPredicate.anyMatch(PreferPushDownProject.class::isInstance); |
There was a problem hiding this comment.
[Low] Weak assertion: rewrittenPredicate.anyMatch(SlotReference.class::isInstance) is trivially satisfied by rScore.getOutput().get(2) (the SlotReference inside rightSidePredicate). This assertion would pass even if the PreferPushDownProject expression simply disappeared rather than being correctly replaced with a new slot.
Consider a stronger assertion, e.g., verifying that the Or's first child is a SlotReference (the replacement slot), or extracting the alias slot from the project and checking it appears in the predicate:
// e.g., check the Or's left child is a SlotReference that didn't exist before
return rewrittenPredicate instanceof Or
&& ((Or) rewrittenPredicate).left() instanceof SlotReference
&& !rewrittenPredicate.anyMatch(PreferPushDownProject.class::isInstance);|
|
||
| @Test | ||
| public void shouldRewritePreferPushDownProjectInOrFilterToSlot() { | ||
| Expression preferPushDownProjectExpr = new MatchAny( |
There was a problem hiding this comment.
[Medium] Insufficient test coverage: This single test only covers one scenario: a MatchAny on the left side of an INNER_JOIN inside an OR predicate. Consider adding tests for:
- Right-side pushdown --
PreferPushDownProjectexpression referencing only right-child slots (exerciseschildIndexToPushedAlias.get(1)path, currently untested). - Both sides simultaneously -- two
PreferPushDownProjectexpressions, one per side, verifying two separateLogicalProjectnodes are inserted. - Dedup path -- the same
PreferPushDownProjectexpression appearing in multiple predicates, verifyingoldExprToNewSlotcache works correctly. - Cross-side expression (negative case) -- a
PreferPushDownProjectwhose input slots span both sides, verifying it remains unrewritten. - Non-inner join types -- verify project insertion works correctly with
LEFT_OUTER_JOINetc., where the filter remains above the join. - Standalone PreferPushDownProject predicate -- not wrapped in OR, verifying it gets rewritten and can then be pushed to one side.
airborne12
left a comment
There was a problem hiding this comment.
LGTM. The approach of rewriting PreferPushDownProject expressions to slot aliases before filter-join pushdown is sound and correct. It cleanly enables storage-level pruning for expressions embedded in OR predicates that couldn't be pushed down individually.
Highlights:
- Deduplication via
oldExprToNewSlotis a nice touch - Early return when no PreferPushDownProject exists avoids unnecessary work
rewriteDownShortCircuitusage is precise
Minor suggestions (non-blocking):
- Consider adding tests for OUTER JOIN, cross-side expressions (should not rewrite), and dedup verification
- The rewrite currently applies to all predicates containing PreferPushDownProject, including standalone pushable conjuncts — these could be skipped for slightly cleaner plan trees
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/PushDownFilterThroughJoin.java
Outdated
Show resolved
Hide resolved
Add filter(join) handling in PushDownProject to push PreferPushDownProject expressions to matching join children as aliases, and replace them with slots inside filter predicates.
This keeps OR filter semantics while exposing nested-field access below join for storage pruning.
Plan tree demo:
Before:
LogicalFilter((match_any(info['context'], 'abc') OR r.k2 > 60))
LogicalJoin(INNER)
left: LogicalOlapScan(student)
right: LogicalOlapScan(score)
After:
LogicalFilter((slot#x OR r.k2 > 60))
LogicalJoin(INNER)
left: LogicalProject(student.*, match_any(info['context'], 'abc') AS slot#x)
LogicalOlapScan(student)
right: LogicalOlapScan(score)
e8eb0bc to
aab1255
Compare
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 26535 ms |
TPC-DS: Total hot run time: 169309 ms |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
…oin pushdown (#61635) Push PreferPushDownProject expressions to the matching join child as aliases and replace them with slots in filter predicates. This keeps OR predicates above join while exposing nested-field access below join for storage-level pruning acceleration. Plan tree demo: Before: ``` LogicalFilter((match_any(info['context'], 'abc') OR r.k2 > 60)) LogicalJoin(INNER) left: LogicalOlapScan(student) right: LogicalOlapScan(score) ``` After: ``` LogicalFilter((slot#x OR r.k2 > 60)) LogicalJoin(INNER) left: LogicalProject(student.*, match_any(info['context'], 'abc') AS slot#x) LogicalOlapScan(student) right: LogicalOlapScan(score) ```
…oin pushdown (#61635) Push PreferPushDownProject expressions to the matching join child as aliases and replace them with slots in filter predicates. This keeps OR predicates above join while exposing nested-field access below join for storage-level pruning acceleration. Plan tree demo: Before: ``` LogicalFilter((match_any(info['context'], 'abc') OR r.k2 > 60)) LogicalJoin(INNER) left: LogicalOlapScan(student) right: LogicalOlapScan(score) ``` After: ``` LogicalFilter((slot#x OR r.k2 > 60)) LogicalJoin(INNER) left: LogicalProject(student.*, match_any(info['context'], 'abc') AS slot#x) LogicalOlapScan(student) right: LogicalOlapScan(score) ```
…mn metadata When MATCH expressions reference alias slots that have lost column metadata (e.g., CAST(variant_col['subkey'] AS VARCHAR) AS fn in CTE/subquery + JOIN), visitMatch() previously threw "SlotReference in Match failed to get Column". Now gracefully falls back to invertedIndex = null instead of throwing. The BE evaluates MATCH correctly via slow-path expression evaluation, or the PushDownProject / PushDownMatchProjectionAsVirtualColumn rules provide fast-path index evaluation. This change works together with PR apache#61635 which pushes PreferPushDownProject expressions (like match_any on variant sub-fields) down through joins as aliased projections.
Push PreferPushDownProject expressions to the matching join child as aliases and replace them with slots in filter predicates. This keeps OR predicates above join while exposing nested-field access below join for storage-level pruning acceleration.
Plan tree demo:
Before:
After:
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)