[Bug](scan) Preserve IN_LIST runtime filter predicates when key range…#62115
[Bug](scan) Preserve IN_LIST runtime filter predicates when key range…#62115yiguolei merged 1 commit intoapache:branch-4.0from
Conversation
… is a scope range (apache#62027) This pull request addresses a bug in the OLAP scan operator where `IN_LIST` predicates could be incorrectly erased when both `MINMAX` and `IN` runtime filters targeted the same key column, and the number of `IN` values exceeded the maximum allowed for pushdown. The changes ensure that `IN_LIST` predicates are preserved in such cases, preventing incorrect query results. Additionally, a regression test is added to verify the fix. **Bug fix in predicate handling:** * Modified the logic in `_build_key_ranges_and_filters()` within `olap_scan_operator.cpp` to ensure that `IN_LIST` predicates are not erased when the key range is a scope range (e.g., `>= X AND <= Y`) and the `IN` filter's value count exceeds `max_pushdown_conditions_per_column`. This preserves filtering semantics that are not captured by the scope range. [[1]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R972) [[2]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292R986) [[3]](diffhunk://#diff-3ddc75656071d9c0e6b0be450e152a1c94559f7e70ea820e7f0c80a7078e3292L986-R1013) * Enhanced the profiling output in `_process_conjuncts()` to accurately reflect the set of predicates that will reach the storage layer after key range and filter construction. This helps with debugging and verification of predicate pushdown. **Testing and regression coverage:** * Added a new regression test `test_rf_in_list_not_erased_by_scope_range.groovy` to verify that `IN_LIST` predicates are not incorrectly erased when both `MINMAX` and `IN` filters are present and the `IN` list is too large to be absorbed into the key range. * Added the corresponding expected output file for the new regression test.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Fixes an OLAP scan predicate-pushdown bug where IN_LIST runtime filter predicates could be incorrectly removed when a MINMAX runtime filter produced a scope key range on the same key column (especially when IN values exceed max_pushdown_conditions_per_column), leading to incorrect filtering semantics reaching storage.
Changes:
- Adjusted key-range construction in
OlapScanLocalState::_build_key_ranges_and_filters()to only erase predicates that are truly subsumed by the generated scan key range (preservingIN_LISTfor scope ranges). - Centralized/removed
ColumnPredicate::could_be_erased()overrides and moved predicate-erasure decision logic into the OLAP scan operator based on predicate type + range shape (fixed vs scope). - Added a regression test + expected output to validate that
IN_LISTpredicates are not erased by scope ranges.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/correctness_p0/test_rf_in_list_not_erased_by_scope_range.groovy | Adds a regression test reproducing the scope-range + oversized-IN runtime filter scenario. |
| regression-test/data/correctness_p0/test_rf_in_list_not_erased_by_scope_range.out | Expected output for the new regression test. |
| be/src/pipeline/exec/scan_operator.cpp | Updates profiling to report pushdown predicates after conjunct processing (more accurate post-normalization view). |
| be/src/pipeline/exec/olap_scan_operator.cpp | Fixes predicate erasure logic when key range is exact but only a scope range (preserve IN_LIST when not subsumed). |
| be/src/olap/null_predicate.h | Removes could_be_erased() override (logic moved to OLAP scan operator). |
| be/src/olap/in_list_predicate.h | Removes could_be_erased() override (logic moved to OLAP scan operator). |
| be/src/olap/comparison_predicate.h | Removes could_be_erased() override (logic moved to OLAP scan operator). |
| be/src/olap/column_predicate.h | Removes the could_be_erased() API from the base predicate interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Use both IN and MIN_MAX runtime filter types so both are generated on the join key. | ||
| sql "set runtime_filter_type = 'IN_OR_BLOOM_FILTER,MIN_MAX';" |
There was a problem hiding this comment.
This regression test is intended to exercise the IN_LIST runtime filter path, but runtime_filter_type = 'IN_OR_BLOOM_FILTER,MIN_MAX' can legitimately switch to a Bloom filter (based on runtime_filter_max_in_num / build-side cardinality). If it switches to Bloom, the IN_LIST predicate isn’t generated and the test may no longer cover the bug it’s meant to prevent. Consider forcing an IN filter here (e.g. use runtime_filter_type = 'IN,MIN_MAX' and/or explicitly set runtime_filter_max_in_num to a value >= the build-side distinct count) so the test reliably validates IN_LIST-not-erased behavior.
| // Use both IN and MIN_MAX runtime filter types so both are generated on the join key. | |
| sql "set runtime_filter_type = 'IN_OR_BLOOM_FILTER,MIN_MAX';" | |
| // Force the runtime filter path to generate an IN_LIST predicate together with MIN_MAX. | |
| // Also set runtime_filter_max_in_num above the 6 distinct build-side keys so the | |
| // engine cannot legitimately switch this test to a Bloom filter. | |
| sql "set runtime_filter_type = 'IN,MIN_MAX';" | |
| sql "set runtime_filter_max_in_num = 16;" |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
… is a scope range (#62027)
This pull request addresses a bug in the OLAP scan operator where
IN_LISTpredicates could be incorrectly erased when bothMINMAXandINruntime filters targeted the same key column, and the number ofINvalues exceeded the maximum allowed for pushdown. The changes ensure thatIN_LISTpredicates are preserved in such cases, preventing incorrect query results. Additionally, a regression test is added to verify the fix.Bug fix in predicate handling:
Modified the logic in
_build_key_ranges_and_filters()withinolap_scan_operator.cppto ensure thatIN_LISTpredicates are not erased when the key range is a scope range (e.g.,>= X AND <= Y) and theINfilter's value count exceedsmax_pushdown_conditions_per_column. This preserves filtering semantics that are not captured by the scope range.[1] [2] [3]
Enhanced the profiling output in
_process_conjuncts()to accurately reflect the set of predicates that will reach the storage layer after key range and filter construction. This helps with debugging and verification of predicate pushdown.Testing and regression coverage:
Added a new regression test
test_rf_in_list_not_erased_by_scope_range.groovyto verify thatIN_LISTpredicates are not incorrectly erased when bothMINMAXandINfilters are present and theINlist is too large to be absorbed into the key range.Added the corresponding expected output file for the new regression test.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)