[fix](fe) Avoid OFFSET path and NULL path for complex datatype when its children are accessed.#63229
Conversation
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: cardinality/map_size on element_at(map, key) was collected as an OFFSET-only access path. element_at(map, key) still needs map keys for lookup, and pushing a nested *.OFFSET predicate path can make BE route OFFSET to an array item child and fail with an invalid access path. Fall back to normal element access for these expressions while preserving OFFSET-only optimization for direct array/map cardinality.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- tools/fast-compile-fe.sh
- FE_UT_PARALLEL=0 ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testCardinalityMapElementDoesNotUseOffsetPath
- FE_UT_PARALLEL=0 ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testStructRootMapMixedAccessKeepsKeysPath+testCardinalityMapElementDoesNotUseOffsetPath
- cd fe && mvn checkstyle:check -pl fe-core -q
- ./build.sh --fe
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
TPC-H: Total hot run time: 29771 ms |
TPC-DS: Total hot run time: 170352 ms |
FE UT Coverage ReportIncrement line coverage |
|
/review |
|
run buildall |
|
OpenCode automated review failed and did not complete. Error: Review step was skipped (possibly timeout or cancelled) Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
TPC-H: Total hot run time: 29523 ms |
TPC-DS: Total hot run time: 170201 ms |
There was a problem hiding this comment.
I found one blocking gap in the nested-column pruning fix: the new guard only recognizes a bare ElementAt, so the same invalid *.OFFSET access path can still be produced when the map lookup is wrapped by a transparent cast.
Critical checkpoints: goal is clear and partially achieved for direct cardinality(element_at(map, key))/map_size(element_at(map, key)); implementation is small and focused; no concurrency, lifecycle, configuration, compatibility, transaction, persistence, or FE/BE protocol changes are involved; functionally parallel wrapped-expression paths are not fully covered; test coverage covers direct array/map cases but misses cast/alias wrappers around map element access; no new observability needed.
User focus: no additional user-provided review focus was present.
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 29372 ms |
TPC-DS: Total hot run time: 171114 ms |
TPC-H: Total hot run time: 29619 ms |
TPC-DS: Total hot run time: 171088 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Nested column pruning could leave predicate access paths that were no longer present in all access paths after NULL/OFFSET-only paths were stripped because child data paths covered the same complex container. This broke the assumption that predicate access paths are a subset of all access paths and could make BE enter NULL_MAP_ONLY/OFFSET_ONLY mode while child data still needed to be read. Normalize predicate paths against final all paths and update regression coverage for NULL and OFFSET cases.
### Release note
None
### Check List (For Author)
- Test: Regression test
- tools/fast-compile-fe.sh fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java
- tools/fast-compile-fe.sh --test fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PruneNestedColumnTest.java
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s null_column_pruning
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one blocking correctness issue. The PR improves the nullable/offset path subset handling for mixed projection+predicate cases, but the new predicate filtering still uses the pre-normalized collected paths. That misses the existing map-value offset-only normalization path, so predicateAccessPaths can remain outside allAccessPaths for predicate-only cardinality(map_col['k']) queries.
Critical checkpoint conclusions:
- Goal/test: The goal is to keep BE-safe all/predicate nested access paths for null and offset pruning. The added tests cover mixed projection+predicate cases, but not predicate-only map element cardinality after special map-path normalization.
- Scope: The change is mostly focused and localized to Nereids nested-column pruning plus tests.
- Concurrency/lifecycle/config/compatibility: No new concurrency, special lifecycle, config, storage format, or FE-BE protocol compatibility issue found.
- Parallel paths: A functionally parallel special path (
hasMapValueOffsetOnlyAccess) is not covered by the new subset filtering. - Data correctness: Blocking issue below can send BE predicate paths that are not present in the emitted all paths.
- Tests: Unit/regression coverage was added, but the missing predicate-only map element cardinality case should be covered.
- Observability/performance: No additional observability or performance concerns found.
User focus: no additional user-provided review focus was specified.
TPC-H: Total hot run time: 29392 ms |
TPC-DS: Total hot run time: 170638 ms |
FE Regression Coverage ReportIncrement line coverage |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: When a field is read in full, NestedColumnPruning could still keep an exact metadata-only path for the same field, such as s.NULL or s.OFFSET. This may route BE iterators into metadata-only mode even though the data path is required.
### Release note
None
### Check List (For Author)
- Test: Regression test / Unit Test
- tools/fast-compile-fe.sh fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java
- tools/fast-compile-fe.sh --test fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PruneNestedColumnTest.java
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s null_column_pruning -forceGenOut
- ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning -forceGenOut
- Behavior changed: Yes. Metadata-only NULL/OFFSET access paths are removed when the exact field is also read in full.
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add explanatory comments and examples for NestedColumnPruning helper methods that remove redundant NULL/OFFSET access paths and keep predicate paths consistent with all access paths.
### Release note
None
### Check List (For Author)
- Test: No need to test (comment-only change)
- cd fe && mvn checkstyle:check -pl fe-core -Dcheckstyle.includes='**/NestedColumnPruning.java'
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
TPC-H: Total hot run time: 31032 ms |
TPC-DS: Total hot run time: 169017 ms |
FE Regression Coverage ReportIncrement line coverage |
### What problem does this PR solve? Issue Number: None Related PR: apache#63229 Problem Summary: Predicate access path cleanup checked raw allAccessPaths instead of the final paths emitted to BE. When whole-column access collapsed all paths to the root column, redundant child NULL predicate paths could remain and fail nested column pruning invariants. ### Release note None ### Check List (For Author) - Test: Unit Test - FE_UT_PARALLEL=0 ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testProjectFilter - FE_UT_PARALLEL=0 ./run-fe-ut.sh --run 'org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testMapElementLengthWithMapValuesKeepsKeysPath+testStructRootMapMixedAccessKeepsKeysPath+testCardinalityMapElementKeepsValueOffsetPath+testCardinalityMapElementOffsetCoveredByValueFieldAccess+testMapElementArrayNullPathCoveredByValueFieldAccess' - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
TPC-H: Total hot run time: 31357 ms |
TPC-DS: Total hot run time: 168900 ms |
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Nested column pruning could treat IF result null checks as full branch value access, and whole-column predicate path cleanup could try to mutate an immutable path list. This caused pruning to fail and fallback to the original struct type for mixed IF null-check and child-field predicates.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testFilter
- org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest
- Behavior changed: No
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
TPC-H: Total hot run time: 30764 ms |
TPC-DS: Total hot run time: 168514 ms |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #62205
Problem Summary: cardinality/map_size on element_at(map, key) was collected as an OFFSET-only access path. element_at(map, key) still needs map keys for lookup, and pushing a nested *.OFFSET predicate path can make BE route OFFSET to an array item child and fail with an invalid access path. Fall back to normal element access for these expressions while preserving OFFSET-only optimization for direct array/map cardinality.
Release note
None
Check List (For Author)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)