[feature](be) Add deferred read plan for pruned complex columns#63780
[feature](be) Add deferred read plan for pruned complex columns#63780mrhhsg wants to merge 1 commit into
Conversation
### What problem does this PR solve? Issue Number: None Related PR: apache#59263 Problem Summary: Nested column pruning can read only predicate subpaths before filtering. When a remaining conjunct also needs the same complex column as a common expression, pruned non-predicate subpaths still need to be materialized after row filtering. This change introduces an explicit deferred read phase for pruned complex columns. Predicate reads keep parent metadata and predicate subpaths, deferred reads materialize the remaining nested targets by selected rowids, and the finalization step removes temporary default placeholders before returning the column. The implementation keeps the phase and nested read plan on the iterator, propagates predicate/deferred intent through array/map/struct children, and exposes the prune switch through TQueryOptions. ### Release note Improve nested column pruning read behavior for complex columns used by both predicates and remaining expressions. ### Check List (For Author) - Test: Unit Test - DORIS_HOME=/mnt/disk7/hushenggang/doris-abs ninja -C be/ut_build_ASAN src/storage/CMakeFiles/Storage.dir/segment/column_reader.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment_iterator.cpp.o test/CMakeFiles/doris_be_test.dir/storage/segment/column_reader_test.cpp.o - build-support/clang-format.sh - git diff --check - Behavior changed: Yes. Pruned complex common-expression columns can defer non-predicate nested materialization until after filtering. - Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I found one blocking build issue. The new deferred nested read path uses DorisVector in segment_iterator.cpp, but this translation unit does not include core/custom_allocator.h, which is where DorisVector is defined. Please add the direct include or use std::vector here.
Critical checkpoint conclusions:
- Goal/test: The PR aims to propagate enable_prune_nested_column to BE and split nested predicate/deferred reads; regression and BE unit coverage were added, but the code currently has a compile blocker.
- Scope/focus: The change is focused on nested-column pruning and lazy materialization.
- Concurrency/lifecycle: No new shared mutable state, threads, locks, or special static lifecycle were introduced in the reviewed paths.
- Config/protocol: A thrift query option is added and FE sets it; no incompatible storage format change observed.
- Parallel paths: Query scan path is gated to READER_QUERY; compaction paths are not enabled for this feature.
- Tests/results: Tests were added for nested access paths and regression output, but they cannot validate the PR until the BE compile issue is fixed.
- Observability/performance: Existing profile stats are extended for deferred nested read; no additional blocking observability issue found.
- Transaction/persistence/data writes: Not applicable; this is a read-path/query-option change.
- User focus: No additional user-provided review focus was specified.
| if (!_deferred_nested_columns.empty()) { | ||
| SCOPED_RAW_TIMER(&_opts.stats->deferred_nested_read_ns); | ||
| DorisVector<rowid_t> rowids(_selected_size); | ||
| for (size_t i = 0; i < _selected_size; ++i) { |
There was a problem hiding this comment.
This new use of DorisVector needs a direct include of core/custom_allocator.h in this translation unit. DorisVector is only defined there, and segment_iterator.cpp does not currently include that header, so this should fail to compile when the deferred nested read code is built. Please add the include (or use std::vector here if the custom allocator is not required).
What problem does this PR solve?
Issue Number: None
Related PR: #59263
Problem Summary: Nested column pruning can read only predicate subpaths before filtering. When a remaining conjunct also needs the same complex column as a common expression, pruned non-predicate subpaths still need to be materialized after row filtering. This change introduces an explicit deferred read phase for pruned complex columns. Predicate reads keep parent metadata and predicate subpaths, deferred reads materialize the remaining nested targets by selected rowids, and the finalization step removes temporary default placeholders before returning the column. The implementation keeps the phase and nested read plan on the iterator, propagates predicate/deferred intent through array/map/struct children, and exposes the prune switch through TQueryOptions.
Release note
Improve nested column pruning read behavior for complex columns used by both predicates and remaining expressions.
Check List (For Author)
DORIS_HOME=/mnt/disk7/hushenggang/doris-abs ninja -C be/ut_build_ASAN src/storage/CMakeFiles/Storage.dir/segment/column_reader.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment_iterator.cpp.o test/CMakeFiles/doris_be_test.dir/storage/segment/column_reader_test.cpp.obuild-support/clang-format.shgit diff --checkColumnReaderTestbinary and regression suite in this worktree.