Skip to content

[feature](be) Add deferred read plan for pruned complex columns#63780

Open
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:codex/lazy-pruned-complex-plan
Open

[feature](be) Add deferred read plan for pruned complex columns#63780
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:codex/lazy-pruned-complex-plan

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 28, 2026

What problem does this PR solve?

Issue Number: None

Related PR: #59263

Problem Summary: Nested column pruning can read only predicate subpaths before filtering. When a remaining conjunct also needs the same complex column as a common expression, pruned non-predicate subpaths still need to be materialized after row filtering. This change introduces an explicit deferred read phase for pruned complex columns. Predicate reads keep parent metadata and predicate subpaths, deferred reads materialize the remaining nested targets by selected rowids, and the finalization step removes temporary default placeholders before returning the column. The implementation keeps the phase and nested read plan on the iterator, propagates predicate/deferred intent through array/map/struct children, and exposes the prune switch through TQueryOptions.

Release note

Improve nested column pruning read behavior for complex columns used by both predicates and remaining expressions.

Check List (For Author)

  • Test: Unit Test
    • DORIS_HOME=/mnt/disk7/hushenggang/doris-abs ninja -C be/ut_build_ASAN src/storage/CMakeFiles/Storage.dir/segment/column_reader.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment_iterator.cpp.o test/CMakeFiles/doris_be_test.dir/storage/segment/column_reader_test.cpp.o
    • build-support/clang-format.sh
    • git diff --check
    • Not run: full ColumnReaderTest binary and regression suite in this worktree.
  • Behavior changed: Yes. Pruned complex common-expression columns can defer non-predicate nested materialization until after filtering.
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: apache#59263

Problem Summary: Nested column pruning can read only predicate subpaths before filtering. When a remaining conjunct also needs the same complex column as a common expression, pruned non-predicate subpaths still need to be materialized after row filtering. This change introduces an explicit deferred read phase for pruned complex columns. Predicate reads keep parent metadata and predicate subpaths, deferred reads materialize the remaining nested targets by selected rowids, and the finalization step removes temporary default placeholders before returning the column. The implementation keeps the phase and nested read plan on the iterator, propagates predicate/deferred intent through array/map/struct children, and exposes the prune switch through TQueryOptions.

### Release note

Improve nested column pruning read behavior for complex columns used by both predicates and remaining expressions.

### Check List (For Author)

- Test: Unit Test
    - DORIS_HOME=/mnt/disk7/hushenggang/doris-abs ninja -C be/ut_build_ASAN src/storage/CMakeFiles/Storage.dir/segment/column_reader.cpp.o src/storage/CMakeFiles/Storage.dir/segment/segment_iterator.cpp.o test/CMakeFiles/doris_be_test.dir/storage/segment/column_reader_test.cpp.o
    - build-support/clang-format.sh
    - git diff --check
- Behavior changed: Yes. Pruned complex common-expression columns can defer non-predicate nested materialization until after filtering.
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 28, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking build issue. The new deferred nested read path uses DorisVector in segment_iterator.cpp, but this translation unit does not include core/custom_allocator.h, which is where DorisVector is defined. Please add the direct include or use std::vector here.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to propagate enable_prune_nested_column to BE and split nested predicate/deferred reads; regression and BE unit coverage were added, but the code currently has a compile blocker.
  • Scope/focus: The change is focused on nested-column pruning and lazy materialization.
  • Concurrency/lifecycle: No new shared mutable state, threads, locks, or special static lifecycle were introduced in the reviewed paths.
  • Config/protocol: A thrift query option is added and FE sets it; no incompatible storage format change observed.
  • Parallel paths: Query scan path is gated to READER_QUERY; compaction paths are not enabled for this feature.
  • Tests/results: Tests were added for nested access paths and regression output, but they cannot validate the PR until the BE compile issue is fixed.
  • Observability/performance: Existing profile stats are extended for deferred nested read; no additional blocking observability issue found.
  • Transaction/persistence/data writes: Not applicable; this is a read-path/query-option change.
  • User focus: No additional user-provided review focus was specified.

if (!_deferred_nested_columns.empty()) {
SCOPED_RAW_TIMER(&_opts.stats->deferred_nested_read_ns);
DorisVector<rowid_t> rowids(_selected_size);
for (size_t i = 0; i < _selected_size; ++i) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new use of DorisVector needs a direct include of core/custom_allocator.h in this translation unit. DorisVector is only defined there, and segment_iterator.cpp does not currently include that header, so this should fail to compile when the deferred nested read code is built. Please add the include (or use std::vector here if the custom allocator is not required).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants