[improvement](be) Eliminate redundant MultiCast block copies#63580
Merged
zclllyybb merged 1 commit intoMay 25, 2026
Conversation
### What problem does this PR solve? Issue Number: N/A Related PR: apache#60386 Problem Summary: MultiCastDataStreamer already shares pulled blocks through the column copy-on-write contract. The previous pull path still cloned every column for each consumer through _copy_block(), which added unnecessary per-consumer allocation and copy work. The copy-on-write assertion changes and regression coverage from the original attempt are already present on current master, so this change keeps the pull completion accounting in _finish_pull() and returns the shared block directly. ### Release note None ### Check List (For Author) - Test: - Build: ./build.sh --be --fe -j90 - Regression test: ./run-regression-test.sh --run -d query_p0/cte -s test_cte_multicast_complex - Format: build-support/clang-format.sh and build-support/check-format.sh with clang-format v16 - Static analysis: build-support/run-clang-tidy.sh --base upstream/master --build-dir be/build_ASAN - Behavior changed: No - Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Author
|
/review |
Contributor
There was a problem hiding this comment.
Review result: no blocking findings found in this PR.
Critical checkpoint conclusions:
- Goal and proof: The change removes redundant per-consumer
Blockcolumn cloning inMultiCastDataStreamer::pull; existing COW paths and the referencedtest_cte_multicast_complexcoverage exercise shared multicast consumers with filters/projections. - Scope: The modification is small and focused on replacing
_copy_blockwith_finish_pullaccounting; no unrelated code paths are changed. - Concurrency: The streamer mutex still protects queue positions and finish accounting. Returned blocks hold
ColumnPtrreferences, so popping the queued block after all consumers pull does not invalidate data; downstream mutation paths use column COW for shared columns. - Lifecycle/static initialization: No new special lifecycle or static/global initialization behavior is introduced.
- Configuration: No configuration items are added.
- Compatibility: No storage format, protocol, function symbol, or serialization compatibility change is introduced.
- Parallel paths: The spill-read cached block path remains independent and already returns owned blocks from spill; the in-memory multicast path is the only path affected by the removed clone.
- Conditional checks: No new conditional behavior is added.
- Tests: The PR reports BE/FE build, targeted regression test
query_p0/cte/test_cte_multicast_complex, format, and clang-tidy. I did not rerun them in this review. - Test result correctness: No
.outor expected-result files are modified by this PR. - Observability: No new observability appears necessary for this internal performance improvement.
- Transactions/persistence/data writes: Not applicable; the change is in pipeline block forwarding only.
- FE/BE variable passing: Not applicable.
- Performance: The change removes obvious redundant per-consumer clone work while relying on existing COW semantics, matching the stated performance goal.
- Additional review focus: The focus file contained no additional user-provided review focus.
Residual risk: This review did not execute the build or regression test locally; it is based on code inspection and the PR-reported validation.
Contributor
TPC-H: Total hot run time: 31209 ms |
Contributor
TPC-DS: Total hot run time: 173217 ms |
Contributor
Author
|
run beut |
Mryange
approved these changes
May 25, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: N/A
Related PR: #60386
Problem Summary: MultiCastDataStreamer already shares pulled blocks through the column copy-on-write contract. The previous pull path still cloned every column for each consumer through _copy_block(), which added unnecessary per-consumer allocation and copy work. The copy-on-write assertion changes and regression coverage from the original attempt are already present on current master, so this change keeps the pull completion accounting in _finish_pull() and returns the shared block directly.
test performance with:
result:
Release note
None
Check List (For Author)