[fix](be) Preserve borrowed block data for http send#63258
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Prevent broadcast exchange HTTP sends from consuming column_values owned by shared broadcast blocks. Keep the default owned-request attachment path moving column_values out of the request, and add a restore mode for borrowed block storage. Document intentional protobuf set_allocated ownership patterns to avoid unsafe future rewrites.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter=ProtoUtilTest.*
- Behavior changed: Yes (broadcast exchange HTTP sends preserve borrowed column_values for later channel sends)
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve? Issue Number: None Related PR: None Problem Summary: Apply clang-format to the proto attachment ownership guard added for borrowed column values. ### Release note None ### Check List (For Author) - Test: No need to test (format-only change) - Behavior changed: No - Does this need documentation: No
|
run buildall |
|
/review |
There was a problem hiding this comment.
Summary: I found one blocking issue in the HTTP borrowed-block fix. The new restore path can mutate a shared broadcast PBlock concurrently across destination RPC instances, which can crash or race when multiple queued broadcast sends enter the HTTP helper at the same time.
Critical checkpoint conclusions:
- Goal/test: The PR targets preserving broadcast column_values for HTTP large-block sends and adds focused unit tests for owned vs borrowed helper behavior, but the tests do not cover the multi-destination concurrent broadcast lifecycle where the borrowed object is shared.
- Scope: The change is small and focused, but the helper-level restore flag hides an important ownership/concurrency precondition.
- Concurrency: Blocking issue found. BroadcastPBlockHolder is shared across remote channels/RpcInstances; different brpc callback threads can process the same holder concurrently after queued prior RPCs complete. The new release/restore mutates the shared protobuf without a common lock.
- Lifecycle: The scoped restore handles single-call lifetime, but not shared-holder concurrent lifetime.
- Configuration/compatibility: No new config or protocol/storage format incompatibility observed.
- Parallel paths: Non-broadcast and non-HTTP paths remain unchanged; the issue is specific to broadcast HTTP borrowed block handling.
- Tests: Unit tests cover helper behavior in isolation, but miss the concurrent broadcast exchange scenario.
- Observability: No new observability required for the intended fix.
- Transaction/persistence/data visibility: Not applicable to this PR.
- Performance: The intended no-copy behavior is preserved, but the fix must avoid shared mutation or serialize access.
- User focus: No additional user-provided review focus was specified.
| // column_values field so the serialized request stays small, then restore it before | ||
| // returning so the real owner can still be reused by later sends. | ||
| auto* column_values = block->release_column_values(); | ||
| DORIS_CHECK(column_values != nullptr); |
There was a problem hiding this comment.
This restore mode mutates the borrowed PBlock by calling release_column_values(), but broadcast blocks are shared across remote channels and can be processed concurrently by different RpcInstance callback threads. A concrete path is: an unpartitioned exchange enqueues the same BroadcastPBlockHolder to multiple remote channels while those destinations already have in-flight RPCs; when those prior RPCs complete around the same time, each callback calls _send_rpc() for its destination and reaches this HTTP helper for the same holder. The first thread temporarily removes column_values; the second thread races on the same protobuf and can see nullptr here (hitting DORIS_CHECK) or otherwise race with the restore. Please avoid mutating the shared PBlock during attachment embedding, or add synchronization that covers all users of the shared holder.
TPC-H: Total hot run time: 29606 ms |
TPC-DS: Total hot run time: 170079 ms |
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#63258 Problem Summary: Avoid release/restore mutation of a shared broadcast PBlock when embedding HTTP attachment data. The broadcast HTTP path now sends a lightweight request containing only block metadata and borrows column_values as read-only attachment data. ### Release note None ### Check List (For Author) - Test: Unit Test - ./run-be-ut.sh --run --filter=ProtoUtilTest.* - build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (fails on existing be/src/core/types.h unmatched NOLINTEND and toolchain/system stddef.h diagnostic) - Behavior changed: No - Does this need documentation: No
|
/review |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Review completed. I did not find additional blocking issues beyond the already-known inline thread, and the current head no longer mutates the shared broadcast PBlock in the HTTP path.
Critical checkpoint conclusions:
- Goal/test: The change targets preserving borrowed broadcast column_values during HTTP attachment sends. The current code accomplishes that by sending copied request/block metadata and read-only attachment data; focused ProtoUtil unit coverage is present for owned vs borrowed attachment embedding.
- Scope: The implementation is limited to the exchange HTTP broadcast path plus helper refactoring and ownership comments.
- Concurrency: The previously raised release/restore mutation race is avoided on the updated path. The per-RpcInstance send state machine and mutex usage remain unchanged; no new shared mutable state was introduced.
- Lifecycle/ownership: AutoReleaseClosure owns the lightweight HTTP request for the RPC lifetime, while BroadcastPBlockHolder remains the owner of the shared PBlock. The scoped protobuf set_allocated ownership comments match the release calls in the same function.
- Configuration/compatibility: No new config or protocol field was added; copied PTransmitDataParams/PBlock fields cover the current proto schema.
- Parallel paths: Non-broadcast and non-HTTP paths keep the existing behavior. Multi-block broadcast behavior remains unchanged because the HTTP helper is only used when the single block field is present.
- Error handling: Status results from attachment embedding and HTTP client setup continue to be propagated.
- Performance/memory: The extra metadata copy happens only on the large HTTP broadcast path and avoids copying column_values; no new untracked large allocation was introduced beyond the existing attachment construction.
- Tests: I did not run tests in this review. The PR states ./run-be-ut.sh --run --filter=ProtoUtilTest.* was run.
- User focus: No additional user-provided review focus was present.
|
run buildall |
TPC-H: Total hot run time: 29615 ms |
TPC-DS: Total hot run time: 171837 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Broadcast exchange sends can borrow a shared
PBlockthroughset_allocated_block. When such a request takes the HTTP attachment path, the helper must removecolumn_valuesfrom the protobuf request before serialization, but it must not permanently consume data owned by the shared broadcast block. This PR adds an explicit restore mode for borrowed block storage while keeping the default owned-request path unchanged.It also documents intentional protobuf
set_allocated_*ownership patterns in exchange sink buffer and rowset meta code, and adds focused unit coverage for the attachment helper's owned vs borrowed behavior.Release note
None
Check List (For Author)
./run-be-ut.sh --run --filter=ProtoUtilTest.*column_valuesfor later channel sends)