[fix](group commit) support forward group commit stream load#63594
[fix](group commit) support forward group commit stream load#63594mymeiyi wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for forwarding group-commit stream-load planning requests from the master FE to follower FEs (round-robin) to better utilize follower capacity, and introduces/adjusts related FE configuration plus a regression test for multi-FE follower mode.
Changes:
- Add master-side forwarding logic in
FrontendServiceImpl.streamLoadPut()for non-off_modegroup commit requests (guarded by a new config). - Introduce
enable_forward_group_commit_stream_loadFE config toggle. - Enable
enable_group_commit_streamload_be_forwardby default and add a regression test that exercises group-commit stream load against multiple FEs in follower mode.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| regression-test/suites/load_p0/stream_load/test_group_commit_stream_load_multi_follower.groovy | New docker regression test for group-commit stream load with 3 FEs in follower mode. |
| fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java | Forward group-commit streamLoadPut requests from master to follower FE via Thrift client pool. |
| fe/fe-common/src/main/java/org/apache/doris/common/Config.java | Add FE forwarding config and change default for BE-forwarding config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
There was a problem hiding this comment.
I found one additional blocking issue beyond the existing inline threads. The master now synchronously forwards group-commit streamLoadPut requests to a follower, but the follower's group-commit planning path calls back to the master to select the BE, so concurrent requests can occupy all master RPC workers waiting for followers while those followers are waiting for master RPC workers. Existing review threads already cover the config default change, table type/catch behavior, and test style issues, so I did not duplicate them.
Critical checkpoints: Goal: forwarding group-commit planning to followers is partially implemented and has a docker regression test, but the nested RPC path can fail under concurrency. Scope: change is small, but it introduces a cross-FE RPC cycle on a hot load path. Concurrency: relevant; master RPC worker threads can block waiting on follower RPCs while followers need master RPC capacity for getGroupCommitLoadBeId. Lifecycle/static init: no additional issue found. Config: new mutable master-only config is present; existing thread covers the separate default flip. Compatibility/persistence/data versioning: no incompatible storage/editlog change found. Parallel paths: follower-direct group-commit path exists and is affected by the new master-forward path. Tests: added test verifies success with a small number of serial loads, but does not cover the concurrent worker-exhaustion case. Observability: logs exist, no additional blocker. User focus: no additional user-provided focus points.
| try { | ||
| client = ClientPool.frontendPool.borrowObject(address); | ||
| TStreamLoadPutResult streamLoadPutResult = client.streamLoadPut(request); | ||
| ok = true; |
There was a problem hiding this comment.
This synchronous forward creates a nested FE RPC cycle for every forwarded group-commit load: the master RPC worker blocks here waiting for the follower, and the follower's StreamLoadHandler.generatePlan() calls GroupCommitManager.selectBackendForGroupCommit(), whose non-master branch calls back to the master via MasterOpExecutor.getGroupCommitLoadBeId(). With enough concurrent group-commit stream loads, all master RPC workers can be occupied waiting on followers while the followers are waiting for free master RPC workers, causing the loads to stall until RPC timeout. The previous follower-direct path only used one follower-to-master RPC and did not hold a master worker while waiting for that callback. Please avoid the master->follower->master cycle, or make the forwarded request carry the already-selected group-commit backend/table routing decision so the follower does not need to call back to the master.
There was a problem hiding this comment.
StreamLoadHandler.generatePlan() does not call GroupCommitManager.selectBackendForGroupCommit()
|
run buildall |
TPC-H: Total hot run time: 31418 ms |
TPC-DS: Total hot run time: 172430 ms |
FE Regression Coverage ReportIncrement line coverage |
No description provided.