[opt](group-commit) Skip createLocation in group commit stream load sink#63561
[opt](group-commit) Skip createLocation in group commit stream load sink#63561liaoxin01 wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Skips the expensive createLocation tablet/replica enumeration for the group-commit stream load sink, which under cloud mode contends on CloudSystemInfoService's RW lock and dominates FE CPU on wide-partition tables. BE's GroupCommitBlockSinkOperatorX::init does not consume location/slave_location, so empty placeholders satisfy the required thrift field at O(1) cost.
Changes:
- Add a
protected initLocationParams(TOlapTableSink)hook onOlapTableSink(default delegates tocreateLocation) and route bothinitoverloads through it. - Override
initLocationParamsinGroupCommitBlockSinkto return two empty placeholderTOlapTableLocationParamobjects. - Add
GroupCommitBlockSinkTestcovering the override andparseGroupCommitparsing.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/planner/OlapTableSink.java | Introduces initLocationParams hook and routes both init overloads through it. |
| fe/fe-core/src/main/java/org/apache/doris/planner/GroupCommitBlockSink.java | Overrides the hook to return empty placeholder location params, skipping replica enumeration. |
| fe/fe-core/src/test/java/org/apache/doris/planner/GroupCommitBlockSinkTest.java | New unit tests for the override behavior and parseGroupCommit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal/test coverage: The change avoids FE tablet replica location enumeration for GroupCommitBlockSink while preserving default OlapTableSink behavior. The added unit test covers the override contract and group_commit parsing. I could not run it locally because thirdparty/installed/bin/protoc is absent in this runner, and FE build instructions require stopping before Maven when protoc is missing.
- Scope/focus: The implementation is small and focused: a protected hook plus a GroupCommitBlockSink override.
- Concurrency/locking: The changed FE code does not introduce new shared mutable state or locks. It removes the high-contention createLocation path from the per-request group commit block sink initialization.
- Lifecycle/static initialization: No new lifecycle-sensitive objects or static initialization dependencies were added.
- Configuration/compatibility: No config items or storage/protocol-incompatible fields were added. Required thrift location fields are still populated with non-null placeholders.
- Parallel code paths: Both relevant OlapTableSink init overloads now route through the hook; non-group-commit and RemoteOlapTableSink paths retain createLocation via the default implementation.
- Error handling/data correctness: BE GroupCommitBlockSinkOperatorX::init only consumes schema, partition, ids, group_commit_mode, load_id, and max_filter_ratio; the real internal group-commit insert plan still obtains the normal OLAP sink planning path. I did not find a data visibility or transaction correctness regression from the skipped placeholder locations.
- Observability/performance: The change addresses the intended hot path without adding noisy logs. Existing group commit BE logging remains available.
User focus: No additional user-provided review focus was supplied.
4372498 to
cd29627
Compare
|
run buildall |
The BE-side GroupCommitBlockSinkOperatorX::init does not consume TOlapTableSink.location or slave_location (it only reads tuple_id, schema, db_id, table_id, partition, group_commit_mode, load_id and max_filter_ratio). However, FE still ran createLocation, which iterates O(partitions * indexes * tablets * replicas) and, for every replica, takes the CloudSystemInfoService RW read lock via CloudReplica.getCurrentClusterId. Under high-concurrency group commit stream load on wide-partition tables (3000+ partitions in one production incident), CAS contention on the RW lock's state cache line saturated all FE CPUs and the cluster could not recover even after scaling. Introduce an initLocationParams hook on OlapTableSink so subclasses can override how location params are populated. Both init(...) overloads now route through this hook. GroupCommitBlockSink overrides it to return empty placeholder params (location is a required thrift field, but its contents are unused on BE for the group commit path). Add GroupCommitBlockSinkTest to lock in the contract.
cd29627 to
683edaa
Compare
|
run buildall |
TPC-H: Total hot run time: 31551 ms |
TPC-H: Total hot run time: 31678 ms |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-DS: Total hot run time: 173573 ms |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal and proof: The PR aims to avoid expensive tablet replica location enumeration for group-commit stream-load sinks. The change accomplishes this by routing
OlapTableSinklocation initialization through an overridable hook and returning empty required thrift placeholders forGroupCommitBlockSink. The added unit test verifies the override returns empty placeholders without touching the table/tuple mocks. - Scope and clarity: The change is small and focused on the group-commit sink path; default
OlapTableSinkbehavior remains unchanged. - Concurrency: No new shared mutable state or locking is introduced. The optimization removes work that previously took backend-location related locks during
createLocation. - Lifecycle/static initialization: No special lifecycle or static initialization concerns found.
- Configuration/compatibility: No new configuration or storage/protocol-incompatible field is introduced. The required thrift
locationfield is still set; only its contents are empty on the group-commit sink path that BE does not consume. - Parallel code paths: Both modified
OlapTableSink.init(...)overloads now use the hook, so the normal insert and stream-load initialization paths remain consistent. - Test coverage:
GroupCommitBlockSinkTestcovers the new hook behavior and parse helper. I did not run tests in this review environment. - Observability: No additional observability seems necessary for this local FE planning CPU optimization.
- Transactions/data writes: The PR does not change transaction commit/publish semantics or BE load queue behavior; schema, partition, db/table ids, load id, group commit mode, and max filter ratio are still populated.
- Performance: The intended O(tablets * replicas) location enumeration is skipped only for the BE group-commit sink path where code inspection confirms it is unused.
User focus: no additional user-provided review focus was supplied.
FE Regression Coverage ReportIncrement line coverage |
Summary
The BE-side
GroupCommitBlockSinkOperatorX::initdoes not consumeTOlapTableSink.locationorslave_location(it only readstuple_id/schema/db_id/table_id/partition/group_commit_mode/load_id/max_filter_ratio). However, FE still runscreateLocation, which iteratesO(partitions * indexes * tablets * replicas)and, for every replica, takes theCloudSystemInfoServiceRW read lock viaCloudReplica.getCurrentClusterId.Under high-concurrency group commit stream load on wide-partition tables (3000+ partitions in a real production incident), CAS contention on the RW lock's
statecache line saturated all FE CPUs, and the cluster could not recover even after scaling out (more cores = more CAS contenders = worse contention).Change
protected initLocationParams(TOlapTableSink)hook onOlapTableSink. Default behavior delegates tocreateLocation, so non-group-commit sinks are unaffected.init(...)overloads inOlapTableSinkthrough the hook.GroupCommitBlockSinkoverrides the hook to return empty placeholderTOlapTableLocationParamobjects.TOlapTableSink.locationis a required thrift field, so we still set non-null placeholders, but no tablet/replica enumeration happens.Effect on the group-commit path:
O(partitions * indexes * tablets * replicas)→O(1)CloudSystemInfoServiceRW lock acquisitions: hundreds of concurrent CAS spinners → 0Test plan
GroupCommitBlockSinkTestcovering:initLocationParamsreturns 2 placeholders with empty tablet lists (verifies the override is what runs, notcreateLocation).parseGroupCommitparsesasync_mode/sync_mode/off_mode(case-insensitive) and returns null for unknown values.group_commit=truestill pass.CloudSystemInfoServicelock contention.