Skip to content

branch-4.1: [opt](group-commit) Skip createLocation in group commit stream load sink #63561#63685

Open
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-63561-branch-4.1
Open

branch-4.1: [opt](group-commit) Skip createLocation in group commit stream load sink #63561#63685
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-63561-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #63561

…ink (#63561)

## Summary

The BE-side `GroupCommitBlockSinkOperatorX::init` does **not** consume
`TOlapTableSink.location` or `slave_location` (it only reads `tuple_id`
/ `schema` / `db_id` / `table_id` / `partition` / `group_commit_mode` /
`load_id` / `max_filter_ratio`). However, FE still runs
`createLocation`, which iterates `O(partitions * indexes * tablets *
replicas)` and, for every replica, takes the `CloudSystemInfoService` RW
read lock via `CloudReplica.getCurrentClusterId`.

Under high-concurrency group commit stream load on wide-partition tables
(3000+ partitions in a real production incident), CAS contention on the
RW lock's `state` cache line saturated all FE CPUs, and the cluster
could not recover even after scaling out (more cores = more CAS
contenders = worse contention).

## Change

- Introduce a `protected initLocationParams(TOlapTableSink)` hook on
`OlapTableSink`. Default behavior delegates to `createLocation`, so
non-group-commit sinks are unaffected.
- Route both `init(...)` overloads in `OlapTableSink` through the hook.
- `GroupCommitBlockSink` overrides the hook to return empty placeholder
`TOlapTableLocationParam` objects. `TOlapTableSink.location` is a
required thrift field, so we still set non-null placeholders, but no
tablet/replica enumeration happens.

Effect on the group-commit path:
- Per-request FE CPU: `O(partitions * indexes * tablets * replicas)` →
`O(1)`
- `CloudSystemInfoService` RW lock acquisitions: hundreds of concurrent
CAS spinners → 0
@github-actions github-actions Bot requested a review from yiguolei as a code owner May 26, 2026 10:39
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants