Skip to content

branch-4.1: pick #61318, #60543, #60705#61874

Merged
yiguolei merged 3 commits intoapache:branch-4.1from
mymeiyi:branch-4.1-pick-60543
Mar 31, 2026
Merged

branch-4.1: pick #61318, #60543, #60705#61874
yiguolei merged 3 commits intoapache:branch-4.1from
mymeiyi:branch-4.1-pick-60543

Conversation

@mymeiyi
Copy link
Copy Markdown
Contributor

@mymeiyi mymeiyi commented Mar 30, 2026

pick:

  1. modify CloudTabletRebalancer and CloudTabletStatMgr to reduce memory ([fix](cloud) modify CloudTabletRebalancer and CloudTabletStatMgr to reduce memory #61318)
  2. cloud reduce get_tablet_stats rpc to meta_service ([improve](cloud) cloud reduce get_tablet_stats rpc to meta_service #60543)
  3. checkpoint save cloud tablet stats to image ([fix](cloud) checkpoint save cloud tablet stats to image #60705)

mymeiyi added 3 commits March 30, 2026 15:33
…educe memory (apache#61318)

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Reduce FE memory by
1. moving top-N table stats filtering from PrometheusMetricVisitor into
CloudTabletStatMgr so it's computed once per stat cycle instead of per
Prometheus scrape,
2. removing the unused beToTablets field from InfightTask to avoid
retaining a large map reference
3. changing InfightTablet.tabletId from Long to long to avoid boxing
overhead.

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Copilot AI review requested due to automatic review settings March 30, 2026 07:45
@mymeiyi mymeiyi requested a review from yiguolei as a code owner March 30, 2026 07:45
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 30, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mymeiyi
Copy link
Copy Markdown
Contributor Author

mymeiyi commented Mar 30, 2026

run buildall

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backports three cloud-mode improvements to the 4.1 branch to reduce FE memory usage, reduce meta-service get_tablet_stats RPC volume, and ensure cloud tablet stats are preserved/updated via checkpoints and FE-to-FE sync.

Changes:

  • Add tablet-id propagation on commit/compaction notifications and use it to mark tablets “active” for faster stats refresh.
  • Rework CloudTabletStatMgr to compute/prom-filter top-N table stats once per stats cycle, add an interval-ladder polling strategy, and push active tablet stats from master FE to other FEs.
  • Enhance checkpointing to optionally trigger in cloud mode based on image staleness and to copy serving-env tablet stats into the checkpoint image.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
gensrc/thrift/FrontendService.thrift Adds tabletIds to commit report requests and introduces FE-to-FE syncCloudTabletStats RPC.
fe/fe-core/src/main/java/org/apache/doris/transaction/GlobalTransactionMgrIface.java Extends commit callback to accept tabletIds for downstream stats refresh.
fe/fe-core/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java Updates interface implementation signature (no-op in non-cloud mgr).
fe/fe-core/src/main/java/org/apache/doris/service/FrontendServiceImpl.java Handles tabletIds on commit/compaction report; implements syncCloudTabletStats.
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Adds session var to force tablet-stats sync via proc-tablets path.
fe/fe-core/src/main/java/org/apache/doris/persist/Storage.java Tracks latest image mtime to support “stale image” checkpoint triggering.
fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java Uses CloudTabletStatMgr precomputed totals/top-N instead of recomputing per scrape.
fe/fe-core/src/main/java/org/apache/doris/master/Checkpoint.java Adds stale-image-based checkpoint trigger (cloud) + copies serving-env tablet stats into checkpoint image.
fe/fe-core/src/main/java/org/apache/doris/common/proc/TabletsProcDir.java Optional “force sync tablet stats” behavior gated by session variable (cloud).
fe/fe-core/src/main/java/org/apache/doris/common/ClientPool.java Adds a dedicated FE client pool for tablet-stats sync RPC.
fe/fe-core/src/main/java/org/apache/doris/cloud/transaction/CloudGlobalTransactionMgr.java Threads tabletIds through commit path and marks tablets active post-commit.
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudTabletRebalancer.java Reduces memory retention by removing unused map and avoiding boxing for tabletId.
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudReplica.java Adds persisted fields for interval-ladder bookkeeping; shortens persisted keys.
fe/fe-core/src/main/java/org/apache/doris/catalog/CloudTabletStatMgr.java Implements active/interval-ladder fetching, precomputed top-N filtering, and master-to-follower stats push.
fe/fe-core/src/main/java/org/apache/doris/alter/CloudSchemaChangeJobV2.java Marks related tablets active after schema change completion.
fe/fe-core/src/main/java/org/apache/doris/alter/CloudRollupJobV2.java Marks related tablets active after rollup completion.
fe/fe-common/src/main/java/org/apache/doris/common/Config.java Adds cloud checkpoint staleness threshold + tablet stats sync/version configs.
be/src/cloud/cloud_meta_mgr.cpp Sends tabletIds alongside commit/compaction notifications to FE.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.44% (1786/2277)
Line Coverage 64.23% (32023/49856)
Region Coverage 65.08% (16021/24619)
Branch Coverage 55.59% (8530/15344)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 10.66% (37/347) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 0.00% (0/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.85% (19677/37235)
Line Coverage 36.24% (184220/508331)
Region Coverage 32.54% (142555/438085)
Branch Coverage 33.72% (62634/185744)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 58.82% (10/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.73% (26510/36449)
Line Coverage 56.18% (284625/506627)
Region Coverage 53.71% (237431/442053)
Branch Coverage 55.43% (103251/186272)

@yiguolei yiguolei merged commit 7a331a4 into apache:branch-4.1 Mar 31, 2026
28 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants