Skip to content

[improvement](executor) use real elapsed time to compute workload group metrics refresh interval#63535

Closed
bosswnx wants to merge 2 commits into
apache:masterfrom
bosswnx:master
Closed

[improvement](executor) use real elapsed time to compute workload group metrics refresh interval#63535
bosswnx wants to merge 2 commits into
apache:masterfrom
bosswnx:master

Conversation

@bosswnx
Copy link
Copy Markdown

@bosswnx bosswnx commented May 22, 2026

What problem does this PR solve?

Problem Summary:

The original implementation of WorkloadGroupMetrics::refresh_metrics() uses config::workload_group_metrics_interval_ms / 1000 as a fixed
divisor to compute per-second CPU and scan IO rates. This is inaccurate when:

  1. The refresh thread is delayed due to system load or scheduling jitter
  2. The configured interval is changed at runtime

In both cases, the reported per-second CPU/IO rates diverge from reality.

This PR replaces the fixed config-based interval with the actual monotonic time delta between two consecutive refreshes, so the rates stay
accurate regardless of thread scheduling delays or runtime config changes. It also adds a division-by-zero guard for sub-second refresh
intervals and corresponding unit tests.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…up metrics refresh interval

  Replace the fixed config-based interval with the actual monotonic time delta
  between two refreshes when calculating per-second CPU and scan IO rates in
  WorkloadGroupMetrics, so the rates stay accurate even when the refresh thread
  is delayed or the configured interval is changed at runtime.

  Also add a guard against division by zero when two refreshes happen within
  less than one second, and add unit tests covering:
  - Real elapsed time rate computation
  - Sub-second interval safety (no division by zero)
  - Proportional rate vs interval relationship
  - Memory metrics correctness
  - First-refresh boundary behavior
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bosswnx
Copy link
Copy Markdown
Author

bosswnx commented May 22, 2026

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants