Skip to content

[bvar] fix sampler interval after switch to cpuwide_time_ns#3278

Merged
chenBright merged 1 commit intoapache:masterfrom
hjwsm1989:fix-bvar-sampler-time-unit
Apr 23, 2026
Merged

[bvar] fix sampler interval after switch to cpuwide_time_ns#3278
chenBright merged 1 commit intoapache:masterfrom
hjwsm1989:fix-bvar-sampler-time-unit

Conversation

@hjwsm1989
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Fixes the unit-mismatch regression introduced by #3268 (commit 12fb539, "Use monotonic time instead of wall time").

Before this PR, on master / 1.17.0-rc1, the bvar sampler thread effectively runs at ~1 kHz instead of 1 Hz and spams

sampler.cpp:192 bvar is busy at sampling for 2 seconds!

PR #3268 switched the three time-source calls in SamplerCollector::run() from gettimeofday_us() (microseconds) to cpuwide_time_ns() (nanoseconds), but left the 1-second offset constant (abstime += 1000000L) and the usleep(abstime - now) call untouched. As a result:

  • abstime += 1000000L now adds 1 ms instead of 1 s, so the sampler spins at ~1 kHz instead of 1 Hz;
  • usleep(abstime - now) is passed a nanosecond delta that usleep() interprets as microseconds, which further distorts the sleep duration.

See #3277 for the full analysis and a reproducer.

What is changed and how does it work?

src/bvar/detail/sampler.cpp::SamplerCollector::run():

  • abstime += 1000000L (1 ms in nanoseconds) → abstime += 1000000000L (1 s in nanoseconds), restoring the intended 1-second sample period.
  • ::usleep(abstime - now)usleep() takes microseconds, but the difference is now in nanoseconds; divide by 1000.

Checklist

  • Fix compiles and links (verified against an application tree linking brpc master).
  • Manually verified on a live cluster: "bvar is busy at sampling" warning count dropped from ~17k/minute per process to 0; sampler thread CPU dropped accordingly.
  • No dedicated unit test added — the existing SamplerCollector::run() loop is not currently covered by a direct unit test. Happy to add one if reviewers can suggest a good shape (the challenge is driving the loop deterministically without sleeping wall-clock time).

Fixes #3277.

@hjwsm1989
Copy link
Copy Markdown
Contributor Author

hjwsm1989 commented Apr 22, 2026

Hi @chenBright, when you have a moment, would you mind taking a look at this PR? It tweaks sampler.cpp to line up the 1-second offset and the usleep() argument with the nanosecond time source after the nice monotonic-time migration in #3268 — full analysis in #3277. Since you have the most context on that area, your feedback would be very much appreciated. Thanks!

@chenBright
Copy link
Copy Markdown
Contributor

I made a mistake before.

Please use cpuwide_time_us in sampler.cpp to fix this issue.

@hjwsm1989
Copy link
Copy Markdown
Contributor Author

I made a mistake before.

Please use cpuwide_time_us in sampler.cpp to fix this issue.

ok, doing

Commit 12fb539 ("Use monotonic time instead of wall time", apache#3268)
switched the three time-source calls in SamplerCollector::run() from
gettimeofday_us() to cpuwide_time_ns(), but the surrounding code still
treats the timestamps as microseconds:

- abstime += 1000000L now represents 1 ms (not 1 s), causing the
  sampler to spin at ~1 kHz instead of 1 Hz;
- usleep(abstime - now) receives a nanosecond delta, which usleep()
  interprets as microseconds.

Use cpuwide_time_us() instead, which preserves the monotonic behavior
from apache#3268 while keeping the existing microsecond-based arithmetic
correct.

Fixes apache#3277.
@hjwsm1989 hjwsm1989 force-pushed the fix-bvar-sampler-time-unit branch from 9e6fdd0 to 95ca976 Compare April 23, 2026 02:24
Copy link
Copy Markdown
Contributor

@chenBright chenBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenBright
Copy link
Copy Markdown
Contributor

cc @ivanallen

@chenBright chenBright merged commit b5dc6fc into apache:master Apr 23, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bvar: sampler busy-loops at ~1 kHz after switch to cpuwide_time_ns (regression from #3268)

2 participants