release-26.2: mmaprototype: dynamically verbose logs for overloaded stores by wenyihu6 · Pull Request #169344 · cockroachdb/cockroach

wenyihu6 · 2026-04-29T15:09:17Z

Backport 12/12 commits from #165719 on behalf of @wenyihu6.

Resolves: #162254
Epic: CRDB-55052
Release note: none

This PR makes MMA's rebalancing logs useful in production without
relying on operators flipping vmodule.

What changes for the operator:

A periodic store-level summary (cluster means, sorted shedding
stores, per-pass results) is emitted at Infof once per
rebalance pass, instead of being hidden behind V(2).
When a store has been continuously overloaded longer than 30
minutes and hasn't had a detailed burst recently, the
per-shedding-store log path (top-K ranges, candidate selection,
failure reasons) is promoted to Infof for that store on that
iteration. A burst-suppression timer prevents spamming.
Per-range and per-candidate logging stays gated by V(3), so
enabling vmodule on a single MMA file shows the detailed
decision trail without drowning the rest of the log.
The bridge that promotes verbose logs to Infof checks
vmodule at the call site (not at construction), so
--vmodule=load=3 enables verbose output for load.go's
emissions even when the logger was constructed elsewhere.

No behavior change for clusters that don't hit persistent
overload; the detailed-burst path is dormant.

Release justification: low risk obs improvements for opt-in functionality.

This change routes all `VEventf` calls in the MMA rebalancing code through an `mmaLogf` function type so that a follow-up commit can selectively promote calls to `Infof` for persistently overloaded stores. `re.logf()` on `rebalanceEnv` delegates to `VEventf`. Functions that called `VEventf` directly now receive a `logf mmaLogf` parameter instead. `computeLoadSummary` and `loadSummaryForDimension` accept a `detailedLog bool` that replaces the `log.ExpensiveLogEnabled(ctx, 3)` guard that previously gated per-dimension log emission inside `loadSummaryForDimension`. All callers pass `log.ExpensiveLogEnabled(ctx, 3)`, preserving the existing behavior. No behavior change. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

This change teaches `re.logf()` to emit at `Infof` level when `re.detailedLog` is true. The flag is set per-shedding-store when `log.ExpensiveLogEnabled(ctx, 3)` is true or when the store has been continuously overloaded for longer than `persistentOverloadThreshold` (30min) and has not had a detailed log burst within `detailedLogInterval` (30min). `lastDetailedLogTime` is updated only for the persistent-overload case, not for verbose logging. This ensures that disabling verbose logging does not suppress future persistent-overload log bursts. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

This change switches each `rebalance-stores` call in `dynamic_verbose_logs.txt` to `vmodule=0`, so the captured output reflects what an operator at the default vmodule would actually see. The per-shedding-store dynamic-verbose-logging promotion is now observable across four passes as `detailedLog` toggles between false (t=10m, t=45m) and true (t=40m, t=56m). Release note: None

This change removes the `detailedLog` field from `rebalanceEnv` and threads it as an explicit parameter through `rebalanceStore`, `rebalanceReplicas`, and `rebalanceLeasesFromLocalStoreID`. A new `re.detailedLogf` method emits at Infof, and `re.makeLogf(detailedLog)` selects between it and `re.logf` for a single shedding store. The field was mutated on each iteration of the shedding-stores loop and read implicitly by `re.logf` from anywhere in the call chain. The implicit coupling required a manual reset after the loop to avoid leaking the per-store flag into subsequent calls. Threading the value explicitly removes both concerns. No behavior change. The per-shedding-store burst this control gates (top-K, load summaries per dim, candidates, results) is throttled per (store, localStoreID) pair to one burst per `detailedLogInterval` (10 min) once `persistentOverloadThreshold` (30 min) is crossed. Volume estimate, with assumptions: - 20 nodes × 2 stores/node = 40 store rebalancers, each calling rebalanceStores every LoadBasedRebalanceInterval (1 min default). - 10 stores sustained overloaded beyond persistentOverloadThreshold (30 min); all 40 rebalancers see them as shedding stores. - The 1-min pass cadence is faster than detailedLogInterval (10 min), so the throttle is the binding constraint. - ~35 lines per burst (top-K, ~4-5 load summaries per dim, candidates, results, etc.; varies with range count). Bursts per hour per (store, localStoreID) pair = (60 min/hr) / detailedLogInterval (10 min) = 6. Pairs = 10 overloaded × 40 rebalancers = 400. Cluster-wide rate = 400 × 6 × 35 = 84,000 lines/hour ≈ 23 lines/sec. Healthy clusters emit zero of these (no store crosses the 30-minute persistent-overload threshold). Release note: None

This change adds an optional `vmodule=N` arg to the `rebalance-stores` datadriven command. Without it, the command emits the recording-span trace as before. With it, the pass runs outside any tracing span -- so `log.ExpensiveLogEnabled` does not artificially force `detailedLog` true -- `N` is applied as `*=N` for the pass via `log.SetVModule`, and INFO+ entries from the `mmaprototype` package are intercepted and printed. The new `dynamic_verbose_logs.txt` walks four rebalance passes against a single overloaded store at t=10m, t=40m, t=45m, and t=56m, printing the recording-span trace for each. A follow-up commit switches the file to `vmodule=0` to demonstrate the operator-visible effect. Release note: None

This change promotes the `rebalanceStores` outer-loop narrative (rebalanceStores begins, cluster means, evaluating sN, overload-start/continued/end, store added to shedding list) from VEventf to Infof at most once per `outerLogInterval` (10 minutes), via `util.EveryN` over the `clusterState`'s time source. A pure timer (no overload condition) is used because the content is small and useful even on healthy clusters. This is independent from the per-store persistent-overload promotion, which fires deep per-shedding-store detail after a 30-minute threshold. The two have separate throttles and serve different purposes: this one is a low-cadence pass-level summary; the per-store control is a deep dive triggered by sustained problems. Volume estimate, with assumptions: - 20 nodes × 2 stores/node = 40 store rebalancers, each calling rebalanceStores every LoadBasedRebalanceInterval (1 min default). - The 1-min pass cadence is faster than outerLogInterval (10 min), so the throttle is the binding constraint. - Lines per emission = 12 fixed (begin, cluster means, overload-* transitions, sorted shedding) + 40 `evaluating sN` (one per cluster store) = 52 lines. Emissions per hour per rebalancer = (60 min/hr) / outerLogInterval (10 min) = 6. Cluster-wide rate = 40 × 6 × 52 = 12,480 lines/hour ≈ 3.5 lines/sec. The rate is the same for healthy and overloaded clusters since the timer has no overload condition. Release note: None

This change gates both Infof-promotion controls in `rebalanceStores` on `re.passObs != nil`. The production caller in `mma_store_rebalancer.go` calls rebalance in a loop, with the first call setting `PeriodicCall=true` and subsequent calls (continuing the same logical pass) setting it false. `passObs` is initialized only for periodic calls, so `re.passObs != nil` already identifies the first iteration of the rebalance loop. The outer-loop control was already implicitly suppressed via `EveryN`'s update-on-fire behavior; the gate makes the intent explicit. The per-store burst was scoped per-store via `lastDetailedLogTimes`, which allowed iteration N to fire fresh bursts for stores not processed in iteration 1, even though those belong to the same operator-visible time window. The gate scopes the burst strictly to the first iteration; a store first encountered in iteration 2+ defers its burst to the next periodic call (~1 minute later), which is operationally invisible. Note that `log.ExpensiveLogEnabled(ctx, 3)` continues to bypass the gate, so vmodule=3 always emits. Release note: None

This change increases `detailedLogInterval` from 10 minutes to 30 minutes, reducing per-store burst cadence by 3x. The throttle is the binding constraint on burst volume in production, so each minute of throttle materially affects log output. Volume estimate, with the same scenario as the earlier commits (20 nodes × 2 stores/node = 40 rebalancers, 10 sustained overloaded stores): Bursts per hour per (store, localStoreID) pair = (60 min/hr) / detailedLogInterval (30 min) = 2. Pairs = 10 overloaded × 40 rebalancers = 400. Cluster-wide rate = 400 × 2 × 35 = 28,000 lines/hour ≈ 7.8 lines/sec. Down from ~23 lines/sec at the previous 10-minute interval. The dynamic_verbose_logs.txt test scenarios are updated: the "burst fires after interval expires" pass moves from t=56m (16 min after t=40m) to t=75m (35 min after t=40m). Release note: None

Makes 0 (vmodule unset, log capture only) the default for the `vmodule` arg on the `rebalance-stores` datadriven command. The previous default ran inside a verbose recording span, which forced `log.ExpensiveLogEnabled(ctx, 3)` true and masked the dynamic-verbose- logging behavior. The new default reflects what an operator at the default vmodule actually sees. Existing `vmodule=0` invocations in dynamic_verbose_logs.txt are dropped as redundant. All other testdata files that relied on the recording-span trace gain `vmodule=3` to retain their previous output. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

Splits the conjunctive condition into the specific reason that fired (pending decrease over threshold, or pending increase over epsilon) and appends a compact list of the unenacted pending changes contributing to the skip: skipping overloaded store s4 (worst dim: WriteBandwidth): pending increase 0.09 >= epsilon; pending: r1+s4[cpu:1ns/s, write-bandwidth: 880 kB/s, byte-size:0 B] Each pending change is rendered as r{rangeID}{op}s{storeID}{loadDelta} where op is + (add), - (remove), or ~ (other), in changeID order. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

Replaces the (logf mmaLogf, detailedLog bool) param pair threaded through MMA rebalancing with a single mmaLogf bridge struct that internally tracks whether logging is enabled. The bridge has a noop field that callers can inspect to skip arg construction at hot sites. Before this change, dropping upstream's `if log.ExpensiveLogEnabled(ctx, 3)` wrappers (during the rebase conflict resolution) caused per-call variadic arg slices to escape to the heap even when logging was off, adding ~500 allocs/op to BenchmarkRebalanceStores (162 -> 666). The mmaLogf type is now: type mmaLogf struct { noop bool internalLogf func(ctx, level, format, args ...interface{}) } makeLogf returns a noop bridge when neither detailedLog nor V(3) is on. Cold-path callers continue to use lb.logf(ctx, 3, ...) unconditionally; hot-path callers (sortTargetCandidateSetAndPick, the result(failed) sites, etc.) wrap the call in `if !lb.noop`. The change also drops the now-redundant detailedLog bool parameter from sortTargetCandidateSetAndPick, computeCandidatesForReplicaTransfer, canShedAndAddLoad, rebalanceStore, rebalanceReplicas, and rebalanceLeasesFromLocalStoreID. computeLoadSummary and loadSummaryForDimension still take detailedLog (they have a separate Infof-vs-skip semantic). BenchmarkRebalanceStores goes from 666 -> 175 allocs/op (vs 162 baseline); time/op back within 1% of baseline. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

The mmaLogger bridge previously decided at construction time whether it would emit. The vmodule check ran in the file containing the constructor, so a logf call in a different file inherited a stale verbosity decision keyed off the constructor's source location. Setting --vmodule=cluster_state_rebalance_stores=3 did not enable verbose output for callees in load.go (and vice versa), which is surprising. Replace the construction-time noop flag with a per-call vmodule check via log.ExpensiveLogEnabledVDepth, so attribution lands on the file that actually emits the message. The bridge keeps a verboseToInfof bool for the two production paths (outer-loop heartbeat and per- shedding-store burst) that promote to Infof; in that mode logf short-circuits the V check and always emits. Hot-path call sites that previously guarded expensive arg construction with `if !ml.noop` now use `if ml.V(ctx, level)`, passing the same level they will pass to logf. The rebalanceEnv-specific constructor is gone; makeMMALogger(bool) is the single entry point. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

trunk-io · 2026-04-29T15:09:21Z

😎 Merged successfully - details.

blathers-crl · 2026-04-29T15:09:24Z

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

cockroach-teamcity · 2026-04-29T15:10:01Z

This change is

wenyihu6 · 2026-04-29T17:09:40Z

/trunk merge

7 already backported via separate release PRs (cockroachdb#169344, cockroachdb#169590, cockroachdb#169711, cockroachdb#169734, cockroachdb#169742, cockroachdb#169761, cockroachdb#169876) but invisible to script because the cherry-picked commits don't reference the original PR number. 7 are unrelated to MMA (admission/AC, sql, kvserver/storage, ci, roachtest/perturbation). 2 are master-and-onward only by intent: cockroachdb#169430 (enable MMA by default in v26.3) and cockroachdb#169669 (retire load_based_lease_rebalancing setting). Net: 0 PRs need backporting to release-26.2. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

wenyihu6 and others added 12 commits March 13, 2026 18:06

wenyihu6 requested review from a team as code owners April 29, 2026 15:09

blathers-crl Bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Apr 29, 2026

blathers-crl Bot assigned wenyihu6 Apr 29, 2026

blathers-crl Bot requested a review from tbg April 29, 2026 15:09

blathers-crl Bot added backport Label PR's that are backports to older release branches T-kv KV Team labels Apr 29, 2026

tbg approved these changes Apr 29, 2026

View reviewed changes

trunk-io Bot merged commit 3a5a123 into cockroachdb:release-26.2 Apr 29, 2026
23 checks passed

celeste-cockroachdb Bot added the target-release-26.2.1 label Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-26.2: mmaprototype: dynamically verbose logs for overloaded stores#169344

release-26.2: mmaprototype: dynamically verbose logs for overloaded stores#169344
trunk-io[bot] merged 12 commits into
cockroachdb:release-26.2from
wenyihu6:blathers/backport-release-26.2-165719

wenyihu6 commented Apr 29, 2026 •

edited by tbg

Loading

Uh oh!

trunk-io Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

blathers-crl Bot commented Apr 29, 2026

Uh oh!

cockroach-teamcity commented Apr 29, 2026

Uh oh!

wenyihu6 commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wenyihu6 commented Apr 29, 2026 • edited by tbg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trunk-io Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl Bot commented Apr 29, 2026

Uh oh!

cockroach-teamcity commented Apr 29, 2026

Uh oh!

wenyihu6 commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenyihu6 commented Apr 29, 2026 •

edited by tbg

Loading

trunk-io Bot commented Apr 29, 2026 •

edited

Loading