release-26.2: mmaprototype: dynamically verbose logs for overloaded stores#169344
Merged
trunk-io[bot] merged 12 commits intoApr 29, 2026
Merged
Conversation
This change routes all `VEventf` calls in the MMA rebalancing code through an `mmaLogf` function type so that a follow-up commit can selectively promote calls to `Infof` for persistently overloaded stores. `re.logf()` on `rebalanceEnv` delegates to `VEventf`. Functions that called `VEventf` directly now receive a `logf mmaLogf` parameter instead. `computeLoadSummary` and `loadSummaryForDimension` accept a `detailedLog bool` that replaces the `log.ExpensiveLogEnabled(ctx, 3)` guard that previously gated per-dimension log emission inside `loadSummaryForDimension`. All callers pass `log.ExpensiveLogEnabled(ctx, 3)`, preserving the existing behavior. No behavior change. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
This change teaches `re.logf()` to emit at `Infof` level when `re.detailedLog` is true. The flag is set per-shedding-store when `log.ExpensiveLogEnabled(ctx, 3)` is true or when the store has been continuously overloaded for longer than `persistentOverloadThreshold` (30min) and has not had a detailed log burst within `detailedLogInterval` (30min). `lastDetailedLogTime` is updated only for the persistent-overload case, not for verbose logging. This ensures that disabling verbose logging does not suppress future persistent-overload log bursts. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
This change switches each `rebalance-stores` call in `dynamic_verbose_logs.txt` to `vmodule=0`, so the captured output reflects what an operator at the default vmodule would actually see. The per-shedding-store dynamic-verbose-logging promotion is now observable across four passes as `detailedLog` toggles between false (t=10m, t=45m) and true (t=40m, t=56m). Release note: None
This change removes the `detailedLog` field from `rebalanceEnv` and
threads it as an explicit parameter through `rebalanceStore`,
`rebalanceReplicas`, and `rebalanceLeasesFromLocalStoreID`. A new
`re.detailedLogf` method emits at Infof, and `re.makeLogf(detailedLog)`
selects between it and `re.logf` for a single shedding store.
The field was mutated on each iteration of the shedding-stores loop
and read implicitly by `re.logf` from anywhere in the call chain. The
implicit coupling required a manual reset after the loop to avoid
leaking the per-store flag into subsequent calls. Threading the value
explicitly removes both concerns. No behavior change.
The per-shedding-store burst this control gates (top-K, load
summaries per dim, candidates, results) is throttled per (store,
localStoreID) pair to one burst per `detailedLogInterval` (10 min)
once `persistentOverloadThreshold` (30 min) is crossed.
Volume estimate, with assumptions:
- 20 nodes × 2 stores/node = 40 store rebalancers, each calling
rebalanceStores every LoadBasedRebalanceInterval (1 min default).
- 10 stores sustained overloaded beyond persistentOverloadThreshold
(30 min); all 40 rebalancers see them as shedding stores.
- The 1-min pass cadence is faster than detailedLogInterval (10 min),
so the throttle is the binding constraint.
- ~35 lines per burst (top-K, ~4-5 load summaries per dim,
candidates, results, etc.; varies with range count).
Bursts per hour per (store, localStoreID) pair
= (60 min/hr) / detailedLogInterval (10 min) = 6.
Pairs = 10 overloaded × 40 rebalancers = 400.
Cluster-wide rate = 400 × 6 × 35 = 84,000 lines/hour ≈ 23 lines/sec.
Healthy clusters emit zero of these (no store crosses the 30-minute
persistent-overload threshold).
Release note: None
This change adds an optional `vmodule=N` arg to the `rebalance-stores` datadriven command. Without it, the command emits the recording-span trace as before. With it, the pass runs outside any tracing span -- so `log.ExpensiveLogEnabled` does not artificially force `detailedLog` true -- `N` is applied as `*=N` for the pass via `log.SetVModule`, and INFO+ entries from the `mmaprototype` package are intercepted and printed. The new `dynamic_verbose_logs.txt` walks four rebalance passes against a single overloaded store at t=10m, t=40m, t=45m, and t=56m, printing the recording-span trace for each. A follow-up commit switches the file to `vmodule=0` to demonstrate the operator-visible effect. Release note: None
This change promotes the `rebalanceStores` outer-loop narrative
(rebalanceStores begins, cluster means, evaluating sN,
overload-start/continued/end, store added to shedding list) from
VEventf to Infof at most once per `outerLogInterval` (10 minutes),
via `util.EveryN` over the `clusterState`'s time source. A pure
timer (no overload condition) is used because the content is small
and useful even on healthy clusters.
This is independent from the per-store persistent-overload promotion,
which fires deep per-shedding-store detail after a 30-minute
threshold. The two have separate throttles and serve different
purposes: this one is a low-cadence pass-level summary; the per-store
control is a deep dive triggered by sustained problems.
Volume estimate, with assumptions:
- 20 nodes × 2 stores/node = 40 store rebalancers, each calling
rebalanceStores every LoadBasedRebalanceInterval (1 min default).
- The 1-min pass cadence is faster than outerLogInterval (10 min),
so the throttle is the binding constraint.
- Lines per emission = 12 fixed (begin, cluster means, overload-*
transitions, sorted shedding) + 40 `evaluating sN` (one per
cluster store) = 52 lines.
Emissions per hour per rebalancer
= (60 min/hr) / outerLogInterval (10 min) = 6.
Cluster-wide rate = 40 × 6 × 52 = 12,480 lines/hour ≈ 3.5 lines/sec.
The rate is the same for healthy and overloaded clusters since the
timer has no overload condition.
Release note: None
This change gates both Infof-promotion controls in `rebalanceStores` on `re.passObs != nil`. The production caller in `mma_store_rebalancer.go` calls rebalance in a loop, with the first call setting `PeriodicCall=true` and subsequent calls (continuing the same logical pass) setting it false. `passObs` is initialized only for periodic calls, so `re.passObs != nil` already identifies the first iteration of the rebalance loop. The outer-loop control was already implicitly suppressed via `EveryN`'s update-on-fire behavior; the gate makes the intent explicit. The per-store burst was scoped per-store via `lastDetailedLogTimes`, which allowed iteration N to fire fresh bursts for stores not processed in iteration 1, even though those belong to the same operator-visible time window. The gate scopes the burst strictly to the first iteration; a store first encountered in iteration 2+ defers its burst to the next periodic call (~1 minute later), which is operationally invisible. Note that `log.ExpensiveLogEnabled(ctx, 3)` continues to bypass the gate, so vmodule=3 always emits. Release note: None
This change increases `detailedLogInterval` from 10 minutes to 30
minutes, reducing per-store burst cadence by 3x. The throttle is the
binding constraint on burst volume in production, so each minute of
throttle materially affects log output.
Volume estimate, with the same scenario as the earlier commits
(20 nodes × 2 stores/node = 40 rebalancers, 10 sustained overloaded
stores):
Bursts per hour per (store, localStoreID) pair
= (60 min/hr) / detailedLogInterval (30 min) = 2.
Pairs = 10 overloaded × 40 rebalancers = 400.
Cluster-wide rate = 400 × 2 × 35 = 28,000 lines/hour ≈ 7.8 lines/sec.
Down from ~23 lines/sec at the previous 10-minute interval.
The dynamic_verbose_logs.txt test scenarios are updated: the
"burst fires after interval expires" pass moves from t=56m (16 min
after t=40m) to t=75m (35 min after t=40m).
Release note: None
Makes 0 (vmodule unset, log capture only) the default for the `vmodule` arg on the `rebalance-stores` datadriven command. The previous default ran inside a verbose recording span, which forced `log.ExpensiveLogEnabled(ctx, 3)` true and masked the dynamic-verbose- logging behavior. The new default reflects what an operator at the default vmodule actually sees. Existing `vmodule=0` invocations in dynamic_verbose_logs.txt are dropped as redundant. All other testdata files that relied on the recording-span trace gain `vmodule=3` to retain their previous output. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Splits the conjunctive condition into the specific reason that fired
(pending decrease over threshold, or pending increase over epsilon) and
appends a compact list of the unenacted pending changes contributing
to the skip:
skipping overloaded store s4 (worst dim: WriteBandwidth): pending
increase 0.09 >= epsilon; pending: r1+s4[cpu:1ns/s, write-bandwidth:
880 kB/s, byte-size:0 B]
Each pending change is rendered as r{rangeID}{op}s{storeID}{loadDelta}
where op is + (add), - (remove), or ~ (other), in changeID order.
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Replaces the (logf mmaLogf, detailedLog bool) param pair threaded
through MMA rebalancing with a single mmaLogf bridge struct that
internally tracks whether logging is enabled. The bridge has a noop
field that callers can inspect to skip arg construction at hot sites.
Before this change, dropping upstream's `if log.ExpensiveLogEnabled(ctx,
3)` wrappers (during the rebase conflict resolution) caused per-call
variadic arg slices to escape to the heap even when logging was off,
adding ~500 allocs/op to BenchmarkRebalanceStores (162 -> 666).
The mmaLogf type is now:
type mmaLogf struct {
noop bool
internalLogf func(ctx, level, format, args ...interface{})
}
makeLogf returns a noop bridge when neither detailedLog nor V(3) is on.
Cold-path callers continue to use lb.logf(ctx, 3, ...) unconditionally;
hot-path callers (sortTargetCandidateSetAndPick, the result(failed)
sites, etc.) wrap the call in `if !lb.noop`.
The change also drops the now-redundant detailedLog bool parameter from
sortTargetCandidateSetAndPick, computeCandidatesForReplicaTransfer,
canShedAndAddLoad, rebalanceStore, rebalanceReplicas, and
rebalanceLeasesFromLocalStoreID. computeLoadSummary and
loadSummaryForDimension still take detailedLog (they have a separate
Infof-vs-skip semantic).
BenchmarkRebalanceStores goes from 666 -> 175 allocs/op (vs 162 baseline);
time/op back within 1% of baseline.
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
The mmaLogger bridge previously decided at construction time whether it would emit. The vmodule check ran in the file containing the constructor, so a logf call in a different file inherited a stale verbosity decision keyed off the constructor's source location. Setting --vmodule=cluster_state_rebalance_stores=3 did not enable verbose output for callees in load.go (and vice versa), which is surprising. Replace the construction-time noop flag with a per-call vmodule check via log.ExpensiveLogEnabledVDepth, so attribution lands on the file that actually emits the message. The bridge keeps a verboseToInfof bool for the two production paths (outer-loop heartbeat and per- shedding-store burst) that promote to Infof; in that mode logf short-circuits the V check and always emits. Hot-path call sites that previously guarded expensive arg construction with `if !ml.noop` now use `if ml.V(ctx, level)`, passing the same level they will pass to logf. The rebalanceEnv-specific constructor is gone; makeMMALogger(bool) is the single entry point. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Contributor
|
😎 Merged successfully - details. |
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
Member
tbg
approved these changes
Apr 29, 2026
Contributor
Author
|
/trunk merge |
tbg
added a commit
to 5hubh4m/cockroach
that referenced
this pull request
May 8, 2026
7 already backported via separate release PRs (cockroachdb#169344, cockroachdb#169590, cockroachdb#169711, cockroachdb#169734, cockroachdb#169742, cockroachdb#169761, cockroachdb#169876) but invisible to script because the cherry-picked commits don't reference the original PR number. 7 are unrelated to MMA (admission/AC, sql, kvserver/storage, ci, roachtest/perturbation). 2 are master-and-onward only by intent: cockroachdb#169430 (enable MMA by default in v26.3) and cockroachdb#169669 (retire load_based_lease_rebalancing setting). Net: 0 PRs need backporting to release-26.2. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 12/12 commits from #165719 on behalf of @wenyihu6.
Resolves: #162254
Epic: CRDB-55052
Release note: none
This PR makes MMA's rebalancing logs useful in production without
relying on operators flipping vmodule.
What changes for the operator:
stores, per-pass results) is emitted at
Infofonce perrebalance pass, instead of being hidden behind
V(2).minutes and hasn't had a detailed burst recently, the
per-shedding-store log path (top-K ranges, candidate selection,
failure reasons) is promoted to
Infoffor that store on thatiteration. A burst-suppression timer prevents spamming.
V(3), soenabling vmodule on a single MMA file shows the detailed
decision trail without drowning the rest of the log.
Infofchecksvmodule at the call site (not at construction), so
--vmodule=load=3enables verbose output forload.go'semissions even when the logger was constructed elsewhere.
No behavior change for clusters that don't hit persistent
overload; the detailed-burst path is dormant.
Release justification: low risk obs improvements for opt-in functionality.