kvserverbase: add COCKROACH_DISABLE_MMA env var override#167897
Conversation
…rride Add a COCKROACH_DISABLE_MMA environment variable that acts as an emergency kill switch for multi-metric allocator (MMA) rebalancing. When set, the new GetLoadBasedRebalancingMode getter returns LBRebalancingOff regardless of the cluster setting value. This is intended for situations where MMA causes crashes so frequent that the cluster setting cannot be changed through normal means. Also add OverrideLoadBasedRebalancingMode for use in tests. Subsequent commits will migrate callers to these new accessors and then unexport the setting variable to prevent direct access. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Mechanical change: replace all direct LoadBasedRebalancingMode.Get() calls with GetLoadBasedRebalancingMode() so that the COCKROACH_DISABLE_MMA env var override is respected at every read site. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…cingMode Mechanical change: replace all direct LoadBasedRebalancingMode.Override() calls with OverrideLoadBasedRebalancingMode() in tests and simulation code. After this commit, there are no external references to LoadBasedRebalancingMode, so the next commit can unexport it. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Now that all callers use GetLoadBasedRebalancingMode and OverrideLoadBasedRebalancingMode, unexport the setting variable to prevent direct access that would bypass the COCKROACH_DISABLE_MMA override. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
|
😎 Merged successfully - details. |
…_MMA The `COCKROACH_DISABLE_MMA` environment variable unconditionally returns `LBRebalancingOff`, which disables all load-based rebalancing — not just the MMA modes. This is unnecessarily disruptive: an operator setting this kill switch to work around MMA crashes should not also lose legacy lease and replica rebalancing. This change modifies `GetLoadBasedRebalancingMode` to fall back to `LBRebalancingLeasesAndReplicas` when the env var is set and the configured mode is an MMA mode. Non-MMA modes are returned as-is. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
wenyihu6
left a comment
There was a problem hiding this comment.
@wenyihu6 made 2 comments.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on tbg).
pkg/kv/kvserver/kvserverbase/base.go line 143 at r1 (raw file):
func GetLoadBasedRebalancingMode(sv *settings.Values) LBRebalancingMode { if disableMMA { return LBRebalancingOff
Should we fall back to leases-and-replicas instead - LBRebalancingOff disables old store rebalancer as well. Added a commit for this.
tbg
left a comment
There was a problem hiding this comment.
TFTR!
/trunk merge
@tbg made 2 comments.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on wenyihu6).
pkg/kv/kvserver/kvserverbase/base.go line 143 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Should we fall back to
leases-and-replicasinstead - LBRebalancingOff disables old store rebalancer as well. Added a commit for this.
I intentionally didn't so that we wouldn't fall someone who intentionally disabled lease rebalancing into lease rebalancing, but it doesn't matter, this works too.
The idea was that if you use the env var, the first thing you'd do after is to set the cluster setting to what you actually wanted, and then remove the env var.
Triages 28 new author-based candidates as ignored: admission/AC SQL CPU token workstream, SQL/UI/docs/CI changes, roachtest infra, and unrelated kvserver/storage work. Also ignores cockroachdb#167861 (touches non-mma kvserver load/allocator code), cockroachdb#167655 (test deflake), and cockroachdb#167696 (roachtest-only scoring). Leaves 4 PRs to backport: cockroachdb#167051, cockroachdb#167110 (asim configs/docs), cockroachdb#167897 (COCKROACH_DISABLE_MMA killswitch), cockroachdb#167939 (disable follow-the-workload when MMA enabled). Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…orted, invisible to script) cockroachdb#167897 was cherry-picked manually (b5f0518, d3ea308, 5bae83c) without backport-tool's PR-ref convention. cockroachdb#167939 was bundled into the backport PR cockroachdb#168475 whose subject doesn't list the original PR number. Both surfaced as conflicts during a backport run; verified the changes are already on release-26.2. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Summary
COCKROACH_DISABLE_MMAenvironment variable that forceskv.allocator.load_based_rebalancingtooff, regardless of the clustersetting value. This is an emergency kill switch for when MMA causes crashes
too frequent to change the setting through normal means.
LoadBasedRebalancingModesetting variable so all access goesthrough
GetLoadBasedRebalancingMode, preventing callers from bypassing theoverride.
Early commits are mechanical migrations of
.Get()and.Override()callsites; the last commit unexports the setting.
Epic: none