Skip to content

kvserverbase: add COCKROACH_DISABLE_MMA env var override#167897

Merged
trunk-io[bot] merged 5 commits into
cockroachdb:masterfrom
tbg:disable-mma-env-var
Apr 8, 2026
Merged

kvserverbase: add COCKROACH_DISABLE_MMA env var override#167897
trunk-io[bot] merged 5 commits into
cockroachdb:masterfrom
tbg:disable-mma-env-var

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 8, 2026

Summary

  • Add a COCKROACH_DISABLE_MMA environment variable that forces
    kv.allocator.load_based_rebalancing to off, regardless of the cluster
    setting value. This is an emergency kill switch for when MMA causes crashes
    too frequent to change the setting through normal means.
  • Unexport the LoadBasedRebalancingMode setting variable so all access goes
    through GetLoadBasedRebalancingMode, preventing callers from bypassing the
    override.

Early commits are mechanical migrations of .Get() and .Override() call
sites; the last commit unexports the setting.

Epic: none

tbg and others added 4 commits April 8, 2026 09:43
…rride

Add a COCKROACH_DISABLE_MMA environment variable that acts as an emergency
kill switch for multi-metric allocator (MMA) rebalancing. When set, the new
GetLoadBasedRebalancingMode getter returns LBRebalancingOff regardless of
the cluster setting value.

This is intended for situations where MMA causes crashes so frequent that
the cluster setting cannot be changed through normal means.

Also add OverrideLoadBasedRebalancingMode for use in tests. Subsequent
commits will migrate callers to these new accessors and then unexport the
setting variable to prevent direct access.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Mechanical change: replace all direct LoadBasedRebalancingMode.Get() calls
with GetLoadBasedRebalancingMode() so that the COCKROACH_DISABLE_MMA env
var override is respected at every read site.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…cingMode

Mechanical change: replace all direct LoadBasedRebalancingMode.Override()
calls with OverrideLoadBasedRebalancingMode() in tests and simulation code.

After this commit, there are no external references to
LoadBasedRebalancingMode, so the next commit can unexport it.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Now that all callers use GetLoadBasedRebalancingMode and
OverrideLoadBasedRebalancingMode, unexport the setting variable to prevent
direct access that would bypass the COCKROACH_DISABLE_MMA override.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 8, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg marked this pull request as ready for review April 8, 2026 08:15
@tbg tbg requested review from a team as code owners April 8, 2026 08:15
@tbg tbg requested a review from wenyihu6 April 8, 2026 08:15
@tbg tbg added the backport-26.2.x Flags PRs that need to be backported to 26.2 label Apr 8, 2026
…_MMA

The `COCKROACH_DISABLE_MMA` environment variable unconditionally returns
`LBRebalancingOff`, which disables all load-based rebalancing — not just
the MMA modes. This is unnecessarily disruptive: an operator setting
this kill switch to work around MMA crashes should not also lose
legacy lease and replica rebalancing.

This change modifies `GetLoadBasedRebalancingMode` to fall back to
`LBRebalancingLeasesAndReplicas` when the env var is set and the
configured mode is an MMA mode. Non-MMA modes are returned as-is.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Copy link
Copy Markdown
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: mod one comment.

@wenyihu6 made 2 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on tbg).


pkg/kv/kvserver/kvserverbase/base.go line 143 at r1 (raw file):

func GetLoadBasedRebalancingMode(sv *settings.Values) LBRebalancingMode {
	if disableMMA {
		return LBRebalancingOff

Should we fall back to leases-and-replicas instead - LBRebalancingOff disables old store rebalancer as well. Added a commit for this.

Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

/trunk merge

@tbg made 2 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on wenyihu6).


pkg/kv/kvserver/kvserverbase/base.go line 143 at r1 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

Should we fall back to leases-and-replicas instead - LBRebalancingOff disables old store rebalancer as well. Added a commit for this.

I intentionally didn't so that we wouldn't fall someone who intentionally disabled lease rebalancing into lease rebalancing, but it doesn't matter, this works too.

The idea was that if you use the env var, the first thing you'd do after is to set the cluster setting to what you actually wanted, and then remove the env var.

@trunk-io trunk-io Bot merged commit 839d4a5 into cockroachdb:master Apr 8, 2026
36 of 39 checks passed
tbg added a commit to 5hubh4m/cockroach that referenced this pull request Apr 29, 2026
Triages 28 new author-based candidates as ignored: admission/AC SQL CPU
token workstream, SQL/UI/docs/CI changes, roachtest infra, and unrelated
kvserver/storage work. Also ignores cockroachdb#167861 (touches non-mma kvserver
load/allocator code), cockroachdb#167655 (test deflake), and cockroachdb#167696 (roachtest-only
scoring).

Leaves 4 PRs to backport: cockroachdb#167051, cockroachdb#167110 (asim configs/docs), cockroachdb#167897
(COCKROACH_DISABLE_MMA killswitch), cockroachdb#167939 (disable follow-the-workload
when MMA enabled).

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to 5hubh4m/cockroach that referenced this pull request Apr 29, 2026
…orted, invisible to script)

cockroachdb#167897 was cherry-picked manually (b5f0518, d3ea308, 5bae83c)
without backport-tool's PR-ref convention. cockroachdb#167939 was bundled into the
backport PR cockroachdb#168475 whose subject doesn't list the original PR number.

Both surfaced as conflicts during a backport run; verified the changes
are already on release-26.2.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-26.2.x Flags PRs that need to be backported to 26.2 target-release-26.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants