Skip to content

release-26.2: mmaprototype: add structured snapshot of cluster state#169711

Merged
trunk-io[bot] merged 7 commits into
cockroachdb:release-26.2from
tbg:blathers/backport-release-26.2-169230
May 5, 2026
Merged

release-26.2: mmaprototype: add structured snapshot of cluster state#169711
trunk-io[bot] merged 7 commits into
cockroachdb:release-26.2from
tbg:blathers/backport-release-26.2-169230

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented May 5, 2026

Backport 7/7 commits from #169230 on behalf of @tbg.


Add a structured proto-based snapshot of mmaprototype.clusterState for
diagnostics and offline analysis (mma-investigator-style debugging, regression
record/replay). The snapshot is produced via clusterState.Snapshot() and
serializes to proto binary, jsonpb, or text — consumers pick.

A new sibling package mmasnappb owns the proto schema. Only types declared
in the mmaprototype package get re-declared as proto messages; shared types
from roachpb are imported and used as-is. Scratch fields, runtime injections
(clocks, interface back-refs), and pure caches are intentionally omitted.

Coverage is enforced by two reflect-walk tests:

  • TestSnapshotCoversAllFields: every declared field of every registered
    owned struct must either map to a proto field or carry an explicit
    OmitReason.
  • TestSnapshotDispositionsCoverReachable: every owned struct reachable from
    clusterState through non-omitted fields must be registered. Together
    these prevent new state added to the snapshotted graph from silently
    slipping out of coverage.

A protoutil round-trip smoke test exercises the converter end-to-end on a
small hand-built clusterState.

The asim datadriven harness writes an n1 snapshot per eval directive
under pkg/kv/kvserver/asim/tests/testdata/generated/sma/rebalancing_qps
when run with -rewrite, giving representative real-world snapshots
checked into the tree.

Epic: CRDB-56265
Release note: None


Release justification: low-risk obs improvement for default-off functionality

tbg and others added 7 commits April 28, 2026 09:52
Introduce a structured snapshot of the in-memory clusterState used by the
multi-metric allocator. The snapshot is intended for diagnostics and offline
analysis (mma-investigator-style debugging, regression record/replay) rather
than for round-tripping live state.

This commit lays down the scaffolding:

- A new sibling package mmasnappb owns the proto schema. Standing up a fresh
  proto package up front lets subsequent commits grow the schema in isolation
  without churning mmaprototype's BUILD or imports each time.
- clusterState.Snapshot returns a *mmasnappb.ClusterStateSnapshot. Today it
  populates only the four top-level scalars (mma id, two disk-utilization
  thresholds, and the change-id sequence generator). Subsequent commits in
  this arc fill in nodes, stores, ranges, pending changes, the constraint
  matcher, and the locality interner.
- A reflect-walk test (TestSnapshotCoversAllFields) requires every declared
  field of every owned struct registered in snapshotFieldDispositions to
  either map to a proto field or carry an explicit OmitReason. Adding a new
  field anywhere in the registered graph fails the test until the author
  decides whether the field belongs in the snapshot. Fields not yet covered
  by the proto are tagged with "TODO(mma-snapshot)" omission reasons that
  follow-up commits will replace.

Only clusterState itself is registered in this commit. Subsequent commits
add nodeState, storeState, rangeState, pendingReplicaChange, and the
constraint/interner types as their proto messages and converters land.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Extend the cluster-state snapshot proto and converter to cover nodeState
and storeState (including the embedded storeLoad,
storeAttributesAndLocalityWithNodeTier, localityTiers, and the anonymous
adjusted struct).

Notable choices:

- The reflect-walk test registers every owned struct now reachable from
  clusterState (nodeState, NodeLoad, storeState, storeLoad,
  storeAttributesAndLocalityWithNodeTier, the unnamed adjusted struct,
  Status, Disposition, ReplicaState/IDAndType/Type, localityTiers,
  topKReplicas, replicaLoad). Each of their declared fields either maps to
  a generated proto field or carries an explicit OmitReason. The unnamed
  storeState.adjusted struct has no Go-level name, so its reflect.Type is
  pulled from a zero storeState at package init.

- Where an embedded source struct is flattened into a single proto message
  (ReplicaIDAndType and ReplicaType into ReplicaState), the embedded field
  carries an OmitReason that points at the disposition for the embedded
  type, and the embedded type's fields get the ProtoField mapping. This
  keeps the test's "field exists on proto" check honest while avoiding
  artificial nested proto messages.

- LoadVector and SecondaryLoadVector are fixed-length arrays of LoadValue.
  Proto has no fixed arrays, so the snapshot uses repeated int64 with the
  documented dimension order. Casting the slice element type back to
  mmaprototype.LoadValue would create a proto-package import cycle, so the
  proto stays at int64 and the converter does the cast.

- topKReplicas.replicaHeap is a transient construction-time heap, not
  state, and is omitted with that reason. Its ordered output (replicas)
  is what the snapshot carries.

- storeState.adjusted.loadPendingChanges is encoded as a list of changeID
  uint64s referencing the canonical clusterState.pendingChanges map. That
  canonical map is still TODO and lands in the next commit; the IDs will
  resolve once it does.

clusterState.ranges, pendingChanges, the constraint matcher, and the
locality interner remain TODO and are addressed by subsequent commits in
this arc.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Extend the cluster-state snapshot to cover rangeState and the canonical
clusterState.pendingChanges map.

The snapshot now carries a single canonical PendingReplicaChange entry per
change, keyed by changeID at the top level. Per-store
(StoreAdjusted.LoadPendingChangeIds) and per-range
(RangeSnapshot.PendingChangeIds) references store only the changeID; readers
can resolve them through the canonical map. This matches how clusterState
denormalizes a single pendingReplicaChange across three locations
(top-level map, storeState.adjusted.loadPendingChanges, and
rangeState.pendingChanges) without duplicating the payload on the wire.

To make ReplicaIDAndType available standalone (it is the type of
ReplicaChange.next), the proto representation of replica state is
restructured: ReplicaState now nests a ReplicaIDAndType message, and
ReplicaIDAndType nests a ReplicaType message, mirroring the source struct
hierarchy. The previous flattened ReplicaState shape introduced in the
nodes/stores commit is replaced. The reflect-walk dispositions for
ReplicaState/ReplicaIDAndType/ReplicaType are simplified accordingly: each
source struct now maps to its own proto message, so no field needs an
"embedded; flattened" omission reason.

rangeState.conf and rangeState.constraints remain TODO; they belong to the
constraint-matcher subsystem and land with the matcher proto in the next
commit. clusterState.constraintMatcher and localityTierInterner are still
unsnapshotted for the same reason.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add a structural reachability test that prevents future additions to
clusterState from silently slipping out of snapshot coverage, plus a small
end-to-end protobuf round-trip smoke test.

TestSnapshotDispositionsCoverReachable walks the source-side type graph
starting at clusterState through every non-omitted field and asserts that
any owned struct it discovers has an entry in snapshotFieldDispositions.
Combined with the existing per-field check in TestSnapshotCoversAllFields,
this means a new owned struct cannot enter the snapshotted graph without
either being registered or having the field that pulls it in marked
omitted.

TestSnapshotProtoRoundTrip builds a tiny clusterState, snapshots it, and
round-trips the result through the protobuf wire format via protoutil. This
catches proto registration / wire-format regressions and exercises the
converter end-to-end. Diagnostic consumers may render via jsonpb or text
on top; those paths are exercised by the consumer code.

The per-type proto target check in TestSnapshotCoversAllFields now skips
types that have no ProtoField entries (i.e. fully omitted, transformed
types), so registering them does not require inventing a placeholder
proto target.

The clusterState.constraintMatcher and clusterState.localityTierInterner
fields graduate from TODO omissions to permanent omissions: the matcher
is a derived index over StoreAttributes (already in StoreSnapshot) and
the constraint patterns referenced by snapshotted span configs (TODO in a
follow-up); the interner is a dedup table whose codes are resolved to
plain strings at snapshot time, so no consumer ever needs the table.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Snapshot rangeState.conf — the (normalized) span config the allocator uses to
make all constraint-relevant decisions for the range. Without it, a snapshot
could not explain why MMA targets any particular constraint outcome, defeating
much of the diagnostic value.

The conf is un-interned back into a roachpb.SpanConfig (only the fields
normalizedSpanConfig actually carries — NumVoters, NumReplicas, Constraints,
VoterConstraints, LeasePreferences) and stored in a top-level
ClusterStateSnapshot.SpanConfigs map. Each RangeSnapshot carries just a
uint32 ConfID into that map.

Deduplication is required because each RangeMsg constructs a fresh
*normalizedSpanConfig (so pointer equality is useless), and on real clusters
most ranges share the same span config — inlining the full conf per range
would balloon the snapshot 100x. The dedup registry keys on the marshaled
proto bytes; structurally identical configs share an id. ConfID=0 is reserved
for "no span config attached" (rangeState.conf was nil).

rangeState.constraints (the rangeAnalyzedConstraints cache) is now
permanently omitted with a "derivable from conf + replicas via
analyzeConstraints" reason, paralleling meansMemo's treatment.
normalizedSpanConfig is registered in snapshotFieldDispositions with all
fields OmitReason so the reachability walk recognizes it as covered without
inventing per-field proto mappings (its content is folded into the
SpanConfigs table).

A new TestSnapshotSpanConfigDedup verifies the property end-to-end: three
ranges with two distinct configs produce a SpanConfigs table with two
entries, and the two ranges with structurally identical configs share a
ConfID.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Wrap the public Snapshot entry point with a deferred recover that converts
any panic — including non-error panics like panic("string") — into an
assertion-failure error. Snapshot is purely read-only with respect to
clusterState, so dropping the in-flight proto and returning an error on
panic is safe and leaves cs untouched.

This guards the server against bugs in the snapshot conversion path. The
snapshot is invoked by diagnostics surfaces (debug zip, status RPCs,
mma-investigator); a bug in a converter must not be able to tank a
production node. errorutil.MaybeCatchPanic was unsuitable because it
re-panics on non-error panic values, of which mmaprototype has many
(panic(fmt.Sprintf(...)), panic("unimplemented"), etc.).

Snapshot now returns (*mmasnappb.ClusterStateSnapshot, error). The body is
factored into an unexported snapshot() method so the recover wraps a single
call rather than the entire function body.

A new TestSnapshotRecoversFromPanic sabotages cs.localityTierInterner to
force a deterministic nil-pointer panic inside snapshotLocalityTiers and
asserts that the panic surfaces as an error rather than crashing.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Wire the MMA cluster state snapshot through the asim datadriven test
harness so each `eval` directive that runs with -rewrite produces one
per-node snapshot JSON alongside the existing metric and trace artifacts.
This gives us a representative real-world snapshot to inspect (for
mma-investigator, dashboards, debug-zip consumers) and lets the asimview
tool consume the same proto schema that production debug-zip produces.

Changes:

- mmaprototype.Allocator gains ClusterStateSnapshot() (returning the
  proto snapshot or any panic recovered as an error). The implementation
  on allocatorState acquires the existing mutex around cs.Snapshot, since
  Snapshot reads cs.nodes and cs.stores. With the recover already inside
  Snapshot, callers see panics as errors.

- The datadriven test, after each simulator.RunSim, iterates
  state.Nodes() and writes each node's snapshot to
  {plotDir}/{testName}_{sample}_n{nodeID}_mma_snapshot.json via jsonpb
  (pretty-printed). Writes are gated on the existing -rewrite flag, the
  same as the metric and setup writes.

- testdata/generated/.gitignore is extended to track only
  *_n1_mma_snapshot.json under sma/rebalancing_qps; per-node snapshots
  for n2..n7 are still written locally but kept out of the PR to avoid
  bloat (n1 alone is sufficient as a representative sample). Total
  tracked addition is ~60 KiB across 4 files (sma-count and mma-only,
  samples 1 and 2).

The rebalancing_qps test that produces the tracked artifacts is gated on
COCKROACH_RUN_ASIM_TESTS=true, matching the existing skip_under_ci
treatment of that test.

Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg force-pushed the blathers/backport-release-26.2-169230 branch from 58b5784 to d323c15 Compare May 5, 2026 01:39
@tbg tbg requested review from a team as code owners May 5, 2026 01:39
@blathers-crl blathers-crl Bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 5, 2026
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 5, 2026

😎 Merged directly without going through the merge queue, as the queue was empty and the PR was up to date with the target branch - details.

@blathers-crl blathers-crl Bot requested a review from wenyihu6 May 5, 2026 01:39
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 5, 2026

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

@blathers-crl blathers-crl Bot added backport Label PR's that are backports to older release branches T-admission-control Admission Control labels May 5, 2026
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 5, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 5, 2026

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

@stevendanna
Copy link
Copy Markdown
Collaborator

/trunk merge

@trunk-io trunk-io Bot merged commit 2a3adbe into cockroachdb:release-26.2 May 5, 2026
33 of 34 checks passed
tbg added a commit to 5hubh4m/cockroach that referenced this pull request May 8, 2026
7 already backported via separate release PRs (cockroachdb#169344, cockroachdb#169590, cockroachdb#169711,
cockroachdb#169734, cockroachdb#169742, cockroachdb#169761, cockroachdb#169876) but invisible to script because the
cherry-picked commits don't reference the original PR number.

7 are unrelated to MMA (admission/AC, sql, kvserver/storage, ci,
roachtest/perturbation).

2 are master-and-onward only by intent: cockroachdb#169430 (enable MMA by default in
v26.3) and cockroachdb#169669 (retire load_based_lease_rebalancing setting).

Net: 0 PRs need backporting to release-26.2.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. T-admission-control Admission Control target-release-26.2.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants