release-26.2: mmaprototype: add structured snapshot of cluster state#169711
Conversation
Introduce a structured snapshot of the in-memory clusterState used by the multi-metric allocator. The snapshot is intended for diagnostics and offline analysis (mma-investigator-style debugging, regression record/replay) rather than for round-tripping live state. This commit lays down the scaffolding: - A new sibling package mmasnappb owns the proto schema. Standing up a fresh proto package up front lets subsequent commits grow the schema in isolation without churning mmaprototype's BUILD or imports each time. - clusterState.Snapshot returns a *mmasnappb.ClusterStateSnapshot. Today it populates only the four top-level scalars (mma id, two disk-utilization thresholds, and the change-id sequence generator). Subsequent commits in this arc fill in nodes, stores, ranges, pending changes, the constraint matcher, and the locality interner. - A reflect-walk test (TestSnapshotCoversAllFields) requires every declared field of every owned struct registered in snapshotFieldDispositions to either map to a proto field or carry an explicit OmitReason. Adding a new field anywhere in the registered graph fails the test until the author decides whether the field belongs in the snapshot. Fields not yet covered by the proto are tagged with "TODO(mma-snapshot)" omission reasons that follow-up commits will replace. Only clusterState itself is registered in this commit. Subsequent commits add nodeState, storeState, rangeState, pendingReplicaChange, and the constraint/interner types as their proto messages and converters land. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Extend the cluster-state snapshot proto and converter to cover nodeState and storeState (including the embedded storeLoad, storeAttributesAndLocalityWithNodeTier, localityTiers, and the anonymous adjusted struct). Notable choices: - The reflect-walk test registers every owned struct now reachable from clusterState (nodeState, NodeLoad, storeState, storeLoad, storeAttributesAndLocalityWithNodeTier, the unnamed adjusted struct, Status, Disposition, ReplicaState/IDAndType/Type, localityTiers, topKReplicas, replicaLoad). Each of their declared fields either maps to a generated proto field or carries an explicit OmitReason. The unnamed storeState.adjusted struct has no Go-level name, so its reflect.Type is pulled from a zero storeState at package init. - Where an embedded source struct is flattened into a single proto message (ReplicaIDAndType and ReplicaType into ReplicaState), the embedded field carries an OmitReason that points at the disposition for the embedded type, and the embedded type's fields get the ProtoField mapping. This keeps the test's "field exists on proto" check honest while avoiding artificial nested proto messages. - LoadVector and SecondaryLoadVector are fixed-length arrays of LoadValue. Proto has no fixed arrays, so the snapshot uses repeated int64 with the documented dimension order. Casting the slice element type back to mmaprototype.LoadValue would create a proto-package import cycle, so the proto stays at int64 and the converter does the cast. - topKReplicas.replicaHeap is a transient construction-time heap, not state, and is omitted with that reason. Its ordered output (replicas) is what the snapshot carries. - storeState.adjusted.loadPendingChanges is encoded as a list of changeID uint64s referencing the canonical clusterState.pendingChanges map. That canonical map is still TODO and lands in the next commit; the IDs will resolve once it does. clusterState.ranges, pendingChanges, the constraint matcher, and the locality interner remain TODO and are addressed by subsequent commits in this arc. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Extend the cluster-state snapshot to cover rangeState and the canonical clusterState.pendingChanges map. The snapshot now carries a single canonical PendingReplicaChange entry per change, keyed by changeID at the top level. Per-store (StoreAdjusted.LoadPendingChangeIds) and per-range (RangeSnapshot.PendingChangeIds) references store only the changeID; readers can resolve them through the canonical map. This matches how clusterState denormalizes a single pendingReplicaChange across three locations (top-level map, storeState.adjusted.loadPendingChanges, and rangeState.pendingChanges) without duplicating the payload on the wire. To make ReplicaIDAndType available standalone (it is the type of ReplicaChange.next), the proto representation of replica state is restructured: ReplicaState now nests a ReplicaIDAndType message, and ReplicaIDAndType nests a ReplicaType message, mirroring the source struct hierarchy. The previous flattened ReplicaState shape introduced in the nodes/stores commit is replaced. The reflect-walk dispositions for ReplicaState/ReplicaIDAndType/ReplicaType are simplified accordingly: each source struct now maps to its own proto message, so no field needs an "embedded; flattened" omission reason. rangeState.conf and rangeState.constraints remain TODO; they belong to the constraint-matcher subsystem and land with the matcher proto in the next commit. clusterState.constraintMatcher and localityTierInterner are still unsnapshotted for the same reason. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add a structural reachability test that prevents future additions to clusterState from silently slipping out of snapshot coverage, plus a small end-to-end protobuf round-trip smoke test. TestSnapshotDispositionsCoverReachable walks the source-side type graph starting at clusterState through every non-omitted field and asserts that any owned struct it discovers has an entry in snapshotFieldDispositions. Combined with the existing per-field check in TestSnapshotCoversAllFields, this means a new owned struct cannot enter the snapshotted graph without either being registered or having the field that pulls it in marked omitted. TestSnapshotProtoRoundTrip builds a tiny clusterState, snapshots it, and round-trips the result through the protobuf wire format via protoutil. This catches proto registration / wire-format regressions and exercises the converter end-to-end. Diagnostic consumers may render via jsonpb or text on top; those paths are exercised by the consumer code. The per-type proto target check in TestSnapshotCoversAllFields now skips types that have no ProtoField entries (i.e. fully omitted, transformed types), so registering them does not require inventing a placeholder proto target. The clusterState.constraintMatcher and clusterState.localityTierInterner fields graduate from TODO omissions to permanent omissions: the matcher is a derived index over StoreAttributes (already in StoreSnapshot) and the constraint patterns referenced by snapshotted span configs (TODO in a follow-up); the interner is a dedup table whose codes are resolved to plain strings at snapshot time, so no consumer ever needs the table. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Snapshot rangeState.conf — the (normalized) span config the allocator uses to make all constraint-relevant decisions for the range. Without it, a snapshot could not explain why MMA targets any particular constraint outcome, defeating much of the diagnostic value. The conf is un-interned back into a roachpb.SpanConfig (only the fields normalizedSpanConfig actually carries — NumVoters, NumReplicas, Constraints, VoterConstraints, LeasePreferences) and stored in a top-level ClusterStateSnapshot.SpanConfigs map. Each RangeSnapshot carries just a uint32 ConfID into that map. Deduplication is required because each RangeMsg constructs a fresh *normalizedSpanConfig (so pointer equality is useless), and on real clusters most ranges share the same span config — inlining the full conf per range would balloon the snapshot 100x. The dedup registry keys on the marshaled proto bytes; structurally identical configs share an id. ConfID=0 is reserved for "no span config attached" (rangeState.conf was nil). rangeState.constraints (the rangeAnalyzedConstraints cache) is now permanently omitted with a "derivable from conf + replicas via analyzeConstraints" reason, paralleling meansMemo's treatment. normalizedSpanConfig is registered in snapshotFieldDispositions with all fields OmitReason so the reachability walk recognizes it as covered without inventing per-field proto mappings (its content is folded into the SpanConfigs table). A new TestSnapshotSpanConfigDedup verifies the property end-to-end: three ranges with two distinct configs produce a SpanConfigs table with two entries, and the two ranges with structurally identical configs share a ConfID. Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Wrap the public Snapshot entry point with a deferred recover that converts
any panic — including non-error panics like panic("string") — into an
assertion-failure error. Snapshot is purely read-only with respect to
clusterState, so dropping the in-flight proto and returning an error on
panic is safe and leaves cs untouched.
This guards the server against bugs in the snapshot conversion path. The
snapshot is invoked by diagnostics surfaces (debug zip, status RPCs,
mma-investigator); a bug in a converter must not be able to tank a
production node. errorutil.MaybeCatchPanic was unsuitable because it
re-panics on non-error panic values, of which mmaprototype has many
(panic(fmt.Sprintf(...)), panic("unimplemented"), etc.).
Snapshot now returns (*mmasnappb.ClusterStateSnapshot, error). The body is
factored into an unexported snapshot() method so the recover wraps a single
call rather than the entire function body.
A new TestSnapshotRecoversFromPanic sabotages cs.localityTierInterner to
force a deterministic nil-pointer panic inside snapshotLocalityTiers and
asserts that the panic surfaces as an error rather than crashing.
Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Wire the MMA cluster state snapshot through the asim datadriven test
harness so each `eval` directive that runs with -rewrite produces one
per-node snapshot JSON alongside the existing metric and trace artifacts.
This gives us a representative real-world snapshot to inspect (for
mma-investigator, dashboards, debug-zip consumers) and lets the asimview
tool consume the same proto schema that production debug-zip produces.
Changes:
- mmaprototype.Allocator gains ClusterStateSnapshot() (returning the
proto snapshot or any panic recovered as an error). The implementation
on allocatorState acquires the existing mutex around cs.Snapshot, since
Snapshot reads cs.nodes and cs.stores. With the recover already inside
Snapshot, callers see panics as errors.
- The datadriven test, after each simulator.RunSim, iterates
state.Nodes() and writes each node's snapshot to
{plotDir}/{testName}_{sample}_n{nodeID}_mma_snapshot.json via jsonpb
(pretty-printed). Writes are gated on the existing -rewrite flag, the
same as the metric and setup writes.
- testdata/generated/.gitignore is extended to track only
*_n1_mma_snapshot.json under sma/rebalancing_qps; per-node snapshots
for n2..n7 are still written locally but kept out of the PR to avoid
bloat (n1 alone is sufficient as a representative sample). Total
tracked addition is ~60 KiB across 4 files (sma-count and mma-only,
samples 1 and 2).
The rebalancing_qps test that produces the tracked artifacts is gated on
COCKROACH_RUN_ASIM_TESTS=true, matching the existing skip_under_ci
treatment of that test.
Epic: none
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
58b5784 to
d323c15
Compare
|
😎 Merged directly without going through the merge queue, as the queue was empty and the PR was up to date with the target branch - details. |
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link) |
|
/trunk merge |
7 already backported via separate release PRs (cockroachdb#169344, cockroachdb#169590, cockroachdb#169711, cockroachdb#169734, cockroachdb#169742, cockroachdb#169761, cockroachdb#169876) but invisible to script because the cherry-picked commits don't reference the original PR number. 7 are unrelated to MMA (admission/AC, sql, kvserver/storage, ci, roachtest/perturbation). 2 are master-and-onward only by intent: cockroachdb#169430 (enable MMA by default in v26.3) and cockroachdb#169669 (retire load_based_lease_rebalancing setting). Net: 0 PRs need backporting to release-26.2. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Backport 7/7 commits from #169230 on behalf of @tbg.
Add a structured proto-based snapshot of
mmaprototype.clusterStatefordiagnostics and offline analysis (mma-investigator-style debugging, regression
record/replay). The snapshot is produced via
clusterState.Snapshot()andserializes to proto binary, jsonpb, or text — consumers pick.
A new sibling package
mmasnappbowns the proto schema. Only types declaredin the
mmaprototypepackage get re-declared as proto messages; shared typesfrom
roachpbare imported and used as-is. Scratch fields, runtime injections(clocks, interface back-refs), and pure caches are intentionally omitted.
Coverage is enforced by two reflect-walk tests:
TestSnapshotCoversAllFields: every declared field of every registeredowned struct must either map to a proto field or carry an explicit
OmitReason.
TestSnapshotDispositionsCoverReachable: every owned struct reachable fromclusterStatethrough non-omitted fields must be registered. Togetherthese prevent new state added to the snapshotted graph from silently
slipping out of coverage.
A
protoutilround-trip smoke test exercises the converter end-to-end on asmall hand-built
clusterState.The asim datadriven harness writes an n1 snapshot per
evaldirectiveunder
pkg/kv/kvserver/asim/tests/testdata/generated/sma/rebalancing_qpswhen run with -rewrite, giving representative real-world snapshots
checked into the tree.
Epic: CRDB-56265
Release note: None
Release justification: low-risk obs improvement for default-off functionality