diff --git a/docs/design/2026_04_28_proposed_keyviz_adapter_labels.md b/docs/design/2026_04_28_proposed_keyviz_adapter_labels.md new file mode 100644 index 00000000..88ad5e4e --- /dev/null +++ b/docs/design/2026_04_28_proposed_keyviz_adapter_labels.md @@ -0,0 +1,891 @@ +--- +status: proposed +phase: 2-C+ +parent_design: docs/admin_ui_key_visualizer_design.md +author: bootjp +date: 2026-04-28 +--- + +# KeyViz adapter / namespace labels + +## 1. Background + +Phase 2-A through 2-C ship a fully functional KeyViz heatmap, but +the smallest unit of attribution is a **Raft route** — a contiguous +key range owned by one group. When multiple adapters share a route +(the default single-group config has every adapter writing into a +single `[-∞, +∞)` route), the heatmap shows one row with the +combined traffic and the operator cannot tell whether the spike +came from DynamoDB, Redis, S3, SQS, or RawKV. + +A user observed this in production: 6-node cluster, fan-out +returning a single row at `route:1` with `Total = 378` writes, +covering all five adapters indistinguishably. + +The hotspot-shard-split design and the multi-group startup flags +(`--raftRedisMap` / `--raftDynamoMap` / `--raftS3Map` / +`--raftSqsMap`) already give operators a way to split traffic +across distinct Raft groups so per-route attribution becomes +per-adapter attribution. But that is an **operational** workaround: +single-group deployments — the most common shape for first-time +operators and small clusters — still get the all-traffic-in-one-row +view. + +This proposal adds an **independent label dimension** to the +sampler so a single Raft group can still surface per-adapter +breakdown in the heatmap. + +## 2. Goals and non-goals + +### 2.1 Goals + +- Attribution **inside a single route**: a row that today reads + `route:1, total=378` should optionally split into sub-rows like + `route:1 / dynamo`, `route:1 / s3`, `route:1 / redis`, … +- Minimal hot-path penalty when labels are **not** configured: a + default deployment incurs **no additional lookups and no + additional allocations**, with only the slightly wider hash + and equality work that comes from the `slotKey{uint64, Label}` + map key compared to today's bare `uint64`. (Copilot + round-15-4th-pass observed that the original "zero penalty" + framing was overstated — the key shape widens regardless of + whether labels are enabled.) +- Adapter-side wiring is the natural place to set labels (every + adapter already has the dispatch entry into + `ShardedCoordinator.Observe…` — see `kv/sharded_coordinator.go`). + No global key-prefix table on the operator side. +- Cluster fan-out merges per-(route, label) cells, not just + per-route — operators see the same per-adapter breakdown across + the whole cluster. + +### 2.2 Non-goals (deferred) + +- **Free-form custom labels** (per-table, per-bucket, per-queue). + This proposal stops at one level: the adapter family. A + follow-up can add a second dimension (e.g. `dynamo / users`, + `dynamo / orders`) once the wire format settles. +- **Persistence**. Labels are in-memory like the rest of the + Phase 2 sampler. +- **Per-key-byte attribution**. Sub-route classification is done + at the dispatch entry (the adapter knows its identity); we do + not attempt to classify by inspecting the key bytes. +- **Backwards-incompatible wire-format changes**. The label is + added as an optional field; old SPAs against new servers keep + working. + +## 3. Surface + +The user-visible delta is one extra label on every heatmap row. +Default config emits the empty label (legacy behaviour); when +adapters set labels, the heatmap splits a single route into one +row per (route, label) pair: + +```text +Today (single group, no labels): + Row 0 route:1 Total 378 + +After (adapters tag their traffic): + Row 0 route:1 / dynamo Total 142 + Row 1 route:1 / redis Total 200 + Row 2 route:1 / sqs Total 18 + Row 3 route:1 / s3 Total 11 + Row 4 route:1 / rawkv Total 7 +``` + +## 4. Options for label propagation + +Four ways to get the label from the adapter to the sampler. Pick +one based on hot-path cost vs. plumbing weight. + +### 4.1 (Recommended) Per-Observe label + +Extend the sampler signature with a typed `Label` (defined as +`type Label string` in `keyviz/labels.go`, see §9 Q2): + +```go +Observe(routeID uint64, op Op, keyLen, valueLen int, label Label) +``` + +Empty `label` (`LabelLegacy`) → existing behaviour, single row +per route. Adapters set their own canonical constant +(`LabelDynamo`, `LabelRedis`, …) at the dispatch site they +already own — typed constants prevent typos like `"DynamoDB"` / +`"dynamo"` drift at compile time. + +Cost on hot path: still one `slots` map lookup per `Observe` on +the hit path (the existing code already does a single +`tbl.slots[routeID]` lookup, with a fallback into +`virtualForRoute` only on miss). What changes is the **key +shape**: `(routeID, label)` instead of just `routeID`, so the +map key is a `{uint64, Label}` struct — slightly wider hash and +equality work per lookup. `label` is a small interned constant +(`"dynamo"`), so the lookup itself allocates nothing. The miss +rate is unchanged because PR-C+D+E pre-creates a slot for every +member of `AllLabels` at `RegisterRoute` time, so labeled +`Observe` calls hit the same way today's unlabeled calls do. +(Copilot round-15 caught the earlier "one extra map lookup" +phrasing — the lookup count is unchanged; only the key shape +widens.) + +Storage: each (route, label) gets its own `routeSlot` **when +labels are enabled** (`--keyvizLabelsEnabled=true`). Total slot +memory is `MaxTrackedRoutes × (len(AllLabels)+1)` with labels +enabled, and `MaxTrackedRoutes × 1` with labels disabled (legacy +slot only — same number of slots as today's baseline). The +pre-allocation cost is gated on the flag, so a cluster that +deploys the PR-C+D+E binary but leaves the flag off avoids the +**multiplicative** `len(AllLabels)+1` slot-count increase, though +each slot's map key is `slotKey{uint64, Label}` (a struct with a +string header) instead of a bare `uint64`, so there is a small +per-slot overhead vs today even with the flag off. The dominant +memory cost — the multiplicative slot count — is the part the +flag actually gates (Codex round-15-4th-pass P1; Copilot round-23 +caught the "no memory penalty" overstatement). +`MaxTrackedRoutes` retains its route-counting semantics under +Option A (§4.1.1), so no operator action is required at PR-C+D+E +rollout. (Vestigial "when operators raise it for label use" +framing was rejected — Codex round-10 P2 + Claude bot round-11.) + +#### 4.1.1 Slot lifecycle and the lockless invariant + +The current `Observe` is a lockless `Load → lookup → atomic.Add` +hot path. On a miss in `tbl.slots[...]`, it may fall back to +`tbl.virtualForRoute[...]`; if both lookups miss, the sample is +silently dropped. The invariant is **not** "no fallback" — it is +"`Observe` never allocates and never creates slots on the hot +path". `RegisterRoute` pre-creates every slot before traffic +arrives, so the hot path never needs to allocate. We keep that +invariant by making `RegisterRoute` **pre-create one slot per +known label** at registration time when `--keyvizLabelsEnabled=true`, +but only on the individual-tracking path. (Copilot round-15-3rd-pass +caught the earlier "no fallback" wording — the fallback exists today +and must be preserved; what stays unchanged is the no-allocate +property.) The label set is the canonical `keyviz/labels.go` +constants (§9 Q2); per-route slot count is `len(labels) + 1` when +labels are enabled (the +1 is the empty-label legacy slot, kept so +callers that pass `label=""` still hit a slot) and `1` (legacy slot +only) when labels are disabled. Total individual slots in the table +is therefore `len(routes) × (len(labels) + 1)` with labels enabled +and `len(routes) × 1` with labels disabled. (Codex round-15-4th-pass +P1 caught the earlier framing that pre-allocated labeled siblings +even when the flag was off — that imposed an unnecessary multi-x +memory penalty on clusters that never enabled labels.) + +#### Pre-allocation is gated on PR-C, not PR-B + +PR-B alone changes the slot-key type from `uint64` to `slotKey` +but still only pre-creates the **legacy empty-label slot**. The +labeled siblings are created by PR-C, when adapters actually +start passing non-empty labels. This split is deliberate and +preserves PR-B as a behavior-neutral refactor: + +- After PR-B alone, slot count per route is **1** (just the + legacy slot, identical to today's `routeID`-only behavior). + A deployment running 1024 routes today still supports 1024 + routes after PR-B — no early coarsening, no virtual-bucket + fold, no per-operator action required. (Codex P1 on round-5.) +- After PR-C with `--keyvizLabelsEnabled=true`, slot count per + route grows to `len(labels) + 1`. Operators **do not** need to + raise `--keyvizMaxTrackedRoutes` — the §4.1.1 Option A + coarsening check divides slot count by `slotsPerRoute` (which + is `len(AllLabels)+1` when the flag is on, `1` when off), so + the operator-set cap continues to mean "individual routes." + (An earlier draft told operators to scale the cap; that + contradicted Option A. Codex round-10 P2.) +- After PR-C with `--keyvizLabelsEnabled=false` (the default + immediately after deploying the bundled binary), slot count per + route is **1** (legacy slot only) — `RegisterRoute` skips + labeled-sibling pre-creation, so memory matches today's + baseline. The flag-flip restart introduces the labeled siblings. + (Codex round-15-4th-pass P1.) + +Concretely PR-B's `RegisterRoute` body looks like: + +```go +// PR-B body: +slot := s.reclaimRetiredSlot(routeID, "") // (RouteID, Label) dedupe — PR-B widened the signature +if slot == nil { + slot = newSlot(routeID, "", start, end, groupID) +} +next.slots[slotKey{RouteID: routeID, Label: ""}] = slot + +// PR-C extends to (flag-gated, Codex round-15-4th-pass P1): +labelsToCreate := []Label{LabelLegacy} // legacy slot always +if s.opts.KeyVizLabelsEnabled { + labelsToCreate = allLabelsWithLegacy() // AllLabels ∪ {LabelLegacy}, fresh slice +} +for _, label := range labelsToCreate { + slot := s.reclaimRetiredSlot(routeID, label) // reclaim per (routeID, label), not just legacy + if slot == nil { + slot = newSlot(routeID, label, start, end, groupID) + } + next.slots[slotKey{RouteID: routeID, Label: label}] = slot +} +``` + +`allLabelsWithLegacy()` is an unexported helper in +**`keyviz/labels.go`** (same package as `sampler.go`) that returns +a **freshly allocated slice** containing `AllLabels` followed by +`LabelLegacy`. Concretely: `result := make([]Label, +len(AllLabels), len(AllLabels)+1); copy(result, AllLabels[:]); +result = append(result, LabelLegacy)` — pre-size the backing +array via the capacity hint so the final length is exactly +`len(AllLabels)+1` with one allocation. The explicit `AllLabels[:]` +re-slice is required because §9 Q2 declares `AllLabels` as a +fixed-size array (`[5]Label`) and Go's `copy` builtin needs slice +arguments. Naively returning `append(AllLabels[:], LabelLegacy)` +would be a footgun — Go's `append` against a slice-of-array may +share storage with the underlying array, which would then visibly +mutate the canonical `AllLabels` (silently overwriting the legacy +entry into the array's tail or, more commonly, panicking because +the array is full). Co-locating the helper with `AllLabels` keeps the +canonical set in one file and avoids `sampler.go` reconstructing +the same expansion at every call site. The function is +unexported because no caller outside `package keyviz` needs it. +(Claude bot round-9 minor; Copilot round-15-4th-pass added the +fresh-slice safety requirement; Copilot round-19 caught the +off-by-one in the prior "length len(AllLabels)+1, then append" +phrasing — that produced length `len(AllLabels)+2`. Now uses +length `len(AllLabels)`, capacity `len(AllLabels)+1`, then +append, yielding the intended length `len(AllLabels)+1`.) + +When `RegisterRoute` decides a route will be coarsened into a +virtual bucket (i.e. it returns `false`), **no labeled slots are +created** for that route — the `tbl.virtualForRoute[routeID]` +fallback in `Observe` already handles all labels uniformly per +§6.5 (virtual buckets always emit `Label = ""`). Pre-creating +labeled slots that would never be hit would just burn allocator +work. + +#### `MaxTrackedRoutes` semantics: count routes, not slots + +After the slot-key widens to `slotKey{uint64, Label}`, the live +table contains `len(AllLabels) + 1` slots per individually-tracked +route when `--keyvizLabelsEnabled=true` (and `1` slot per route +when the flag is off) — bare `len(next.slots)` no longer equals +"number of routes." We pick **Option A (route-counting)**: the +coarsening check at `keyviz/sampler.go:416` divides by +`slotsPerRoute`, which is `len(AllLabels) + 1` when the flag is +enabled and `1` when off. `MaxTrackedRoutes` continues to count +routes, not slots, in both regimes. This preserves the operator- +visible meaning of the existing flag — `--keyvizMaxTrackedRoutes=1024` +still means "1024 individual routes," not "1024 slots that may be +~170 routes." Total slot memory is `MaxTrackedRoutes × +slotsPerRoute` (i.e. `MaxTrackedRoutes × (len(AllLabels) + 1)` +when labels are enabled, `MaxTrackedRoutes × 1` when off), which +we document but never count against the cap. + +PR-B body (unconditional, no flag check — Codex round-26 P2): + +```go +// keyviz/sampler.go:416 — PR-B body +slotsPerRoute := 1 // legacy slot only; unconditional in PR-B +if len(next.slots) / slotsPerRoute < s.opts.MaxTrackedRoutes { + // ... pre-create slot(s) for this route +} +``` + +PR-C+D+E replaces the literal `1` with the flag-gated +conditional (Codex round-15-4th-pass P1): + +```go +// keyviz/sampler.go:416 — PR-C+D+E body (replaces PR-B body above) +slotsPerRoute := 1 // legacy slot only (when flag off) +if s.opts.KeyVizLabelsEnabled { + slotsPerRoute = len(AllLabels) + 1 // 6 with flag on (in-package, no keyviz. qualifier) +} +if len(next.slots) / slotsPerRoute < s.opts.MaxTrackedRoutes { + // ... pre-create slot(s) for this route +} +``` + +(Two separate code blocks — the dual `slotsPerRoute :=` form +in a single Go scope is a compile error, "no new variables on +left side of `:=`"; Claude round-27 minor caught the +single-block presentation.) + +The alternative (Option B: `MaxTrackedRoutes` redefined as +"slots") would silently halve the route capacity at PR-C +rollout — an operator setting 1024 today would suddenly track +~170 routes after PR-C deploys with five labels, even though +the flag value is unchanged. We reject this on the +"behavior-preserving for legacy single-group deployments" +goal in §2.1. (Claude bot moderate on PR #694 round-5.) + +New labels added after a route is registered are not auto-bound; +an operator deploying a new adapter must **deploy a new binary** +that includes the new constant (a process restart alone does +nothing — the canonical set lives at compile time). Matches the +current "ApplySplit / ApplyMerge in a future PR" semantics for +the static route catalog. + +**`virtualForRoute` stays `map[uint64]*routeSlot`** (no widening +to `slotKey`). Virtual buckets aggregate across labels per §6.5, +so the fallback lookup keys on `RouteID` alone — widening it to +`slotKey` would make every coarsened-route `Observe` miss both +the labeled lookup AND the virtual fallback, silently dropping +all traffic for those routes. The new `Observe` lookup chain is: + +```go +slot, ok := tbl.slots[slotKey{RouteID: routeID, Label: label}] +if !ok { + slot, ok = tbl.virtualForRoute[routeID] // coarsened route: label irrelevant, aggregate all traffic + if !ok { + return // unknown route — drop, same as today + } +} +// atomic.Add as today +``` + +`routeSlot` gains a `Label string` field so `Flush` / +`appendDrainedRow` can read the label off the slot rather than +threading it through a separate channel: + +```go +type routeSlot struct { + metaMu sync.RWMutex + RouteID uint64 + Label Label // new — empty for the legacy unlabeled slot, typed via keyviz/labels.go + GroupID uint64 // Phase 2-C+, see PR-3a + Start, End []byte + ... +} +``` + +`reclaimRetiredSlot` (`keyviz/sampler.go:710`) must dedupe on +`(RouteID, Label)`, not `RouteID` alone — a slot retired as +`(routeID, "dynamo")` and re-registered as `(routeID, "redis")` +is **not** a reclaim candidate; they are different slots. + +#### Symmetric teardown in `RemoveRoute` + +`RemoveRoute(routeID)` must be updated to remove **every** +`slotKey{routeID, label}` for `label ∈ AllLabels ∪ {""}` — not +just the legacy `(routeID, "")` entry. The current code at +`keyviz/sampler.go:532`: + +```go +delete(next.slots, routeID) +s.retiredSlots = append(s.retiredSlots, retiredSlot{slot: individual, retiredAt: retiredAt}) +``` + +is a `map[uint64]*routeSlot` delete; after key widening this +becomes a compile error. The PR-B fix loops over `AllLabels` +plus the empty legacy label and retires N+1 slots in one pass: + +```go +for _, label := range allLabelsWithLegacy() { // AllLabels ∪ {""} + key := slotKey{RouteID: routeID, Label: label} + if slot := next.slots[key]; slot != nil { + delete(next.slots, key) + s.retiredSlots = append(s.retiredSlots, retiredSlot{slot: slot, retiredAt: retiredAt}) + } +} +``` + +If only the legacy slot is retired, every labeled +`(routeID, "dynamo")` / `(routeID, "redis")` / … remains in the +live `routeTable` receiving Observe traffic forever — orphaned +slots accumulating until process restart. The §8 lens-5 test +"`RemoveRoute` retires N+1 slots" pins this. (Claude bot moderate +on PR #694 round-4.) + +#### `RegisterRoute` idempotency check after key widening + +The current idempotency guard is `if _, ok := cur.slots[routeID]` +which becomes a type mismatch after the key widens to `slotKey`. +PR-B replaces it with a check on the legacy slot: +`if _, ok := cur.slots[slotKey{RouteID: routeID}]; ok { return true }` +— the legacy slot's presence implies the labeled siblings were +already created by an earlier `RegisterRoute` call (the +pre-creation loop runs under the `routesMu` lock). + +Two alternatives we explicitly reject: + +- **First-seen miss-fall-back to empty label**: surprising silent + data-loss for the first Observe per (route, label) pair. Hard + to debug. +- **First-seen creates a slot under mutex inside `Observe`**: + breaks the zero-mutex hot-path contract. Non-starter. + +(Reviewer notes from PR #694: Claude bot critical, Gemini medium.) + +### 4.2 Per-adapter sampler instance + +Wire one `*MemSampler` per adapter, each with a fixed label. The +admin handler queries every sampler and concatenates the results. + +- Pro: zero hot-path code change in the existing sampler. +- Con: every adapter gets its own ring buffer, history, and + retention machinery. Memory is N× higher and per-route + metadata duplicates across samplers. + +### 4.3 Per-key-prefix taxonomy (operator-configured) + +Static `{prefix → label}` map registered at startup. Sampler +classifies each key at Observe time by prefix-matching. + +- Pro: no adapter wiring; works with any caller that goes through + `ShardedCoordinator`. +- Con: prefix-match per Observe is a hot-path cost, and the + taxonomy is a new operator-facing config the design has been + careful to avoid. + +### 4.4 Hash the adapter into the route catalog + +Make `distribution.Route` carry an adapter label and route by +adapter at the catalog layer. The sampler stays single-keyed. + +- Pro: solves attribution at the catalog level, where it actually + belongs. +- Con: the catalog is the wrong place for this — adapters share + Raft groups by design, and forcing `Route` to carry adapter + identity bakes a different separation into the route topology. + Reverses the multi-group startup-flag story. + +**Recommendation: Option 4.1.** Lowest plumbing weight, and the +label originates where it is most naturally available (the +adapter's dispatch entry). The hot path remains a **single** map +lookup per `Observe`; only the lookup key widens (so the change +is wider hash and equality work, not an additional lookup). The +`routeSlot` map shape changes from `map[uint64]*routeSlot` to +`map[slotKey]*routeSlot` with `slotKey = {uint64, Label}`. +(Copilot round-19 caught the earlier "smallest hot-path delta +(one map lookup)" wording, which could be misread as adding an +extra lookup.) + +## 5. Wire format extension + +Three layers gain a label field. They are kept in sync at PR-D+E +(both shipping together) so an aggregator running PR-D against an +upstream still on PR-C never sees a wire shape mismatch: + +```diff + // keyviz/sampler.go — internal type, available from PR-B. + // MatrixRow.Label uses the typed `keyviz.Label` so adapters + // referencing it via the `Sampler` interface can't accidentally + // hand-write a string literal that drifts from the canonical + // constant set. The proto and JSON shapes below use plain `string` + // because the wire form is operator-visible and we want to allow + // unknown labels through without a schema bump (an old binary + // returning a label a new SPA hasn't seen yet still renders). + type MatrixRow struct { + RouteID uint64 ++ Label Label // typed; see keyviz/labels.go + Start, End []byte + … + } + + // proto/admin.proto — wire form, PR-D+E. + // + // Reuse the **existing** field 4 (`string label`), which is + // already declared on KeyVizRow but currently unused and + // undocumented (no doc comment on the field) — no schema + // migration, no field-number bump. + // The current proto declares `string label = 4;` with no + // doc comment, and the nearby `bucket_id` comment only mentions + // the legacy `route:` / `virtual:` forms. PR-D+E starts + // filling the field AND updates the proto comments in the same + // commit (both the new label-field doc comment and the + // `bucket_id` comment so it lists the composite + // `route::