feat(main): keyviz wiring follow-up (--keyvizHistoryColumns + Phase-2 read TODO)#651
feat(main): keyviz wiring follow-up (--keyvizHistoryColumns + Phase-2 read TODO)#651
Conversation
Round-2 review fixes for PR #647 (Claude bot): - Add --keyvizHistoryColumns flag so operators can shorten the ring buffer for high-cardinality clusters; defaults to keyviz.DefaultHistoryColumns (1440 = 24h at 60s). - startKeyVizFlusher now early-returns when the sampler is nil instead of spawning a goroutine that just parks on ctx.Done. - Document the Phase-2 read-sampling milestone on observeMutation so future readers don't think the missing read path is a bug — the design (§5.1, §10) calls for read attribution from the node that serves the read (LeaseRead / LinearizableRead / follower reads), and that wiring is intentionally deferred to the next slice. Until then the Reads/ReadBytes series stay zero.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 5 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 1m 59s —— View job Code Review — PR #651Todo list:
SummaryClean, minimal follow-up. All three stated changes are correct. One minor test isolation gap worth fixing.
|
There was a problem hiding this comment.
Code Review
This pull request introduces the keyvizHistoryColumns configuration flag to manage the KeyViz ring buffer size and optimizes the flusher by avoiding goroutine creation when the sampler is disabled. It also adds documentation regarding future read sampling support. A review comment suggests implementing an upper bound for the history columns flag to mitigate potential out-of-memory risks from large pre-allocations.
| keyvizStep = flag.Duration("keyvizStep", keyviz.DefaultStep, "Flush interval / matrix-column resolution for the keyviz sampler") | ||
| keyvizMaxTrackedRoutes = flag.Int("keyvizMaxTrackedRoutes", keyviz.DefaultMaxTrackedRoutes, "Maximum routes tracked individually before excess routes coarsen into virtual buckets") | ||
| keyvizMaxMemberRoutesPerSlot = flag.Int("keyvizMaxMemberRoutesPerSlot", keyviz.DefaultMaxMemberRoutesPerSlot, "Maximum members listed on a virtual bucket; excess routes still drive the bucket counters") | ||
| keyvizHistoryColumns = flag.Int("keyvizHistoryColumns", keyviz.DefaultHistoryColumns, "Maximum matrix columns retained in the keyviz ring buffer (each column = one Step)") |
There was a problem hiding this comment.
While this flag allows operators to adjust the ring buffer size, setting an excessively large value could lead to significant memory allocation and potential OOM issues at startup, as the buffer is pre-allocated. Consider adding a reasonable upper bound (e.g., 100,000 columns) in buildKeyVizSampler or NewMemSampler to prevent accidental misconfiguration from crashing the node.
References
- To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests or configuration changes.
Round-1 review fix for PR #651 (Gemini medium): newRingBuffer pre-allocates a slice of capacity HistoryColumns at construction. A misconfiguration like --keyvizHistoryColumns=100000000 (operator typo) would reserve gigabytes upfront and likely OOM the node before the heatmap returned its first column. Add MaxHistoryColumns = 100_000 (~70 days at 60s Step) and clamp opts.HistoryColumns in NewMemSampler so excessive values silently land at the cap instead of being trusted as-is. The cap lives in the keyviz package (next to the existing Default* constants) because the risk is a property of the data structure, not the caller. Operators wanting longer retention should use the Phase 3 persistence path (per-Raft-group `!admin|keyviz|*` namespace) — not a giant in-memory ring. Test TestHistoryColumnsClampedAboveMax confirms both above-cap input clamps to MaxHistoryColumns and exactly-at-cap input is preserved.
|
@claude review |
|
Claude finished @bootjp's task in 1m 31s —— View job Code Review — PR #651Todo list:
Overall verdict: Approve ✓All three stated goals are correctly implemented. The gemini-raised OOM concern is fully addressed by the second commit. One pre-existing test isolation gap remains.
|
Round-2 review fix for PR #651 (Claude bot minor): withFlags helper saved/restored every keyviz flag except the new --keyvizHistoryColumns, so a future test mutating the flag through the helper would leak state across parallel tests. Add the slot, update both call sites, and pin the flag → buildKeyVizSampler → MemSamplerOptions.HistoryColumns forwarding by asserting s.HistoryColumns() in TestBuildKeyVizSamplerHonorsEnabledFlag. Also adds a HistoryColumns() accessor on *MemSampler paralleling Step() so wiring tests don't need to reach into the unexported opts struct.
|
@claude review |
|
Claude finished @bootjp's task in 2m 4s —— View job Code Review — PR #651Todo list:
Overall verdict: Approve ✓All three issues raised in the previous two review rounds have been closed out. The PR is clean. Round-2 gaps — now closed
Forwarding path assertion ( require.Equal(t, 16, s.HistoryColumns())This pins the wiring
|
## Summary Phase 2-B of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`): integrate the heatmap into the existing `web/admin/` SPA rather than building a parallel dashboard. Doc + implementation in one PR (doc commit first, per CLAUDE.md design-doc-first workflow). - New page at `/keyviz` — canvas heatmap polling `/admin/api/v1/keyviz/matrix` with series picker (writes / reads / write_bytes / read_bytes), row-budget input (clamped at 1024), and off / 5 s / 30 s auto-refresh. - Cold cells (value 0) render as the page background, not a faint blue — spotting actually-cold routes stays the dominant visual signal. - Row-detail flyout on hover: bucket_id, start, end, aggregate, route_count, route_ids (with truncation indicator). `start` / `end` decode through a printable-or-hex preview so binary keys do not render as mojibake. - Dependency-free: hand-rolled five-stop colour ramp in `lib/colorRamp.ts`. No d3, no ECharts. Bundle grew from ~155 kB to ~208 kB (raw), 64 kB gzipped. Backend is unchanged. The handler at `/admin/api/v1/keyviz/matrix` and its sampler wiring already shipped under Phase 2-A (PRs #639 / #645 / #646 / #647 / #651 / #660 / #661 / #672). ## Design `docs/design/2026_04_27_proposed_keyviz_spa_integration.md` — proposed status, lands with this PR. Parent design §12 phase table is split into 2-A (server, shipped) / 2-B (SPA, this PR) / 2-C (cluster fan-out, open). ## Five-lens self-review 1. **Data loss** — n/a; SPA is read-only against an existing handler. 2. **Concurrency / distributed** — n/a; single browser tab polling a single handler. Sampler concurrency was already covered by Phase 2-A tests. 3. **Performance** — Canvas + per-cell `fillRect` runs under §10 budget at 1024 × 500. Auto-refresh defaults off; 5-second cadence is the lower bound (sampler flush is 1 s). 4. **Data consistency** — SPA renders whatever the handler returns; consistency guarantees come from the existing leader-issued counters in the sampler. 5. **Test coverage** — `tsc -b --noEmit` clean; `vite build` clean; `go build ./internal/admin/...` clean (embed glob unaffected); `go test ./internal/admin/...` clean. Manual verification documented in the design doc §5. ## Test plan - [x] `npm run lint` (`tsc -b --noEmit`) — clean - [x] `npm run build` (Vite) — clean, output goes to `internal/admin/dist` - [x] `go build ./internal/admin/...` — clean - [x] `go test ./internal/admin/...` — clean - [ ] Manual: `make run` + `make client`, navigate to `/keyviz`, see hot routes light up red within ~5 s of write traffic - [ ] Manual: series picker swaps the displayed counter; row-budget input clamps at 1024; auto-refresh polls without flicker ## Out of scope - **Cluster fan-out** — handler is currently node-local. Phase 2-C will add a cross-node admin RPC; this PR will pick up the aggregate view automatically once that ships. - **Drill-down per-route sparkline** — Phase 3. - **Routes / Raft Groups correlation** — Phase 1 SPA pages not yet built; correlation lands when those pages do. - **`localStorage` for series / rows / refresh** — punt to follow-up.
Summary
Follow-up to PR #647, which was merged at the round-1 commit before round-2 review fixes propagated. This PR carries the orphaned round-2 changes:
--keyvizHistoryColumnsflag (defaults tokeyviz.DefaultHistoryColumns = 1440, i.e. 24h at 60s) so operators can shorten the ring buffer for high-cardinality clusters without rebuilding.startKeyVizFlusherearly-returns when the sampler is nil instead of spawning a goroutine that just parks onctx.Done— the goroutine was harmless but had no signal.observeMutationdocumenting the Phase-2 read-sampling milestone (design §5.1, §10) so future readers don't think the missing read path is a regression. Until that wiring lands the matrix'sReads/ReadBytesseries stay zero.These items came out of Claude bot's round-2 review of #647 but landed after the merge button was pressed.
Test plan
go build .,go vet .,golangci-lint run ./...clean.go test -race -count=1 -run 'TestBuildKeyVizSampler|TestSeedKeyVizRoutes|TestStartKeyVizFlusher' .clean.