Skip to content

feat(main): keyviz wiring follow-up (--keyvizHistoryColumns + Phase-2 read TODO)#651

Merged
bootjp merged 3 commits intomainfrom
feat/keyviz-main-wiring-followup
Apr 26, 2026
Merged

feat(main): keyviz wiring follow-up (--keyvizHistoryColumns + Phase-2 read TODO)#651
bootjp merged 3 commits intomainfrom
feat/keyviz-main-wiring-followup

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 26, 2026

Summary

Follow-up to PR #647, which was merged at the round-1 commit before round-2 review fixes propagated. This PR carries the orphaned round-2 changes:

  • New --keyvizHistoryColumns flag (defaults to keyviz.DefaultHistoryColumns = 1440, i.e. 24h at 60s) so operators can shorten the ring buffer for high-cardinality clusters without rebuilding.
  • startKeyVizFlusher early-returns when the sampler is nil instead of spawning a goroutine that just parks on ctx.Done — the goroutine was harmless but had no signal.
  • TODO on observeMutation documenting the Phase-2 read-sampling milestone (design §5.1, §10) so future readers don't think the missing read path is a regression. Until that wiring lands the matrix's Reads/ReadBytes series stay zero.

These items came out of Claude bot's round-2 review of #647 but landed after the merge button was pressed.

Test plan

  • go build ., go vet ., golangci-lint run ./... clean.
  • go test -race -count=1 -run 'TestBuildKeyVizSampler|TestSeedKeyVizRoutes|TestStartKeyVizFlusher' . clean.

Round-2 review fixes for PR #647 (Claude bot):

- Add --keyvizHistoryColumns flag so operators can shorten the ring
  buffer for high-cardinality clusters; defaults to
  keyviz.DefaultHistoryColumns (1440 = 24h at 60s).
- startKeyVizFlusher now early-returns when the sampler is nil
  instead of spawning a goroutine that just parks on ctx.Done.
- Document the Phase-2 read-sampling milestone on observeMutation
  so future readers don't think the missing read path is a bug —
  the design (§5.1, §10) calls for read attribution from the node
  that serves the read (LeaseRead / LinearizableRead / follower
  reads), and that wiring is intentionally deferred to the next
  slice. Until then the Reads/ReadBytes series stay zero.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 26, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 5 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 5 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0cc4e5b3-a2f7-4547-9caa-f1e9135a3beb

📥 Commits

Reviewing files that changed from the base of the PR and between 41c7f0f and 8f5d0e0.

📒 Files selected for processing (5)
  • keyviz/sampler.go
  • keyviz/sampler_test.go
  • kv/sharded_coordinator.go
  • main.go
  • main_keyviz_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/keyviz-main-wiring-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 1m 59s —— View job


Code Review — PR #651

Todo list:

  • Gather context and read PR description
  • Diff changed files against origin/main
  • Read main.go changes
  • Read kv/sharded_coordinator.go changes
  • Verify flag wiring, defaults, and usage consistency
  • Cross-check design doc references (§5.1, §10)
  • Post final review

Summary

Clean, minimal follow-up. All three stated changes are correct. One minor test isolation gap worth fixing.


main.go--keyvizHistoryColumns flag (+8/-2)

Flag registration and wiring (main.go:138, main.go:1380) — correct. The flag defaults to keyviz.DefaultHistoryColumns (1440), the value flows directly into MemSamplerOptions.HistoryColumns inside buildKeyVizSampler, and the flag description accurately describes the unit. No issues here.

startKeyVizFlusher nil guard — the early return on s == nil (before eg.Go) is a clean improvement. The prior code would spawn a goroutine that blocked on ctx.Done with no productive work; the updated comment text is accurate.

Minor: withFlags helper in main_keyviz_test.go doesn't cover the new flag (main_keyviz_test.go:94). The helper saves/restores keyvizEnabled, keyvizStep, keyvizMaxTrackedRoutes, and keyvizMaxMemberRoutesPerSlot — but keyvizHistoryColumns is now a fifth package-level flag var and is not covered. This won't cause any of the current tests to fail, but if a future test modifies *keyvizHistoryColumns and calls withFlags, the value won't be restored, leaking state across tests. Adding it now while the change is fresh keeps the helper consistent with the flag surface: Fix this →

Minor: no test pins the HistoryColumns forwarding path. The three tests cited in the PR test plan exercise buildKeyVizSampler (nil vs non-nil), seedKeyVizRoutes, and startKeyVizFlusher lifecycle — but none asserts that the value of *keyvizHistoryColumns actually reaches MemSamplerOptions.HistoryColumns. A one-line assertion inside TestBuildKeyVizSamplerHonorsEnabledFlag (e.g., set the flag to a non-default value and call s.opts.HistoryColumns == expected, or check through a method if the field is unexported) would lock down the wiring. Not a blocker, but it is the only new wiring in this PR.


kv/sharded_coordinator.go — Phase-2 TODO comment (+7)

The comment accurately describes what is deferred and why — the design (§5.1, §10 of docs/admin_ui_key_visualizer_design.md) calls for read attribution from the node that serves the read; that wiring is intentionally out of scope for this writes-first slice. The referenced sections match the actual design doc content. The comment will prevent future readers from filing a "reads are always zero, is this a bug?" question. Good documentation.

One minor style note: the TODO token is TODO(keyviz Phase 2) which is non-standard for Go (TODO(username) or TODO(issue#) being the common idioms). This is a project-level convention question and not a linting failure — just worth knowing if you plan to grep for open TODOs later.


Overall

Approve. The flag wiring and nil-guard are correct, the TODO documents a real design gap without over-promising, and the tests cited in the PR description pass. The two "minor" items above are non-blocking but worth a follow-up commit or issue if you want the helper to stay tidy.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the keyvizHistoryColumns configuration flag to manage the KeyViz ring buffer size and optimizes the flusher by avoiding goroutine creation when the sampler is disabled. It also adds documentation regarding future read sampling support. A review comment suggests implementing an upper bound for the history columns flag to mitigate potential out-of-memory risks from large pre-allocations.

Comment thread main.go
keyvizStep = flag.Duration("keyvizStep", keyviz.DefaultStep, "Flush interval / matrix-column resolution for the keyviz sampler")
keyvizMaxTrackedRoutes = flag.Int("keyvizMaxTrackedRoutes", keyviz.DefaultMaxTrackedRoutes, "Maximum routes tracked individually before excess routes coarsen into virtual buckets")
keyvizMaxMemberRoutesPerSlot = flag.Int("keyvizMaxMemberRoutesPerSlot", keyviz.DefaultMaxMemberRoutesPerSlot, "Maximum members listed on a virtual bucket; excess routes still drive the bucket counters")
keyvizHistoryColumns = flag.Int("keyvizHistoryColumns", keyviz.DefaultHistoryColumns, "Maximum matrix columns retained in the keyviz ring buffer (each column = one Step)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this flag allows operators to adjust the ring buffer size, setting an excessively large value could lead to significant memory allocation and potential OOM issues at startup, as the buffer is pre-allocated. Consider adding a reasonable upper bound (e.g., 100,000 columns) in buildKeyVizSampler or NewMemSampler to prevent accidental misconfiguration from crashing the node.

References
  1. To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests or configuration changes.

Round-1 review fix for PR #651 (Gemini medium):

newRingBuffer pre-allocates a slice of capacity HistoryColumns at
construction. A misconfiguration like --keyvizHistoryColumns=100000000
(operator typo) would reserve gigabytes upfront and likely OOM the
node before the heatmap returned its first column.

Add MaxHistoryColumns = 100_000 (~70 days at 60s Step) and clamp
opts.HistoryColumns in NewMemSampler so excessive values silently
land at the cap instead of being trusted as-is. The cap lives in the
keyviz package (next to the existing Default* constants) because the
risk is a property of the data structure, not the caller.

Operators wanting longer retention should use the Phase 3 persistence
path (per-Raft-group `!admin|keyviz|*` namespace) — not a giant
in-memory ring.

Test TestHistoryColumnsClampedAboveMax confirms both above-cap input
clamps to MaxHistoryColumns and exactly-at-cap input is preserved.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 1m 31s —— View job


Code Review — PR #651

Todo list:

  • Gather context and read PR description
  • Read changed files (keyviz/sampler.go, keyviz/sampler_test.go, kv/sharded_coordinator.go, main.go)
  • Check if OOM concern (gemini review) was addressed by MaxHistoryColumns clamp commit
  • Verify test isolation gap (withFlags helper coverage)
  • Post final review

Overall verdict: Approve

All three stated goals are correctly implemented. The gemini-raised OOM concern is fully addressed by the second commit. One pre-existing test isolation gap remains.


keyviz/sampler.goMaxHistoryColumns upper bound

New constant + clamp (sampler.go:85-92, sampler.go:315-317):

const MaxHistoryColumns = 100_000

if opts.HistoryColumns > MaxHistoryColumns {
    opts.HistoryColumns = MaxHistoryColumns
}

This directly addresses the gemini review concern. The clamp sits between the <= 0 default and the first allocation call, so both under- and over-range inputs are handled in the same place. The constant's doc comment explains the design rationale (Phase 3 persistence as the intended path for long retention). Clean.

TestHistoryColumnsClampedAboveMax (sampler_test.go:878-895) — covers 10× MaxHistoryColumns (clamped) and MaxHistoryColumns exactly (preserved). Both branches exercised, behaviour locked down.


main.go--keyvizHistoryColumns flag

Flag registration (main.go:138):

keyvizHistoryColumns = flag.Int("keyvizHistoryColumns", keyviz.DefaultHistoryColumns, "Maximum matrix columns retained in the keyviz ring buffer (each column = one Step)")

Default matches keyviz.DefaultHistoryColumns (1440). Description is accurate. Flag name follows the existing keyviz* naming convention.

Wiring into buildKeyVizSampler (main.go:1380): HistoryColumns: *keyvizHistoryColumns is the only new line here. The MaxHistoryColumns guard in NewMemSampler means the coordinator path doesn't need a defensive clamp.

startKeyVizFlusher nil guard (main.go:1420-1422): Early return on s == nil is correct. The removed goroutine was allocating a stack and a goroutine for the sole purpose of waiting on ctx.Done — harmless, but worthless. The updated comment accurately describes why it was removed.


kv/sharded_coordinator.go — Phase-2 TODO

The comment is accurate and self-contained — it names the design document, the specific sections (§5.1, §10), and explains the zero Reads/ReadBytes series without leaving it ambiguous whether that is a bug or an intent. One minor style note (carry-over from the previous round):

TODO(keyviz Phase 2) is non-standard Go TODO style — the common idiom is TODO(username) or TODO(issue#). This won't cause a linting failure, but if you grep for open TODOs later a standardised tag is easier to filter. Low priority, not a blocker.


Remaining gap: withFlags helper doesn't cover keyvizHistoryColumns

main_keyviz_test.go:94–117 saves and restores keyvizEnabled, keyvizStep, keyvizMaxTrackedRoutes, and keyvizMaxMemberRoutesPerSlot — but not the new keyvizHistoryColumns. No current test in main_keyviz_test.go sets *keyvizHistoryColumns, so no existing test is broken by this. The risk is that a future test modifying the flag through withFlags will leak state across parallel tests. Fixing it now while the context is fresh is preferable to debugging a subtle test flake later: Fix this →

The end-to-end forwarding path (*keyvizHistoryColumnsbuildKeyVizSamplerMemSamplerOptions.HistoryColumns) is also not directly tested via main_keyviz_test.go — a one-liner assertion inside TestBuildKeyVizSamplerHonorsEnabledFlag (set the flag to a non-default value, verify s.opts.HistoryColumns matches) would lock down this wiring. Not blocking, but consistent with the project convention that new flag wiring has a forwarding test.


| Branch: feat/keyviz-main-wiring-followup

Round-2 review fix for PR #651 (Claude bot minor):

withFlags helper saved/restored every keyviz flag except the new
--keyvizHistoryColumns, so a future test mutating the flag through
the helper would leak state across parallel tests. Add the slot,
update both call sites, and pin the flag → buildKeyVizSampler →
MemSamplerOptions.HistoryColumns forwarding by asserting
s.HistoryColumns() in TestBuildKeyVizSamplerHonorsEnabledFlag.

Also adds a HistoryColumns() accessor on *MemSampler paralleling
Step() so wiring tests don't need to reach into the unexported opts
struct.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 26, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @bootjp's task in 2m 4s —— View job


Code Review — PR #651

Todo list:

  • Gather context and read PR description
  • Read all changed files
  • Verify withFlags helper coverage for keyvizHistoryColumns
  • Check MaxHistoryColumns clamp implementation and tests
  • Check forwarding path test coverage
  • Post final review

Overall verdict: Approve

All three issues raised in the previous two review rounds have been closed out. The PR is clean.


Round-2 gaps — now closed

withFlags helper coverage (main_keyviz_test.go:98–124) — fixed. The helper now accepts historyColumns int as a 5th parameter, saves prevHistoryColumns := *keyvizHistoryColumns, sets the flag, and restores it in the defer. Both call sites pass an explicit value (16 in the enabled branch, default in others). No state will leak across parallel tests.

Forwarding path assertion (main_keyviz_test.go:30) — fixed. The test now calls:

require.Equal(t, 16, s.HistoryColumns())

This pins the wiring *keyvizHistoryColumns → buildKeyVizSampler → MemSamplerOptions.HistoryColumns end-to-end. The HistoryColumns() accessor (sampler.go:781) is nil-safe and exposes the post-default, post-clamp value, so the assertion correctly reflects what the ring buffer actually uses.


keyviz/sampler.goMaxHistoryColumns clamp (commit 5bddeba)

const MaxHistoryColumns = 100_000

if opts.HistoryColumns > MaxHistoryColumns {
    opts.HistoryColumns = MaxHistoryColumns
}

Correct placement: sits after the <= 0 defaulting block and before the first allocation. The doc comment explains the 70-day upper bound and why longer retention belongs to the Phase 3 persistence path. The HistoryColumns() accessor mirrors the nil-safe pattern of the existing Step() accessor.

TestHistoryColumnsClampedAboveMax (sampler_test.go:878–895) exercises both branches: 10× MaxHistoryColumns (clamped to cap) and MaxHistoryColumns exactly (preserved). Both critical boundary conditions covered.


kv/sharded_coordinator.go — Phase-2 TODO

The comment accurately describes what is deferred, cites the design sections (§5.1, §10), and makes explicit that zero Reads/ReadBytes is intentional, not a regression. One carry-over style note:

TODO(keyviz Phase 2) is non-standard Go idiom (conventional form is TODO(username) or TODO(issue#)). Not a linter failure, but worth standardising if you ever grep open TODOs by tag. Low priority.


main.go — nil guard and flag wiring

startKeyVizFlusher early-return on s == nil is correct and the comment accurately explains why the prior goroutine was dropped. Flag default matches keyviz.DefaultHistoryColumns (1440) and the description text is accurate.


All items from the previous two rounds are resolved. No blocking issues remain.
| Branch: feat/keyviz-main-wiring-followup

@bootjp bootjp enabled auto-merge April 26, 2026 09:47
@bootjp bootjp merged commit cc7108f into main Apr 26, 2026
8 checks passed
@bootjp bootjp deleted the feat/keyviz-main-wiring-followup branch April 26, 2026 09:49
bootjp added a commit that referenced this pull request Apr 26, 2026
## Summary

Phase 2-B of the Key Visualizer design
(`docs/admin_ui_key_visualizer_design.md`): integrate the heatmap into
the existing `web/admin/` SPA rather than building a parallel dashboard.
Doc + implementation in one PR (doc commit first, per CLAUDE.md
design-doc-first workflow).

- New page at `/keyviz` — canvas heatmap polling
`/admin/api/v1/keyviz/matrix` with series picker (writes / reads /
write_bytes / read_bytes), row-budget input (clamped at 1024), and off /
5 s / 30 s auto-refresh.
- Cold cells (value 0) render as the page background, not a faint blue —
spotting actually-cold routes stays the dominant visual signal.
- Row-detail flyout on hover: bucket_id, start, end, aggregate,
route_count, route_ids (with truncation indicator). `start` / `end`
decode through a printable-or-hex preview so binary keys do not render
as mojibake.
- Dependency-free: hand-rolled five-stop colour ramp in
`lib/colorRamp.ts`. No d3, no ECharts. Bundle grew from ~155 kB to ~208
kB (raw), 64 kB gzipped.

Backend is unchanged. The handler at `/admin/api/v1/keyviz/matrix` and
its sampler wiring already shipped under Phase 2-A (PRs #639 / #645 /
#646 / #647 / #651 / #660 / #661 / #672).

## Design

`docs/design/2026_04_27_proposed_keyviz_spa_integration.md` — proposed
status, lands with this PR. Parent design §12 phase table is split into
2-A (server, shipped) / 2-B (SPA, this PR) / 2-C (cluster fan-out,
open).

## Five-lens self-review

1. **Data loss** — n/a; SPA is read-only against an existing handler.
2. **Concurrency / distributed** — n/a; single browser tab polling a
single handler. Sampler concurrency was already covered by Phase 2-A
tests.
3. **Performance** — Canvas + per-cell `fillRect` runs under §10 budget
at 1024 × 500. Auto-refresh defaults off; 5-second cadence is the lower
bound (sampler flush is 1 s).
4. **Data consistency** — SPA renders whatever the handler returns;
consistency guarantees come from the existing leader-issued counters in
the sampler.
5. **Test coverage** — `tsc -b --noEmit` clean; `vite build` clean; `go
build ./internal/admin/...` clean (embed glob unaffected); `go test
./internal/admin/...` clean. Manual verification documented in the
design doc §5.

## Test plan

- [x] `npm run lint` (`tsc -b --noEmit`) — clean
- [x] `npm run build` (Vite) — clean, output goes to
`internal/admin/dist`
- [x] `go build ./internal/admin/...` — clean
- [x] `go test ./internal/admin/...` — clean
- [ ] Manual: `make run` + `make client`, navigate to `/keyviz`, see hot
routes light up red within ~5 s of write traffic
- [ ] Manual: series picker swaps the displayed counter; row-budget
input clamps at 1024; auto-refresh polls without flicker

## Out of scope

- **Cluster fan-out** — handler is currently node-local. Phase 2-C will
add a cross-node admin RPC; this PR will pick up the aggregate view
automatically once that ships.
- **Drill-down per-route sparkline** — Phase 3.
- **Routes / Raft Groups correlation** — Phase 1 SPA pages not yet
built; correlation lands when those pages do.
- **`localStorage` for series / rows / refresh** — punt to follow-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant