feat(kv): observe lease + linearizable reads via keyviz Sampler#661
feat(kv): observe lease + linearizable reads via keyviz Sampler#661
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 37 minutes and 48 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Wires read sampling into the shard coordinator's two key-routed read entry points so the heatmap's Reads / ReadBytes series get populated (design §5.1: "Reads are sampled by the node that actually serves the read"). Mirrors the write-side wiring in observeMutation; both feed the same keyviz.Sampler the coordinator already holds. - New observeReadKey helper, called from LinearizableReadForKey and LeaseReadForKey after the route is resolved. - valueLen is always 0 at this layer — the consistency check doesn't fetch data; the actual MVCCStore.GetAt happens further down the stack and is sampled separately. Adapter direct-read paths (Redis / DynamoDB / S3 hitting MVCCStore.GetAt without going through the coordinator) still bypass keyviz; sampling those is task B in the design's Phase 2 follow-up — see the observeReadKey doc comment. - Leadership-check entries (IsLeaderForKey, VerifyLeaderForKey, RaftLeaderForKey) intentionally do NOT call observeReadKey — they're internal routing checks, not user-facing data reads. Tests: - TestShardedCoordinatorObservesLeaseAndLinearizableReads — drives one of each through a real raft + store, asserts exactly one Observe(routeID, OpRead, len(key), 0) per call. - TestShardedCoordinatorSkipsObserveForLeadershipChecks — pins the negative contract that the three leadership checks stay silent.
There was a problem hiding this comment.
Code Review
This pull request implements read-side sampling for key visualization by integrating the observeReadKey helper into the LinearizableReadForKey and LeaseReadForKey methods of the ShardedCoordinator. It also includes unit tests to verify that user-facing reads are correctly recorded while internal leadership checks are ignored. Feedback was provided regarding the performance overhead of redundant routing lookups within the observeReadKey helper and the potential for typed-nil interface issues when checking the sampler.
| func (c *ShardedCoordinator) observeReadKey(key []byte) { | ||
| if c.sampler == nil { | ||
| return | ||
| } | ||
| route, ok := c.engine.GetRoute(routeKey(key)) | ||
| if !ok { | ||
| return | ||
| } | ||
| c.sampler.Observe(route.RouteID, keyviz.OpRead, len(key), 0) | ||
| } |
There was a problem hiding this comment.
The observeReadKey helper introduces a redundant routing lookup on the read hot path. Since the caller (LinearizableReadForKey or LeaseReadForKey) has already performed a lookup via groupForKey, performing it again here doubles the overhead (lock acquisition and binary search) when sampling is enabled.
Additionally, the c.sampler == nil check does not account for typed-nil interface values (e.g., a *keyviz.MemSampler(nil)), which are explicitly supported and tested. For such cases, the expensive GetRoute lookup is still performed before the sampler's Observe method (which is nil-safe) returns.
While the comment mentions that this keeps the helper nil-safe, it contradicts the goal of keeping the hot path efficient. A better approach would be to refactor the routing logic to return the Route object and pass the RouteID directly to the observation helper, mirroring the efficient pattern used in observeMutation.
References
- For frequently accessed fields and hot paths, prioritize performance by minimizing overhead such as redundant lookups and lock acquisitions.
9bca535 to
254b9cf
Compare
|
@claude review |
|
Claude finished @bootjp's task in 3m 7s —— View job Code Review
The change correctly wires read-side keyviz sampling and the test structure is solid. One structural issue merits addressing before merge, and there are a few smaller observations. Performance: redundant
|
Round-1 review fix for PR #661 (Gemini medium): observeReadKey was repeating the GetRoute lookup that the calling LinearizableReadForKey / LeaseReadForKey had already done via groupForKey — doubling the per-read routing cost when sampling was enabled. Refactor: - New routeAndGroupForKey returns both the route ID and the *ShardGroup in one engine.GetRoute pass; reads use it instead of groupForKey. - observeRead now takes the precomputed routeID directly. The c.sampler == nil guard is the sole short-circuit; we never pay GetRoute for the keyviz call. - Leadership-only entries (IsLeaderForKey, VerifyLeaderForKey, RaftLeaderForKey) keep using groupForKey since they don't need the route ID.
|
@claude review |
|
Claude finished @bootjp's task in 2m 26s —— View job Code Review
The PR is well-structured and the second commit ( Data LossNo issues. Concurrency / Distributed FailuresNo issues. The typed-nil concern Gemini raised in round 1 is not a bug: PerformanceThe second commit resolves the double-lookup. Data ConsistencyNo issues. Test CoverageTests are solid. Issues1. PR/review-tool references in production comments — [fix before merge]
Per CLAUDE.md conventions: "Do not reference the current task, fix, or callers...since those belong in the PR description and rot as the codebase evolves." These references will be meaningless in six months. Replace with just the rationale: // routeAndGroupForKey is groupForKey extended to return the RouteID so
// read entry points that need to observe into keyviz can do so without
// a second GetRoute lookup. Leadership-only callers (IsLeaderForKey /
// VerifyLeaderForKey / RaftLeaderForKey) keep using groupForKey because
// they don't need the route ID.2.
// Counted pre-operation so failed reads (context-cancelled barrier,
// lease expiry) still contribute to offered-load traffic — intentional,
// matching observeMutation's semantics.Minor Observations (not blocking)
Overall: Two commits, clean separation of concerns, the performance concern was resolved before review. The only required fix is the review-tool attribution in comments. Everything else is nitpicks. |
## Summary Phase 2-B of the Key Visualizer design (`docs/admin_ui_key_visualizer_design.md`): integrate the heatmap into the existing `web/admin/` SPA rather than building a parallel dashboard. Doc + implementation in one PR (doc commit first, per CLAUDE.md design-doc-first workflow). - New page at `/keyviz` — canvas heatmap polling `/admin/api/v1/keyviz/matrix` with series picker (writes / reads / write_bytes / read_bytes), row-budget input (clamped at 1024), and off / 5 s / 30 s auto-refresh. - Cold cells (value 0) render as the page background, not a faint blue — spotting actually-cold routes stays the dominant visual signal. - Row-detail flyout on hover: bucket_id, start, end, aggregate, route_count, route_ids (with truncation indicator). `start` / `end` decode through a printable-or-hex preview so binary keys do not render as mojibake. - Dependency-free: hand-rolled five-stop colour ramp in `lib/colorRamp.ts`. No d3, no ECharts. Bundle grew from ~155 kB to ~208 kB (raw), 64 kB gzipped. Backend is unchanged. The handler at `/admin/api/v1/keyviz/matrix` and its sampler wiring already shipped under Phase 2-A (PRs #639 / #645 / #646 / #647 / #651 / #660 / #661 / #672). ## Design `docs/design/2026_04_27_proposed_keyviz_spa_integration.md` — proposed status, lands with this PR. Parent design §12 phase table is split into 2-A (server, shipped) / 2-B (SPA, this PR) / 2-C (cluster fan-out, open). ## Five-lens self-review 1. **Data loss** — n/a; SPA is read-only against an existing handler. 2. **Concurrency / distributed** — n/a; single browser tab polling a single handler. Sampler concurrency was already covered by Phase 2-A tests. 3. **Performance** — Canvas + per-cell `fillRect` runs under §10 budget at 1024 × 500. Auto-refresh defaults off; 5-second cadence is the lower bound (sampler flush is 1 s). 4. **Data consistency** — SPA renders whatever the handler returns; consistency guarantees come from the existing leader-issued counters in the sampler. 5. **Test coverage** — `tsc -b --noEmit` clean; `vite build` clean; `go build ./internal/admin/...` clean (embed glob unaffected); `go test ./internal/admin/...` clean. Manual verification documented in the design doc §5. ## Test plan - [x] `npm run lint` (`tsc -b --noEmit`) — clean - [x] `npm run build` (Vite) — clean, output goes to `internal/admin/dist` - [x] `go build ./internal/admin/...` — clean - [x] `go test ./internal/admin/...` — clean - [ ] Manual: `make run` + `make client`, navigate to `/keyviz`, see hot routes light up red within ~5 s of write traffic - [ ] Manual: series picker swaps the displayed counter; row-budget input clamps at 1024; auto-refresh polls without flicker ## Out of scope - **Cluster fan-out** — handler is currently node-local. Phase 2-C will add a cross-node admin RPC; this PR will pick up the aggregate view automatically once that ships. - **Drill-down per-route sparkline** — Phase 3. - **Routes / Raft Groups correlation** — Phase 1 SPA pages not yet built; correlation lands when those pages do. - **`localStorage` for series / rows / refresh** — punt to follow-up.
Summary
Wires read sampling into the shard coordinator's two key-routed read entry points so the heatmap's
Reads/ReadBytesseries get populated (design §5.1: "Reads are sampled by the node that actually serves the read"). Mirrors the write-sideobserveMutationwiring; both feed the samekeyviz.Samplerthe coordinator already holds.observeReadKeyhelper, called fromLinearizableReadForKeyandLeaseReadForKeyafter the route is resolved.valueLenis always 0 at this layer — the consistency check doesn't fetch data; the actualMVCCStore.GetAthappens further down the stack and is sampled separately. Adapter direct-read paths (Redis / DynamoDB / S3 hittingMVCCStore.GetAtwithout going through the coordinator) still bypass keyviz; sampling those is a future slice — see theobserveReadKeydoc comment.IsLeaderForKey,VerifyLeaderForKey,RaftLeaderForKey) intentionally do NOT callobserveReadKey— they're internal routing checks, not user-facing data reads.Closes the read-side gap noted on the
observeMutationPhase-2 TODO. The matrix'sReadsandReadBytesseries are no longer permanently zero.Test plan
TestShardedCoordinatorObservesLeaseAndLinearizableReads— drives one of each through a real raft + store, asserts exactly oneObserve(routeID, OpRead, len(key), 0)per call.TestShardedCoordinatorSkipsObserveForLeadershipChecks— pins the negative contract that the three leadership checks stay silent.go test -race -count=1 -run TestShardedCoordinator ./kv/...clean.golangci-lint run ./kv/...clean.