docs: add admin UI and key visualizer design by bootjp · Pull Request #545 · bootjp/elastickv

bootjp · 2026-04-19T16:05:20Z

Proposes a standalone cmd/elastickv-admin binary and a TiKV-style key visualizer heatmap. Avoids the Prometheus client dependency in the initial phases by adding an in-process LiveSummary alongside the existing observers, and keeps sampler hot-path overhead below the benchmark noise floor via adaptive 1-in-N sampling with a ≥95% capture SLO.

Summary by CodeRabbit

Documentation
- Added a comprehensive design spec for the admin Web UI and Key Visualizer (architecture, sampling, persistence, UI interactions, phased plan).
New Features
- Standalone admin HTTP server with cluster-overview API, membership discovery, CLI controls, and graceful shutdown.
- Token-protected Admin gRPC API exposing cluster, raft, adapter, key-visualizer, route detail, and live event streams.
- Key Visualizer: sampled heatmap, live column updates, drill-down, uncertainty/hatched rendering, and continuity across splits/merges.
Tests
- Extensive unit and integration tests covering admin HTTP/gRPC, discovery/fanout, auth, client caching, token loading, and startup wiring.
Chores
- Protobuf generation updated to include the new admin service.

Proposes a standalone cmd/elastickv-admin binary and a TiKV-style key visualizer heatmap. Avoids the Prometheus client dependency in the initial phases by adding an in-process LiveSummary alongside the existing observers, and keeps sampler hot-path overhead below the benchmark noise floor via adaptive 1-in-N sampling with a ≥95% capture SLO.

coderabbitai · 2026-04-19T16:05:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new admin subsystem: a standalone elastickv-admin HTTP frontend that fans out to per-node, token-protected, read-only Admin gRPC endpoints; a node-side Admin gRPC server and auth interceptors; new admin protobufs and generation; CLI/server wiring and tests; and a Key Visualizer design document defining sampling, storage, and UI behavior.

Changes

Cohort / File(s)	Summary
Design doc `docs/admin_ui_key_visualizer_design.md`	New comprehensive design for an operator-facing admin Web UI and Key Visualizer: sampler placement, adaptive sampling, leadership semantics, split/merge continuity, phased persistence into Raft groups, namespace isolation, adapter summaries, frontend UX/WS behaviors, testing matrix, and delivery phases.
Admin protobufs & generation `proto/admin.proto`, `proto/Makefile`	Adds `Admin` gRPC service with RPCs (GetClusterOverview, GetRaftGroups, GetAdapterSummary, GetKeyVizMatrix, GetRouteDetail, StreamEvents), messages/enums, and updates `proto/Makefile` to generate Go/gRPC code for `admin.proto`.
Node-side gRPC server `adapter/admin_grpc.go`, `adapter/admin_grpc_test.go`	New `AdminServer`, `AdminGroup` interface and `NodeIdentity`; registration API; `GetClusterOverview` and `GetRaftGroups` implementations with deterministic sorted responses and merged member logic; `AdminTokenAuth` unary/stream interceptors and comprehensive unit tests covering auth, ordering, concurrency, and timing semantics.
Admin CLI / frontend server `cmd/elastickv-admin/main.go`, `cmd/elastickv-admin/main_test.go`	New `elastickv-admin` binary: CLI flags (bind, seeds, token file, TLS/insecure), membership cache with singleflight refresh, concurrent fan-out to node Admin gRPCs with client leasing and eviction, `/api/cluster/overview` and `/healthz` handlers, proto→JSON marshaling with size caps, graceful shutdown, and extensive tests for parsing, token loading, transport credentials, discovery/fanout, caching, and HTTP behavior.
Node integration & flags `main.go`, `main_admin_test.go`	Node startup gains admin token-file / insecure-no-auth flags, token loading and validation, deterministic self-address derivation, conditional per-raft Admin service registration, and interceptor chaining; tests validate wiring, token enforcement, and bootstrap member computation.
Helpers: gRPC & tokens `internal/grpc.go`, `internal/tokenfile.go`, `internal/tokenfile_test.go`	Adds `GRPCCallOptions()` to set per-call max message sizes; `LoadBearerTokenFile` to read/validate token files with max-size enforcement; and tests covering trim, empty, oversize, and missing-file cases.
Proto-generated integration `proto/...`	Build rules updated to include `admin.proto` in Go/GRPC generation.
Minor refactor `internal/raftengine/etcd/fsm_snapshot_file.go`	Replaced hardcoded `.snap` literals with a shared `snapFileExt` constant for snapshot filename parsing/enumeration.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Browser as Admin UI (browser)
    participant Fanout as elastickv-admin (HTTP)
    participant Node as elastickv node (Admin gRPC)
    participant Engine as Raft Group Engine

    Browser->>Fanout: GET /api/cluster/overview
    Fanout->>Node: GetClusterOverview (gRPC, Bearer token)
    Node->>Engine: Configuration()/Status() per registered group
    Engine-->>Node: group status/configuration
    Node-->>Fanout: GetClusterOverviewResponse
    Fanout-->>Browser: aggregated JSON { nodes: [...] }

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through bytes and counted each route,

I sampled warm keys and marked the shout,
Heatmaps hum, tokens guard the gate,
Fans spin out nodes and stitch the state,
A rabbit cheers — clusters look great!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main changes: it describes the primary deliverable (admin UI and key visualizer design) being added to the documentation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/admin_ui_key_visualizer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request proposes a design for a standalone Admin UI and Key Visualizer for Elastickv, utilizing a low-overhead sampler and a Svelte-based frontend to provide cluster insights. The review feedback identifies several critical areas for refinement, including potential concurrency bugs in the sampler's map access, scalability concerns regarding memory usage for high route counts, and the operational risk of centralizing data persistence in the default Raft group. Furthermore, the feedback suggests re-evaluating the mathematical definition of the capture rate SLO, the performance overhead of T-Digests on the hot path, and the necessity of a cluster-wide aggregation strategy to ensure a complete view of traffic.

…aling - Replace lock-free claim with concrete shard-striped map + atomic pointer-swap flush; child counters are installed before RouteID is published so Observe never races with splits/merges (H1). - Distribute KeyViz persistence across the user Raft groups that own each route, with per-group hourly batches, instead of centralising writes on the default group (H2). - Cap tracked routes via --keyvizMaxTrackedRoutes (default 10000) with activity-weighted coarsening of adjacent routes; memory footprint is now bounded independent of total route count (H3). - Redefine the accuracy SLO as plus-or-minus 5 percent relative error at 95 percent CI on bucket totals, since raw capture rate is incompatible with Horvitz-Thompson estimation under stepwise sub-sampling (M1). - Replace the hot-path t-digest in LiveSummary with a fixed-bucket log-linear histogram so every observation is a single atomic add (M2). - Make cluster-wide fan-out the default mode of cmd/elastickv-admin; GetKeyVizMatrix now only serves this node leader-owned routes and the admin binary merges and reports per-node partial-status (M3).

bootjp · 2026-04-19T17:36:58Z

/gemini review

gemini-code-assist · 2026-04-19T17:37:01Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

bootjp · 2026-04-19T18:40:43Z

/gemini review

gemini-code-assist · 2026-04-19T18:40:45Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

bootjp · 2026-04-20T12:04:37Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive design document for a new Admin UI and Key Visualizer for Elastickv, featuring a standalone admin binary and a low-overhead sampling mechanism for traffic heatmaps. The review feedback highlights several critical improvements: ensuring historical data continuity by using stable identifiers for persistence instead of volatile route IDs, optimizing the hot-path sampling logic with a lock-free map to reduce cache line contention, and correcting the mathematical formula for the estimator's relative error to ensure accurate sampling statistics.

bootjp · 2026-04-21T14:30:53Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive design for a built-in Admin UI and a Key Visualizer for Elastickv, featuring a standalone admin binary and a high-performance traffic sampler. The design includes strategies for adaptive sub-sampling, memory management, and a phased implementation. The review feedback identifies several critical design gaps and operational risks, such as potential unbounded data growth due to storage layer limitations, the risk of exposing internal metadata to user operations, and potential data loss during leadership transitions. It also emphasizes the need for unique ID generation and state recovery mechanisms to ensure historical continuity.

gemini-code-assist · 2026-04-21T14:33:13Z

+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local reserved key `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the global user keyspace or default group. `lineageID` is a stable KeyViz identifier stored with `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` metadata; `RouteID` is recorded only as the current routing hint, never as the primary history key.
+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>`. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- Writes are batched hourly per group (not per flush) and dispatched as a single low-priority proposal per group, keeping the write amplification proportional to the group's own traffic.
+- A TTL of 7 days is applied via the existing HLC-based expiry (`store/lsm_store.go:24`).


The design relies on HLC-based expiry for a 7-day TTL, but the pebbleStore.Compact implementation in store/lsm_store.go (lines 1115-1138) only removes old MVCC versions and does not check ExpireAt for the latest version. This poses an operational risk of unbounded data growth; the design should detail mitigation strategies, such as updating the store's GC logic, to ensure data is correctly evicted.

References

When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-21T14:33:13Z

+
+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local reserved key `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the global user keyspace or default group. `lineageID` is a stable KeyViz identifier stored with `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` metadata; `RouteID` is recorded only as the current routing hint, never as the primary history key.


The !admin| prefix for internal KeyViz data is not currently filtered by ShardedCoordinator.maxLatestCommitTS or pebbleStore.ScanAt. This could expose internal metadata to user-plane operations or cause user transactions to receive unnecessarily advanced StartTS values.

bootjp · 2026-04-21T15:04:54Z

/gemini review

gemini-code-assist

Code Review

This pull request outlines the design for a standalone Admin UI and Key Visualizer for Elastickv, introducing a heatmap-based traffic monitor and cluster overview. The design focuses on high-performance sampling and distributed persistence. Feedback highlights the need for earlier authentication to secure node-level metadata, the inclusion of follower-local read sampling to avoid incomplete traffic views, and API enhancements to clarify when route data has been coarsened for the UI.

bootjp · 2026-04-21T15:43:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a design document for an Admin UI and Key Visualizer for Elastickv, proposing a standalone admin binary and a hot-path-safe sampler for per-route traffic. The review feedback identifies several operational risks that require mitigation strategies, such as the I/O overhead of the garbage collection pass, potential collisions in UUIDv7 generation, and the risk of data loss due to hourly batching. Additionally, the reviewer suggests improving data isolation by rejecting user-plane writes to the internal admin namespace and updating store-level metadata checks to support prefix-based filtering.

gemini-code-assist · 2026-04-21T15:50:05Z

+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
+- Writes are batched hourly per group (not per flush) and dispatched as a single low-priority proposal per group, keeping the write amplification proportional to the group's own traffic.
+- Retention is enforced by a KeyViz-specific GC pass, not by assuming ordinary HLC expiry will delete the latest MVCC version. Phase 3 includes either updating `pebbleStore.Compact` to collect latest versions whose `ExpireAt` is past the retention horizon or adding a KeyViz maintenance delete that tombstones expired column and lineage records before compaction. Persistence refuses to enable if this GC capability is absent, avoiding unbounded growth.


The design proposes a KeyViz-specific GC pass that either updates pebbleStore.Compact or adds a maintenance delete. Since KeyViz data can be high-volume, performing a full scan-and-delete GC pass might introduce significant I/O and CPU overhead. As this represents a significant operational risk, the design must detail mitigation strategies per repository guidelines. Leveraging a CompactionFilter to drop expired versions during background compactions is a recommended mitigation to avoid manual scan overhead.

References

When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-21T15:50:05Z

+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. This prevents internal metadata from leaking through scans or advancing user transaction timestamps.
+- `lineageID` is generated as a UUIDv7 using the route transition HLC plus crypto-random entropy, making it cluster-wide unique without coordinating through the default group. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.


The design mentions generating lineageID as a UUIDv7 using the route transition HLC. Truncating the 64-bit HLC to the 48-bit UUIDv7 timestamp field might lead to collisions or loss of ordering precision, which is a significant operational risk. Per repository guidelines, the design must detail mitigation strategies. Please specify how the 64-bit HLC is mapped into the UUIDv7 structure (e.g., by using the physical part for the timestamp and the logical part for the sequence/random bits) to mitigate this risk.

References

When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-21T15:50:05Z

+- `lineageID` is generated as a UUIDv7 using the route transition HLC plus crypto-random entropy, making it cluster-wide unique without coordinating through the default group. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.
+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
+- Writes are batched hourly per group (not per flush) and dispatched as a single low-priority proposal per group, keeping the write amplification proportional to the group's own traffic.


Batching KeyViz writes hourly per group minimizes Raft overhead but introduces a risk of losing up to one hour of historical heatmap data if a node crashes before the batch is persisted. This is a significant operational risk. According to repository rules, the design must detail potential mitigation strategies. For Phase 3, or even as a configurable option now, the design should specify a more frequent persistence interval or a recovery strategy to ensure operational continuity.

References

When a design document identifies a significant operational risk, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-21T15:50:05Z

+
+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. This prevents internal metadata from leaking through scans or advancing user transaction timestamps.


The design specifies filtering !admin|* records from user-plane scans and timestamp selection. To ensure complete isolation and prevent accidental or malicious interference with internal metadata, the data-plane adapters (gRPC, Redis, etc.) should also be updated to reject user-plane write requests (PUT, DELETE) that target keys starting with the !admin| prefix.

gemini-code-assist · 2026-04-21T15:50:05Z

+
+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. This prevents internal metadata from leaking through scans or advancing user transaction timestamps.


In the current implementation of store/lsm_store.go, the isPebbleMetaKey function (line 299) only checks for exact matches with specific metadata keys. To support the !admin|* namespace filtering mentioned in the design, this function (or its equivalent in Phase 3) should be updated to perform prefix checks. This will ensure that nextScannableUserKey and prevScannableUserKey correctly skip all internal KeyViz records during user-plane scans.

- Detail UUIDv7 HLC mapping (physical→ts, logical→rand_a, crypto rand_b). - Use Pebble CompactionFilter as preferred GC path; bounded sweep as fallback. - Replace hourly batch with 5 min default + per-node WAL for crash recovery. - Reject user-plane writes to !admin|* in coordinator; widen isPebbleMetaKey to prefix match.

bootjp · 2026-04-21T19:06:06Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a design for a built-in Admin UI and Key Visualizer for Elastickv, featuring a standalone admin binary and a sampling-based heatmap for traffic analysis. Feedback highlights several technical concerns: the potential for node startup delays due to WAL replay, a security gap where point reads could bypass system namespace isolation, and a risk of ID collisions when mapping 16-bit HLC counters to 12-bit UUIDv7 fields. Additionally, it is recommended to use dynamic membership refresh with caching for the admin binary's fan-out logic to ensure heatmap completeness during cluster scaling.

gemini-code-assist · 2026-04-21T19:07:52Z

+- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, the low 12 bits of the HLC **logical** counter populate the `rand_a` sequence field (preserving per-millisecond ordering for transitions in the same physical tick), and the remaining 62 bits of `rand_b` come from `crypto/rand`. This avoids the collision and lost-ordering risk of naïvely truncating the HLC into the timestamp field. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.
+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
+- Writes are batched per group on a configurable interval (`--keyvizPersistInterval`, **default 5 min**, max 1 h) and dispatched as a single low-priority Raft proposal per group, keeping the write amplification proportional to the group's own traffic. Hourly was rejected as the default because a node crash between flushes would lose up to one hour of heatmap; 5 min bounds worst-case loss while still amortising Raft cost. As a defence-in-depth against single-point loss, each node also keeps the most recent unflushed window in a small **append-only WAL file** (`<dataDir>/keyviz/wal-<hour>.log`) under the same retention contract; on restart the sampler replays the WAL into the in-memory ring buffer, then truncates entries that have since been included in a persisted batch. Operators that want stricter durability set `--keyvizPersistInterval=30s`.


Replaying a 5-minute WAL into the in-memory ring buffer on every restart could significantly delay node startup under high QPS. This presents a significant operational risk, particularly for rolling upgrades where nodes must return to service quickly. Per repository guidelines, the design must detail mitigation strategies, such as more frequent persistence (checkpointing) or a mechanism to bound the WAL size/replay time, to ensure the node remains responsive during recovery.

References

When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.

gemini-code-assist · 2026-04-21T19:07:52Z

+
+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.


The namespace isolation logic for !admin|* keys should also be applied to GetAt and ExistsAt operations in the ShardedCoordinator. Currently, the design specifies filtering for ScanAt, ReverseScanAt, and maxLatestCommitTS, and rejecting user-plane writes. However, if point reads are not also restricted, a user could potentially read internal KeyViz metadata or lineage records by discovering the keys, which undermines the intended isolation of the system namespace.

gemini-code-assist · 2026-04-21T19:07:52Z

+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so `pebbleStore.ScanAt`, `ReverseScanAt`, and `ShardedCoordinator.maxLatestCommitTS` ignore `!admin|*` records for user-plane requests. The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
+- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, the low 12 bits of the HLC **logical** counter populate the `rand_a` sequence field (preserving per-millisecond ordering for transitions in the same physical tick), and the remaining 62 bits of `rand_b` come from `crypto/rand`. This avoids the collision and lost-ordering risk of naïvely truncating the HLC into the timestamp field. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}`; `RouteID` is recorded only as the current routing hint, never as the primary history key.


Mapping the HLC logical counter to the 12-bit rand_a field in UUIDv7 is problematic for high-throughput clusters. Standard HLC implementations often use a 16-bit logical counter to handle high-concurrency bursts within a single millisecond. Truncating this to 12 bits will cause collisions and break the monotonic ordering guarantee for events occurring within the same millisecond if the logical counter exceeds 4095. Since lineageID is used to track continuity across split/merge events, losing ordering could lead to incorrect history reconstruction in the UI.

gemini-code-assist · 2026-04-21T19:07:52Z

+## 13. Open Questions
+
+1. Default value of `--keyvizMaxTrackedRoutes`. 10 000 is conservative; operators with very large clusters may prefer a higher default paired with shorter retention. Settle during Phase 2 benchmarking.
+2. In fan-out (§9.1), should the admin binary **pin** to the seed list or dynamically refresh membership from `GetClusterOverview` on every request? Dynamic is more correct during scale events; pinned is simpler and avoids stampedes on the seed.


Regarding the open question on fan-out membership: dynamic refresh is strongly recommended over pinning to a seed list. Relying on a pinned list will lead to incomplete heatmaps as the cluster scales out or as nodes are replaced. To avoid stampedes on the seed nodes, the admin binary can cache the membership for a short duration (e.g., 10-30 seconds) rather than refreshing on every single request.

- Bound WAL replay: size cap + per-interval checkpoint + shed-on-overflow so rolling restarts stay responsive. - Extend namespace isolation to GetAt/ExistsAt so point reads cannot probe !admin|*. - Use full 16-bit HLC logical when packing UUIDv7 (rand_a + top nibble of rand_b) to keep monotonic ordering. - Resolve fan-out membership open question: dynamic refresh with short-lived cache + lazy invalidation.

bootjp · 2026-04-21T19:12:32Z

/gemini review

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

docs/admin_ui_key_visualizer_design.md (2)
252-262: Fix table formatting in the Integration Points section.

Static analysis flags table column count mismatches at lines 257 and 260. Reviewing the table structure at lines 252-262, the two-column layout (File | Change) has rows where the Change cell contains inline code spans with spaces (e.g., `Put`/`Delete`), which the Markdown linter interprets as additional columns.

To fix the linting warnings and improve readability:

Remove spaces inside inline code spans: `Put`/`Delete` should be `Put` / `Delete` or `Put`/`Delete` (no spaces around the slash).

Ensure each table row has exactly two pipe-separated cells.

Also, the code block at line 167 is missing a language identifier (flagged by MD040). Tag it as text or leave it untagged if it's intentionally pseudo-code:
-```
+```text
 Sampler
  ├─ routes atomic.Pointer[routeTable]
  └─ history *ringBuffer[matrixColumn]
These are minor formatting issues that don't affect technical correctness but improve documentation quality.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/admin_ui_key_visualizer_design.md` around lines 252 - 262, The
Integration Points table has rows where the "Change" column contains inline code
spans that include spaces around slashes which the Markdown linter treats as
extra columns (breaks the two-column `File | Change` table); edit the table so
each row has exactly two pipe-separated cells and normalize inline code spans
like `Put`/`Delete` (remove surrounding spaces or use `Put / Delete`) and any
other backticked phrases to avoid introducing additional pipes, and add a
language tag (e.g., text) to the standalone code block containing the Sampler
snippet (the block starting with "Sampler ├─ routes atomic.Pointer[routeTable]")
so MD040 is resolved.
85-86: Consider documenting token lifecycle and rotation strategy.

The design specifies a shared read-only admin token loaded from --adminTokenFile, but doesn't address token rotation, minimum entropy requirements, or compromise recovery. For production deployment, operators need guidance on:

Token generation (e.g., require minimum 256-bit entropy)

Safe distribution mechanism

Rotation procedure without downtime (e.g., dual-token grace period)

Monitoring for unauthorized access attempts

Add a "Security Considerations" subsection under §4 that specifies:
### 4.2 Token Security

- Generate admin tokens with at least 256 bits of cryptographic entropy (e.g., `openssl rand -base64 32`).
- Distribute via secure channels (sealed secrets, key management systems).
- Support dual-token mode: `--adminTokenFile` may contain multiple newline-separated tokens; nodes accept any valid token, enabling zero-downtime rotation.
- Log failed authentication attempts to the audit stream for monitoring.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/admin_ui_key_visualizer_design.md` around lines 85 - 86, Add a "Security
Considerations" subsection under §4 that documents admin token lifecycle and
rotation for the Admin service (registered alongside RawKV), specifying that
admin tokens loaded via --adminTokenFile (and sent from admin binary using
--nodeTokenFile) must be generated with >=256-bit entropy (example: openssl rand
-base64 32), should be distributed via secure channels (sealed secrets / KMS),
support dual-token mode by allowing --adminTokenFile to contain multiple
newline-separated tokens so nodes accept any valid token for zero-downtime
rotation, require logging of failed authentication attempts to the audit stream,
and include a brief compromise-recovery procedure and guidance on using
--adminInsecureNoAuth only for local development (with startup warning).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/admin_ui_key_visualizer_design.md`:
- Line 161: Clarify how CPU attributed to Observe is measured: update the text
around "sampleRate" and the adaptive controller (and add a brief note in §5.2)
to state the chosen measurement approach explicitly—preferably Option 1
(synthetic cost model), e.g., compute Observe cost as total flush-window
duration or routeCount × benchmarkedCostPerObserve to avoid hot-path profiling;
if Option 2 is chosen instead, state using runtime.SetCPUProfileRate with a low
sampling rate (e.g., ~10Hz) and that attribution is approximate and
low-overhead. Mention both options briefly and mark which one the design uses.
Ensure references to "sampleRate" and "Observe" remain consistent.
- Around line 206-207: The text around lineageID and "route transition HLC" must
explicitly state the timestamp source to ensure leader-issued timestamps are
used: clarify that the "route transition HLC" used to derive lineageID (and
populate validFromHLC/validToHLC) is the Raft-leader-issued HLC/timestamp
attached to the SplitRange/MergeRange Raft proposal (or otherwise derived
deterministically from the Raft log index), not a node-local HLC snapshot;
update the wording for lineageID, validFromHLC, validToHLC and any references to
split/merge events to require leader-issued timestamps or a deterministic
Raft-index-based derivation to avoid violating timestamp safety.
- Around line 206-207: The lineageID UUIDv7 mapping drops the top 4 bits of the
16-bit HLC logical counter (rand_a only takes 12 bits) and uses crypto/rand for
rand_b making IDs non-deterministic across nodes; fix the lineageID generation
by encoding the full 16-bit HLC logical into the UUIDv7 fields (for example,
keep unix_ts_ms from the 48-bit HLC physical, place all 16 logical bits by using
the remaining bits of rand_a plus the high bits of rand_b as a deterministic
prefix) and derive the rest of rand_b deterministically (e.g., from a secure
hash of the HLC or node-specific stable ID + HLC) instead of crypto/rand so
identical HLCs produce identical lineageID values; update the mapping logic used
where lineageID is produced and anywhere rand_a/rand_b are referenced to ensure
deterministic, full-precision encoding.
- Line 143: Add a note in §5.1 or §5.6 describing behavior on leadership loss:
when a node's lease-loss callback for a Raft group fires, the sampler should
mark all leaderWrite samples for that group in the current and previous step
window with staleLeader=true (or optionally clear them immediately), and
document that the admin binary's fan-out/merge logic (§9.1) must exclude
staleLeader samples so they aren't double-counted; reference the sampler,
leaderWrite, staleLeader flag, lease-loss callback, and the admin fan-out merge
in the text.

---

Nitpick comments:
In `@docs/admin_ui_key_visualizer_design.md`:
- Around line 252-262: The Integration Points table has rows where the "Change"
column contains inline code spans that include spaces around slashes which the
Markdown linter treats as extra columns (breaks the two-column `File | Change`
table); edit the table so each row has exactly two pipe-separated cells and
normalize inline code spans like `Put`/`Delete` (remove surrounding spaces or
use `Put / Delete`) and any other backticked phrases to avoid introducing
additional pipes, and add a language tag (e.g., text) to the standalone code
block containing the Sampler snippet (the block starting with "Sampler ├─ routes
atomic.Pointer[routeTable]") so MD040 is resolved.
- Around line 85-86: Add a "Security Considerations" subsection under §4 that
documents admin token lifecycle and rotation for the Admin service (registered
alongside RawKV), specifying that admin tokens loaded via --adminTokenFile (and
sent from admin binary using --nodeTokenFile) must be generated with >=256-bit
entropy (example: openssl rand -base64 32), should be distributed via secure
channels (sealed secrets / KMS), support dual-token mode by allowing
--adminTokenFile to contain multiple newline-separated tokens so nodes accept
any valid token for zero-downtime rotation, require logging of failed
authentication attempts to the audit stream, and include a brief
compromise-recovery procedure and guidance on using --adminInsecureNoAuth only
for local development (with startup warning).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 012e9477-bf60-4387-b012-93c49d94c8eb

📥 Commits

Reviewing files that changed from the base of the PR and between f1dcc14 and 6a88d26.

📒 Files selected for processing (1)

docs/admin_ui_key_visualizer_design.md

gemini-code-assist

Code Review

This pull request introduces a comprehensive design for an Admin UI and Key Visualizer for Elastickv, featuring a standalone binary, a Svelte-based frontend, and a sophisticated sampling and persistence mechanism for traffic heatmaps. The review feedback highlights several critical areas for refinement in the Phase 3 persistence and recovery logic, including the need for a WAL corruption strategy, clarification on how follower-local reads are persisted to avoid data loss during rebalancing, and the necessity of restricting lineage record creation to Raft leaders to prevent conflicting IDs. Additionally, the feedback suggests jittering persistence intervals to prevent cluster-wide load spikes and ensuring that namespace isolation is consistently applied to point reads as well as scans.

gemini-code-assist · 2026-04-21T19:16:52Z

+- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, and the full HLC **logical** counter (Elastickv's HLC uses a 16-bit logical) populates `rand_a` (12 bits) **concatenated** with the top 4 bits of `rand_b`'s per-UUID random payload — i.e. logical bits `[15:4]` → `rand_a`, logical bits `[3:0]` → the top nibble of `rand_b`, so the full 16-bit logical is preserved inside the UUID body. The remaining 58 bits of `rand_b` come from `crypto/rand`. This preserves the HLC monotonic ordering guarantee within a single millisecond (up to 65 535 transitions per ms per node, well beyond the observed split/merge rate) and still gives ~2^58 random bits to keep collision probability negligible. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` with `validFromHLC` carrying the full HLC so the reader can re-sort authoritatively; `RouteID` is recorded only as the current routing hint, never as the primary history key.
+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
+- Writes are batched per group on a configurable interval (`--keyvizPersistInterval`, **default 5 min**, max 1 h) and dispatched as a single low-priority Raft proposal per group, keeping the write amplification proportional to the group's own traffic. Hourly was rejected as the default because a node crash between flushes would lose up to one hour of heatmap; 5 min bounds worst-case loss while still amortising Raft cost. As a defence-in-depth against single-point loss, each node also keeps the most recent unflushed window in a small **append-only WAL file** (`<dataDir>/keyviz/wal-<hour>.log`) under the same retention contract, with two hard bounds to keep restart fast: the WAL is **size-capped at `--keyvizWALMaxBytes` (default 64 MiB)** and **checkpointed every `--keyvizPersistInterval`** — when a batch is persisted to Raft, the corresponding WAL prefix is truncated. This caps worst-case replay at one interval's worth of data (at the default, tens of MiB at most), and a target recovery budget of **≤1 s replay time at 1 M ops/s**. If the WAL exceeds its size cap before the next flush — indicating the node is behind on persistence — the sampler drops the oldest records and records a `keyviz_wal_shed_total` metric instead of blocking the hot path. On startup the sampler fast-loads the WAL without running the adaptive controller, then resumes normal operation; readiness is gated on WAL replay completion so rolling upgrades do not route traffic to a node that is still rebuilding state. Operators that want stricter durability set `--keyvizPersistInterval=30s`; those that want faster restart at the cost of more write amplification set a smaller `--keyvizWALMaxBytes`.


The design gates node readiness on WAL replay completion. While this ensures data continuity, the design should specify how to handle a corrupted WAL file. Without a defined "ignore and continue" or "move aside" strategy, a single corrupted KeyViz WAL segment could prevent a node from joining the cluster or passing health checks, turning a monitoring feature into an availability risk.

References

When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.

gemini-code-assist · 2026-04-21T19:16:52Z

+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.


The design for Phase 3 persistence should clarify whether follower-local read samples are proposed to the Raft leader or stored only in the node's local Pebble instance. Section 5.1 states that pointing the admin binary at a single node produces a "partial view" because reads are spread across followers, which suggests the data is not aggregated in the Raft log. However, Section 5.6 describes persistence via "Raft proposal per group" into a namespace that is part of the FSM. If follower reads are not replicated via Raft, they will be lost if a group is rebalanced to a new node, contradicting the goal of preserving 7 days of history across cluster changes.

gemini-code-assist · 2026-04-21T19:16:52Z

+
+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.


The design states that point reads targeting !admin|* keys should return NotFound. However, the integration points in Section 8 only mention widening the isPebbleMetaKey check for scans. To ensure full isolation, the pebbleStore.getAt and latestCommitTS methods (and their public counterparts) must also be updated to explicitly reject or skip keys with the !admin| prefix, as they do not use the scannable key helpers.

gemini-code-assist · 2026-04-21T19:16:52Z

+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.
+- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, and the full HLC **logical** counter (Elastickv's HLC uses a 16-bit logical) populates `rand_a` (12 bits) **concatenated** with the top 4 bits of `rand_b`'s per-UUID random payload — i.e. logical bits `[15:4]` → `rand_a`, logical bits `[3:0]` → the top nibble of `rand_b`, so the full 16-bit logical is preserved inside the UUID body. The remaining 58 bits of `rand_b` come from `crypto/rand`. This preserves the HLC monotonic ordering guarantee within a single millisecond (up to 65 535 transitions per ms per node, well beyond the observed split/merge rate) and still gives ~2^58 random bits to keep collision probability negligible. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` with `validFromHLC` carrying the full HLC so the reader can re-sort authoritatively; `RouteID` is recorded only as the current routing hint, never as the primary history key.
+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.


The logic for creating new lineage records on startup should be restricted to the Raft leader of the group. If multiple nodes in a group detect a missing lineage record simultaneously and use the proposed UUIDv7 generation (which includes 58 random bits), they will propose conflicting lineage IDs for the same key range. While Raft will linearize these, it results in redundant records and potential confusion in the lineage tree.

gemini-code-assist · 2026-04-21T19:16:52Z

+- `lineageID` is generated as a UUIDv7 derived from the route transition HLC, making it cluster-wide unique without coordinating through the default group. To avoid losing precision, the 64-bit HLC is mapped explicitly: the **physical** part (millisecond-resolution wall clock) populates the 48-bit `unix_ts_ms` timestamp field, and the full HLC **logical** counter (Elastickv's HLC uses a 16-bit logical) populates `rand_a` (12 bits) **concatenated** with the top 4 bits of `rand_b`'s per-UUID random payload — i.e. logical bits `[15:4]` → `rand_a`, logical bits `[3:0]` → the top nibble of `rand_b`, so the full 16-bit logical is preserved inside the UUID body. The remaining 58 bits of `rand_b` come from `crypto/rand`. This preserves the HLC monotonic ordering guarantee within a single millisecond (up to 65 535 transitions per ms per node, well beyond the observed split/merge rate) and still gives ~2^58 random bits to keep collision probability negligible. The lineage record stores `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` with `validFromHLC` carrying the full HLC so the reader can re-sort authoritatively; `RouteID` is recorded only as the current routing hint, never as the primary history key.
+- Split and merge events append small group-local lineage records under `!admin|keyviz|lineage|<lineageID>` and mark closed branches with `validToHLC` so retention GC can later prune them. On split, both children point back to the parent lineage and inherit the parent's compacted history for continuity. On merge, the survivor records both child lineage IDs and the reader sums overlapping intervals. If a node sees historical rows without a lineage record during an upgrade, the admin reader falls back to overlap on the persisted `[start, end)` range before using `RouteID`.
+- On startup, the sampler rebuilds its in-memory `RouteID → lineageID` map by scanning the group-local lineage index for routes currently owned by the node's groups and matching active `[start, end)` ranges from `ListRoutes`. If a route exists without a matching lineage record, the node creates a new lineage record with a parent pointer to the best overlapping retained range. This makes rolling restarts and upgrades preserve historical continuity.
+- Writes are batched per group on a configurable interval (`--keyvizPersistInterval`, **default 5 min**, max 1 h) and dispatched as a single low-priority Raft proposal per group, keeping the write amplification proportional to the group's own traffic. Hourly was rejected as the default because a node crash between flushes would lose up to one hour of heatmap; 5 min bounds worst-case loss while still amortising Raft cost. As a defence-in-depth against single-point loss, each node also keeps the most recent unflushed window in a small **append-only WAL file** (`<dataDir>/keyviz/wal-<hour>.log`) under the same retention contract, with two hard bounds to keep restart fast: the WAL is **size-capped at `--keyvizWALMaxBytes` (default 64 MiB)** and **checkpointed every `--keyvizPersistInterval`** — when a batch is persisted to Raft, the corresponding WAL prefix is truncated. This caps worst-case replay at one interval's worth of data (at the default, tens of MiB at most), and a target recovery budget of **≤1 s replay time at 1 M ops/s**. If the WAL exceeds its size cap before the next flush — indicating the node is behind on persistence — the sampler drops the oldest records and records a `keyviz_wal_shed_total` metric instead of blocking the hot path. On startup the sampler fast-loads the WAL without running the adaptive controller, then resumes normal operation; readiness is gated on WAL replay completion so rolling upgrades do not route traffic to a node that is still rebuilding state. Operators that want stricter durability set `--keyvizPersistInterval=30s`; those that want faster restart at the cost of more write amplification set a smaller `--keyvizWALMaxBytes`.


To avoid synchronized load spikes across the cluster, the 5-minute persistence interval (--keyvizPersistInterval) should be jittered or staggered per Raft group. If all groups on all nodes attempt to propose their compacted KeyViz columns simultaneously, it could lead to transient Raft election timeouts or increased latency on the data plane hot path.

References

When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary "bridge" or "proxy" mode.

…p loss - Leadership loss: mark in-window leaderWrite samples staleLeader=true; fan-out keys writes by (group, term, window) so stale + fresh samples never double-count. - CPU attribution: use synthetic costPerObserveNs × observeCount rather than runtime profiling to keep hot path clean. - lineageID is generated once by the Raft leader as part of the split/merge proposal (leader-issued HLC), deterministically derived from HLC + log index + BLAKE2b so every replica computes the same UUIDv7.

bootjp · 2026-04-21T19:19:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive design document for a new Admin UI and Key Visualizer for Elastickv, featuring a standalone admin binary, a high-performance traffic sampler, and a distributed persistence model. The review feedback highlights several critical areas for improvement: addressing potential performance bottlenecks in the cluster-wide fan-out mechanism as the cluster scales, extending write protections to all internal metadata prefixes (such as _meta_), and mitigating a race condition in the lock-free counter flush logic to ensure data accuracy.

gemini-code-assist · 2026-04-21T19:26:00Z

+
+Because writes are recorded by Raft leaders and follower-local reads are recorded by the followers that serve them (§5.1), pointing the admin binary at a single node produces a **partial heatmap**. To give operators a complete view by default, the admin binary runs in **fan-out mode**:
+
+- `--nodes` accepts a comma-separated list of seed addresses. The admin binary calls `GetClusterOverview` on any reachable seed to discover the current full membership (node → gRPC endpoint, plus per-group leader identity). Membership is cached for `--nodesRefreshInterval` (**default 15 s**) so a stampede of concurrent browser requests hits at most one `GetClusterOverview` per interval per seed, while scale-out events are still reflected within seconds. The cache is refreshed lazily on the first request after expiry and invalidated immediately on any per-node `Unavailable` error, so removed or replaced nodes are dropped on the next request instead of waiting for the next tick.


The cluster-wide fan-out mode for the admin binary (querying every node) may become a performance bottleneck as the cluster scales to hundreds or thousands of nodes. This represents a significant operational risk for large-scale deployments. While acceptable for an MVP, the design must detail potential mitigation strategies for Phase 3, such as querying only the leaders of the relevant Raft groups or implementing a hierarchical aggregation layer to reduce the fan-out pressure.

References

When a design document identifies a significant operational risk, such as performance bottlenecks at scale, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-21T19:26:00Z

+
+Phase 3 persists compacted columns **distributed across the user Raft groups themselves, not the default group**. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:
+
+- Each compacted KeyViz column is written to the **Raft group that owns its key range**, under a group-local admin namespace `!admin|keyviz|range|<lineageID>|<unix-hour>`; the prefix is not routed through the default group. Phase 3 also adds an explicit system-namespace filter so every user-plane read and timestamp-selection path — `pebbleStore.ScanAt`, `ReverseScanAt`, `GetAt`, `ExistsAt`, and `ShardedCoordinator.maxLatestCommitTS` — ignores `!admin|*` records; point reads that target an `!admin|*` key return `NotFound` as if the key did not exist, so an attacker cannot distinguish "hidden" from "missing". The current `isPebbleMetaKey` exact-match check (`store/lsm_store.go:299`) is widened to a prefix check on `!admin|`, and the same check is applied in `nextScannableUserKey` / `prevScannableUserKey` so internal KeyViz records are skipped during user-plane scans. To prevent the inverse leak, every data-plane adapter (gRPC `RawKV`/`TransactionalKV`, Redis, DynamoDB, S3) rejects user-plane writes — `Put`, `Delete`, transactional mutations, and Redis equivalents — whose key starts with `!admin|`. The check is centralised in `kv.ShardedCoordinator` so adapters cannot forget it; a write attempting an `!admin|*` key returns `InvalidArgument` and is recorded in the audit metric.


The design mentions widening isPebbleMetaKey to a prefix check on !admin| to protect internal KeyViz records from user-plane scans. However, the existing meta keys (e.g., _meta_last_commit_ts) are also sensitive and should be protected from user-plane writes. To mitigate the operational risk of metadata corruption, the design should explicitly state that all internal prefixes (including _meta_) must be rejected by the ShardedCoordinator for user-plane writes.

References

When a design document identifies a significant operational risk, such as metadata corruption, it must also detail potential mitigation strategies.

gemini-code-assist · 2026-04-21T19:26:00Z

+- **Current-window counters**: `routes` is an immutable `routeTable` published through `atomic.Pointer[routeTable]`. `routeTable` owns `map[RouteID]*routeSlot`; each `routeSlot` owns an `atomic.Pointer[routeCounters]`. `Observe` loads the current table, performs a plain map lookup against that immutable snapshot, loads the slot's counter pointer, and uses `atomic.AddUint64` on the counter fields. Adding a new `RouteID` or replacing split/merge mappings performs a copy-on-write table update under a non-hot-path `routesMu`, then publishes the new table with one atomic store. No `Observe` call ever runs against a Go map that can be mutated concurrently.
+- **Flush**: instead of holding a long write lock, the flush goroutine **atomically swaps** the `*routeCounters` pointer for each key using `atomic.Pointer[routeCounters]`, then reads the old pointer's frozen counters to build the new matrix column. `Observe` that loaded the old pointer before the swap completes its increments against the (now-retired) old counters, which the next flush will harvest. No counts are lost; at most one step-boundary's worth of counts land in the next column instead of the current one.


The lock-free pointer swap scheme for routeCounters is efficient, but there is a potential race condition during the flush. If the flush goroutine reads the old counters immediately after the swap, it might miss increments from Observe calls that loaded the old pointer just before the swap but haven't executed atomic.AddUint64 yet. To ensure no counts are lost as claimed, the flush mechanism should include a small grace period or a double-buffering strategy to allow pending increments to settle before harvesting the frozen values.

gemini-code-assist

Code Review

This pull request implements Phase 0 of the Elastickv admin Web UI, introducing a standalone elastickv-admin binary and a node-side Admin gRPC service. The changes include cluster membership discovery, connection caching, and token-based authentication for administrative RPCs. Review feedback suggests enhancing operational visibility by logging errors during the closure of network connections and files, and recommends increasing the aggregate response body size limit to support larger cluster configurations.

gemini-code-assist · 2026-04-25T13:07:23Z

+	if f.closed {
+		f.mu.Unlock()
+		_ = conn.Close()
+		return nil, func() {}, errFanoutClosed


The error from conn.Close() is silently ignored here. Per the general rules, errors from Close() methods on resources like network connections should be logged to ensure resource leaks or cleanup problems are visible.

if err := conn.Close(); err != nil { log.Printf("elastickv-admin: close orphaned connection for %s: %v", addr, err) }

References

Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

gemini-code-assist · 2026-04-25T13:07:23Z

+// operators running clusters where the overview legitimately exceeds this
+// can raise the constant. Keep this aligned with handleOverview's fan-out
+// cap so a misbehaving node cannot force unbounded memory growth.
+const maxResponseBodyBytes = 32 << 20


The 32 MiB cap for the aggregated response body might be too restrictive for large clusters. As noted in the comments, a 512-node cluster overview can reach ~75 MiB. Consider increasing this limit or making it a configurable flag to avoid 500 errors on legitimate large responses.

Suggested change

const maxResponseBodyBytes = 32 << 20

const maxResponseBodyBytes = 128 << 20

gemini-code-assist · 2026-04-25T13:07:23Z

+	if err != nil {
+		return "", errors.Wrapf(err, "open %s file", humanName)
+	}
+	defer func() { _ = f.Close() }()


While this is a file resource rather than a network connection, it is still good practice to log errors from Close() to maintain visibility into potential resource leaks or filesystem issues, aligning with the project's general rules for resource cleanup.

Suggested change

defer func() { _ = f.Close() }()

defer func() {

if err := f.Close(); err != nil {

log.Printf("internal: close %s file %s: %v", humanName, abs, err)

}

}()

References

Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4488ba55d0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T13:26:03Z

+		if _, known := addrByID[m.NodeID]; known {
+			continue
+		}
+		addrByID[m.NodeID] = m.GRPCAddress
+		*order = append(*order, m.NodeID)


Drop removed seed members from cluster overview

GetClusterOverview is documented to pick up scale-in events, but mergeSeedMembers re-adds any bootstrap seed whose NodeID is missing from live Raft configuration. In a real scale-in (node removed from all groups), that node stays in Members forever, so the admin fan-out keeps dialing a decommissioned peer and reporting persistent false errors instead of converging to current membership.

Useful? React with 👍 / 👎.

Codex P2 + 3 Gemini M on 14698e8: - (Codex P2) mergeSeedMembers used to re-add any bootstrap seed whose NodeID was missing from live Raft config, so a node that was removed via raft RemoveServer kept appearing in /api/cluster/overview forever and the admin fan-out kept dialing the dead peer. Track which NodeIDs the live configs reported (in any state, even with a blank address), plus an "authoritative" flag set when ≥1 Configuration call succeeded. Seed backfill now: (a) supplies the address when a NodeID was seen but with empty Address (etcd convergence transient); (b) drops removed seeds when the live result is authoritative; (c) falls back to seeds only on cold-start or all-errors. Regression test pins scale-in. - (Gemini M) cmd/elastickv-admin/main.go: log conn.Close() error in installOrAttach's "fanout closed mid-dial" branch instead of swallowing. - (Gemini M) maxResponseBodyBytes raised 32 → 128 MiB to clear the ~75 MiB worst case for a 512-node fan-out (rationale comment is unchanged otherwise). - (Gemini M) internal/tokenfile.go now logs f.Close() errors instead of swallowing in the deferred close. Test refresh: TestGetClusterOverviewReturnsSelfAndLeaders now populates the fakeGroup.Servers with both nodes so the live config is consistent with the bootstrap seed (otherwise the scale-in fix correctly drops the seed since live is "authoritatively empty").

bootjp · 2026-04-25T13:41:55Z

/gemini review @codex review

gemini-code-assist

Code Review

This pull request implements Phase 0 of the Admin UI and Key Visualizer design. It introduces a new standalone binary, cmd/elastickv-admin, which serves as an HTTP gateway that fans out requests to cluster nodes using a newly defined Admin gRPC service. Key features include a connection caching mechanism with eviction logic, bearer token authentication for the gRPC service, and detailed design documentation. My feedback focuses on a performance optimization in the connection eviction path to avoid redundant map allocations while holding a global mutex.

gemini-code-assist · 2026-04-25T13:44:38Z

+	seeds := make(map[string]struct{}, len(f.seeds))
+	for _, s := range f.seeds {
+		seeds[s] = struct{}{}
+	}


The seeds map is rebuilt on every call to evictOneLocked, which occurs under the global fanout.mu lock whenever the client cache is full. Since the seed list is immutable after construction, consider pre-calculating this map in newFanout and storing it in the fanout struct to avoid redundant allocations and work on the eviction path.

Gemini medium on 9485138: evictOneLocked rebuilt a map[string]struct{} of seed addresses on every cache-full eviction call (under fanout.mu). Seeds are immutable after newFanout, so the per-call map allocation was pure waste and slowed down the hot eviction path while holding the global mutex. Cache the set as fanout.seedSet at construction time and reuse it.

bootjp · 2026-04-25T13:47:49Z

/gemini review @codex review

Reviewdog flagged cmd/elastickv-admin/main.go's import block; the local golangci-lint --fix didn't normalise it because of a stale lint cache. Bringing it into shape: bootjp-prefix imports separated into the third group, matching the project gci config used by reviewdog in CI.

gemini-code-assist

Code Review

This pull request implements Phase 0 of the Elastickv Admin UI and Key Visualizer, introducing a node-side Admin gRPC service with token-based authentication and a standalone admin binary for cluster-wide query fan-out. The changes include a connection cache for gRPC clients, membership discovery logic, and detailed design documentation. Feedback suggests several improvements for robustness and security: enforcing request body size limits on all HTTP handlers to prevent DoS, using a worker pool or semaphore to bound concurrent goroutines during gRPC fan-out, and ensuring that background goroutines in configuration polling are properly synchronized or terminated upon context cancellation to avoid resource leaks.

gemini-code-assist · 2026-04-25T13:53:14Z

+// testability and to keep run() under the cyclop budget.
+func buildMux(fan *fanout) *http.ServeMux {
+	mux := http.NewServeMux()
+	mux.HandleFunc("/healthz", func(w http.ResponseWriter, _ *http.Request) {
+		w.WriteHeader(http.StatusOK)
+	})
+	mux.HandleFunc("/api/cluster/overview", fan.handleOverview)
+	mux.HandleFunc("/api/", func(w http.ResponseWriter, _ *http.Request) {
+		writeJSONError(w, http.StatusServiceUnavailable, "endpoint not implemented in phase 0")
+	})
+	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+		if r.URL.Path != "/" {
+			http.NotFound(w, r)
+			return
+		}
+		w.Header().Set("Content-Type", "text/plain; charset=utf-8")
+		_, _ = w.Write([]byte("elastickv-admin: phase 0 — SPA not yet embedded\n"))
+	})
+	return mux


The HTTP server handlers for /healthz, /api/, and / do not currently enforce a request body size limit. While these endpoints are not expected to receive large payloads, applying a global limit via middleware or http.MaxBytesReader in each handler is a recommended security practice to prevent potential Denial of Service (DoS) attacks from oversized request bodies.

References

When handling HTTP requests, always limit the size of the request body (e.g., using http.MaxBytesReader) to prevent Denial of Service attacks from oversized payloads.

gemini-code-assist · 2026-04-25T13:53:14Z

+		wg.Add(1)
+		go func(i int, addr string) {
+			defer wg.Done()
+			entry := perNodeResult{Node: addr}
+			cli, release, err := f.clientFor(addr)
+			if err != nil {
+				entry.Error = err.Error()
+				results[i] = entry
+				return
+			}
+			defer release()
+			resp, err := cli.client.GetClusterOverview(f.outgoingCtx(ctx), &pb.GetClusterOverviewRequest{})
+			if err != nil {
+				if status.Code(err) == codes.Unavailable {
+					f.invalidateClient(addr)
+				}
+				entry.Error = err.Error()
+				results[i] = entry
+				return
+			}
+			data, mErr := marshalProto(resp)
+			if mErr != nil {
+				entry.Error = errors.Wrap(mErr, "marshal response").Error()
+				results[i] = entry
+				return
+			}
+			entry.OK = true
+			entry.Data = data
+			results[i] = entry
+		}(i, addr)


The fan-out logic in handleOverview spawns a goroutine for every target node (up to 512). To prevent unbounded memory growth and potential OOM issues from external requests, apply a fixed bound to these concurrent operations (e.g., using a worker pool or semaphore) and consider rejecting new requests when the bound is reached.

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections or resources that can grow from external requests. Reject new requests when the bound is reached.

gemini-code-assist · 2026-04-25T13:53:14Z

+func fanoutConfigurationCalls(ctx context.Context, groups []groupEntry) []configResult {
+	resultsCh := make(chan configResult, len(groups))
+	for i, entry := range groups {
+		go func(i int, entry groupEntry) {
+			if err := ctx.Err(); err != nil {
+				resultsCh <- configResult{i: i, err: err}
+				return
+			}
+			cfg, err := entry.group.Configuration(ctx)
+			resultsCh <- configResult{i: i, cfg: cfg, err: err}
+		}(i, entry)
+	}
+	got := make([]configResult, 0, len(groups))
+	for range groups {
+		select {
+		case res := <-resultsCh:
+			got = append(got, res)
+		case <-ctx.Done():
+			return got
+		}
+	}
+	return got
+}


In fanoutConfigurationCalls, if the parent ctx is cancelled, the function returns immediately with a partial got slice. However, the goroutines spawned for each group will continue to run in the background until they either finish their gRPC call or observe the cancellation themselves. While the resultsCh is buffered to prevent blocking, these 'orphaned' goroutines still consume resources. Consider using a sync.WaitGroup or a derived context to ensure all spawned goroutines are accounted for or terminated before the function returns, especially if this pattern is reused for more expensive operations.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 968c041af7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T14:06:07Z

+		if !seen && live.authoritative {
+			// Live config is authoritative and doesn't know this node:
+			// it was removed via raft RemoveServer. Skip.
+			continue


Avoid pruning seed nodes after partial config failures

mergeSeedMembers drops any seed node not seen in live config when live.authoritative is true, but collectLiveMembers sets authoritative after any single successful Configuration call. If one group succeeds and another group’s Configuration errors, nodes that are only visible in the failed group are incorrectly treated as removed and omitted from GetClusterOverview, so admin fan-out can stop querying healthy peers. This should only prune unseen seeds when live membership is complete (or otherwise proven authoritative across groups), not on partial results.

Useful? React with 👍 / 👎.

bootjp · 2026-04-25T14:23:51Z

/gemini review

Codex P2 on 9485138: collectLiveMembers flipped live.authoritative on any single successful Configuration call, so mergeSeedMembers would prune a seed NodeID just because at least one group reported its own view — even if other groups errored and the missing peer was only known to a failing group. That regressed admin fan-out under partial config failure. Tighten the contract: authoritative is now true only when every queried group reported successfully (and the fanout did not return early on ctx cancellation, so we counted as many results as groups). Under any partial failure, ctx-cancel, or zero groups, mergeSeedMembers falls back to the seed list; a peer visible only via a failing group stays in the overview. Doc comment updated to document both regressions (round-24 over-pruned; round-22 never pruned). Regression test fires GetClusterOverview with one healthy group and one ctx.DeadlineExceeded group, asserts both the live-known peer and the seed-only peer remain.

bootjp · 2026-04-25T14:26:02Z

/gemini review @codex review

gemini-code-assist

Code Review

This pull request introduces a node-side Admin gRPC service and a standalone elastickv-admin binary for cluster management and traffic visualization, featuring token-based authentication and a connection-cached fan-out mechanism. Review feedback highlighted a bug in the membership authority logic that could lead to erroneous node removal during transient failures and recommended implementing concurrency limits for per-group configuration polling to prevent potential resource exhaustion.

gemini-code-assist · 2026-04-25T14:27:52Z

+	sort.Slice(got, func(a, b int) bool { return got[a].i < got[b].i })
+
+	live := liveMembers{
+		addrByID: map[string]string{},
+		seenID:   map[string]struct{}{},
+		order:    []string{},
+	}
+	successes := 0
+	for _, res := range got {
+		if res.err != nil {


The authoritative flag logic is too aggressive. Currently, it is set to true if at least one Raft group returns its configuration successfully. In a multi-raft system, a single group only knows its own members, not the entire cluster. If some groups fail to respond (e.g., due to a timeout), the aggregated membership list will be incomplete. Marking this partial list as authoritative causes mergeSeedMembers to incorrectly drop seed nodes that might still be active members of the groups that didn't respond. This can lead to nodes being erroneously removed from the cluster overview during transient failures. To ensure the persisted peer list remains complete and consistent, the server must have successfully received responses from all registered groups before claiming authority.

live := liveMembers{ addrByID: map[string]string{}, seenID: map[string]struct{}{}, order: []string{}, authoritative: len(got) == len(groups) && len(groups) > 0, } for _, res := range got { if res.err != nil { live.authoritative = false continue }

References

Raft membership operations must ensure the persisted peer list remains complete and consistent to prevent erroneous node removal during restarts or transient failures.

gemini-code-assist · 2026-04-25T14:27:53Z

+		go func(i int, entry groupEntry) {
+			if err := ctx.Err(); err != nil {
+				resultsCh <- configResult{i: i, err: err}
+				return
+			}
+			cfg, err := entry.group.Configuration(ctx)
+			resultsCh <- configResult{i: i, cfg: cfg, err: err}
+		}(i, entry)
+	}


The fanoutConfigurationCalls function spawns a goroutine for every registered Raft group without any concurrency control. While the number of groups per node is typically manageable, a node participating in a very large number of groups could experience a significant burst of goroutines and concurrent gRPC calls for every GetClusterOverview request. To prevent unbounded resource growth and potential OOM issues, consider using a worker pool or a semaphore to bound the maximum number of concurrent configuration polls, rejecting or queuing new requests when the bound is reached.

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections or processes that can grow from external requests.

Gemini medium on 19a16be: fanoutConfigurationCalls spawned a goroutine per registered Raft group with no concurrency control. A node hosting hundreds of shards would burst that many goroutines + concurrent gRPC calls on every /api/cluster/overview, with no upper bound on goroutine/connection footprint under load. Add a buffered-channel semaphore configFanoutMaxConcurrency=64 so at most 64 Configuration calls run concurrently while still spawning len(groups) goroutines (the rest park on the semaphore acquire). The acquire respects ctx so a cancelled fanout doesn't block on the semaphore. Regression test fires 4×cap groups, holds them mid-call, and asserts the peak in-flight never exceeds the cap. (Skipped Gemini's "authoritative too aggressive" High on the same commit — already addressed in round-26 19a16be itself; the suggested patch matches what's in tree.)

bootjp · 2026-04-25T14:34:18Z

/gemini review @codex review

gemini-code-assist

Code Review

This pull request introduces a new Admin gRPC service and a standalone admin binary (cmd/elastickv-admin) to provide cluster observability, including a key visualizer. The changes include implementing the Admin gRPC service on nodes, adding authentication via bearer tokens, and creating a fan-out mechanism in the admin binary to aggregate cluster-wide data. My feedback suggests making the maxDiscoveredNodes constant configurable to better support larger clusters.

gemini-code-assist · 2026-04-25T14:34:22Z

+	// /api/cluster/overview fan-out dials every discovered node; the
+	// per-conn cache is sized to match so a healthy cluster-wide query
+	// reuses connections instead of thrashing the LRU.
+	maxDiscoveredNodes = 512


The maxDiscoveredNodes constant is set to 512. This fixed bound prevents unbounded memory growth from external requests. Consider making this configurable via a flag for scalability in larger clusters.

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests.

gemini-code-assist

Code Review

This pull request implements Phase 0 of the Elastickv Admin UI, introducing a node-side Admin gRPC service and a standalone elastickv-admin proxy binary. The proxy facilitates cluster-wide data aggregation and membership discovery, serving a JSON API for operational insights while enforcing bearer-token authentication. The PR also includes the design specification for the Key Visualizer and internal helpers for secure token management. Feedback indicates that a custom itoa helper in the test suite should be replaced with the standard library's strconv.Itoa for better maintainability.

gemini-code-assist · 2026-04-25T14:38:28Z

+func strconvItoa(i int) string {
+	if i == 0 {
+		return "0"
+	}
+	var digits []byte
+	for i > 0 {
+		digits = append([]byte{byte('0' + i%10)}, digits...)
+		i /= 10
+	}
+	return string(digits)
+}


This custom implementation of itoa is not necessary. The standard library's strconv.Itoa should be used instead for code clarity, maintainability, and idiomatic Go. The reason given in the comment (to avoid an import) is not a standard Go practice, as strconv is a core package that is very likely already part of the test binary's dependencies.

Two Gemini mediums on 0ba2e96: - maxDiscoveredNodes was a hard-coded const at 512. Add --maxDiscoveredNodes flag (default 512); fanout stores the value as a per-instance maxNodes field and discoveryAccumulator takes it as a constructor arg. Bounds both the per-overview discovery list and the gRPC client cache, so operators with clusters > 512 nodes can raise it without a code change. Values ≤0 fall back to the default to preserve the bound. - Drop the custom strconvItoa test helper; just import strconv.Itoa. The "avoid imports" rationale was thin and the standard library is more idiomatic + maintainable. Test refresh: every newFanout call passes defaultMaxDiscoveredNodes explicitly; membersFrom takes the cap as an arg. References to the old const names in tests are remapped to defaultMaxDiscoveredNodes.

bootjp · 2026-04-25T14:45:09Z

/gemini review @codex review

gemini-code-assist

Code Review

This pull request implements Phase 0 of the Admin UI and Key Visualizer design, introducing a new node-side Admin gRPC service and a standalone elastickv-admin binary. The changes provide cluster membership discovery, Raft group status reporting, and a bearer-token authentication mechanism. Feedback suggests increasing the fan-out concurrency limit to better support larger clusters and refactoring tests to use context-based polling instead of time.Sleep to improve reliability and reduce flakiness.

gemini-code-assist · 2026-04-25T14:48:26Z

+// connection footprint bounded under load. Smaller than maxDiscoveredNodes
+// (per-fanout target cap) on purpose: this is per-RPC concurrency, not
+// total target count.
+const configFanoutMaxConcurrency = 64


The constant configFanoutMaxConcurrency is defined as 64. While maintaining a fixed bound is consistent with repository rules to prevent OOM, this limit may be too restrictive given defaultMaxDiscoveredNodes is 512. Consider increasing this bound to reduce latency in large clusters while still maintaining a fixed memory limit.

References

Apply a fixed bound to collections that can grow from external requests to prevent unbounded memory growth and OOM issues.

gemini-code-assist · 2026-04-25T14:48:26Z

+	t.Parallel()
+	resp := &pb.GetClusterOverviewResponse{
+		Self: &pb.NodeIdentity{GrpcAddress: "self:1"},
+	}


Avoid using time.Sleep for polling in tests as it can be flaky. Instead, use context.Context for managing deadlines and cancellation, and consider centralizing this retry logic into a shared helper function to improve maintainability.

References

Use context.Context for managing deadlines and cancellation instead of separate timeout parameters.

Centralize common complex logic, such as retry/backoff mechanisms, into shared helper functions.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 501b0173bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T14:56:24Z

+	}
+
+	log.Printf("elastickv-admin: all seeds unreachable for membership refresh; falling back to static seed list")
+	return append([]string(nil), f.seeds...)


Enforce maxDiscoveredNodes when falling back to seed targets

When every discovery RPC fails, this path returns the full f.seeds list without applying maxDiscoveredNodes. That bypasses the fan-out bound enforced in membersFrom, so /api/cluster/overview can suddenly launch far more concurrent RPCs than configured (one per seed) during outages or with oversized seed lists, causing avoidable CPU/memory pressure exactly when the system is already degraded. Clamp/dedupe the seed fallback to f.maxNodes before returning.

Useful? React with 👍 / 👎.

Codex P2 on 501b017: when every discovery RPC fails (or the caller context cancels, or refreshMembership returns an unexpected type), currentTargets / refreshMembership returned the raw f.seeds list, bypassing the maxNodes bound that membersFrom otherwise enforces. With an oversized --nodes list (or under outage conditions where the fallback path actually fires), /api/cluster/overview could spawn one RPC per seed entry — exactly when the system is already degraded. Add fanout.seedTargets() that returns a deduplicated copy of f.seeds clamped to f.maxNodes, and replace all three raw-seed-return sites with it. Regression test feeds 9 seeds (1 duplicate) with maxNodes=5 and asserts the result is 5 distinct entries.

bootjp · 2026-04-25T15:11:59Z

/gemini review @codex review

gemini-code-assist · 2026-04-25T15:12:02Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

internal/grpc.go (1)

30-42: Optional: deduplicate the per-call options block.

GRPCDialOptions() (lines 21–29) defines the same WithDefaultCallOptions(MaxCallRecvMsgSize/MaxCallSendMsgSize) block. Consider folding it through the new helper to keep a single source of truth for the message-size cap.

♻️ Proposed refactor

 func GRPCDialOptions() []grpc.DialOption {
 	return []grpc.DialOption{
 		grpc.WithTransportCredentials(insecure.NewCredentials()),
-		grpc.WithDefaultCallOptions(
-			grpc.MaxCallRecvMsgSize(GRPCMaxMessageBytes),
-			grpc.MaxCallSendMsgSize(GRPCMaxMessageBytes),
-		),
+		GRPCCallOptions(),
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/grpc.go` around lines 30 - 42, The per-call message-size options are
duplicated between GRPCCallOptions() and GRPCDialOptions(); refactor so the two
share a single source of truth by having GRPCDialOptions() reuse
GRPCCallOptions() (or extract a tiny helper that returns the grpc.CallOption
slice used by both), ensuring both the MaxCallRecvMsgSize and MaxCallSendMsgSize
(GRPCMaxMessageBytes) are set only once; update GRPCDialOptions() to incorporate
the shared options instead of repeating the WithDefaultCallOptions block.

cmd/elastickv-admin/main.go (1)

694-715: cap shadows the Go builtin; bound check uses f.maxNodes instead of the local.

Two small issues in seedTargets:

cap := f.maxNodes shadows the predeclared cap builtin. golangci-lint's predeclared linter will flag this; the project style is to not shadow builtins.
The slice/map are sized using local cap, but the loop break at line 708 uses f.maxNodes directly. They're equivalent today (newFanout normalizes maxNodes to >0), but the inconsistency is a future-refactor footgun.

♻️ Proposed refactor

 func (f *fanout) seedTargets() []string {
-	cap := f.maxNodes
-	if cap <= 0 {
-		cap = defaultMaxDiscoveredNodes
-	}
-	if len(f.seeds) < cap {
-		cap = len(f.seeds)
-	}
-	out := make([]string, 0, cap)
-	seen := make(map[string]struct{}, cap)
+	limit := f.maxNodes
+	if limit <= 0 {
+		limit = defaultMaxDiscoveredNodes
+	}
+	hint := limit
+	if len(f.seeds) < hint {
+		hint = len(f.seeds)
+	}
+	out := make([]string, 0, hint)
+	seen := make(map[string]struct{}, hint)
 	for _, s := range f.seeds {
 		if _, dup := seen[s]; dup {
 			continue
 		}
-		if len(out) >= f.maxNodes && f.maxNodes > 0 {
+		if len(out) >= limit {
 			break
 		}
 		seen[s] = struct{}{}
 		out = append(out, s)
 	}
 	return out
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/elastickv-admin/main.go` around lines 694 - 715, The variable named cap
in seedTargets shadows the predeclared cap builtin and the loop break uses
f.maxNodes instead of the local computed bound; rename that local to something
like limit (or maxCount), compute limit := f.maxNodes; if limit <= 0 { limit =
defaultMaxDiscoveredNodes }; if len(f.seeds) < limit { limit = len(f.seeds) };
use limit when sizing out and seen and when checking the loop break (e.g., if
len(out) >= limit { break }), keeping all references consistent and avoiding
builtin shadowing.

internal/tokenfile_test.go (1)

10-59: Consider consolidating into a table-driven test.

The four TestLoadBearerTokenFile* functions all set up a temp file then assert (return value, error substring) — a classic table-driven shape. As per coding guidelines: "prefer table-driven test cases".

♻️ Proposed refactor (sketch)

func TestLoadBearerTokenFile(t *testing.T) {
	t.Parallel()
	cases := []struct {
		name        string
		contents    string // empty string + missing=true means no file
		missing     bool
		maxBytes    int64
		want        string
		wantErrSub  string
	}{
		{name: "happy path", contents: "\n  s3cret \n", maxBytes: 4 << 10, want: "s3cret"},
		{name: "rejects empty", contents: "   \n", maxBytes: 4 << 10, wantErrSub: "is empty"},
		{name: "rejects oversize", contents: strings.Repeat("x", 65), maxBytes: 64, wantErrSub: "exceeds maximum"},
		{name: "missing file", missing: true, maxBytes: 4 << 10, wantErrSub: ""}, // any error acceptable
	}
	for _, tc := range cases {
		t.Run(tc.name, func(t *testing.T) {
			t.Parallel()
			path := "/definitely/not/there"
			if !tc.missing {
				path = filepath.Join(t.TempDir(), "tok")
				if err := os.WriteFile(path, []byte(tc.contents), 0o600); err != nil {
					t.Fatal(err)
				}
			}
			got, err := LoadBearerTokenFile(path, tc.maxBytes, "admin token")
			// ... assert tc.want / tc.wantErrSub ...
		})
	}
}

As per coding guidelines: "Unit tests must be co-located with packages in *_test.go files; prefer table-driven test cases".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@internal/tokenfile_test.go` around lines 10 - 59, Consolidate the four tests
into a single table-driven test (e.g., TestLoadBearerTokenFile) that defines
cases for "happy path", "rejects empty", "rejects oversize", and "missing file";
for each case use t.Run and t.Parallel, create the temp file only when missing
is false, call LoadBearerTokenFile(path, maxBytes, "admin token"), and assert
either the expected want string or that the error contains wantErrSub (for
missing-file case accept any non-nil error). Reference the existing test names
(TestLoadBearerTokenFileHappyPath, TestLoadBearerTokenFileRejectsEmpty,
TestLoadBearerTokenFileRejectsOversize, TestLoadBearerTokenFileMissingFile) and
the target function LoadBearerTokenFile to locate and replace the current tests
with the single table-driven implementation.

adapter/admin_grpc.go (1)

162-209: Doc comments for collectLiveMembers and fanoutConfigurationCalls are misattributed.

The two narrative blocks are contiguous with the doc comments for the next declaration, so godoc/golint will attach them to the wrong symbol:

Lines 162–179 (about collectLiveMembers) flow without a blank line into 180–182, so the entire 162–182 block becomes the doc for type configResult (and collectLiveMembers at line 264 ends up with no doc).

Lines 188–199 (about fanoutConfigurationCalls) flow without a blank line into 200–206, so the entire 188–206 block becomes the doc for const configFanoutMaxConcurrency (and fanoutConfigurationCalls at line 209 ends up with no doc).

Move each narrative block to immediately above its target declaration, separated from unrelated declarations by a blank line.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter/admin_grpc.go` around lines 162 - 209, The narrative doc comments are
misattached: move the comment block describing collectLiveMembers so it sits
immediately above the collectLiveMembers function declaration (not above type
configResult) and ensure there's a blank line between that comment and the
unrelated type configResult; likewise move the comment describing
fanoutConfigurationCalls to immediately precede the fanoutConfigurationCalls
function (not above const configFanoutMaxConcurrency) and add a blank line
separating it from the const declaration; this ensures godoc/golint attach docs
to collectLiveMembers and fanoutConfigurationCalls rather than to configResult
or configFanoutMaxConcurrency.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapter/admin_grpc.go`:
- Around line 329-346: In mergeSeedMembers, skip seed entries with an empty
GRPCAddress so a blank seed can't shadow a usable address: when iterating seeds
in mergeSeedMembers, add a guard that continues if m.GRPCAddress == "" (similar
to the srv.Address == "" check in collectLiveMembers) before inserting into
live.addrByID or live.order; keep the existing checks for m.NodeID == "" and
selfID, and preserve behavior around live.seenID and live.authoritative.

In `@cmd/elastickv-admin/main_test.go`:
- Around line 369-411: Test TestFanoutClientForOrphanedDialClosed never hits
installOrAttach's orphan branch (the shouldClose := c.conn != conn path); fix by
injecting a test-only dial hook into fanout so clientFor can produce a different
conn and trigger the orphan-close logic. Add a dialDeduped func field to the
fanout type (used by clientFor instead of calling grpc.NewClient directly when
non-nil), update clientFor/installOrAttach to use that hook, and in
TestFanoutClientForOrphanedDialClosed set f.dialDeduped to return a distinct
*grpc.ClientConn (then close it as the test intends) so the installOrAttach
branch where c.conn != conn executes deterministically. Ensure cleanup by
calling rel1() and asserting no panic/leak.

In `@cmd/elastickv-admin/main.go`:
- Around line 1-1024: Replace the std log usage with slog: swap the import "log"
for "log/slog" and update every log.Printf/log.Printf-like call to structured
slog calls (slog.Info/Warn/Error) across the file (examples to change: run’s
startup log, Close, dialDeduped, installOrAttach orphan-close logs,
refreshMembership seed/unreachable logs, membersFrom truncation log,
invalidateClient/invalidate-close, releaseFunc deferred close, writeJSON
encode/write errors, and any other log.Printf uses). For each call, move the
formatted message into a short message string and add structured fields such as
slog.String("addr", addr), slog.String("error", err.Error()) or slog.Any("err",
err), and when logging numeric caps use keys like slog.Int("maxNodes", maxNodes)
or slog.Int("bytes", maxResponseBodyBytes) so logs follow the project’s
structured key conventions.

In `@internal/tokenfile.go`:
- Around line 40-44: Replace the log.Printf call in the deferred close handler
with structured slog logging: when the deferred func that runs cerr := f.Close()
detects an error, call slog.Error (or slog.Log with level Error) and include the
error and structured keys such as "humanName" (humanName) and "path" (abs)
instead of formatting a message string; update the deferred closure around
f.Close() to use those symbols (f, cerr, humanName, abs) to produce structured
logs consistent with the codebase.

In `@main.go`:
- Around line 724-726: The warning for insecure admin mode currently uses
log.Printf; replace that with structured slog logging to match the rest of
main.go: where the flag adminInsecureNoAuth is checked, call slog.Warn (or the
package's logger) instead of log.Printf and include a clear message plus
structured key(s) such as "flag"="adminInsecureNoAuth" and "warning" or "reason"
to maintain consistency with other slog calls (see existing slog usage in
main.go) so the insecure-mode warning is emitted via slog with structured
fields.
- Around line 625-633: The block comment describing setupAdminService is
misplaced above type serversInput; move that entire comment so it sits
immediately above the func setupAdminService(...) declaration (the definition
around line ~697) so godoc links it to the function, and replace or add an
appropriate comment for serversInput where that struct is defined; ensure the
moved comment still begins with "setupAdminService" to satisfy Go doc
conventions.

---

Nitpick comments:
In `@adapter/admin_grpc.go`:
- Around line 162-209: The narrative doc comments are misattached: move the
comment block describing collectLiveMembers so it sits immediately above the
collectLiveMembers function declaration (not above type configResult) and ensure
there's a blank line between that comment and the unrelated type configResult;
likewise move the comment describing fanoutConfigurationCalls to immediately
precede the fanoutConfigurationCalls function (not above const
configFanoutMaxConcurrency) and add a blank line separating it from the const
declaration; this ensures godoc/golint attach docs to collectLiveMembers and
fanoutConfigurationCalls rather than to configResult or
configFanoutMaxConcurrency.

In `@cmd/elastickv-admin/main.go`:
- Around line 694-715: The variable named cap in seedTargets shadows the
predeclared cap builtin and the loop break uses f.maxNodes instead of the local
computed bound; rename that local to something like limit (or maxCount), compute
limit := f.maxNodes; if limit <= 0 { limit = defaultMaxDiscoveredNodes }; if
len(f.seeds) < limit { limit = len(f.seeds) }; use limit when sizing out and
seen and when checking the loop break (e.g., if len(out) >= limit { break }),
keeping all references consistent and avoiding builtin shadowing.

In `@internal/grpc.go`:
- Around line 30-42: The per-call message-size options are duplicated between
GRPCCallOptions() and GRPCDialOptions(); refactor so the two share a single
source of truth by having GRPCDialOptions() reuse GRPCCallOptions() (or extract
a tiny helper that returns the grpc.CallOption slice used by both), ensuring
both the MaxCallRecvMsgSize and MaxCallSendMsgSize (GRPCMaxMessageBytes) are set
only once; update GRPCDialOptions() to incorporate the shared options instead of
repeating the WithDefaultCallOptions block.

In `@internal/tokenfile_test.go`:
- Around line 10-59: Consolidate the four tests into a single table-driven test
(e.g., TestLoadBearerTokenFile) that defines cases for "happy path", "rejects
empty", "rejects oversize", and "missing file"; for each case use t.Run and
t.Parallel, create the temp file only when missing is false, call
LoadBearerTokenFile(path, maxBytes, "admin token"), and assert either the
expected want string or that the error contains wantErrSub (for missing-file
case accept any non-nil error). Reference the existing test names
(TestLoadBearerTokenFileHappyPath, TestLoadBearerTokenFileRejectsEmpty,
TestLoadBearerTokenFileRejectsOversize, TestLoadBearerTokenFileMissingFile) and
the target function LoadBearerTokenFile to locate and replace the current tests
with the single table-driven implementation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 51324863-667b-4e13-8345-f439d9eda19f

📥 Commits

Reviewing files that changed from the base of the PR and between a126e71 and 492b22d.

⛔ Files ignored due to path filters (1)

proto/admin.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (11)

adapter/admin_grpc.go
adapter/admin_grpc_test.go
cmd/elastickv-admin/main.go
cmd/elastickv-admin/main_test.go
docs/admin_ui_key_visualizer_design.md
internal/grpc.go
internal/tokenfile.go
internal/tokenfile_test.go
main.go
main_admin_test.go
proto/admin.proto

🚧 Files skipped from review as they are similar to previous changes (2)

adapter/admin_grpc_test.go
main_admin_test.go

coderabbitai · 2026-04-25T15:18:05Z

+func mergeSeedMembers(seeds []NodeIdentity, selfID string, live *liveMembers) {
+	for _, m := range seeds {
+		if m.NodeID == "" || m.NodeID == selfID {
+			continue
+		}
+		if _, hasAddr := live.addrByID[m.NodeID]; hasAddr {
+			continue
+		}
+		_, seen := live.seenID[m.NodeID]
+		if !seen && live.authoritative {
+			// Live config is authoritative and doesn't know this node:
+			// it was removed via raft RemoveServer. Skip.
+			continue
+		}
+		live.addrByID[m.NodeID] = m.GRPCAddress
+		live.order = append(live.order, m.NodeID)
+	}
+}


⚠️ Potential issue | 🟡 Minor

Mirror the empty-address skip on the seed path.

collectLiveMembers deliberately drops live entries with srv.Address == "" (line 292) so a blank address can't shadow a usable one. The seed path has no symmetric guard: a seed entry with GRPCAddress == "" (e.g., misconfigured operator input) flows through line 343 and ends up in the response, after which the admin binary would attempt to dial an empty target. Worth a one-line check for safety.

Proposed fix

for _, m := range seeds { if m.NodeID == "" || m.NodeID == selfID { continue } + if m.GRPCAddress == "" { + continue + } if _, hasAddr := live.addrByID[m.NodeID]; hasAddr { continue }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@adapter/admin_grpc.go` around lines 329 - 346, In mergeSeedMembers, skip seed entries with an empty GRPCAddress so a blank seed can't shadow a usable address: when iterating seeds in mergeSeedMembers, add a guard that continues if m.GRPCAddress == "" (similar to the srv.Address == "" check in collectLiveMembers) before inserting into live.addrByID or live.order; keep the existing checks for m.NodeID == "" and selfID, and preserve behavior around live.seenID and live.authoritative.

coderabbitai · 2026-04-25T15:18:05Z

+func TestFanoutClientForOrphanedDialClosed(t *testing.T) {
+	t.Parallel()
+	peer := &fakeAdminServer{members: []string{"m:1"}}
+	addr := startFakeAdmin(t, peer)
+
+	f := newFanout([]string{addr}, "", time.Second, insecure.NewCredentials(), defaultMaxDiscoveredNodes)
+	defer f.Close()
+
+	// First clientFor — installs into cache.
+	c1, rel1, err := f.clientFor(addr)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Hand-craft an orphaned conn by simulating "post-singleflight, cache
+	// already has different conn". We bypass singleflight to deterministically
+	// produce a second *grpc.ClientConn that won't equal c1.conn, then drive
+	// the cache-hit-after-dial branch via a second clientFor call.
+	conn2, err := grpc.NewClient(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
+	if err != nil {
+		t.Fatal(err)
+	}
+	// Stuff conn2 into the singleflight cache slot via the public API path
+	// is not feasible; instead, verify the live path: a second concurrent
+	// clientFor should not leak — the dedup test already covered the
+	// happy path. Here we just assert that the orphan branch's Close()
+	// does not panic on a fresh conn (nil-safe Close behavior on
+	// already-closed conn is what protects the path).
+	if err := conn2.Close(); err != nil {
+		t.Fatalf("conn2.Close: %v", err)
+	}
+
+	rel1()
+	// Second clientFor should still succeed (cache hit) and not panic.
+	c2, rel2, err := f.clientFor(addr)
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer rel2()
+	if c1 != c2 {
+		t.Fatalf("nodeClient pointer mismatch — cache lookup did not return the cached entry")
+	}
+}


⚠️ Potential issue | 🟡 Minor

TestFanoutClientForOrphanedDialClosed does not exercise the orphan branch.

The test name promises coverage of installOrAttach's shouldClose := c.conn != conn path, but the body comment is explicit: "Stuff conn2 into the singleflight cache slot via the public API path is not feasible; instead, verify the live path...". As written, the test:

Calls clientFor(addr) once → installs c1 in cache.

Dials conn2 separately and closes it itself (never touches the fanout).

Calls clientFor(addr) again → simple cache hit, returns c1.

The orphan-conn-close path (line 510–516 in cmd/elastickv-admin/main.go) is never reached. This gives false coverage signal — a regression that leaks the orphaned conn would not fail this test.

Either (a) delete this test (the dedup test at 311–362 already covers the singleflight collapse case, and the race test at 815–859 covers concurrency), or (b) introduce a test-only hook (e.g., a dialDeduped injection point) so the orphan branch can be deterministically driven.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/elastickv-admin/main_test.go` around lines 369 - 411, Test TestFanoutClientForOrphanedDialClosed never hits installOrAttach's orphan branch (the shouldClose := c.conn != conn path); fix by injecting a test-only dial hook into fanout so clientFor can produce a different conn and trigger the orphan-close logic. Add a dialDeduped func field to the fanout type (used by clientFor instead of calling grpc.NewClient directly when non-nil), update clientFor/installOrAttach to use that hook, and in TestFanoutClientForOrphanedDialClosed set f.dialDeduped to return a distinct *grpc.ClientConn (then close it as the test intends) so the installOrAttach branch where c.conn != conn executes deterministically. Ensure cleanup by calling rel1() and asserting no panic/leak.

coderabbitai · 2026-04-25T15:18:06Z

+// Command elastickv-admin serves the Elastickv admin Web UI described in
+// docs/admin_ui_key_visualizer_design.md. Phase 0: token-protected passthrough
+// of Admin.GetClusterOverview at /api/cluster/overview, no SPA yet.
+package main
+
+import (
+	"bytes"
+	"context"
+	"crypto/tls"
+	"crypto/x509"
+	"encoding/json"
+	"flag"
+	"fmt"
+	"log"
+	"net"
+	"net/http"
+	"os"
+	"os/signal"
+	"strings"
+	"sync"
+	"syscall"
+	"time"
+
+	internalutil "github.com/bootjp/elastickv/internal"
+	pb "github.com/bootjp/elastickv/proto"
+	"github.com/cockroachdb/errors"
+	"golang.org/x/sync/singleflight"
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/codes"
+	"google.golang.org/grpc/credentials"
+	"google.golang.org/grpc/credentials/insecure"
+	"google.golang.org/grpc/metadata"
+	"google.golang.org/grpc/status"
+	"google.golang.org/protobuf/encoding/protojson"
+	"google.golang.org/protobuf/proto"
+)
+
+const (
+	defaultBindAddr             = "127.0.0.1:8080"
+	defaultNodesRefreshInterval = 15 * time.Second
+	defaultGRPCRequestTimeout   = 10 * time.Second
+	discoveryRPCTimeout         = 2 * time.Second
+	// discoveryWaitBudget is how long a request handler is willing to wait
+	// for the singleflight membership refresh before falling back to the
+	// cached (or static seed) list. Kept well below defaultGRPCRequestTimeout
+	// so a slow discovery cannot starve the subsequent per-node fan-out.
+	discoveryWaitBudget = 3 * time.Second
+	// membershipRefreshBudget caps the detached background refresh so it
+	// cannot run forever even if every seed is slow. Sized for up to a few
+	// sequential discoveryRPCTimeout attempts before the singleflight
+	// collapses.
+	membershipRefreshBudget = 10 * time.Second
+	readHeaderTimeout       = 5 * time.Second
+	readTimeout             = 30 * time.Second
+	writeTimeout            = 30 * time.Second
+	idleTimeout             = 120 * time.Second
+	shutdownTimeout         = 5 * time.Second
+	maxRequestBodyBytes     = 4 << 10
+	// maxTokenFileBytes caps the admin-token file so a misconfigured path
+	// pointing at a huge file (for example a log) cannot force the admin
+	// process to allocate arbitrary memory before the bearer-token check.
+	maxTokenFileBytes = 4 << 10
+	// defaultMaxDiscoveredNodes is the out-of-the-box cap on the member
+	// list returned by a peer's GetClusterOverview. The runtime value is
+	// the per-fanout maxDiscoveredNodes field, configurable via
+	// --maxDiscoveredNodes; this constant is just the default.
+	//
+	// A single /api/cluster/overview fan-out dials every discovered node
+	// up to this cap; the per-conn cache is sized to match so a healthy
+	// cluster-wide query reuses connections instead of thrashing the LRU.
+	defaultMaxDiscoveredNodes = 512
+)
+
+var (
+	bindAddr             = flag.String("bindAddr", defaultBindAddr, "HTTP bind address for the admin UI")
+	nodes                = flag.String("nodes", "", "Comma-separated list of elastickv node gRPC addresses")
+	nodeTokenFile        = flag.String("nodeTokenFile", "", "File containing the bearer token sent to nodes' Admin service")
+	nodesRefreshInterval = flag.Duration("nodesRefreshInterval", defaultNodesRefreshInterval, "Duration to cache cluster membership before re-fetching")
+	insecureNoAuth       = flag.Bool("adminInsecureNoAuth", false, "Skip bearer token authentication; development only")
+	// Node gRPC is plaintext in Phase 0, so the admin binary defaults to
+	// plaintext too. TLS is opt-in: set --nodeTLSCACertFile (preferred) or
+	// --nodeTLSInsecureSkipVerify to switch to TLS. When the cluster turns
+	// on TLS, operators flip the flag without code changes.
+	nodeTLSCACertFile = flag.String("nodeTLSCACertFile", "", "PEM file with CA certificates used to verify nodes' gRPC TLS; setting this flag enables TLS dialing")
+	nodeTLSServerName = flag.String("nodeTLSServerName", "", "Expected TLS server name when connecting to nodes (overrides the address host); only honoured when TLS is enabled")
+	nodeTLSSkipVerify = flag.Bool("nodeTLSInsecureSkipVerify", false, "Dial nodes with TLS but skip certificate verification; development only. Implies TLS.")
+	allowRemoteBind   = flag.Bool("allowRemoteBind", false, "Allow --bindAddr to listen on a non-loopback interface. The admin UI has no browser-facing auth; set this only when the UI is fronted by an authenticating reverse proxy.")
+	// --maxDiscoveredNodes bounds both the discovery list returned by a
+	// peer's GetClusterOverview and the per-conn client cache. Operators
+	// running clusters larger than the default 512 nodes can raise this;
+	// values ≤0 fall back to the default to avoid disabling the bound.
+	maxDiscoveredNodesFlag = flag.Int("maxDiscoveredNodes", defaultMaxDiscoveredNodes, "Maximum number of cluster nodes the admin binary will fan out to (caps both discovery list size and the gRPC client-conn cache)")
+)
+
+func main() {
+	flag.Parse()
+	if err := run(); err != nil {
+		log.Fatal(err)
+	}
+}
+
+type runConfig struct {
+	seeds []string
+	fan   *fanout
+}
+
+// initRun consolidates flag parsing and fanout construction so run() stays
+// under the project's cyclop budget.
+func initRun() (runConfig, error) {
+	seeds := splitNodes(*nodes)
+	if len(seeds) == 0 {
+		return runConfig{}, errors.New("--nodes is required (comma-separated gRPC addresses)")
+	}
+	token, err := loadToken(*nodeTokenFile, *insecureNoAuth)
+	if err != nil {
+		return runConfig{}, err
+	}
+	if err := validateBindAddr(*bindAddr, *allowRemoteBind); err != nil {
+		return runConfig{}, err
+	}
+	creds, err := loadTransportCredentials(*nodeTLSCACertFile, *nodeTLSServerName, *nodeTLSSkipVerify)
+	if err != nil {
+		return runConfig{}, err
+	}
+	fan := newFanout(seeds, token, *nodesRefreshInterval, creds, *maxDiscoveredNodesFlag)
+	return runConfig{seeds: seeds, fan: fan}, nil
+}
+
+// buildMux wires the Phase 0 HTTP surface. Lives outside run() both for
+// testability and to keep run() under the cyclop budget.
+func buildMux(fan *fanout) *http.ServeMux {
+	mux := http.NewServeMux()
+	mux.HandleFunc("/healthz", func(w http.ResponseWriter, _ *http.Request) {
+		w.WriteHeader(http.StatusOK)
+	})
+	mux.HandleFunc("/api/cluster/overview", fan.handleOverview)
+	mux.HandleFunc("/api/", func(w http.ResponseWriter, _ *http.Request) {
+		writeJSONError(w, http.StatusServiceUnavailable, "endpoint not implemented in phase 0")
+	})
+	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+		if r.URL.Path != "/" {
+			http.NotFound(w, r)
+			return
+		}
+		w.Header().Set("Content-Type", "text/plain; charset=utf-8")
+		_, _ = w.Write([]byte("elastickv-admin: phase 0 — SPA not yet embedded\n"))
+	})
+	return mux
+}
+
+func run() error {
+	cfg, err := initRun()
+	if err != nil {
+		return err
+	}
+	defer cfg.fan.Close()
+
+	srv := &http.Server{
+		Addr:              *bindAddr,
+		Handler:           buildMux(cfg.fan),
+		ReadHeaderTimeout: readHeaderTimeout,
+		ReadTimeout:       readTimeout,
+		WriteTimeout:      writeTimeout,
+		IdleTimeout:       idleTimeout,
+	}
+
+	ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
+	defer cancel()
+
+	errCh := make(chan error, 1)
+	go func() {
+		log.Printf("elastickv-admin listening on %s (seeds=%v)", *bindAddr, cfg.seeds)
+		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
+			errCh <- err
+			return
+		}
+		errCh <- nil
+	}()
+
+	select {
+	case <-ctx.Done():
+		shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), shutdownTimeout)
+		defer shutdownCancel()
+		if err := srv.Shutdown(shutdownCtx); err != nil {
+			return errors.Wrap(err, "shutdown")
+		}
+		return nil
+	case err := <-errCh:
+		return err
+	}
+}
+
+// validateBindAddr rejects a non-loopback bind unless the operator has
+// explicitly opted into --allowRemoteBind. The admin binary performs no
+// browser-side authentication in Phase 0 while holding a privileged node
+// admin token, so a misconfigured 0.0.0.0:8080 would expose that token-gated
+// cluster view to anyone on the network.
+func validateBindAddr(addr string, allow bool) error {
+	if allow {
+		return nil
+	}
+	host, _, err := net.SplitHostPort(addr)
+	if err != nil {
+		return errors.Wrapf(err, "invalid --bindAddr %q", addr)
+	}
+	host = strings.TrimSpace(host)
+	if host == "" {
+		return fmt.Errorf("--bindAddr %q has an empty host; pass an explicit loopback host like 127.0.0.1 or set --allowRemoteBind when fronted by an auth proxy", addr)
+	}
+	ip := net.ParseIP(host)
+	switch {
+	case host == "localhost":
+		return nil
+	case ip != nil && ip.IsLoopback():
+		return nil
+	}
+	return fmt.Errorf("--bindAddr %q is not loopback; set --allowRemoteBind to expose the admin UI remotely (the UI has no browser-side auth — do so only behind an auth proxy)", addr)
+}
+
+func splitNodes(raw string) []string {
+	parts := strings.Split(raw, ",")
+	out := make([]string, 0, len(parts))
+	for _, p := range parts {
+		p = strings.TrimSpace(p)
+		if p != "" {
+			out = append(out, p)
+		}
+	}
+	return out
+}
+
+func loadToken(path string, insecureMode bool) (string, error) {
+	if path == "" {
+		if insecureMode {
+			return "", nil
+		}
+		return "", errors.New("--nodeTokenFile is required; pass --adminInsecureNoAuth for insecure dev mode")
+	}
+	if insecureMode {
+		return "", errors.New("--adminInsecureNoAuth and --nodeTokenFile are mutually exclusive")
+	}
+	tok, err := internalutil.LoadBearerTokenFile(path, maxTokenFileBytes, "admin token")
+	if err != nil {
+		return "", errors.Wrap(err, "load admin token")
+	}
+	return tok, nil
+}
+
+// loadTransportCredentials builds the gRPC TransportCredentials used to dial
+// nodes. Phase 0 nodes expose a plaintext gRPC server, so the default is
+// insecure credentials — if neither --nodeTLSCACertFile nor
+// --nodeTLSInsecureSkipVerify is set, the admin binary dials plaintext.
+// Passing either flag opts into TLS; --nodeTLSServerName is honoured only
+// alongside a TLS opt-in.
+func loadTransportCredentials(
+	caFile, serverName string,
+	skipVerify bool,
+) (credentials.TransportCredentials, error) {
+	tlsRequested := caFile != "" || skipVerify
+	if !tlsRequested {
+		if serverName != "" {
+			return nil, errors.New("--nodeTLSServerName requires TLS; set --nodeTLSCACertFile or --nodeTLSInsecureSkipVerify")
+		}
+		return insecure.NewCredentials(), nil
+	}
+	if caFile != "" && skipVerify {
+		return nil, errors.New("--nodeTLSCACertFile and --nodeTLSInsecureSkipVerify are mutually exclusive")
+	}
+	cfg := &tls.Config{
+		MinVersion:         tls.VersionTLS12,
+		ServerName:         serverName,
+		InsecureSkipVerify: skipVerify, //nolint:gosec // gated behind --nodeTLSInsecureSkipVerify; dev-only.
+	}
+	if caFile != "" {
+		pem, err := os.ReadFile(caFile)
+		if err != nil {
+			return nil, errors.Wrap(err, "read node TLS CA file")
+		}
+		pool := x509.NewCertPool()
+		if !pool.AppendCertsFromPEM(pem) {
+			return nil, errors.New("no certificates parsed from --nodeTLSCACertFile")
+		}
+		cfg.RootCAs = pool
+	}
+	return credentials.NewTLS(cfg), nil
+}
+
+type nodeClient struct {
+	addr   string
+	conn   *grpc.ClientConn
+	client pb.AdminClient
+
+	// refcount, evicted, and closed are protected by fanout.mu. They let the
+	// cache evict entries while RPCs are in flight: eviction removes the
+	// entry from the map and marks it evicted, and the conn is closed only
+	// once the last borrower calls release. closed guards against a second
+	// release on an already-closed client so the public contract (extra
+	// release() calls are no-ops) holds even when refcount transiently
+	// bounces back to zero.
+	refcount int
+	evicted  bool
+	closed   bool
+}
+
+type membership struct {
+	addrs     []string
+	fetchedAt time.Time
+}
+
+type fanout struct {
+	seeds []string
+	// seedSet is a pre-computed lookup over seeds for evictOneLocked's
+	// "skip seed entries" check. Seeds are immutable after construction so
+	// rebuilding the map on every cache-full eviction (under f.mu) is pure
+	// waste — Gemini flagged the per-call allocation.
+	seedSet map[string]struct{}
+	// maxNodes bounds both the per-overview discovery list and the gRPC
+	// client cache. Configurable via --maxDiscoveredNodes; values ≤0 fall
+	// back to defaultMaxDiscoveredNodes so the bound is never disabled.
+	maxNodes        int
+	token           string
+	refreshInterval time.Duration
+	creds           credentials.TransportCredentials
+
+	mu      sync.Mutex
+	clients map[string]*nodeClient
+	members *membership
+	closed  bool
+
+	// refreshGroup deduplicates concurrent membership refresh RPCs so a burst
+	// of browser requests immediately after cache expiry collapses into a
+	// single GetClusterOverview call against one seed.
+	refreshGroup singleflight.Group
+
+	// dialGroup deduplicates concurrent grpc.NewClient calls for the same
+	// address. Without it, N goroutines that all miss the cache for the
+	// same addr would each run a parallel dial (DNS/parsing/setup); only
+	// one is kept. With singleflight, only one dial runs and every waiter
+	// gets the same *grpc.ClientConn — refcount is bumped per waiter
+	// before they each return.
+	dialGroup singleflight.Group
+}
+
+// errFanoutClosed is returned by clientFor when Close has already run, so
+// callers can treat it as a graceful shutdown signal instead of bubbling up as
+// a generic map-panic.
+var errFanoutClosed = errors.New("admin fanout is closed")
+
+func newFanout(
+	seeds []string,
+	token string,
+	refreshInterval time.Duration,
+	creds credentials.TransportCredentials,
+	maxNodes int,
+) *fanout {
+	if refreshInterval <= 0 {
+		refreshInterval = defaultNodesRefreshInterval
+	}
+	if creds == nil {
+		creds = insecure.NewCredentials()
+	}
+	if maxNodes <= 0 {
+		maxNodes = defaultMaxDiscoveredNodes
+	}
+	seedSet := make(map[string]struct{}, len(seeds))
+	for _, s := range seeds {
+		seedSet[s] = struct{}{}
+	}
+	return &fanout{
+		seeds:           seeds,
+		seedSet:         seedSet,
+		maxNodes:        maxNodes,
+		token:           token,
+		refreshInterval: refreshInterval,
+		creds:           creds,
+		clients:         make(map[string]*nodeClient),
+	}
+}
+
+func (f *fanout) Close() {
+	f.mu.Lock()
+	if f.closed {
+		f.mu.Unlock()
+		return
+	}
+	f.closed = true
+	// Shutdown is an intentional cancellation of any in-flight RPCs; close
+	// connections eagerly and let borrowers see the cancel. Borrowers that
+	// still hold leases will observe the conn as closed on their next call.
+	// Mark each client closed inside the lock so the deferred release path
+	// does not attempt a double-close, then collect the *grpc.ClientConn
+	// references and run conn.Close() outside the lock — Close() can do
+	// network I/O and waits for the gRPC client transport to drain, which
+	// would block any concurrent clientFor / invalidateClient / RPC waiting
+	// on f.mu for the entire shutdown window.
+	conns := make([]struct {
+		addr string
+		conn *grpc.ClientConn
+	}, 0, len(f.clients))
+	for _, c := range f.clients {
+		if c.closed {
+			continue
+		}
+		c.closed = true
+		conns = append(conns, struct {
+			addr string
+			conn *grpc.ClientConn
+		}{addr: c.addr, conn: c.conn})
+	}
+	// Replace with an empty map rather than nil so the remaining
+	// closed-guarded accessors can still iterate or lookup without panicking
+	// while still releasing the client references for GC.
+	f.clients = map[string]*nodeClient{}
+	f.mu.Unlock()
+
+	for _, e := range conns {
+		if err := e.conn.Close(); err != nil {
+			log.Printf("elastickv-admin: close gRPC connection to %s: %v", e.addr, err)
+		}
+	}
+}
+
+// clientFor returns a leased nodeClient that callers must release once they
+// finish the RPC (release is the second return value, always non-nil and safe
+// to call). The cache is bounded by maxCachedClients; if the cache is full,
+// one entry is evicted — prefer non-seed victims, fall back to any entry when
+// the cache is saturated with seeds. Evicted entries stop accepting new leases
+// but their underlying *grpc.ClientConn is kept alive until every outstanding
+// borrower has released; this prevents an eviction from canceling a healthy
+// concurrent GetClusterOverview.
+//
+// The dial step (grpc.NewClient) runs outside f.mu through a singleflight
+// keyed by addr — concurrent dials for the same addr collapse into one,
+// avoiding wasted DNS/parsing work plus the post-dial close-the-loser
+// dance. NewClient itself is non-blocking but parses the target and may
+// trigger synchronous DNS depending on resolver config, so holding the
+// global mutex for that wall-clock time would serialize concurrent
+// clientFor calls for distinct addrs.
+func (f *fanout) clientFor(addr string) (*nodeClient, func(), error) {
+	if c, release, err, ok := f.cacheLookup(addr); ok {
+		return c, release, err
+	}
+	conn, err := f.dialDeduped(addr)
+	if err != nil {
+		return nil, func() {}, err
+	}
+	return f.installOrAttach(addr, conn)
+}
+
+// cacheLookup returns (client, release, err, true) when either the cache hit
+// or the fanout-closed branch fires; the caller can short-circuit. Returns
+// (_,_,_,false) when the caller still needs to dial.
+func (f *fanout) cacheLookup(addr string) (*nodeClient, func(), error, bool) {
+	f.mu.Lock()
+	if f.closed {
+		f.mu.Unlock()
+		return nil, func() {}, errFanoutClosed, true
+	}
+	if c, ok := f.clients[addr]; ok {
+		c.refcount++
+		release := f.releaseFunc(c)
+		f.mu.Unlock()
+		return c, release, nil, true
+	}
+	f.mu.Unlock()
+	return nil, nil, nil, false
+}
+
+// dialDeduped runs grpc.NewClient inside the dialGroup singleflight so
+// concurrent first-time dials for addr collapse to one conn.
+func (f *fanout) dialDeduped(addr string) (*grpc.ClientConn, error) {
+	v, err, _ := f.dialGroup.Do(addr, func() (any, error) {
+		return grpc.NewClient(
+			addr,
+			grpc.WithTransportCredentials(f.creds),
+			internalutil.GRPCCallOptions(),
+		)
+	})
+	if err != nil {
+		return nil, errors.Wrapf(err, "dial %s", addr)
+	}
+	conn, ok := v.(*grpc.ClientConn)
+	if !ok {
+		return nil, fmt.Errorf("dial %s: unexpected singleflight value type %T", addr, v)
+	}
+	return conn, nil
+}
+
+// installOrAttach installs the just-dialed conn into the cache or, if a
+// concurrent waiter beat us to it, takes a lease on the existing entry and
+// closes the orphaned conn (when its pointer differs from the cached entry).
+func (f *fanout) installOrAttach(addr string, conn *grpc.ClientConn) (*nodeClient, func(), error) {
+	f.mu.Lock()
+	if f.closed {
+		f.mu.Unlock()
+		if err := conn.Close(); err != nil {
+			log.Printf("elastickv-admin: close orphaned dial for %s after shutdown: %v", addr, err)
+		}
+		return nil, func() {}, errFanoutClosed
+	}
+	// If another waiter already installed a cache entry, take a lease on it.
+	// Two cases: (a) singleflight collapsed concurrent dials so the cached
+	// entry's conn IS this conn (same pointer) — must NOT Close it because
+	// the cache holds the only reference; (b) a non-concurrent earlier dial
+	// installed a different conn before our Do call — our just-dialed conn
+	// is orphaned and must be closed to avoid leaking fds/goroutines.
+	if c, ok := f.clients[addr]; ok {
+		c.refcount++
+		release := f.releaseFunc(c)
+		shouldClose := c.conn != conn
+		f.mu.Unlock()
+		if shouldClose {
+			if err := conn.Close(); err != nil {
+				log.Printf("elastickv-admin: close orphaned dial for %s: %v", addr, err)
+			}
+		}
+		return c, release, nil
+	}
+	var evicted *grpc.ClientConn
+	if len(f.clients) >= f.maxNodes {
+		evicted = f.evictOneLocked()
+	}
+	c := &nodeClient{addr: addr, conn: conn, client: pb.NewAdminClient(conn), refcount: 1}
+	f.clients[addr] = c
+	release := f.releaseFunc(c)
+	f.mu.Unlock()
+	if evicted != nil {
+		if err := evicted.Close(); err != nil {
+			log.Printf("elastickv-admin: evict-close: %v", err)
+		}
+	}
+	return c, release, nil
+}
+
+// releaseFunc returns the closer used to drop a lease. On the last release
+// of an evicted client the underlying connection is finally closed; that
+// Close() runs after f.mu is dropped because grpc.ClientConn.Close can do
+// network I/O and waits for the transport to drain — holding the global
+// fanout mutex across that would block any concurrent clientFor /
+// invalidateClient / RPC waiting on f.mu. Extra release() calls after the
+// conn is already closed are safe no-ops.
+func (f *fanout) releaseFunc(c *nodeClient) func() {
+	return func() {
+		f.mu.Lock()
+		if c.refcount > 0 {
+			c.refcount--
+		}
+		var toClose *grpc.ClientConn
+		if c.refcount == 0 && c.evicted && !c.closed {
+			c.closed = true
+			toClose = c.conn
+		}
+		f.mu.Unlock()
+
+		if toClose == nil {
+			return
+		}
+		if err := toClose.Close(); err != nil {
+			log.Printf("elastickv-admin: deferred close for %s: %v", c.addr, err)
+		}
+	}
+}
+
+// evictOneLocked removes exactly one entry from f.clients. Prefers non-seed
+// entries; falls back to any entry if none are eligible (for example when
+// len(seeds) >= maxCachedClients). Returns the *grpc.ClientConn that needs
+// closing (or nil if the entry has outstanding leases or was already
+// closed) — caller must run Close() outside f.mu. Closing is deferred to
+// the last release (see releaseFunc) when leases are still held.
+func (f *fanout) evictOneLocked() *grpc.ClientConn {
+	var fallback string
+	var fallbackClient *nodeClient
+	for victim, vc := range f.clients {
+		if fallback == "" {
+			fallback, fallbackClient = victim, vc
+		}
+		if _, keep := f.seedSet[victim]; keep {
+			continue
+		}
+		return f.retireLocked(victim, vc)
+	}
+	if fallbackClient != nil {
+		return f.retireLocked(fallback, fallbackClient)
+	}
+	return nil
+}
+
+// retireLocked removes a client from the cache and, if no lease is currently
+// held, marks it for closing. Returns the connection that needs to be closed
+// (or nil) so the caller can run conn.Close() outside f.mu — Close() blocks
+// on transport teardown and must not run with the global fanout mutex held.
+// Otherwise the connection stays open until the last borrower releases, so
+// an evicted entry never cancels an in-flight RPC. Idempotent — double-retiring
+// or retiring after the last release is a no-op. Caller must hold f.mu.
+func (f *fanout) retireLocked(addr string, c *nodeClient) *grpc.ClientConn {
+	delete(f.clients, addr)
+	if c.evicted {
+		return nil
+	}
+	c.evicted = true
+	if c.refcount > 0 || c.closed {
+		return nil
+	}
+	c.closed = true
+	return c.conn
+}
+
+// invalidateClient drops a cached connection — used when a peer returns
+// Unavailable so the next request re-dials or skips the removed node. The
+// connection stays open until the last borrower releases, so invalidating
+// does not cancel other goroutines' in-flight RPCs.
+func (f *fanout) invalidateClient(addr string) {
+	f.mu.Lock()
+	if f.closed {
+		f.mu.Unlock()
+		return
+	}
+	f.members = nil
+	var toClose *grpc.ClientConn
+	if c, ok := f.clients[addr]; ok {
+		toClose = f.retireLocked(addr, c)
+	}
+	f.mu.Unlock()
+
+	if toClose != nil {
+		if err := toClose.Close(); err != nil {
+			log.Printf("elastickv-admin: invalidate %s: close: %v", addr, err)
+		}
+	}
+}
+
+func (f *fanout) outgoingCtx(parent context.Context) context.Context {
+	if f.token == "" {
+		return parent
+	}
+	return metadata.AppendToOutgoingContext(parent, "authorization", "Bearer "+f.token)
+}
+
+// currentTargets returns the list of node addresses to fan out to. If the
+// membership cache is fresh it is returned directly; otherwise the admin binary
+// queries seeds via GetClusterOverview and caches the resulting member list
+// for refreshInterval. Concurrent refreshes are collapsed through singleflight
+// so a burst of requests after cache expiry hits only one seed. The shared
+// refresh runs on a detached background context bounded by
+// membershipRefreshBudget so one caller canceling (e.g., browser tab close)
+// does not abort the work for every other concurrent waiter. On total failure
+// the admin binary falls back to the static seed list so a single unreachable
+// seed does not take the admin offline.
+func (f *fanout) currentTargets(ctx context.Context) []string {
+	f.mu.Lock()
+	if f.members != nil && time.Since(f.members.fetchedAt) < f.refreshInterval {
+		addrs := append([]string(nil), f.members.addrs...)
+		f.mu.Unlock()
+		return addrs
+	}
+	f.mu.Unlock()
+
+	ch := f.refreshGroup.DoChan("members", func() (any, error) {
+		bgCtx, cancel := context.WithTimeout(context.Background(), membershipRefreshBudget)
+		defer cancel()
+		return f.refreshMembership(bgCtx), nil
+	})
+	select {
+	case r := <-ch:
+		// refreshMembership always returns a []string today, but explicitly
+		// check the assertion so a future return-type change turns into a
+		// loud, non-panicking fallback to seeds instead of a silent crash.
+		if addrs, ok := r.Val.([]string); ok {
+			return addrs
+		}
+		log.Printf("elastickv-admin: membership refresh returned unexpected type %T; falling back to seeds", r.Val)
+		return f.seedTargets()
+	case <-ctx.Done():
+		// Caller bailed. Give them whatever targets we can assemble without
+		// blocking: the last cached membership if we have one, else seeds.
+		// The detached refresh continues in the background and will populate
+		// the cache for the next request.
+		f.mu.Lock()
+		defer f.mu.Unlock()
+		if f.members != nil {
+			return append([]string(nil), f.members.addrs...)
+		}
+		return f.seedTargets()
+	}
+}
+
+// seedTargets returns a deduplicated copy of f.seeds clamped to f.maxNodes.
+// Callers use it on seed-fallback paths (discovery error, ctx cancel,
+// unexpected refresh result, no cached members yet) so a misconfigured huge
+// --nodes list never bypasses the fan-out bound that membersFrom otherwise
+// enforces. Codex P2 on 501b0173: previously these paths returned the raw
+// f.seeds, which under outages or oversized seed lists could spawn more
+// concurrent RPCs than configured.
+func (f *fanout) seedTargets() []string {
+	cap := f.maxNodes
+	if cap <= 0 {
+		cap = defaultMaxDiscoveredNodes
+	}
+	if len(f.seeds) < cap {
+		cap = len(f.seeds)
+	}
+	out := make([]string, 0, cap)
+	seen := make(map[string]struct{}, cap)
+	for _, s := range f.seeds {
+		if _, dup := seen[s]; dup {
+			continue
+		}
+		if len(out) >= f.maxNodes && f.maxNodes > 0 {
+			break
+		}
+		seen[s] = struct{}{}
+		out = append(out, s)
+	}
+	return out
+}
+
+// refreshMembership performs the actual discovery RPC. It honours the caller's
+// context for overall cancellation but derives a short per-seed timeout from
+// discoveryRPCTimeout so a slow first seed does not stall the whole request.
+func (f *fanout) refreshMembership(ctx context.Context) []string {
+	for _, seed := range f.seeds {
+		cli, release, err := f.clientFor(seed)
+		if err != nil {
+			log.Printf("elastickv-admin: dial seed %s: %v", seed, err)
+			continue
+		}
+		rpcCtx, cancel := context.WithTimeout(ctx, discoveryRPCTimeout)
+		resp, err := cli.client.GetClusterOverview(f.outgoingCtx(rpcCtx), &pb.GetClusterOverviewRequest{})
+		cancel()
+		release()
+		if err != nil {
+			if status.Code(err) == codes.Unavailable {
+				f.invalidateClient(seed)
+			}
+			log.Printf("elastickv-admin: discover membership via %s: %v", seed, err)
+			continue
+		}
+		addrs := membersFrom(seed, resp, f.maxNodes)
+		f.mu.Lock()
+		f.members = &membership{addrs: addrs, fetchedAt: time.Now()}
+		f.mu.Unlock()
+		return append([]string(nil), addrs...)
+	}
+
+	log.Printf("elastickv-admin: all seeds unreachable for membership refresh; falling back to static seed list")
+	return f.seedTargets()
+}
+
+// membersFrom extracts a deduplicated address list from a cluster overview
+// response, always including the node that answered so the answering seed is
+// still queried even if it omits itself from members. The result is capped at
+// maxDiscoveredNodes so a malicious or misconfigured peer cannot inflate the
+// fan-out.
+//
+// Deduplication keys on NodeID when available, falling back to the raw
+// grpc_address otherwise. This prevents a fan-out from querying the same
+// node twice when the seed address (e.g. "localhost:50051") and the node's
+// self-advertised address (e.g. "127.0.0.1:50051") are different aliases
+// for the same process — Codex flagged that the previous address-only
+// dedup distorted overview results in that case.
+//
+// Initial slice capacity is bounded by maxDiscoveredNodes (rather than
+// len(members)+1) so a misbehaving peer that returns 10× the cap does not
+// force a giant allocation just to truncate immediately afterward.
+func membersFrom(seed string, resp *pb.GetClusterOverviewResponse, maxNodes int) []string {
+	if maxNodes <= 0 {
+		maxNodes = defaultMaxDiscoveredNodes
+	}
+	acc := newDiscoveryAccumulator(len(resp.GetMembers())+1, maxNodes)
+
+	// Add the seed under the responding node's ID so a later entry for that
+	// same NodeID (most likely resp.Self.GrpcAddress, an alias of the seed)
+	// is deduped instead of producing a duplicate fan-out target.
+	self := resp.GetSelf()
+	var selfID string
+	if self != nil {
+		selfID = self.GetNodeId()
+	}
+	acc.add(selfID, seed)
+
+	// When the response advertises a different self.GrpcAddress, only add it
+	// when we have no NodeID to anchor the seed to (legacy nodes); otherwise
+	// the dedup above already covers it.
+	if self != nil && selfID == "" {
+		acc.add("", self.GetGrpcAddress())
+	}
+
+	for _, m := range resp.GetMembers() {
+		acc.add(m.GetNodeId(), m.GetGrpcAddress())
+	}
+	if acc.truncated {
+		log.Printf("elastickv-admin: discovery response exceeded %d nodes; truncating (peer=%s)", maxNodes, seed)
+	}
+	return acc.out
+}
+
+// discoveryAccumulator dedups (NodeID, address) pairs while building the
+// fan-out target list. Extracted from membersFrom so the surrounding loop
+// stays under the cyclop budget. The cap is per-instance (not a package
+// constant) so operators can raise it via --maxDiscoveredNodes for very
+// large clusters.
+type discoveryAccumulator struct {
+	out       []string
+	seenAddr  map[string]struct{}
+	seenID    map[string]struct{}
+	maxNodes  int
+	truncated bool
+}
+
+func newDiscoveryAccumulator(suggestedCap, maxNodes int) *discoveryAccumulator {
+	if maxNodes <= 0 {
+		maxNodes = defaultMaxDiscoveredNodes
+	}
+	if suggestedCap > maxNodes {
+		suggestedCap = maxNodes
+	}
+	return &discoveryAccumulator{
+		out:      make([]string, 0, suggestedCap),
+		seenAddr: map[string]struct{}{},
+		seenID:   map[string]struct{}{},
+		maxNodes: maxNodes,
+	}
+}
+
+// add records a fan-out target keyed by its NodeID (when known) and address.
+// Returns silently when the entry is empty, a duplicate, or would push the
+// list past maxNodes; the caller can read truncated to log a truncation
+// event once.
+func (a *discoveryAccumulator) add(id, addr string) {
+	addr = strings.TrimSpace(addr)
+	if addr == "" {
+		return
+	}
+	if _, dup := a.seenAddr[addr]; dup {
+		return
+	}
+	if id != "" {
+		if _, dup := a.seenID[id]; dup {
+			return
+		}
+	}
+	if len(a.out) >= a.maxNodes {
+		a.truncated = true
+		return
+	}
+	a.seenAddr[addr] = struct{}{}
+	if id != "" {
+		a.seenID[id] = struct{}{}
+	}
+	a.out = append(a.out, addr)
+}
+
+// perNodeResult wraps a fan-out response from one node. Data is stored as
+// json.RawMessage so it can be filled with a protojson-encoded protobuf
+// message — encoding/json would lose the proto3 field-name mapping and
+// well-known-type handling.
+type perNodeResult struct {
+	Node  string          `json:"node"`
+	OK    bool            `json:"ok"`
+	Error string          `json:"error,omitempty"`
+	Data  json.RawMessage `json:"data,omitempty"`
+}
+
+// marshalProto encodes a protobuf message with the JSON mapping that preserves
+// proto3 field names and well-known-type semantics.
+var protoMarshaler = protojson.MarshalOptions{EmitUnpopulated: true, UseProtoNames: false}
+
+func marshalProto(m proto.Message) (json.RawMessage, error) {
+	raw, err := protoMarshaler.Marshal(m)
+	if err != nil {
+		return nil, errors.Wrap(err, "protojson marshal")
+	}
+	return raw, nil
+}
+
+func (f *fanout) handleOverview(w http.ResponseWriter, r *http.Request) {
+	if r.Method != http.MethodGet {
+		writeJSONError(w, http.StatusMethodNotAllowed, "method not allowed")
+		return
+	}
+	r.Body = http.MaxBytesReader(w, r.Body, maxRequestBodyBytes)
+
+	// Split the discovery and per-node fan-out budgets. Reusing one ctx for
+	// both lets a slow membership refresh consume the entire deadline and
+	// leave the fan-out with an already-canceled context, so separate them.
+	discoveryCtx, discoveryCancel := context.WithTimeout(r.Context(), discoveryWaitBudget)
+	targets := f.currentTargets(discoveryCtx)
+	discoveryCancel()
+
+	ctx, cancel := context.WithTimeout(r.Context(), defaultGRPCRequestTimeout)
+	defer cancel()
+	results := make([]perNodeResult, len(targets))
+	var wg sync.WaitGroup
+	for i, addr := range targets {
+		wg.Add(1)
+		go func(i int, addr string) {
+			defer wg.Done()
+			entry := perNodeResult{Node: addr}
+			cli, release, err := f.clientFor(addr)
+			if err != nil {
+				entry.Error = err.Error()
+				results[i] = entry
+				return
+			}
+			defer release()
+			resp, err := cli.client.GetClusterOverview(f.outgoingCtx(ctx), &pb.GetClusterOverviewRequest{})
+			if err != nil {
+				if status.Code(err) == codes.Unavailable {
+					f.invalidateClient(addr)
+				}
+				entry.Error = err.Error()
+				results[i] = entry
+				return
+			}
+			data, mErr := marshalProto(resp)
+			if mErr != nil {
+				entry.Error = errors.Wrap(mErr, "marshal response").Error()
+				results[i] = entry
+				return
+			}
+			entry.OK = true
+			entry.Data = data
+			results[i] = entry
+		}(i, addr)
+	}
+	wg.Wait()
+
+	writeJSON(w, http.StatusOK, map[string]any{"nodes": results})
+}
+
+// maxResponseBodyBytes caps writeJSON's encode buffer. Worst-case sizing:
+// fan-out hits at most maxDiscoveredNodes (=512) nodes, each returning a
+// GetClusterOverview proto. The proto is dominated by the members list
+// (≤maxDiscoveredNodes entries × ~few-hundred bytes each) plus the group
+// leaders map (one entry per Raft group; clusters carry tens, not hundreds),
+// so the per-node JSON is bounded around ~150 KiB and the aggregated body is
+// bounded around 75 MiB even before deduplication. The 128 MiB cap below
+// comfortably covers that worst case while still rejecting clearly
+// oversized payloads; operators running clusters where the overview
+// legitimately exceeds this can raise the constant. Keep this aligned with
+// handleOverview's fan-out cap so a misbehaving node cannot force unbounded
+// memory growth.
+const maxResponseBodyBytes = 128 << 20
+
+// writeJSONBufferPool reuses encode buffers across requests so a steady stream
+// of /api/* calls doesn't churn the heap with per-request allocations. The
+// pool stores *bytes.Buffer; each user resets and bounds the buffer.
+var writeJSONBufferPool = sync.Pool{
+	New: func() any { return new(bytes.Buffer) },
+}
+
+// writeJSON marshals body into a pooled, size-capped buffer first, so an
+// encoding failure can still surface as a 500 instead of a truncated body
+// under a committed 2xx header. The cap (maxResponseBodyBytes) bounds memory
+// even if a misbehaving downstream returns an oversized payload.
+func writeJSON(w http.ResponseWriter, code int, body any) {
+	buf, ok := writeJSONBufferPool.Get().(*bytes.Buffer)
+	if !ok {
+		buf = new(bytes.Buffer)
+	}
+	defer func() {
+		// Drop very large buffers rather than retaining them in the pool —
+		// keeps steady-state memory close to the typical response size.
+		const maxRetainBytes = 1 << 20
+		if buf.Cap() > maxRetainBytes {
+			return
+		}
+		buf.Reset()
+		writeJSONBufferPool.Put(buf)
+	}()
+	buf.Reset()
+
+	limited := &cappedWriter{w: buf, max: maxResponseBodyBytes}
+	if err := json.NewEncoder(limited).Encode(body); err != nil || limited.exceeded {
+		if limited.exceeded {
+			log.Printf("elastickv-admin: response exceeded %d-byte cap; returning 500", maxResponseBodyBytes)
+		} else {
+			log.Printf("elastickv-admin: encode JSON response: %v", err)
+		}
+		w.Header().Set("Content-Type", "application/json; charset=utf-8")
+		w.WriteHeader(http.StatusInternalServerError)
+		const fallback = `{"code":500,"message":"internal server error"}` + "\n"
+		if _, werr := w.Write([]byte(fallback)); werr != nil {
+			log.Printf("elastickv-admin: write fallback response: %v", werr)
+		}
+		return
+	}
+	w.Header().Set("Content-Type", "application/json; charset=utf-8")
+	w.WriteHeader(code)
+	if _, err := w.Write(buf.Bytes()); err != nil {
+		log.Printf("elastickv-admin: write JSON response: %v", err)
+	}
+}
+
+// cappedWriter wraps an io.Writer and refuses writes once `written` would
+// exceed `max`. Used by writeJSON so json.Encoder stops streaming bytes into
+// the buffer past the cap; the encoder reports the short-write and writeJSON
+// returns a 500 instead of an oversized body.
+type cappedWriter struct {
+	w        *bytes.Buffer
+	max      int
+	written  int
+	exceeded bool
+}
+
+func (c *cappedWriter) Write(p []byte) (int, error) {
+	if c.exceeded {
+		return 0, errors.New("response body cap exceeded")
+	}
+	if c.written+len(p) > c.max {
+		c.exceeded = true
+		return 0, fmt.Errorf("response body would exceed %d bytes", c.max)
+	}
+	n, err := c.w.Write(p)
+	c.written += n
+	if err != nil {
+		return n, errors.Wrap(err, "buffer write")
+	}
+	return n, nil
+}
+
+func writeJSONError(w http.ResponseWriter, code int, msg string) {
+	writeJSON(w, code, map[string]any{"code": code, "message": msg})
+}


⚠️ Potential issue | 🟡 Minor

Use slog instead of log.Printf throughout this binary.

This entire file (lines 172, 418, 497, 514, 529, 559, 627, 671, 724, 735, 745, 792, 976, 978, 984, 991) uses the standard log package, but the project standard is slog with structured keys (addr, error, etc.). New binary, no migration cost — please switch before this lands so future log scraping doesn't have to special-case elastickv-admin. Replace log "log" with "log/slog" in the import set and convert each call to slog.Info/Warn/Error with structured fields.

As per coding guidelines: "Use slog for logging and maintain structured logging keys (e.g., key, commit_ts)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cmd/elastickv-admin/main.go` around lines 1 - 1024, Replace the std log usage with slog: swap the import "log" for "log/slog" and update every log.Printf/log.Printf-like call to structured slog calls (slog.Info/Warn/Error) across the file (examples to change: run’s startup log, Close, dialDeduped, installOrAttach orphan-close logs, refreshMembership seed/unreachable logs, membersFrom truncation log, invalidateClient/invalidate-close, releaseFunc deferred close, writeJSON encode/write errors, and any other log.Printf uses). For each call, move the formatted message into a short message string and add structured fields such as slog.String("addr", addr), slog.String("error", err.Error()) or slog.Any("err", err), and when logging numeric caps use keys like slog.Int("maxNodes", maxNodes) or slog.Int("bytes", maxResponseBodyBytes) so logs follow the project’s structured key conventions.

coderabbitai · 2026-04-25T15:18:06Z

+	defer func() {
+		if cerr := f.Close(); cerr != nil {
+			log.Printf("internal: close %s file %s: %v", humanName, abs, cerr)
+		}
+	}()


⚠️ Potential issue | 🟡 Minor

Use slog for the close-error log.

The package uses log.Printf to surface a deferred-close failure; the rest of the codebase logs through slog with structured keys, and this is new code so there's no compatibility cost in switching.

🔧 Proposed fix

import ( "fmt" "io" - "log" + "log/slog" "os" "path/filepath" "strings" "github.com/cockroachdb/errors" ) @@ defer func() { if cerr := f.Close(); cerr != nil { - log.Printf("internal: close %s file %s: %v", humanName, abs, cerr) + slog.Warn("close token file failed", + "kind", humanName, "path", abs, "error", cerr) } }()

As per coding guidelines: "Use slog for logging and maintain structured logging keys".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

defer func() {

if cerr := f.Close(); cerr != nil {

log.Printf("internal: close %s file %s: %v", humanName, abs, cerr)

}

}()

import (

"fmt"

"io"

"log/slog"

"os"

"path/filepath"

"strings"

"github.com/cockroachdb/errors"

)

defer func() {

if cerr := f.Close(); cerr != nil {

slog.Warn("close token file failed",

"kind", humanName, "path", abs, "error", cerr)

}

}()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/tokenfile.go` around lines 40 - 44, Replace the log.Printf call in the deferred close handler with structured slog logging: when the deferred func that runs cerr := f.Close() detects an error, call slog.Error (or slog.Log with level Error) and include the error and structured keys such as "humanName" (humanName) and "path" (abs) instead of formatting a message string; update the deferred closure around f.Close() to use those symbols (f, cerr, humanName, abs) to produce structured logs consistent with the codebase.

coderabbitai · 2026-04-25T15:18:06Z

+// setupAdminService is a thin wrapper around configureAdminService that also
+// binds each Raft runtime to the server and logs an operator warning when
+// running without authentication. Keeping this out of run() preserves run's
+// cyclomatic-complexity budget. Members are seeded from the bootstrap
+// configuration so GetClusterOverview advertises peer node addresses to the
+// admin binary's fan-out discovery path.
+// serversInput bundles the values run() passes to startServers so the
+// signature stays compact and run() stays under the cyclop budget.
+type serversInput struct {


⚠️ Potential issue | 🟡 Minor

Misplaced doc comment for setupAdminService.

Lines 625–630 document setupAdminService, but they sit immediately above type serversInput struct (line 633). The actual setupAdminService function is defined at line 697. go doc will attribute this comment to neither. Move the block to immediately above func setupAdminService(...).

🔧 Proposed fix

-// setupAdminService is a thin wrapper around configureAdminService that also -// binds each Raft runtime to the server and logs an operator warning when -// running without authentication. Keeping this out of run() preserves run's -// cyclomatic-complexity budget. Members are seeded from the bootstrap -// configuration so GetClusterOverview advertises peer node addresses to the -// admin binary's fan-out discovery path. // serversInput bundles the values run() passes to startServers so the // signature stays compact and run() stays under the cyclop budget. type serversInput struct { @@ +// setupAdminService is a thin wrapper around configureAdminService that also +// binds each Raft runtime to the server and logs an operator warning when +// running without authentication. Keeping this out of run() preserves run's +// cyclomatic-complexity budget. Members are seeded from the bootstrap +// configuration so GetClusterOverview advertises peer node addresses to the +// admin binary's fan-out discovery path. func setupAdminService(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// setupAdminService is a thin wrapper around configureAdminService that also

// binds each Raft runtime to the server and logs an operator warning when

// running without authentication. Keeping this out of run() preserves run's

// cyclomatic-complexity budget. Members are seeded from the bootstrap

// configuration so GetClusterOverview advertises peer node addresses to the

// admin binary's fan-out discovery path.

// serversInput bundles the values run() passes to startServers so the

// signature stays compact and run() stays under the cyclop budget.

type serversInput struct {

// serversInput bundles the values run() passes to startServers so the

// signature stays compact and run() stays under the cyclop budget.

type serversInput struct {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@main.go` around lines 625 - 633, The block comment describing setupAdminService is misplaced above type serversInput; move that entire comment so it sits immediately above the func setupAdminService(...) declaration (the definition around line ~697) so godoc links it to the function, and replace or add an appropriate comment for serversInput where that struct is defined; ensure the moved comment still begins with "setupAdminService" to satisfy Go doc conventions.

coderabbitai · 2026-04-25T15:18:06Z

+	if *adminInsecureNoAuth {
+		log.Printf("WARNING: --adminInsecureNoAuth is set; Admin gRPC service exposed without authentication")
+	}


⚠️ Potential issue | 🟡 Minor

Use slog.Warn for the insecure-mode warning.

This is new code path; the rest of main.go already uses slog for warnings (e.g., line 167, 188, 200, 858), so this log.Printf stands out and is a coding guideline violation.

🔧 Proposed fix

if *adminInsecureNoAuth { - log.Printf("WARNING: --adminInsecureNoAuth is set; Admin gRPC service exposed without authentication") + slog.Warn("Admin gRPC service exposed without authentication", + "flag", "--adminInsecureNoAuth") }

As per coding guidelines: "Use slog for logging and maintain structured logging keys".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if *adminInsecureNoAuth {

log.Printf("WARNING: --adminInsecureNoAuth is set; Admin gRPC service exposed without authentication")

}

if *adminInsecureNoAuth {

slog.Warn("Admin gRPC service exposed without authentication",

"flag", "--adminInsecureNoAuth")

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@main.go` around lines 724 - 726, The warning for insecure admin mode currently uses log.Printf; replace that with structured slog logging to match the rest of main.go: where the flag adminInsecureNoAuth is checked, call slog.Warn (or the package's logger) instead of log.Printf and include a clear message plus structured key(s) such as "flag"="adminInsecureNoAuth" and "warning" or "reason" to maintain consistency with other slog calls (see existing slog usage in main.go) so the insecure-mode warning is emitted via slog with structured fields.

chatgpt-codex-connector · 2026-04-25T15:27:53Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

bootjp added 2 commits April 21, 2026 23:02

docs: address keyviz review feedback

7a8dbae

Merge branch 'main' into feat/admin_ui_key_visualizer

8766865

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

bootjp added 2 commits April 22, 2026 00:26

docs: address keyviz follow-up review

d6ba24a

docs: clarify keyviz lineage retention

11af064

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread docs/admin_ui_key_visualizer_design.md

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

Comment thread docs/admin_ui_key_visualizer_design.md Outdated

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

bootjp added 2 commits April 25, 2026 22:35

Merge branch 'main' into feat/admin_ui_key_visualizer

14698e8

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 25, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

bootjp merged commit eb1ec90 into main Apr 25, 2026
8 checks passed

bootjp deleted the feat/admin_ui_key_visualizer branch April 25, 2026 15:35


		Phase 3 persists compacted columns distributed across the user Raft groups themselves, not the default group. Concentrating KeyViz writes on the default group would centralise I/O and Raft-log growth onto a single group, creating exactly the kind of hotspot this feature is built to surface. Instead:

		- Each compacted KeyViz column is written to the Raft group that owns its key range, under a group-local reserved key `!admin\|keyviz\|range\|<lineageID>\|<unix-hour>`; the prefix is not routed through the global user keyspace or default group. `lineageID` is a stable KeyViz identifier stored with `{start, end, routeID, validFromHLC, validToHLC, parentLineageIDs}` metadata; `RouteID` is recorded only as the current routing hint, never as the primary history key.


		Because writes are recorded by Raft leaders and follower-local reads are recorded by the followers that serve them (§5.1), pointing the admin binary at a single node produces a partial heatmap. To give operators a complete view by default, the admin binary runs in fan-out mode:

		- `--nodes` accepts a comma-separated list of seed addresses. The admin binary calls `GetClusterOverview` on any reachable seed to discover the current full membership (node → gRPC endpoint, plus per-group leader identity). Membership is cached for `--nodesRefreshInterval` (default 15 s) so a stampede of concurrent browser requests hits at most one `GetClusterOverview` per interval per seed, while scale-out events are still reflected within seconds. The cache is refreshed lazily on the first request after expiry and invalidated immediately on any per-node `Unavailable` error, so removed or replaced nodes are dropped on the next request instead of waiting for the next tick.

		- Current-window counters: `routes` is an immutable `routeTable` published through `atomic.Pointer[routeTable]`. `routeTable` owns `map[RouteID]*routeSlot`; each `routeSlot` owns an `atomic.Pointer[routeCounters]`. `Observe` loads the current table, performs a plain map lookup against that immutable snapshot, loads the slot's counter pointer, and uses `atomic.AddUint64` on the counter fields. Adding a new `RouteID` or replacing split/merge mappings performs a copy-on-write table update under a non-hot-path `routesMu`, then publishes the new table with one atomic store. No `Observe` call ever runs against a Go map that can be mutated concurrently.
		- Flush: instead of holding a long write lock, the flush goroutine atomically swaps the `*routeCounters` pointer for each key using `atomic.Pointer[routeCounters]`, then reads the old pointer's frozen counters to build the new matrix column. `Observe` that loaded the old pointer before the swap completes its increments against the (now-retired) old counters, which the next flush will harvest. No counts are lost; at most one step-boundary's worth of counts land in the next column instead of the current one.

	const maxResponseBodyBytes = 32 << 20
	const maxResponseBodyBytes = 128 << 20

Conversation

bootjp commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot commented Apr 19, 2026

Uh oh!

bootjp commented Apr 19, 2026

Uh oh!

gemini-code-assist Bot commented Apr 19, 2026

Uh oh!

bootjp commented Apr 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bootjp commented Apr 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

bootjp commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading