feat(encryption): Stage 5 PR-A — EncryptionAdmin proto + read-only RPCs + status CLI#754
Conversation
…pped) Stages 0-4 of the data-at-rest encryption rollout (foundation package, sidecar crash-durable I/O, storage envelope, raft envelope + engine pre-apply hook, FSM-internal Raft entry types) all landed via PRs #719/#722/#742/#744/#748. Per CLAUDE.md's design-doc lifecycle, rename the proposal to `_partial_` and add a milestone status table so the remaining stages (5-9) and their section references are visible at a glance. The pipeline is now end-to-end byte-tag complete but production-inert: no caller wires WithEncryption, no flag exists, no admin RPC proposes a bootstrap entry. Stage 5 begins the operator-facing surface.
…Cs + status CLI First slice of Stage 5 (operator-facing surface for data-at-rest encryption). Lands the proto.EncryptionAdmin service, a server scaffold serving the read-only RPCs, and a `elastickv-admin encryption status` CLI subcommand. Stays production-inert: the server is not yet registered on main's gRPC listener, the mutating RPCs return Unimplemented, and Stage 4's WithEncryption applier seam is still un-wired. ## proto/encryption_admin.proto Full v1 surface per the design's section 6.1 admin.go: - GetCapability / GetSidecarState - read-only, leaderless, fan-out source for the section 7.1 cutover capability check. - BootstrapEncryption / RotateDEK / RegisterEncryptionWriter - mutating; proposers for the section 11.3 reserved opcodes 0x03 / 0x04 / 0x05 wired in Stage 4. PR-A returns Unimplemented; PR-B will route them through raftengine.Proposer using the Stage 4 fsmwire encoders. - ResyncSidecar - section 5.5 follower repair, read-only. local_epoch travels as uint32 on the wire (proto3 has no uint16); the schema comment documents the value <= 0xFFFF validation rule that every decode site must apply. ## adapter/encryption_admin.go EncryptionAdminServer implements pb.EncryptionAdminServer. Options: - WithEncryptionAdminSidecarPath - the section 5.1 keys.json path; empty means "encryption not configured" and GetCapability returns encryption_capable=false gracefully. - WithEncryptionAdminKeystore - reserved for the PR-B RotateDEK fast path. - WithEncryptionAdminFullNodeID - reported in CapabilityReport for the section 5.6 step 1a writer-registry batch. - WithEncryptionAdminBuildSHA - overrides runtime/debug auto-detect. - WithEncryptionAdminLatestAppliedIndex - the section 5.5 escape hatch so a caller can substitute the in-memory apply counter for the sidecar's persisted value. Read-only RPCs read the sidecar via encryption.ReadSidecar on every call; the file is small and there is no shared mutable state in the server itself. Sidecar errors map: not-exist on GetCapability returns a graceful "not capable", not-exist on GetSidecarState / ResyncSidecar returns FailedPrecondition, version mismatch returns FailedPrecondition with the error embedded, anything else returns Internal. writer_registry_for_caller is an empty non-nil map in PR-A; Stage 7 fills it from the writer registry. ## cmd/elastickv-admin/encryption.go main() dispatches to encryptionMain(os.Args[2:]) when argv[1] is "encryption", before flag.Parse() so the encryption subcommand tree owns a separate flag namespace from the HTTP-server surface. Sub-subcommands: PR-A wires only `status`. It dials the configured endpoint via grpc.NewClient (no DialContext deprecation), calls GetCapability and GetSidecarState, and writes a plain-text table. GetSidecarState returning FailedPrecondition on an unconfigured node downgrades to a "<unavailable>" line rather than failing the whole command, so `status` is informative everywhere. ## Tests - adapter/encryption_admin_test.go covers: no-sidecar path, missing sidecar file, sidecar-present-but-not-bootstrapped, bootstrapped, GetSidecarState error code, GetSidecarState happy path with wrapped DEK ids, callback override of latest_applied_index, ResyncSidecar happy path, and the three Unimplemented stubs. - cmd/elastickv-admin/encryption_test.go brings up a real gRPC server on a loopback listener (matching the production dial path, not bufconn) and exercises runEncryptionStatus end-to-end with both bootstrapped and unconfigured fixtures. Listener cleanup uses GracefulStop with a 2s ceiling so a stuck client cannot hang the test process. ## Five-pass self-review 1. Data loss - read-only RPCs, no Raft proposals, no sidecar writes. Mutating stubs return Unimplemented and never touch state. Section 5.1 sidecar write protocol untouched. 2. Concurrency - server has no mutable state; every call snapshots the sidecar via ReadSidecar. The CLI gRPC client uses grpc.NewClient (non-blocking, lazy-connecting). Test listener cleanup is bounded. 3. Performance - hot paths are unchanged; the new RPCs are operator-facing and off the data plane. ReadSidecar parses a small JSON file per call which is fine for an admin RPC. 4. Consistency - GetSidecarState reports the sidecar's view, which is the persisted, post-fsync value; the callback escape hatch lets PR-B substitute the freshest applied index when leadership wiring lands. writer_registry_for_caller is empty (no Stage 7 yet) and the cutover path is documented to source its batch from GetCapability fan-out, not this RPC. 5. Test coverage - unit tests on every read-only path plus the Unimplemented stubs; CLI tests cover both bootstrapped and unconfigured fixtures end-to-end through a real gRPC server. ## Verification - go test -race ./adapter -run TestEncryptionAdmin: PASS (1.16s) - go test -race ./cmd/elastickv-admin -run 'TestEncryptionMain|TestRunEncryptionStatus': PASS (1.08s) - go test -race ./internal/encryption/...: PASS - golangci-lint run on touched paths: 0 issues
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR introduces the EncryptionAdmin gRPC service for encryption observability and sidecar recovery. It defines proto contracts for capability reporting and sidecar state inspection, implements three read-only RPCs in a new gRPC server, adds an elastickv-admin CLI command to query encryption status via GetCapability and GetSidecarState, and updates the design document to track that Stages 0–4 of the encryption rollout are now shipped. ChangesEncryption Admin Control Plane
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements the EncryptionAdmin gRPC service and the encryption status CLI subcommand, providing read-only observability for the data-at-rest encryption feature. The changes include server-side logic for capability and state probes, comprehensive tests, and updated design documentation. Review feedback identifies a critical logic error in the EncryptionCapable flag that could prevent cluster bootstrapping and notes inconsistencies in the ResyncSidecar RPC implementation relative to the design specifications. Additionally, the reviewer recommended logging errors during gRPC connection cleanup and adopting slices.Sort for improved performance.
| func (s *EncryptionAdminServer) GetCapability(_ context.Context, _ *pb.Empty) (*pb.CapabilityReport, error) { | ||
| if s.sidecarPath == "" { | ||
| return &pb.CapabilityReport{BuildSha: s.buildSHA}, nil | ||
| } | ||
| sc, err := encryption.ReadSidecar(s.sidecarPath) | ||
| switch { | ||
| case err == nil: | ||
| case errors.Is(err, os.ErrNotExist) || encryption.IsNotExist(err): | ||
| return &pb.CapabilityReport{BuildSha: s.buildSHA}, nil | ||
| default: | ||
| return nil, statusFromSidecarErr(err) | ||
| } | ||
| return &pb.CapabilityReport{ | ||
| EncryptionCapable: sc.Active.Storage != 0, | ||
| BuildSha: s.buildSHA, | ||
| SidecarPresent: true, | ||
| FullNodeId: s.fullNodeID, | ||
| // LocalEpoch stays at 0 until Stage 7 wires the §4.1 | ||
| // writer-registry counter. Bootstrap pre-check is | ||
| // expected to call GetCapability before any DEK exists, | ||
| // so 0 is the correct value at that point regardless. | ||
| LocalEpoch: 0, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
The EncryptionCapable logic in GetCapability appears to be incorrect based on the design doc §7.1 Phase 0. The doc states that encryption_capable should be true for any node running the new binary with the encryption flag set, allowing the leader to distinguish capable nodes from older ones during the Phase 0/1 capability poll.
Currently, if the sidecar is missing or sc.Active.Storage == 0 (not yet bootstrapped), the RPC returns false. This will block the bootstrap process itself, as the cutover command (§7.1 step 3) refuses to proceed unless all members report true. The capability check should return true if the node is configured for encryption (i.e., sidecarPath is set), regardless of whether it has already been bootstrapped.
| // default Raft group's leader (BootstrapEncryption, RotateDEK, | ||
| // RegisterEncryptionWriter, ResyncSidecar). They route to the | ||
| // leader; followers return FailedPrecondition with the |
There was a problem hiding this comment.
The comment groups ResyncSidecar under "Mutating RPCs that propose FSM-internal entries". However, the design doc §5.5 and the implementation in adapter/encryption_admin.go indicate that this is a read-only RPC used for follower repair that does not involve a Raft proposal. Please update the comment to reflect its read-only nature to avoid confusion for future implementers.
| func (s *EncryptionAdminServer) ResyncSidecar(_ context.Context, _ *pb.ResyncSidecarRequest) (*pb.ResyncSidecarResponse, error) { | ||
| if s.sidecarPath == "" { | ||
| return nil, grpcStatusError(codes.FailedPrecondition, "encryption: sidecar path is not configured on this node") | ||
| } | ||
| sc, err := encryption.ReadSidecar(s.sidecarPath) | ||
| if err != nil { | ||
| return nil, statusFromSidecarErr(err) | ||
| } | ||
| return &pb.ResyncSidecarResponse{ | ||
| WrappedDeksById: wrappedDEKMap(sc), | ||
| ActiveStorageId: sc.Active.Storage, | ||
| ActiveRaftId: sc.Active.Raft, | ||
| LeaderLatestAppliedIndex: s.appliedIndex(sc.RaftAppliedIndex), | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
The ResyncSidecar implementation is missing the writer_registry_for_caller information required by the recovery protocol described in Design Doc §5.5. The doc specifies that a recovering follower uses the leader's recorded last_seen_local_epoch for that specific node to re-derive its local_epoch monotonically.
While the PR description notes that the writer registry is deferred to Stage 7, the ResyncSidecarResponse proto message is also missing this field entirely, whereas SidecarStateReport (from GetSidecarState) includes it. Given §5.5 says the fallback uses GetSidecarState, the purpose and completeness of ResyncSidecar as a separate RPC should be reconciled with the design.
| if err != nil { | ||
| return err | ||
| } | ||
| defer func() { _ = closeFn() }() |
There was a problem hiding this comment.
The error from closeFn() (which closes the gRPC client connection) is silently ignored. Per the general rules for this repository, errors from Close() methods on resources like network connections should be logged to ensure that cleanup problems or resource leaks are visible.
| defer func() { _ = closeFn() }() | |
| defer func() { | |
| if err := closeFn(); err != nil { | |
| fmt.Fprintf(os.Stderr, "encryption: close connection: %v\n", err) | |
| } | |
| }() |
References
- Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.
| for id := range state.WrappedDeksById { | ||
| ids = append(ids, id) | ||
| } | ||
| sort.Slice(ids, func(i, j int) bool { return ids[i] < ids[j] }) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b6799e520d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if stateErr != nil { | ||
| _, _ = fmt.Fprintf(out, "sidecar_state: <unavailable: %v>\n", stateErr) | ||
| return nil |
There was a problem hiding this comment.
Return non-precondition sidecar-state failures
writeEncryptionStatus treats every GetSidecarState error as a soft “unavailable” condition and returns success, but only FailedPrecondition is expected for unconfigured nodes. If GetSidecarState fails for other reasons (e.g., Unimplemented, permission issues, transport faults after capability succeeds), elastickv-admin encryption status exits 0 and can mislead operators into thinking the check passed. Restrict this fallback path to codes.FailedPrecondition and propagate all other RPC errors.
Useful? React with 👍 / 👎.
Addresses the first review wave on PR #754. Per CLAUDE.md, the P1 defect fix is paired with a failing regression test in the same commit. ## gemini P1 (high) - adapter/encryption_admin.go:137 `GetCapability` was gating `encryption_capable` on `sc.Active.Storage != 0`, which is the post-bootstrap state. The §7.1 Phase 0 cutover gate runs *before* bootstrap proposes the first DEK, so gating on Active.Storage != 0 deadlocks the cutover against the very entry it gates. Fix: encryption_capable is now gated on "node started with --encryption-enabled" (proxied here by a non-empty sidecar path). SidecarPresent carries the orthogonal "bootstrap entry landed" signal. The §6.1 doc-comment is rewritten so the next reader does not re-introduce the bug. Regression tests: - TestEncryptionAdmin_GetCapability_SidecarMissing now asserts EncryptionCapable=true with SidecarPresent=false (Phase 0 window: configured but not yet bootstrapped). - TestEncryptionAdmin_GetCapability_NotBootstrapped now asserts EncryptionCapable=true with SidecarPresent=true (sidecar present but Active.Storage==0). Caller audit: GetCapability has one production caller today (cmd/elastickv-admin/encryption.go), which just prints the bool. No semantic logic depends on the old gating. Future cutover callers (PR-B / Stage 6) will read the new semantics fresh. ## gemini medium - proto/encryption_admin.proto:22 ResyncSidecar was grouped under "mutating RPCs". Move it into the read-only group and document the §5.5 follower-repair semantics (addressed at the leader because the recovery flow only consults the leader's record, but no Raft proposal is involved). ## gemini medium - adapter/encryption_admin.go:192 ResyncSidecarResponse was missing the writer_registry_for_caller field that §5.5 requires for the follower to re-derive its §4.1 local_epoch. Add the field on the proto and populate an empty non-nil map in PR-A (Stage 7 wires the registry); the field-comment documents the value <= 0xFFFF decode rule the schema requires. ## gemini medium - cmd/elastickv-admin/encryption.go:81 closeFn() error was silently ignored. Log to stderr so an FD leak or transport bug stays visible. ## gemini medium - cmd/elastickv-admin/encryption.go:144 sort.Slice replaced with slices.Sort. ## codex P2 - cmd/elastickv-admin/encryption.go:116 writeEncryptionStatus treated every GetSidecarState error as the "node not configured" soft fallback. Narrow the fallback to codes.FailedPrecondition (the documented sentinel for unconfigured nodes); any other code (Internal / Unavailable / Unimplemented) now propagates so the CLI exits non-zero. Regression test (TestRunEncryptionStatus_PropagatesNonPreconditionError) spins up a stub gRPC server that returns codes.Internal from GetSidecarState and verifies runEncryptionStatus returns a wrapped error rather than silently exiting 0. Caller audit: writeEncryptionStatus has a single production callsite (runEncryptionStatus, just above the narrowing). The narrowing happens before writeEncryptionStatus is called, so the helper's contract is unchanged at its call boundary. ## Verification - go test -race -timeout=60s ./adapter ./cmd/elastickv-admin (Stage 5 scope): PASS - golangci-lint run on touched paths: 0 issues
|
@claude review |
|
Claude finished @bootjp's task in 3m 31s —— View job Claude's ReviewTasks
Status of Prior Automated Review CommentsBefore noting new findings, I verified the current state of every issue flagged by the earlier automated reviewers:
All automated reviewer concerns are resolved in the current code. Independent Review
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/elastickv-admin/encryption.go`:
- Around line 74-76: The flag parsing path using fs.Parse(args) treats any error
as fatal; modify the error handling in the encryption command (where
fs.Parse(args) is called in encryption.go) to treat flag.ErrHelp as a non-error:
after calling fs.Parse(args), if err == flag.ErrHelp return nil (or no wrapped
error) so the help invocation exits successfully, otherwise return
errors.Wrap(err, "parse flags") for other errors; ensure you import flag if not
already referenced and keep the existing errors.Wrap behavior for non-help
errors.
- Around line 31-40: The command treats help-style invocations as an
unknown-subcommand error; update the subcommand handling so that when sub is
"help", "-h", or "--help" it prints the usage string ("usage: elastickv-admin
encryption <subcommand> [flags]") to os.Stdout and returns nil (exit 0) instead
of falling through to errors.Errorf; modify the switch that dispatches on sub
(the same block that calls runEncryptionStatus(rest, os.Stdout)) to handle these
help tokens and emit the usage text rather than an error.
- Around line 131-133: The fmt.Fprintf call that writes "sidecar_state:
<unavailable: %v>\n" currently discards its error; change it to capture the
returned error from fmt.Fprintf(out, ...) and return (or wrap) that error
instead of ignoring it so write failures to the output stream are propagated.
Locate the call using fmt.Fprintf with the message "sidecar_state: <unavailable:
%v>\n", and use the returned err (and stateErr if desired for context) to
produce a non-nil error return from the enclosing function rather than using "_,
_ =" to drop it.
In `@docs/design/2026_04_29_partial_data_at_rest_encryption.md`:
- Line 16: Update the Stage 5 status for the "EncryptionAdmin gRPC + CLI" row in
the design doc: change the Stage column value currently set to `open` to
`partial` (or split into separate 5A/5B rows if you prefer) and add `#754` to
the "Landed in" column so the row reflects this PR as Stage 5 PR-A; locate the
row containing the text "EncryptionAdmin gRPC + CLI (§5.6 bootstrap, §6.6 admin
commands)" and make these two edits to keep milestone tracking accurate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d42f2fa-0b40-4cf0-b149-4a978908ab51
⛔ Files ignored due to path filters (2)
proto/encryption_admin.pb.gois excluded by!**/*.pb.goproto/encryption_admin_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (7)
adapter/encryption_admin.goadapter/encryption_admin_test.gocmd/elastickv-admin/encryption.gocmd/elastickv-admin/encryption_test.gocmd/elastickv-admin/main.godocs/design/2026_04_29_partial_data_at_rest_encryption.mdproto/encryption_admin.proto
Addresses claude[bot]'s round-2 review on PR #754. claude[bot] LGTM'd the round-1 fixes (gemini P1 + 4 mediums + codex P2) and surfaced only minor follow-ups in the round-2 pass. ## adapter/encryption_admin.go - ResyncSidecar now binds `req` and explicitly `_ = req` with a comment that PR-B's leader-only guard will read CallerFullNodeId. No semantic change; this just signals to reviewers that the unused field is intentional, not an oversight. - wrappedDEKMap logs an `slog.Error` when parseSidecarKeyID rejects an entry. The comment continues to document the "programming error" expectation; the log makes a future sidecar-format migration that outpaces validateSidecar visible in operator logs rather than producing a silently shrunken wrapped_deks_by_id map. ## cmd/elastickv-admin/encryption.go - dialEncryption gets a doc-comment clarifying that the context argument is reserved for the PR-B auth path (TLS + token); grpc.NewClient itself is non-blocking and ignores it. ## adapter/encryption_admin_test.go - TestEncryptionAdmin_ResyncSidecar_ShipsWrappedDEKs now asserts the same non-nil empty-map contract on WriterRegistryForCaller that TestEncryptionAdmin_GetSidecarState_ShipsWrappedDEKs already checks. Mirrors the §5.5 promise so a future regression to nil is caught. ## cmd/elastickv-admin/encryption_test.go - startCustomEncryptionAdminTestServer's Serve goroutine no longer silently drops the error. GracefulStop returns nil from Serve, so the happy-path exit prints nothing; an unexpected listener or accept failure now surfaces via t.Logf. ## Skipped (claude[bot] cosmetic) - "Service RPC ordering doesn't match comment grouping" — reordering the rpc definitions would shuffle the generated ServiceDesc.Methods order. Wire format is unaffected but the cosmetic risk-to-reward is unfavourable; the existing comment grouping is clear enough. ## Verification - go test -race -timeout=60s on ./adapter and ./cmd/elastickv-admin: PASS - golangci-lint run on touched paths: 0 issues
|
@claude review |
|
Claude finished @bootjp's task in 1m 46s —— View job Claude finished @bootjp's task — Claude's Review (Round 2)Tasks
Status: Prior Round-1 FindingsAll five issues flagged in the previous Claude review are addressed in the current code:
All prior automated-reviewer findings (Gemini P1/M1–M4, Codex P2) remain resolved as verified in the prior round. Remaining Open Items (CodeRabbit batch, not yet addressed)1.
|
…bot]) claude[bot] round 2 says "Ready to merge"; coderabbitai round 1 filed 4 minor findings. All overlap in 3 distinct cleanups, all addressed here. No correctness change; the diff is the kind of paper-cut polish that keeps shell ergonomics + design-doc lifecycle clean. ## cmd/elastickv-admin/encryption.go - `encryption -h` / `--help` / `help` now print usage to stdout and exit 0 instead of falling through to the unknown-subcommand error path. - `encryption status -h` now treats `flag.ErrHelp` as success. flag.ContinueOnError already wrote the per-flag usage to fs.Output(); we just stop wrapping the sentinel into an error. - The unavailable-sidecar branch no longer drops `fmt.Fprintf` write errors; it now matches the propagation pattern every other Fprintf in `writeEncryptionStatus` uses. ## cmd/elastickv-admin/encryption_test.go - TestEncryptionMain_HelpFlagExitsZero locks in the three help-flag forms (`-h`, `--help`, `help`) all returning nil. - TestRunEncryptionStatus_HelpFlagExitsZero locks in `flag.ErrHelp`-on-subcommand-flags also returning nil. ## docs/design/2026_04_29_partial_data_at_rest_encryption.md - Stage 5 milestone row split into 5A (this PR, shipped) and 5B (mutating RPCs + capability fan-out + remaining CLI + main.go wiring, open). `Landed in` column links #754 for 5A. ## Verification - go test -race -timeout=60s on ./adapter and ./cmd/elastickv-admin: PASS - golangci-lint run on touched paths: 0 issues
|
@claude review |
|
Claude finished @bootjp's task in 2m 30s —— View job Claude's Review (Round 3) Tasks
Status of Prior Round 2 / CodeRabbit FindingsAll three open items from Round 2 are resolved in commit
New pinning tests added for the help fixes:
Independent Round 3 ObservationsNo new correctness, data-loss, or concurrency issues found. One minor style note: Everything else — the Overall AssessmentThis PR is ready to merge. All automated-reviewer findings (Gemini P1/M1–M4, Codex P2), all Round 1 Claude findings, and all three Round 2 Claude / CodeRabbit findings are resolved. The implementation quality is high, the test coverage is thorough, and the design-doc lifecycle (5A shipped / 5B open) is correctly maintained. |
Summary
First slice of Stage 5 of the data-at-rest encryption rollout
(design doc:
docs/design/2026_04_29_partial_data_at_rest_encryption.md,§5.6 initial DEK bootstrap, §6.1 admin.go service, §6.6 admin commands).
Production-inert: the new server is not yet registered on
main's gRPClistener; only the read-only RPCs are wired; the propose-side RPCs
return
Unimplementeduntil PR-B.This PR pairs the doc-lifecycle transition (
_proposed_→_partial_, per CLAUDE.md design-doc-first workflow) with PR-A ofthe implementation.
What lands
proto.EncryptionAdminv1 surface:GetCapability,GetSidecarState,BootstrapEncryption,RotateDEK,RegisterEncryptionWriter,ResyncSidecar. The schema doc-commentspin the
local_epoch <= 0xFFFFvalidation rule that every decodesite must apply.
adapter/EncryptionAdminServerwith the read-only RPCs fullyimplemented (sidecar-backed via
encryption.ReadSidecar); thethree propose-side RPCs are
Unimplementedstubs.elastickv-admin encryption statusCLI subcommand: dispatches viaargv[1] == "encryption"beforeflag.Parse()so subcommand flagsdo not bleed into the HTTP-server global flag namespace. Uses
grpc.NewClient(no DialContext deprecation).Out of scope (deferred to PR-B / Stage 6)
RegisterEncryptionAdminServerontomain.go's gRPC listener.BootstrapEncryption/RotateDEK/RegisterEncryptionWriterimplementations on top of
raftengine.Proposer+ Stage 4 fsmwirebodies, with leader-only routing.
bootstrap,rotate-dek,register-writer,resync-sidecar).--enable-storage-envelope,--enable-raft-envelope) and §9.1 startup-refusal guards.Test plan
go test -race ./adapter -run TestEncryptionAdmin— 9 casescover no-sidecar / missing-file / not-bootstrapped / bootstrapped /
callback override / ResyncSidecar happy path / Unimplemented stubs.
go test -race ./cmd/elastickv-admin -run 'TestEncryptionMain|TestRunEncryptionStatus'— real gRPC server on a loopback listener (not bufconn) exercises
the CLI dial path with both bootstrapped and unconfigured fixtures.
go test -race ./internal/encryption/...regression — pass.golangci-lint runon touched paths — 0 issues.Self-review (CLAUDE.md 5 passes)
encryption.ReadSidecar; no writes, no Raft proposals. MutatingRPCs return
Unimplementedand never reach the FSM apply path.§5.1 crash-durable write protocol is untouched.
reads are inherently snapshot-safe under the §5.1 atomic-rename
protocol. CLI uses
grpc.NewClient(non-blocking, lazy connect).Test listener cleanup is bounded by a 2s
GracefulStopdeadlineso a stuck client cannot hang
go test.plane.
ReadSidecarparses a small JSON file per call, which isfine at admin-RPC rates and matches the existing §5.5 fallback
pattern.
GetSidecarStatereports the persistedpost-fsync sidecar value;
WithEncryptionAdminLatestAppliedIndexis the §5.5 escape hatch so PR-B can substitute the freshest
in-memory apply counter when leadership wiring lands.
writer_registry_for_calleris empty in PR-A (no Stage 7), andthe cutover path sources its batch from
GetCapabilityfan-outper §5.6 step 1a, not from this RPC.
Unimplementedstub has a unit test; CLI status path hasend-to-end coverage through a real gRPC server with both
bootstrapped and unconfigured fixtures.
Notes
libprotoc 29.6locally (29.3 notinstallable on macOS via Homebrew); the only diff vs the canonical
toolchain is the
protoc v5.29.xversion-comment string ingenerated headers.
protoc-gen-go v1.36.11andprotoc-gen-go-grpc 1.6.1match theproto/Makefilepinsexactly, so the wire-relevant output is identical.
adapter/test-package long timeout reproducedon
mainwithout my changes (go test -race -count=1 ./adapter/hangs after ~10 minutes on grpc dns_resolver goroutines); my
tests run cleanly in isolation (
go test -race -run TestEncryptionAdmin ./adapter/finishes in 1.16s).Summary by CodeRabbit
New Features
elastickv-admin encryption statusCLI command to report encryption capability and sidecar state.Tests
Documentation