Skip to content

feat(encryption): Stage 5 PR-B — RotateDEK + RegisterEncryptionWriter mutators#756

Open
bootjp wants to merge 8 commits into
mainfrom
feat/encryption-admin-mutators
Open

feat(encryption): Stage 5 PR-B — RotateDEK + RegisterEncryptionWriter mutators#756
bootjp wants to merge 8 commits into
mainfrom
feat/encryption-admin-mutators

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 15, 2026

Summary

Stage 5 PR-B of the data-at-rest encryption rollout (design doc:
docs/design/2026_04_29_partial_data_at_rest_encryption.md, §5.2
rotation, §4.1 writer registry, §6.1 admin.go service, §6.6 admin
commands). Wires two of the three §11.3 mutating opcodes through
raftengine.Proposer on top of Stage 4's fsmwire body encoders.

Production-inert: main.go does not yet register the
EncryptionAdminServer on its gRPC listener, and
WithEncryptionAdminProposer is unset by default — every mutating
RPC then short-circuits to FailedPrecondition with "proposer not
configured". Stage 6 flips this on under the --encryption-enabled
cluster flag; PR-C lands BootstrapEncryption + §5.6 step 1a
capability fan-out.

What lands

  • RotateDEK + RegisterEncryptionWriter server-side
    implementations on top of raftengine.Proposer. Leader-only,
    with the leader id+address embedded in FailedPrecondition
    status detail.
  • ResyncSidecar is now leader-gated (it was always-on in PR-A);
    §5.5 recovery only consults the leader's record.
  • WithEncryptionAdminProposer / WithEncryptionAdminLeaderView
    options.
  • elastickv-admin encryption rotate-dek and
    elastickv-admin encryption register-writer subcommands. The
    wrapped DEK material is taken as base64.
  • Design doc Stage-5 row split: 5A shipped (feat(encryption): Stage 5 PR-A — EncryptionAdmin proto + read-only RPCs + status CLI #754), 5B in PR, 5C
    open.

Out of scope (PR-C / Stage 6)

  • BootstrapEncryption mutator + bootstrap CLI.
  • §5.6 step 1a capability fan-out across voters and learners.
  • main.go registers EncryptionAdminServer on the gRPC listener
    (Stage 6, under the cluster flag).
  • TLS / token auth for mutating RPCs (Stage 6).

Test plan

  • go test -race -timeout=60s ./adapter -run TestEncryptionAdmin
    — 14 cases including:
    • RotateDEK happy path (wire-layout round-trip via
      fsmwire.DecodeRotation).
    • RotateDEK rejects on follower (leader hint in status detail).
    • RotateDEK input-validation table (zero new_dek_id, empty
      wrapped, unspecified purpose, local_epoch > 0xFFFF).
    • RegisterEncryptionWriter happy path (round-trip via
      fsmwire.DecodeRegistration).
    • RegisterEncryptionWriter rejects multi-writer batches.
    • ResyncSidecar rejects on follower.
    • MutatingRPCs_RejectWithoutProposer pins the production-inert
      guarantee.
  • go test -race -timeout=60s ./cmd/elastickv-admin -run TestRunEncryption
    — CLI happy path through a real gRPC loopback server, plus the
    parse-time validation (--purpose=junk, --proposer-local-epoch=70000).
  • golangci-lint run on touched paths: 0 issues.
  • CI verifies build / lint / Jepsen.

Self-review (CLAUDE.md 5 passes)

  1. Data loss — every mutating RPC is leader-gated; followers
    return FailedPrecondition before the proposer is touched. A
    nil proposer rejects with FailedPrecondition. Stage 4's
    HaltApply seam already covers the apply-side failure mode.
  2. Concurrency — the server adds no mutable state; the
    proposer is set once at construction. raftengine.Proposer's
    contract serialises proposals through the engine's queue.
  3. Performance — hot path is unchanged; mutating RPCs are
    operator-facing and rare.
  4. Consistency — fsmwire encoders are the single source of
    truth for the on-wire byte layout. Round-trip tests
    (Encode→Decode) lock the server's view against fsmwire's, so
    a future change to either side cannot silently desync.
  5. Test coverage — input-validation table, leader-rejection,
    wire-layout round-trip, and CLI happy path are all covered.
    No-proposer = FailedPrecondition is pinned.

Notes

… mutators

Stage 5 PR-B of the data-at-rest encryption rollout (design doc
docs/design/2026_04_29_partial_data_at_rest_encryption.md, §5.2
rotation, §4.1 writer registry, §6.1 admin.go service, §6.6 admin
commands). Wires two of the three §11.3 mutating opcodes through
raftengine.Proposer on top of Stage 4's fsmwire body encoders.

Production-inert: main.go does not yet register the
EncryptionAdminServer on its gRPC listener, and the
WithEncryptionAdminProposer option is unset by default — every
mutating RPC then short-circuits to FailedPrecondition with
"proposer not configured". Stage 6 flips this on under the
--encryption-enabled cluster flag; PR-C lands BootstrapEncryption
+ §5.6 step 1a capability fan-out.

## adapter/encryption_admin.go

- New options: WithEncryptionAdminProposer (raftengine.Proposer)
  and WithEncryptionAdminLeaderView (raftengine.LeaderView). Both
  are nil-safe; a nil leaderView treats the node as always-leader
  (test affordance) while a nil proposer rejects every mutating
  RPC at the FailedPrecondition boundary.
- RotateDEK validates purpose / new_dek_id != 0 / non-empty
  wrapped_new_dek / proposer_local_epoch <= 0xFFFF at the gRPC
  boundary, composes a fsmwire.RotationPayload with
  SubTag=RotateSubRotateDEK, prepends the §11.3 0x05 opcode tag,
  and Propose()s. Returns the engine's CommitIndex as
  applied_index.
- RegisterEncryptionWriter requires exactly one writer per call
  (multi-writer batches go through BootstrapEncryption per §5.6
  step 1a) and applies the same local_epoch <= 0xFFFF bound.
  Encodes a fsmwire.RegistrationPayload under the 0x03 opcode tag.
- ResyncSidecar now requires leadership: §5.5 recovery only
  consults the leader's record, so a follower's sidecar should
  not be used as a peer's recovery source.
- proposeEncryptionEntry is the byte-level glue between the
  fsmwire-encoded body and Proposer.Propose, prepending the
  opcode tag in one place so the wire layout is locked.
- requireLeader embeds the leader's id + address in the
  FailedPrecondition status detail so the operator's CLI can
  re-target without parsing free-form error text.
- uint32ToLocalEpoch is a defence-in-depth narrowing helper for
  the §4.1 16-bit nonce; bound-checked at the handler boundary
  plus masked here.

## cmd/elastickv-admin/encryption_mutators.go (new)

- `encryption rotate-dek --purpose=storage|raft --new-dek-id=
  --wrapped-new-dek=<base64> --proposer-node-id= --proposer-local-epoch=`
- `encryption register-writer --dek-id= --full-node-id= --local-epoch=`
- Shared --endpoint / --timeout flags via the PR-A
  newEncryptionEndpointFlags helper. Help-flag (`-h` / `--help`)
  is handled via flag.ErrHelp per the PR-A convention.
- Wrapped DEK material accepted as base64 because operator-side
  KEK wrapping is done out-of-band (Stage 9 CLI); base64 is the
  lingua franca for pasting opaque bytes through a terminal.
- narrowUint32 is the CLI-side twin of uint32ToLocalEpoch:
  bound-checked + masked.

## cmd/elastickv-admin/encryption.go

- encryptionMain dispatches the two new subcommands and updates
  the usage / unknown-subcommand error text to list them.

## docs/design/2026_04_29_partial_data_at_rest_encryption.md

- Stage 5 milestone row split further: 5A (shipped, PR #754), 5B
  (this PR), 5C (Bootstrap + fan-out + main.go wiring).

## Tests

adapter/encryption_admin_test.go:
- RotateDEK happy path verifies the proposed entry byte layout
  by round-tripping through fsmwire.DecodeRotation.
- RotateDEK follower-rejection embeds the leader hint in the
  error string.
- RotateDEK input-validation table: zero new_dek_id, empty
  wrapped, unspecified purpose, local_epoch > 0xFFFF.
- RegisterEncryptionWriter happy path round-trips through
  fsmwire.DecodeRegistration.
- RegisterEncryptionWriter rejects batch (len(writers) != 1).
- ResyncSidecar follower-rejection.
- MutatingRPCs_RejectWithoutProposer pins the production-inert
  guarantee.

cmd/elastickv-admin/encryption_test.go:
- runEncryptionRotateDEK happy path against a stub gRPC server.
- runEncryptionRotateDEK rejects bad --purpose and epoch overflow
  at the CLI parse boundary (no round-trip needed).
- runEncryptionRegisterWriter happy path against a stub.

## Self-review (CLAUDE.md 5 passes)

1. Data loss: every mutating RPC is leader-gated; followers
   return FailedPrecondition before the proposer is touched. A
   nil proposer rejects with FailedPrecondition. Stage 4's
   HaltApply seam already covers the apply-side failure mode.
2. Concurrency: the server adds no mutable state; the proposer
   is set once at construction. raftengine.Proposer's contract
   serialises proposals through the engine's queue.
3. Performance: hot path is unchanged. Mutating RPCs are
   operator-facing and rare.
4. Consistency: fsmwire encoders are the single source of truth
   for the on-wire byte layout. Round-trip tests
   (Encode→DecodeRotation / Encode→DecodeRegistration) lock the
   server's view against fsmwire's, so a future change to
   either side cannot silently desync.
5. Test coverage: input-validation table, leader-rejection,
   wire-layout round-trip, and CLI happy path are all covered.
   No proposer wired = FailedPrecondition is pinned.

## Verification

- go test -race -timeout=60s ./adapter ./cmd/elastickv-admin
  (Stage 5 scope): PASS
- golangci-lint run on touched paths: 0 issues
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 23 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 106a6db9-c4ed-4f70-bfb4-b90742631342

📥 Commits

Reviewing files that changed from the base of the PR and between ee42528 and 109ee7d.

📒 Files selected for processing (6)
  • adapter/encryption_admin.go
  • adapter/encryption_admin_test.go
  • cmd/elastickv-admin/encryption.go
  • cmd/elastickv-admin/encryption_mutators.go
  • cmd/elastickv-admin/encryption_test.go
  • docs/design/2026_04_29_partial_data_at_rest_encryption.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/encryption-admin-mutators

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 748f537db3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/encryption_admin.go Outdated
// surface here. We surface the engine's error rather
// than rewriting it so the operator can grep against
// the engine's own logs.
return 0, pkgerrors.Wrapf(err, "encryption: propose 0x%02x", opcode)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return gRPC status for proposer-side proposal errors

When Propose fails (e.g. leadership changes between requireLeader() and Propose), this handler returns a plain wrapped error instead of a status error, so gRPC reports codes.Unknown to clients. That drops structured retry semantics for known Raft conditions (ErrNotLeader, ErrLeadershipLost, transfer-in-progress) and makes mutating admin RPC behavior inconsistent with the rest of this service’s FailedPrecondition-based leader routing.

Useful? React with 👍 / 👎.

Comment on lines +394 to +397
payload := fsmwire.RegistrationPayload{
DEKID: req.GetDekId(),
FullNodeID: w.GetFullNodeId(),
LocalEpoch: uint32ToLocalEpoch(w.GetLocalEpoch()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject zero full_node_id in writer registration payloads

This path accepts writers[0].full_node_id == 0 and forwards it directly into the Raft registration payload. In this codebase full_node_id=0 is used as a sentinel for “not capable/unset”, so persisting it into writer-registry state can create an invalid writer identity and break nonce-uniqueness assumptions if multiple registrations collapse onto node id 0. The server should reject zero full_node_id (and similarly for rotate proposer id) at the RPC boundary.

Useful? React with 👍 / 👎.

…error mapping)

Addresses both codex P1 findings on PR #756. Per CLAUDE.md, the
defect fixes are paired with failing regression tests added in the
same commit.

## codex P1 #1 — reject zero full_node_id at the RPC boundary

`adapter/encryption_admin.go` accepted `proposer_node_id == 0`
(RotateDEK) and `writers[0].full_node_id == 0`
(RegisterEncryptionWriter), forwarding the sentinel directly into
the §11.3 0x03 / 0x05 Raft entry body. §6.1 reserves
`full_node_id = 0` as the "node not encryption-capable" marker;
persisting it into a writer-registry row would collide with every
other un-bootstrapped node's sentinel and break the §4.1
nonce-uniqueness invariant.

Fix: reject with `codes.InvalidArgument` at the gRPC boundary in
both RPCs.

Regression tests:
- TestEncryptionAdmin_RotateDEK_RejectsZeroProposerNodeID
- TestEncryptionAdmin_RegisterEncryptionWriter_RejectsZeroFullNodeID

Caller audit: ProposerNodeId / FullNodeId are read by tests (all
use non-zero fixtures), the CLI flag parsers (operator-controlled,
pass straight through), and the new validators here. No production
caller depends on the old "accept zero" behaviour.

## codex P1 #2 — map Propose errors to gRPC status codes

`proposeEncryptionEntry` returned `pkgerrors.Wrap` on Propose
failure, which gRPC surfaces as `codes.Unknown`. Clients then lose
structured retry semantics for the known engine errors
(`ErrNotLeader`, `ErrLeadershipLost`,
`ErrLeadershipTransferInProgress`).

Fix: new `proposeErrorToStatus` helper maps:
- leadership errors → FailedPrecondition (same shape as the
  up-front requireLeader rejection so clients can re-target the
  current leader uniformly)
- context.Canceled → Canceled
- context.DeadlineExceeded → DeadlineExceeded
- everything else → Unavailable (transient retryable)

Regression tests:
- TestEncryptionAdmin_RotateDEK_MapsProposeLeaderErrorToFailedPrecondition
  (table-driven across all three leadership sentinels)
- TestEncryptionAdmin_RotateDEK_MapsProposeOtherErrorToUnavailable

Caller audit: proposeEncryptionEntry is called only from RotateDEK
and RegisterEncryptionWriter, both of which `return err` directly
without rewrapping, so the gRPC status reaches the client intact.
CLI code wraps via `errors.Wrap(err, "RotateDEK")` but
`status.Code(err)` walks the Unwrap chain and still recovers the
status, so the CLI-side error rendering is unaffected.

## Verification

- go test -race -timeout=60s ./adapter ./cmd/elastickv-admin
  -run 'TestEncryptionAdmin|TestEncryptionMain|TestRunEncryption':
  PASS
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 2m 47s —— View job


PR Review — feat(encryption): Stage 5 PR-B

Todo list:

  • Read changed files and gather context
  • Review adapter/encryption_admin.go — server-side implementation
  • Review adapter/encryption_admin_test.go — test coverage
  • Review cmd/elastickv-admin/encryption_mutators.go — CLI commands
  • Review cmd/elastickv-admin/encryption_test.go — CLI tests
  • Review cmd/elastickv-admin/encryption.go — CLI entrypoint changes
  • Review design doc update
  • Assess Codex P1 findings and additional issues
  • Post final review

Summary

This is a well-structured PR. The architecture is sound, the Codex P1 issues cited in the automated review have both been addressed in the follow-up commit 54f0dc6. The leader-only gating, fsmwire round-trip tests, and proposeErrorToStatus coverage are all solid. A few issues below, roughly P2–P3 in severity.


Codex P1 status — both resolved ✅

P1-A: gRPC status for proposer-side errors — resolved by proposeErrorToStatus in adapter/encryption_admin.go:430–470. It correctly maps ErrNotLeader / ErrLeadershipLost / ErrLeadershipTransferInProgressFailedPrecondition, context errors → Canceled/DeadlineExceeded, and everything else → Unavailable. Tests in TestEncryptionAdmin_RotateDEK_MapsProposeLeaderError* pin this.

P1-B: zero full_node_id rejection — resolved in both paths:

  • RotateDEK at adapter/encryption_admin.go:343–350: rejects ProposerNodeId == 0 with InvalidArgument.
  • RegisterEncryptionWriter at adapter/encryption_admin.go:401–405: rejects writers[0].FullNodeId == 0 with InvalidArgument.
    Both have dedicated regression tests (TestEncryptionAdmin_RotateDEK_RejectsZeroProposerNodeID, TestEncryptionAdmin_RegisterEncryptionWriter_RejectsZeroFullNodeID).

Issues found

P2 — requireLeader nil bypass is not enforced at construction time

adapter/encryption_admin.go:477–491: when leaderView == nil, requireLeader() returns nil unconditionally. The comment says "production wiring in main.go MUST register one." But there is no enforcement: a Stage 6 wiring that sets WithEncryptionAdminProposer but forgets WithEncryptionAdminLeaderView would silently allow followers to mutate state. The pattern is intentional for unit tests, but Stage 6 needs a startup-time assertion.

Suggestion for Stage 6: add a validation step on NewEncryptionAdminServer (or a separate Validate() method called at startup) that returns an error if proposer != nil && leaderView == nil. That makes the "MUST" machine-enforced rather than convention-enforced.

Fix this →

P2 — RegisterEncryptionWriter error message is misleading for zero writers

adapter/encryption_admin.go:390–394:

if len(writers) != 1 {
    return nil, grpcStatusErrorf(codes.InvalidArgument,
        "encryption: RegisterEncryptionWriter requires exactly one writer in PR-B, got %d (use BootstrapEncryption for multi-writer batches)",
        len(writers))
}

When len(writers) == 0, the message tells the operator to "use BootstrapEncryption for multi-writer batches", which is wrong advice for the empty case. An empty batch is a caller bug, not a multi-writer scenario. Consider distinguishing the two cases:

switch len(writers) {
case 0:
    return nil, grpcStatusError(codes.InvalidArgument,
        "encryption: RegisterEncryptionWriter requires exactly one writer, got 0")
case 1:
    // ok
default:
    return nil, grpcStatusErrorf(codes.InvalidArgument,
        "encryption: RegisterEncryptionWriter requires exactly one writer in PR-B, got %d (use BootstrapEncryption for multi-writer batches)",
        len(writers))
}

Fix this →

P3 — CLI missing zero-value guard for --full-node-id in register-writer

cmd/elastickv-admin/encryption_mutators.go:145–162: runEncryptionRegisterWriter validates dekID == 0 and localEpoch > MaxUint16 at the CLI level before dialing the server, but does not reject fullNodeID == 0. The server enforces it (P1-B above), so correctness is unaffected, but the operator would get a round-trip before seeing the error. Compare to runEncryptionRotateDEK which validates proposerNodeID == 0 on the CLI side (via the gRPC call path; actually it doesn't have a CLI-level check either — this is consistent). Mark as informational rather than a defect.

P3 — RegisterEncryptionWriter lacks an equivalent to RotateDEK_RejectsBadInputs table test

adapter/encryption_admin_test.go: RotateDEK has a 4-case table test covering zero new_dek_id, empty wrapped, unspecified purpose, and epoch > 0xFFFF. RegisterEncryptionWriter has individual tests for zero full_node_id and multi-writer batch rejection, but no table test for dek_id == 0 or local_epoch > 0xFFFF. These paths are code-covered by the implementation matching RotateDEK, but a table test would lock them explicitly and makes the intent clearer for future readers.


Observations (non-blocking)

  • requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:192): the "Plus1" suffix is slightly confusing — the function checks v > MaxUint16, it doesn't add 1. requireFitsUint16 or requireUint16Bound would read more naturally.

  • localEpochMask const placement (adapter/encryption_admin.go:527–528): the const is declared immediately after the multi-line comment for uint32ToLocalEpoch. It's easy to miss the function boundary. This is very minor — just an unusual style for Go where consts usually precede the function they serve.

  • Design doc split (docs/design/2026_04_29_partial_data_at_rest_encryption.md): the 5B→5C split is clean and the table rows accurately reflect what landed vs. what's deferred.

  • Wire-layout round-trip tests (TestEncryptionAdmin_RotateDEK_HappyPath, TestEncryptionAdmin_RegisterEncryptionWriter_HappyPath): these are the right pattern for pinning the server–fsmwire contract. Solid.

  • proposeErrorToStatus opcode parameter: including the opcode in the error message (propose 0x%02x rejected...) is a nice touch for operator debugging across multiple concurrent rotations.


Verdict

Approve with minor notes. The two Codex P1s are fully addressed, input validation is thorough, and the wire-format round-trip tests lock the critical invariant. The P2 issues (nil leaderView bypass, empty-writers message) are worth addressing before Stage 6 flips the cluster flag; the P3 items can go in as follow-ups.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Stage 5B foundation for encryption administration, specifically wiring the RotateDEK and RegisterEncryptionWriter RPCs as leader-only proposers. It introduces the necessary raftengine.Proposer and LeaderView dependencies to the EncryptionAdminServer, adds CLI subcommands for these operations, and includes comprehensive unit tests. Feedback was provided regarding error handling in proposeEncryptionEntry, where engine errors like context cancellation or leadership changes should be mapped to specific gRPC status codes to improve client-side error handling.

Comment on lines +418 to +424

// proposeEncryptionEntry prepends the §11.3 opcode tag to a
// fsmwire-encoded body and submits the resulting Raft entry. The
// Stage 4 FSM apply path peels the tag, dispatches into
// applyEncryption, and Halt-Applies on any decode failure — this
// helper is just the byte-level glue between the server-side
// encoder and raftengine.Proposer.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current error handling in proposeEncryptionEntry wraps all engine errors using pkgerrors.Wrapf, which results in a codes.Unknown status for the gRPC client.

Specifically, context.Canceled and context.DeadlineExceeded should be mapped to their respective gRPC status codes (codes.Canceled and codes.DeadlineExceeded). Additionally, if the engine returns raftengine.ErrNotLeader, we should attempt to provide the leader hint by calling requireLeader() before returning the error.

res, err := s.proposer.Propose(ctx, entry)
	if err != nil {
		if errors.Is(err, context.Canceled) {
			return 0, grpcStatusError(codes.Canceled, "encryption: propose canceled")
		}
		if errors.Is(err, context.DeadlineExceeded) {
			return 0, grpcStatusError(codes.DeadlineExceeded, "encryption: propose timeout")
		}
		if errors.Is(err, raftengine.ErrNotLeader) {
			if lErr := s.requireLeader(); lErr != nil {
				return 0, lErr
			}
		}
		// Raft-engine errors are operator-visible diagnostics:
		// not-leader / context-canceled / propose-queue full all
		// surface here. We surface the engine's error rather
		// than rewriting it so the operator can grep against
		// the engine's own logs.
		return 0, pkgerrors.Wrapf(err, "encryption: propose 0x%02x", opcode)
	}

…e + table test)

claude[bot] round-1 said "Approve with minor notes" after confirming
both codex P1 fixes resolved. This commit picks up the two P2/P3
items that are cheap to land before Stage 6 flips the cluster
flag; the third (nil-leaderView startup enforcement) is correctly
Stage 6 territory and is intentionally deferred.

## adapter/encryption_admin.go

- RegisterEncryptionWriter switches the writers-length check from
  a single != 1 branch to a three-arm switch so the zero-length
  case gets a "got 0" message instead of misrouting the operator
  to BootstrapEncryption. The >1 case keeps its existing
  multi-writer hint.

## adapter/encryption_admin_test.go

- New TestEncryptionAdmin_RegisterEncryptionWriter_RejectsBadInputs
  is the table-driven twin of the existing RotateDEK table test.
  Covers zero dek_id, local_epoch > 0xFFFF, zero full_node_id,
  empty writers — all returning InvalidArgument.
- New TestEncryptionAdmin_RegisterEncryptionWriter_EmptyWritersMessage
  pins the message-routing fix: an empty writers list must
  surface "got 0" and must NOT mention BootstrapEncryption.

## Deferred (Stage 6)

claude[bot]'s P2 #1 — `requireLeader` nil bypass — is a
production-wiring concern. The current pattern (nil = always
leader) is intentional for unit tests, and claude[bot]
explicitly defers the startup-time assertion to Stage 6. The
production wiring in main.go will gate the gRPC-server
registration behind `--encryption-enabled`, at which point a
nil leaderView is a config bug to fail closed on.

## Verification

- go test -race -timeout=60s on ./adapter and ./cmd/elastickv-admin
  (Stage 5 scope): PASS
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 2m 7s —— View job


PR Review — feat(encryption): Stage 5 PR-B

Todo list:

  • Read changed files and gather context
  • Review adapter/encryption_admin.go — server-side implementation
  • Review adapter/encryption_admin_test.go — test coverage
  • Review cmd/elastickv-admin/encryption_mutators.go — CLI commands
  • Review cmd/elastickv-admin/encryption_test.go — CLI tests
  • Review cmd/elastickv-admin/encryption.go — CLI entrypoint changes
  • Review design doc update
  • Assess previous round-1 and round-2 fixes
  • Post final review

Previous review findings — status

Codex P1-A (propose error mapping) — ✅ fully resolved. proposeErrorToStatus at adapter/encryption_admin.go:458–475 correctly maps ErrNotLeader/ErrLeadershipLost/ErrLeadershipTransferInProgressFailedPrecondition, context errors → Canceled/DeadlineExceeded, everything else → Unavailable. Tests TestEncryptionAdmin_RotateDEK_MapsProposeLeaderError* and TestEncryptionAdmin_RotateDEK_MapsProposeOtherErrorToUnavailable lock this in.

Codex P1-B (zero full_node_id) — ✅ fully resolved. RotateDEK rejects at encryption_admin.go:343–350; RegisterEncryptionWriter rejects at encryption_admin.go:407–411. Regression tests RejectsZeroProposerNodeID and RejectsZeroFullNodeID are present.

Claude round-1 P2 (empty-writers message) — ✅ resolved in 72c174a. The switch len(writers) at encryption_admin.go:390–400 now correctly separates the 0-writers case (no BootstrapEncryption hint) from the multi-writer case. TestEncryptionAdmin_RegisterEncryptionWriter_EmptyWritersMessage pins the message content.

Claude round-1 P3 (RegisterEncryptionWriter table test) — ✅ resolved. TestEncryptionAdmin_RegisterEncryptionWriter_RejectsBadInputs is now a 4-case table covering zero dek_id, local_epoch > 0xFFFF, zero full_node_id, and empty writers.


New findings

P2 — Stale docstring on ResyncSidecar (adapter/encryption_admin.go:218–221)

The current docstring says:

"PR-A serves this from the local sidecar without leadership verification… PR-B will add a leader-only guard so a follower with a stale sidecar cannot poison the recovery path of a peer with an even-staler sidecar."

But PR-B has already landed that guard (requireLeader() at line 230). The docstring reads as a future intent but the implementation is already present. This will confuse future readers who'll wonder whether the guard is live or pending.

Fix this →

P2 — Stale code comment in encryption.go dispatcher (cmd/elastickv-admin/encryption.go:28–30)

The comment says:

"PR-A wires only status. PR-B adds bootstrap / rotate-dek / register-writer / resync-sidecar; Stage 6 adds enable-storage-envelope and enable-raft-envelope."

But bootstrap is deferred to PR-C per the PR description, and resync-sidecar has no CLI command in PR-B (it's server-side only). The switch correctly wires only status, rotate-dek, and register-writer — the comment is wrong.

Fix this →

P2 (carry-forward from round-1) — requireLeader nil-bypass must be enforced at Stage 6 wiring

adapter/encryption_admin.go:483–484: when leaderView == nil, requireLeader() returns nil. The comment correctly documents this as a test escape hatch with "the production wiring in main.go MUST register one." However, there is no machine-enforced check: Stage 6 code that wires WithEncryptionAdminProposer but forgets WithEncryptionAdminLeaderView would silently let followers mutate state. The MUST should become a runtime assertion at NewEncryptionAdminServer when proposer != nil && leaderView == nil.

This is intentionally deferred here (the server is production-inert until Stage 6), but it should be recorded as a Stage 6 TODO.


Observations (non-blocking)

requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:193): the "Plus1" suffix is opaque — the function enforces v <= MaxUint16, it doesn't add 1 to anything. requireFitsUint16 or requireUint16Bound would match the intent.

localEpochMask const placement (adapter/encryption_admin.go:527–528): the const is declared inline between the function's multi-line doc comment and its body, which is unusual Go style and easy to miss. Standard placement would be a const block before the doc comment or at package level alongside other masks.

No proposeErrorToStatus coverage for RegisterEncryptionWriter path: both error-mapping tests (MapsProposeLeaderError*, MapsProposeOtherErrorToUnavailable) go through RotateDEK. Since both handlers call the same proposeErrorToStatus, code coverage is complete, but a future reader may not notice the shared helper is already exercised.


Verdict

Approve with two P2 comment fixes before merge. The architecture is sound, all round-1 and round-2 findings are addressed, leader-only gating is correctly enforced, and the wire-format round-trip tests lock the critical server ↔ fsmwire contract. The two P2 items (stale docstrings) should be a quick fixup commit. The requireLeader nil-bypass is a Stage 6 concern; leave a TODO comment in the constructor if you want to track it.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72c174a1c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/encryption_admin.go Outdated
Comment on lines +487 to +488
if s.leaderView.State() == raftengine.StateLeader {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Verify quorum before accepting leader-only encryption RPCs

requireLeader returns success solely from State()==StateLeader, but that state can remain true on a partitioned former leader until it observes step-down; during that window ResyncSidecar will serve sidecar data without any Raft proposal or linearizable-read barrier. A follower resyncing against that stale node can overwrite its local sidecar with an outdated DEK set and miss recent rotations. Please add a real leadership confirmation (LeaderView.VerifyLeader/linearizable check) on the request path before treating the node as authoritative.

Useful? React with 👍 / 👎.

… claude[bot] P2 docstrings

## codex P1 — VerifyLeader quorum confirmation in requireLeader

Codex round-2 finding: requireLeader returned success purely from
State() == StateLeader. A partitioned former leader's local
State() can still report StateLeader until the engine observes
step-down. During that window:

- RotateDEK / RegisterEncryptionWriter would Propose() against a
  Raft engine that no longer has a quorum; the proposal queues
  and either fails late or applies under a new leader's term.
- ResyncSidecar would ship the local sidecar's wrapped-DEK set
  to a recovering follower. The set may be missing recent
  rotations committed under a newer leader, silently overwriting
  the follower's freshly-recovered keys with a stale snapshot.

Fix: requireLeader now takes ctx and calls LeaderView.VerifyLeader
after the fast-path State() check. VerifyLeader does a ReadIndex
round-trip; a partitioned leader cannot complete it. The status
detail surfaces the engine error so the operator can see the
quorum-loss reason in the CLI output.

This is a fail-closed semantic change for requireLeader. Caller
audit: 3 sites (RotateDEK, RegisterEncryptionWriter,
ResyncSidecar). All gRPC handlers, all have a ctx in scope. All
three updated to pass ctx. No external callers.

Regression tests:
- TestEncryptionAdmin_RotateDEK_RejectsStaleLeader — stubLeaderView
  with State=Leader but VerifyLeader=error; expects
  FailedPrecondition with "VerifyLeader" in the message.
- TestEncryptionAdmin_ResyncSidecar_RejectsStaleLeader — the
  read-only twin; ResyncSidecar has no Propose() to catch
  leadership loss, so VerifyLeader is the only defence.

stubLeaderView gains a verifyErr field to simulate the
partitioned-leader case; existing happy-path tests use the zero
value (verifyErr=nil) and continue to pass.

## claude[bot] P2 — stale docstrings

claude[bot] round-2 flagged two outdated comments:

- adapter/encryption_admin.go ResyncSidecar docstring said "PR-A
  serves this from the local sidecar without leadership
  verification... PR-B will add a leader-only guard". PR-B has
  already added it; the docstring now documents the actual
  behaviour (VerifyLeader-based quorum confirmation + the §5.5
  rationale for the strict guard).
- cmd/elastickv-admin/encryption.go encryptionMain comment listed
  "PR-B adds bootstrap / rotate-dek / register-writer /
  resync-sidecar". bootstrap is deferred to PR-C and ResyncSidecar
  has no CLI surface. Comment now reflects the actual subcommand
  matrix.

## Deferred

claude[bot]'s carry-forward P2 (nil leaderView startup
enforcement) remains a Stage 6 concern; the production wiring
will gate the server registration on --encryption-enabled and a
nil leaderView at that point is a config bug to fail closed on.

## Verification

- go test -race -timeout=60s on ./adapter and ./cmd/elastickv-admin:
  PASS (includes 2 new stale-leader regression tests)
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 2m 47s —— View job


PR Review — feat(encryption): Stage 5 PR-B (Round 3)

Todo list:

  • Read changed files and gather context
  • Check round-2 P2 docstring fixes (ResyncSidecar + encryption.go dispatcher)
  • Assess round-3 codex P1 fix — VerifyLeader quorum guard
  • Assess carry-forward items (nil-bypass, naming, const placement)
  • Identify any new issues in round-3 commit
  • Post final review

Previous round findings — status

Claude round-2 P2 (stale ResyncSidecar docstring) — ✅ resolved. adapter/encryption_admin.go:211–221 now documents the VerifyLeader-based quorum confirmation with the §5.5 rationale for why the strict guard is needed. Clean.

Claude round-2 P2 (stale dispatcher comment) — ✅ resolved. cmd/elastickv-admin/encryption.go:26–30 now accurately says "PR-B added rotate-dek and register-writer. PR-C adds bootstrap". ResyncSidecar correctly noted as server-side only with no CLI surface.

Codex round-2 P1 (VerifyLeader quorum guard) — ✅ fully resolved.

requireLeader at adapter/encryption_admin.go:498–516 now runs two checks:

  1. Fast path: State() != StateLeader → reject followers/candidates immediately without a Raft round-trip.
  2. Quorum confirmation: VerifyLeader(ctx) → ReadIndex round-trip that a partitioned former leader cannot complete. Surfaced as FailedPrecondition with the engine error in the message.

Both mutating RPCs (RotateDEK, RegisterEncryptionWriter) and the read-only ResyncSidecar pass through the new guard. The status detail includes the engine error so an operator knows whether they hit a follower rejection or a quorum-loss timeout.

Regression tests:

  • TestEncryptionAdmin_RotateDEK_RejectsStaleLeaderstubLeaderView with State=StateLeader, verifyErr=<error>FailedPrecondition with "VerifyLeader" in the message.
  • TestEncryptionAdmin_ResyncSidecar_RejectsStaleLeader — same pattern for the read-only path, where Propose() is absent and VerifyLeader is the only defence.

stubLeaderView gains a verifyErr field; existing happy-path tests use the zero value (verifyErr=nil) and are unaffected.

Claude round-2 P2 carry-forward (nil leaderView startup enforcement) — explicitly deferred to Stage 6 in the commit message. The docstring at adapter/encryption_admin.go:481–484 now reads "enforced as a Stage 6 startup-time assertion". Acceptable.


New findings

P3 — WithEncryptionAdminKeystore docstring still forward-tense (adapter/encryption_admin.go:55)

The docstring says:

"PR-B will use it to fast-path the RotateDEK pre-check."

PR-B landed without implementing the keystore fast-path — the keystore field is stored but never read anywhere in the current implementation. The comment is now a stale intent reference. Non-blocking; can be updated in a follow-up cleanup commit or at Stage 7 when the keystore is actually wired.

P3 — ResyncSidecar happy-path test skips the leaderView code path (adapter/encryption_admin_test.go:207–240)

TestEncryptionAdmin_ResyncSidecar_ShipsWrappedDEKs constructs the server without WithEncryptionAdminLeaderView, so requireLeader returns nil immediately via the nil escape hatch and VerifyLeader is never called on the happy path. The stale-leader regression test covers VerifyLeader for ResyncSidecar specifically, so correctness is not at risk, but the happy-path test does not exercise the State()==StateLeader && VerifyLeader()==nil success path end-to-end. Adding WithEncryptionAdminLeaderView(stubLeaderView{state: raftengine.StateLeader}) to this test fixture would give complete coverage without changing the test's intent.

P3 — newEncryptionEndpointFlags docstring still says "deferred to PR-B" (cmd/elastickv-admin/encryption.go:68)

The comment says TLS / token auth for the CLI surface "is deferred to PR-B for the CLI surface so the initial PR stays scoped to read-only status." PR-B has now landed and TLS is still not implemented (it's a Stage 6 concern). Minor stale reference. Non-blocking.


Observations (unchanged from round-2, non-blocking)

  • requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:193): the "Plus1" suffix is still opaque — the function checks v > MaxUint16, it does not add 1. requireFitsUint16 or requireUint16Bound would match the intent. Still P3.

  • localEpochMask const placement (adapter/encryption_admin.go:547): const is declared between the preceding function's doc comment and its body, an unusual Go pattern. Very minor.


Verdict

Approve. All round-2 and codex round-2 issues are fully addressed:

  • The VerifyLeader quorum guard closes the partitioned-leader window for both mutating RPCs and ResyncSidecar.
  • Both stale docstrings are corrected.
  • The nil-bypass carry-forward is properly tracked for Stage 6.

The three P3 items above (stale keystore docstring, happy-path test gap, stale TLS comment) can go in as follow-ups. The wire-format round-trip tests, the stale-leader regression tests, and proposeErrorToStatus coverage collectively make this a solid foundation for Stage 6.

claude[bot] round-3 said "Approve." with three P3 follow-ups —
all stale-comment / coverage-gap items. None block correctness;
landing them now keeps the PR fully clean before merge.

## adapter/encryption_admin.go

- `WithEncryptionAdminKeystore` docstring updated. The previous
  "PR-B will use it to fast-path the RotateDEK pre-check" was
  never realised; the keystore is currently set-but-unread.
  Comment now points to Stage 7 (writer registry) as the actual
  consumer.

## cmd/elastickv-admin/encryption.go

- `newEncryptionEndpointFlags` docstring updated. The previous
  "deferred to PR-B for the CLI surface" was misleading after
  PR-B shipped without TLS — TLS is a Stage 6 concern. Comment
  now reads "shares the existing flags on the HTTP surface and
  is wired in Stage 6 alongside the --encryption-enabled gate".

## adapter/encryption_admin_test.go

- `TestEncryptionAdmin_ResyncSidecar_ShipsWrappedDEKs` now wires a
  `stubLeaderView{state: StateLeader}` so the happy path
  exercises `State()==StateLeader && VerifyLeader()==nil`
  end-to-end. Previously it short-circuited through the
  nil-leaderView test escape hatch and never exercised the
  VerifyLeader call site on the success path.

## Skipped (claude[bot] style observations)

- `requireUint16Plus1` rename — non-blocking and cosmetic.
- `localEpochMask` const placement — non-blocking and cosmetic.

## Verification

- go test -race -timeout=60s ./adapter ./cmd/elastickv-admin: PASS
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 1m 48s —— View job


PR Review — feat(encryption): Stage 5 PR-B (Round 4)

Todo list:

  • Read changed files and gather context
  • Verify round-3 P3 items addressed (keystore docstring, happy-path test gap, TLS comment, naming, const placement)
  • Check round-4 commit for new issues
  • Post final review

Round-3 P3 items — all addressed ✅

Stale keystore docstring (adapter/encryption_admin.go:52–57) — ✅ resolved. The "PR-B will use it to fast-path the RotateDEK pre-check" text is replaced with an accurate description: "Stage 7 (writer registry) will use it for the in-memory counter fast-path." No more forward-tense fiction.

ResyncSidecar happy-path test gap (adapter/encryption_admin_test.go:207–247) — ✅ resolved. TestEncryptionAdmin_ResyncSidecar_ShipsWrappedDEKs now wires stubLeaderView{state: raftengine.StateLeader}, exercising State()==StateLeader && VerifyLeader()==nil end-to-end instead of short-circuiting through the nil-leaderView test escape hatch.

Stale TLS comment in newEncryptionEndpointFlags (cmd/elastickv-admin/encryption.go:63–69) — ✅ resolved. "deferred to PR-B for the CLI surface" is replaced with "wired into the encryption CLI in Stage 6 alongside the --encryption-enabled cluster-flag gate." Accurate.

requireUint16Plus1 rename and localEpochMask placement — explicitly skipped in the commit message with rationale (non-blocking, cosmetic). The commit message clearly documents this decision. Acceptable.


New finding

P3 — dialEncryption context comment still says "PR-B auth path" (cmd/elastickv-admin/encryption.go:78–79)

// dialEncryption opens a gRPC client. The context argument is
// reserved for the PR-B auth path (TLS handshake + token attach);

This is the same pattern as the newEncryptionEndpointFlags comment fixed in round-3. PR-B shipped without TLS; the reference should say "Stage 6 auth path" or equivalent. Non-blocking but inconsistent with the round-3 fix one function above.

Fix this →


Observations (carry-forward, non-blocking)

  • requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:193): function checks v > MaxUint16, does not add 1. Cosmetic; explicitly deferred in the round-4 commit.
  • localEpochMask const placement (adapter/encryption_admin.go:547–549): const declared between a preceding function's doc comment and its body. Unusual Go style; explicitly deferred in the round-4 commit.
  • TestEncryptionAdmin_GetCapability_Bootstrapped comment (adapter/encryption_admin_test.go:116): says "PR-A always reports local_epoch=0" — a stale PR-stage label in a test comment, very minor.

Verdict

Approve. All round-3 findings are cleanly addressed. The two explicitly-skipped cosmetic items are a deliberate decision correctly documented in the commit message. The one new P3 item (stale dialEncryption comment) can go in as a follow-up — it is parallel to the already-fixed newEncryptionEndpointFlags comment and does not affect correctness.

The PR is in excellent shape: P1s resolved (proposer error mapping, zero node_id), P2s resolved (empty-writers message, VerifyLeader quorum guard, stale docstrings), wire-format round-trip tests lock the critical server ↔ fsmwire contract, and the ResyncSidecar happy-path now exercises the full leaderView code path.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83e4c49726

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +511 to +513
if err := s.leaderView.VerifyLeader(ctx); err != nil {
return grpcStatusErrorf(codes.FailedPrecondition,
"encryption: VerifyLeader failed, refusing to act on stale-leader state: %v", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve context error codes from VerifyLeader

requireLeader currently converts every VerifyLeader failure into codes.FailedPrecondition, including context.Canceled and context.DeadlineExceeded. In slow or partitioned clusters, a client timeout during the ReadIndex check will therefore be reported as “not leader” instead of cancellation/timeout, which breaks retry semantics and makes callers treat transport timing issues as leadership errors.

Useful? React with 👍 / 👎.

…+ claude[bot] P3 stale-comment cleanup

## codex P1 — preserve context codes from VerifyLeader

Codex round-4 finding: requireLeader mapped every VerifyLeader
failure to codes.FailedPrecondition, including context.Canceled
and context.DeadlineExceeded. A client whose ctx cancels or times
out during the Raft ReadIndex round-trip would see "not leader",
which collides with the genuine leadership-rejection branch and
breaks transport-timing retry semantics in slow or partitioned
clusters.

Fix: new verifyLeaderErrorToStatus helper mirrors the existing
proposeErrorToStatus pattern. context.Canceled →
codes.Canceled, context.DeadlineExceeded → codes.DeadlineExceeded,
everything else → codes.FailedPrecondition (the
stale-leader / partition case).

Caller audit: verifyLeaderErrorToStatus is called only from
requireLeader. requireLeader has 3 callers (RotateDEK,
RegisterEncryptionWriter, ResyncSidecar), all gRPC handlers, all
propagate the status via `return err` without rewrap. CLI uses
errors.Wrap; status.Code walks the Unwrap chain. No silent
behaviour change downstream.

Regression test:
- TestEncryptionAdmin_RotateDEK_VerifyLeader_PreservesContextCodes
  is a two-case table (context.Canceled, context.DeadlineExceeded)
  that pins the new mapping.

## claude[bot] P3 — two more stale comments

- dialEncryption (`cmd/elastickv-admin/encryption.go:78`): "PR-B
  auth path" updated to "Stage 6 auth path" so the comment
  matches the just-fixed newEncryptionEndpointFlags comment.
- TestEncryptionAdmin_GetCapability_Bootstrapped
  (`adapter/encryption_admin_test.go:116`): the "PR-A always
  reports local_epoch=0" label is generalised to "Stage 5
  reports... pre-Stage-7" since both PR-A and PR-B observe the
  invariant.

## Skipped (claude[bot] non-blocking observations)

- requireUint16Plus1 rename — cosmetic only.
- localEpochMask const placement — cosmetic only.

Both items are explicitly carried forward and the commit message
records the deferral rationale.

## Verification

- go test -race -timeout=60s ./adapter ./cmd/elastickv-admin: PASS
  (includes 1 new VerifyLeader context-code regression test)
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 2m 24s —— View job


PR Review — feat(encryption): Stage 5 PR-B (Round 5)

Todo list:

  • Read changed files and gather context
  • Verify round-4 P3 fix — dialEncryption stale comment
  • Verify codex round-4 P1 fix — VerifyLeader context error codes
  • Check for new issues in round-5 commit
  • Post final review

Round-4 findings — all resolved ✅

Codex round-4 P1 (VerifyLeader context code preservation) — ✅ fully resolved.

verifyLeaderErrorToStatus at adapter/encryption_admin.go:526–538 correctly implements the three-way dispatch:

case errors.Is(err, context.Canceled):
    return grpcStatusErrorf(codes.Canceled, ...)
case errors.Is(err, context.DeadlineExceeded):
    return grpcStatusErrorf(codes.DeadlineExceeded, ...)
default:
    return grpcStatusErrorf(codes.FailedPrecondition, ...)

Context timeouts during the ReadIndex round-trip now surface as codes.DeadlineExceeded / codes.Canceled, not FailedPrecondition. requireLeader at line 513 calls verifyLeaderErrorToStatus(err) — the helper is used uniformly across RotateDEK, RegisterEncryptionWriter, and ResyncSidecar.

TestEncryptionAdmin_RotateDEK_VerifyLeader_PreservesContextCodes (lines 536–561) is a two-case table that pins both the Canceled and DeadlineExceeded paths explicitly. ✅

Claude round-4 P3 (stale dialEncryption comment) — ✅ resolved. cmd/elastickv-admin/encryption.go:77–81 now reads "reserved for the Stage 6 auth path", consistent with the newEncryptionEndpointFlags fix from round-3. ✅


New findings

P3 — No VerifyLeader context-code twin test for ResyncSidecar

TestEncryptionAdmin_RotateDEK_VerifyLeader_PreservesContextCodes covers Canceled / DeadlineExceeded preservation only for RotateDEK. Both ResyncSidecar and RegisterEncryptionWriter go through the same requireLeader()verifyLeaderErrorToStatus() call chain, so correctness is guaranteed by the shared helper. But a reader tracing ResyncSidecar_RejectsStaleLeader (which uses errors.New("synthetic quorum loss") → hits the default FailedPrecondition arm) may not notice the context-code contract is also locked in for that path.

Adding stubLeaderView{state: StateLeader, verifyErr: context.DeadlineExceeded} to a ResyncSidecar variant would make the pin explicit without duplicating logic. Non-blocking; the current coverage is correct.

P3 — PR-stage archaeological references in code docstrings

verifyLeaderErrorToStatus (line 525) ends with "Codex P1 round-4 finding on PR #756"; requireLeader (line 499) ends with "Codex P1 round-2 finding on PR #756." Per CLAUDE.md, code comments should not reference the current task/fix/PR ("those belong in the PR description"). The WHY is already captured by the surrounding explanation — the PR-stage labels are redundant once the PR merges. Very minor; can be stripped in a follow-up.


Observations (carry-forward, non-blocking, explicitly deferred in round-4 commit)

  • requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:193): checks v > MaxUint16, does not add 1. requireFitsUint16 would match the intent.
  • localEpochMask const placement (adapter/encryption_admin.go:569): declared between a preceding function's doc comment and its body, unusual Go style.

Verdict

Approve. All round-4 findings are resolved. The codex P1 (VerifyLeader context codes) is cleanly handled via verifyLeaderErrorToStatus, which is correctly wired through requireLeader for all three mutating/read-only paths. The PR now carries no P1 or P2 open items:

  • P1-A/B (propose error mapping, zero node_id) — ✅ resolved round-1
  • P2 (empty-writers message, stale docstrings) — ✅ resolved rounds 2–3
  • P2 (VerifyLeader quorum guard) — ✅ resolved round-3
  • P2 (VerifyLeader context codes) — ✅ resolved round-5
  • P3 items — informational, can follow up

The wire-format round-trip tests, the stale-leader regression tests, proposeErrorToStatus coverage, and now verifyLeaderErrorToStatus coverage collectively make this a solid foundation for Stage 6.

claude[bot] round-5 verdict was "Approve" with one new P3 noting
that prior commits accumulated PR-stage / round-number references
in code comments (e.g. "Codex P1 round-4 finding on PR #756").
CLAUDE.md says:

  Don't reference the current task, fix, or callers ("used by X",
  "added for the Y flow", "handles the case from issue #123"),
  since those belong in the PR description and rot as the
  codebase evolves.

This commit removes the PR/round provenance from every code and
test comment that picked it up across rounds 1-5, while preserving
the invariant-level WHY (which IS load-bearing for future readers).

Files touched:
- adapter/encryption_admin.go (2 sites: requireLeader,
  verifyLeaderErrorToStatus)
- adapter/encryption_admin_test.go (6 sites: regression-test
  docstrings)
- cmd/elastickv-admin/encryption_test.go (2 sites: help-flag
  test, non-precondition propagation test)

No behaviour change. No tests changed structurally — only
docstrings.

## Skipped (claude[bot] non-blocking observations)

- ResyncSidecar context-code twin test: correctness guaranteed
  by the shared verifyLeaderErrorToStatus helper that the
  existing RotateDEK test exercises.
- requireUint16Plus1 rename: cosmetic carry-forward.
- localEpochMask const placement: cosmetic carry-forward.

## Verification

- go test -race -timeout=60s ./adapter ./cmd/elastickv-admin: PASS
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 2m 48s —— View job


PR Review — feat(encryption): Stage 5 PR-B (Round 6)

Todo list:

  • Read changed files and gather context
  • Verify round-5 P3 fixes (PR-stage refs stripped, ResyncSidecar context-code test)
  • Check round-6 commit for new issues
  • Post final review

Round-5 P3 items — status

verifyLeaderErrorToStatus and requireLeader docstrings — ✅ resolved. The "Codex P1 round-4 finding on PR #756" and "Codex P1 round-2 finding on PR #756" suffixes are gone. Both functions now stand on their own rationale without internal review-tracking artifacts.

TestEncryptionAdmin_GetCapability_Bootstrapped test comment — ✅ resolved. "PR-A always reports local_epoch=0" is replaced with "Stage 5 reports local_epoch=0 unconditionally; Stage 7 wires the writer-registry counter…" at adapter/encryption_admin_test.go:115–120. Accurate and doesn't rot on merge.

No VerifyLeader context-code twin test for ResyncSidecar — still open (P3/informational). No new test covering context.Canceled/context.DeadlineExceeded via the ResyncSidecar path was added. Correctness is unaffected — the shared verifyLeaderErrorToStatus helper is exercised by TestEncryptionAdmin_RotateDEK_VerifyLeader_PreservesContextCodes. Can follow up.


New findings

P3 — PR-stage labels in two operator-visible error strings

The round-6 sweep correctly removed internal review-tracking labels from docstrings, but two user-facing error strings still contain "PR-B":

cmd/elastickv-admin/encryption.go:54:

return errors.Errorf("encryption: unknown subcommand %q (PR-B supports: status, rotate-dek, register-writer)", sub)

An operator who types an unknown subcommand sees the text "PR-B supports:…" — the PR-stage label is meaningless outside the delivery team. Suggest:

return errors.Errorf("encryption: unknown subcommand %q (supported: status, rotate-dek, register-writer)", sub)

Fix this →

adapter/encryption_admin.go:401:

"encryption: RegisterEncryptionWriter requires exactly one writer in PR-B, got %d (use BootstrapEncryption for multi-writer batches)"

The "in PR-B" is an internal delivery note embedded in a gRPC status error that operators see. Suggest dropping it:

"encryption: RegisterEncryptionWriter requires exactly one writer, got %d (use BootstrapEncryption for multi-writer batches)"

Fix this →


Observations (carry-forward, explicitly deferred)

  • requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:193): checks v > MaxUint16, does not add 1. requireFitsUint16 would match the intent. Deferred in round-4 commit.
  • localEpochMask const placement (adapter/encryption_admin.go:565–567): const declared mid-function (between the preceding function's multi-line doc comment and its body). Unusual Go style. Deferred in round-4 commit.
  • PR-A/PR-B/PR-C roadmap references in docstrings (adapter/encryption_admin.go:19–26, adapter/encryption_admin.go:376–378, cmd/elastickv-admin/encryption.go:27–28): these describe incremental feature delivery and are different from the "Codex P1 round-X" tracking artifacts just removed. They will partially rot once PR-C lands. Lower priority than the two user-visible error strings above.

Verdict

Approve. The round-6 commit cleanly strips the internal review-tracking artifacts from verifyLeaderErrorToStatus and requireLeader docstrings — the primary intent is complete. No regressions or new correctness issues found. The two P3 items (PR-stage labels in user-visible error strings) are minor but worth a one-line fixup before Stage 6 makes the server production-active, since operators will see those messages directly.

All P1/P2 items from prior rounds remain closed:

  • P1-A/B (propose error mapping, zero node_id) — ✅ round-1
  • P2 (empty-writers message, stale docstrings) — ✅ rounds 2–3
  • P2 (VerifyLeader quorum guard) — ✅ round-3
  • P2 (VerifyLeader context codes) — ✅ round-5

…visible errors

claude[bot] round-6 verdict was "Approve" with two P3 follow-ups:
operator-visible error strings still carried internal PR-stage
labels that don't belong in user-facing output.

## adapter/encryption_admin.go

RegisterEncryptionWriter multi-writer error dropped "in PR-B"; the
gRPC InvalidArgument status now reads "requires exactly one
writer, got %d (use BootstrapEncryption for multi-writer
batches)" — same content, no internal delivery note leaking out
to the operator.

## cmd/elastickv-admin/encryption.go

Unknown-subcommand CLI error dropped "PR-B supports:" in favor of
"supported:". Same list, no delivery-tracking label.

## Skipped

claude[bot] flagged the PR-A/PR-B/PR-C roadmap references in
function-level docstrings as a separate concern from the
review-tracking artifacts removed in round-6. These are
load-bearing scope markers for current readers of the in-flight
stack and will be cleaned up as a single sweep once Stage 5
fully ships (PR-C lands).

## Verification

- go test -race -timeout=60s ./adapter ./cmd/elastickv-admin: PASS
- golangci-lint run on touched paths: 0 issues
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 15, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @bootjp's task in 2m 31s —— View job


PR Review — feat(encryption): Stage 5 PR-B (Round 7)

Todo list:

  • Read changed files and gather context
  • Verify round-6 P3 fixes (PR-stage labels in operator-visible error strings)
  • Check carry-forward items (ResyncSidecar context-code twin test, naming, const placement)
  • Identify any new issues in round-7 commit
  • Post final review

Round-6 P3 items — both resolved ✅

cmd/elastickv-admin/encryption.go:54 — ✅ resolved in 109ee7d. The unknown-subcommand error now reads:

return errors.Errorf("encryption: unknown subcommand %q (supported: status, rotate-dek, register-writer)", sub)

"PR-B supports:" is gone; operators see "supported:" without the delivery-tracking label.

adapter/encryption_admin.go:400–402 — ✅ resolved in 109ee7d. The multi-writer InvalidArgument message no longer contains "in PR-B":

"encryption: RegisterEncryptionWriter requires exactly one writer, got %d (use BootstrapEncryption for multi-writer batches)"

The round-7 commit is a clean, minimal 2-line change (one per file) that precisely targets the two flagged strings and nothing else. No regressions visible.


Carry-forward status

No VerifyLeader context-code twin test for ResyncSidecar (carry-forward from round-5, informational) — still open. Both Canceled and DeadlineExceeded paths are locked in by the shared verifyLeaderErrorToStatus helper exercised by TestEncryptionAdmin_RotateDEK_VerifyLeader_PreservesContextCodes. Correctness is unaffected.

requireUint16Plus1 naming (cmd/elastickv-admin/encryption_mutators.go:193) — still deferred; explicitly noted in the round-4 commit message. The function checks v > MaxUint16 (doesn't add 1). Cosmetic.

localEpochMask const placement (adapter/encryption_admin.go:565–567) — still deferred. Unusual Go style but functionally correct.

PR-A/PR-B/PR-C roadmap refs in function docstrings — explicitly preserved per the round-7 commit message rationale ("load-bearing scope markers for current readers of the in-flight stack, cleaned up as a single sweep once Stage 5 fully ships"). Reasonable decision.


Verdict

Approve. All P1/P2 findings from every prior round are resolved. The two round-6 P3 items (operator-visible error strings with PR-stage labels) are cleanly addressed in 109ee7d. No new issues in the round-7 commit.

Complete P-level history for this PR:

  • P1-A (propose error mapping → proposeErrorToStatus) — ✅ round-1
  • P1-B (zero full_node_id rejection) — ✅ round-1
  • P2 (empty-writers message, stale docstrings) — ✅ rounds 2–3
  • P2 (VerifyLeader quorum guard) — ✅ round-3
  • P2 (VerifyLeader context codes → verifyLeaderErrorToStatus) — ✅ round-5
  • P3 (stale keystore docstring, TLS comment, ResyncSidecar happy-path test gap) — ✅ round-4
  • P3 (PR-stage refs in internal docstrings) — ✅ round-6
  • P3 (PR-stage labels in operator-visible errors) — ✅ round-7

This is a solid foundation for Stage 6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant