docs(encryption): Stage 8b design proposal — WAL coverage analysis and closure#897
docs(encryption): Stage 8b design proposal — WAL coverage analysis and closure#897bootjp wants to merge 3 commits into
Conversation
…d closure Closure-style design that consolidates the parent §4.3 + §4.6 analyses on what stays cleartext in the etcd-raft WAL after the §4.2 raft envelope ships (Stage 6E). Why this doc exists, not why this work exists: - The parent's Stage 8 row was labeled 'Snapshot header v2 + WAL coverage'. 8a (#886) shipped the snapshot header v2. - The WAL-coverage half has been settled in §4.3 ('no direct file-level wrapping') but never formally closed. The 8a doc's (§4.5) shorthand for 8b was a stale label. - Operators reasonably ask 'where does the design address WAL-at-rest?' and the answer is currently scattered. What 8b says: - Affirms §4.3's existing no-file-wrapping decision; lists why (upstream maintenance, crash-safety semantics, perf hot-path overhead). - Enumerates the residual cleartext on disk (term, index, type, proposal ID, ConfChange topology) and frames it as traffic-analysis metadata, not user data. - Documents the threat-model justification for accepting the residual: defense-in-depth atop infrastructure FS encryption (LUKS, EBS, GCE); high-compliance overlays deferred to Stage 9. - Carves out a forward-compat hook at internal/raftengine/etcd/wal_store.go for a hypothetical future application-level file-encryption layer. Not committed. What 8b does NOT do: - Ship any code. This is a documentation-only closure. - Re-litigate §4.3's decision (the design predates Stage 6E but the reasoning still holds). - Commit to any forward-compat work. After 8b lands, the parent Stage 8 row collapses to '8a shipped, 8b closed by docs' and the encryption roadmap moves to Stage 9 once 6E ships.
|
Warning Review limit reached
More reviews will be available in 57 minutes and 14 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request proposes a design document for Stage 8b, providing a documentation-only closure analysis for WAL coverage and affirming the decision to rely on the Raft envelope payload encryption rather than direct file-level wrapping. The review feedback points out discrepancies between the document's description of the proposal envelope header and threat-model details versus the actual implementation in internal/encryption/raft_envelope.go. Specifically, the diagram and text should be updated to correctly reference the actual Envelope header fields (such as KeyID and Nonce) and the leaked monotonic sequence (write_count within the deterministic nonce) instead of a stale "proposal ID" representation.
| | Proposal envelope header (cleartext) | | ||
| | - version byte (0x01), proposal ID (8B) | |
There was a problem hiding this comment.
There is a discrepancy between the proposal envelope header described here and the actual Envelope structure implemented in internal/encryption/raft_envelope.go.
According to internal/encryption/raft_envelope.go, the Envelope struct consists of:
Version(1 byte)Flag(1 byte)KeyID(4 bytes)Nonce(12 bytes)Body(ciphertext)
The diagram and subsequent text refer to a proposal ID (8B) instead of the KeyID and Nonce (which contains the deterministic nonce components like write_count). To ensure high-fidelity documentation, the diagram should be updated to accurately reflect the actual Envelope header fields (KeyID and Nonce).
| | Proposal envelope header (cleartext) | | |
| | - version byte (0x01), proposal ID (8B) | | |
| | Proposal envelope header (cleartext) | | |
| | - version (1B), flag (1B), key ID (4B), | | |
| | - nonce (12B) | |
| 2. **Entry index gaps** (presence of compaction / snapshot installs). | ||
| 3. **Term changes** (visible from raft-entry framing). | ||
| 4. **Entry types** — normal entry vs `ConfChangeV2` / `ConfChange`. ConfChange entries carry node IDs and addresses (topology, not user data). | ||
| 5. **Proposal ID sequence** (8-byte counter, mostly monotonic per leader). Leaks throughput rate and the rough cadence of leadership flips. |
There was a problem hiding this comment.
Consistent with the actual Envelope header fields, the monotonic sequence leaked is the write_count component within the 12-byte deterministic Nonce (constructed as node_id ‖ local_epoch ‖ write_count), rather than a standalone 8-byte proposal ID. Updating this description will align the threat analysis with the actual implementation.
| 5. **Proposal ID sequence** (8-byte counter, mostly monotonic per leader). Leaks throughput rate and the rough cadence of leadership flips. | |
| 5. **Nonce sequence** (specifically the write_count component within the 12-byte deterministic nonce). Leaks throughput rate and the rough cadence of leadership flips. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9587da1018
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| What the adversary **cannot** observe (because §4.2 wraps the payload): | ||
|
|
||
| - User keys, values, operation type (PUT/DEL/GET/COMMIT/ABORT). | ||
| - Transaction grouping or two-phase commit metadata. | ||
| - Backup or admin operation contents. |
There was a problem hiding this comment.
Document pre-cutover WAL exposure
The parent design states that after raft_envelope_active is committed, the WAL still interleaves cleartext pre-flag entries with post-flag raft-envelope entries, so on any cluster that enables Stage 6E after existing traffic, raw WAL access can still reveal user keys, values, and op tags until those old entries are compacted/removed. This unconditional “cannot observe” list makes the Stage 8b closure look effective immediately after cutover and can mislead operators relying on WAL-at-rest guarantees.
Useful? React with 👍 / 👎.
…write_count nonce leakage) gemini MEDIUM #1 on §2 (line 41) Original diagram conflated two distinct envelopes that both appear cleartext on disk: - Outer 'proposal envelope' from the etcd-raft engine (proposalEnvelopeVersion 1B + proposal_id 8B); owned by internal/raftengine/etcd/engine.go:3887 encodeProposalEnvelope. - Inner 'AEAD envelope' from §4.2 (Version 1B + Flag 1B + KeyID 4B + Nonce 12B + Body); owned by internal/encryption/envelope.go:68. 6E activates the wrap-on-propose / unwrap-on-apply boundary against the INNER one. Both have cleartext headers on disk; only the AEAD body is encrypted. Updated §2's ASCII diagram to show both envelopes stacked with the field accounting matching the real implementation. gemini MEDIUM #2 on §3 (line 59) §3 item 5 (residual cleartext) listed only the 8-byte proposal ID as a monotonic-counter leak. The 12B deterministic nonce (node_id || local_epoch || write_count) ALSO leaks its write_count component on disk — both counters move under steady state but diverge under leader flip / DEK rotation in informative ways. Updated §3 to enumerate BOTH counters and added §3 item 6 for DEK rotation events (KeyID changes between consecutive entries). Both fixes improve the threat-model fidelity of the closure analysis; the underlying §4.3 decision (no file wrapping) is unaffected. No semantic change to the design's conclusions, just factual corrections to the residual-leakage enumeration.
|
Round-1 fixes in 1423fa3:
Both fixes improve threat-model fidelity; the §4.3 'no file wrapping' decision is unaffected. @claude review |
|
Claude finished @bootjp's task in 3m 14s —— View job Review of PR #897 — Stage 8b WAL Coverage Closure (Round-1 Fixes)Tasks
Round-1 Fixes: Verified ✅Both round-1 changes are correct against the implementation. §2 dual-envelope diagram — The outer proposal envelope layout ( §3 dual-counter enumeration — Both the proposal ID ( Issues Found1. Pre-cutover cleartext WAL entries —
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1423fa3d3f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| the rough cadence of leadership flips (the counter resets on | ||
| leadership change). |
There was a problem hiding this comment.
Correct proposal-ID leakage description
In clusters where a node stays up across leadership changes or handles reads/admin requests, the WAL does not show a counter that resets on each leadership change: Engine.nextID() is a process-wide atomic counter shared by Propose, linearizable reads, and admin conf changes, and there is no reset on leader transitions. Documenting resets as a leader-flip signal gives operators the wrong traffic-analysis model; this residual should describe process-local request IDs with possible gaps from non-WAL reads/admin work instead of per-leader reset behavior.
Useful? React with 👍 / 👎.
codex P2 #1 + claude Issue #1 — pre-cutover cleartext window The 'cannot observe' block in §3 was unconditional; in reality it applies only to entries with index > raftEnvelopeCutoverIndex. On a cluster that enables Stage 6E after existing traffic, the WAL interleaves cleartext pre-cutover entries with post-cutover ciphertext until log compaction removes the pre-cutover segments (snapshot install + segment purge). During that window an adversary with raw WAL access can still read user data from the pre-cutover segments. Added an 'Important caveat' note that documents the 'eventually true post-compaction' framing, matching the engine's strict-> dispatch semantics. codex P2 #2 — proposal-ID leakage description corrected The original wording claimed the 8B proposal ID 'resets on leadership change' — wrong. Engine.nextID() at engine.go:3134 is Engine.nextRequestID.Add(1), a process-wide atomic shared by Propose, linearizable reads, and admin conf changes. It does NOT reset across leadership transitions and is bumped by non-WAL paths, so the on-disk WAL sequence has gaps relative to the in-process counter. Updated §3 item 5 to describe the actual semantics: process-local request IDs with gaps from non-WAL traffic, useful for throughput estimation but NOT a reliable leader-flip signal. claude Issue #2 — nonce field widths in §2 diagram Added explicit byte widths (node_id 2B + local_epoch 2B + write_count 8B = 12B total, all big-endian) with a back-reference to internal/encryption/nonce_factory.go:13-15. Matches the precision the round-1 fix applied to the envelope fields. claude Issue #3 — §6 forward-compat hook scope clarification The original wording presented wal_store.go as 'the implementation would land here as a transparent os.File-wrapping layer' — but wal_store.go's WAL operations route through the upstream go.etcd.io/etcd/server/v3/storage/wal package, which manages *os.File internally and exposes no external io.Writer intercept surface. Updated §6 to clarify that wal_store.go is the operator control surface but the actual wrapping mechanism still has to take one of the approaches §5 rules out (fork the upstream package, or move to an FS-level interposition that's effectively §4's infrastructure-FS-encryption answer). Adds a back-reference to §5 so the reader sees the structural challenge consistently. All four fixes improve threat-model fidelity without changing the §4.3 'no file wrapping' decision.
|
Round-2 fixes in 2d3f80e:
All four fixes improve threat-model fidelity; the §4.3 'no file wrapping' decision is unaffected. @claude review |
|
Claude finished @bootjp's task in 2m 51s —— View job Review of PR #897 — Stage 8b WAL Coverage Closure (Round-2 Fixes)Tasks
Round-2 Fixes: Verified ✅Fix #1 — pre-cutover cleartext caveat (§3): The "Important caveat" block correctly characterizes the window. The strict- Fix #2 — proposal-ID semantics (§3 item 5): Verified against the implementation. Fix #3 — nonce field widths in §2 diagram: The added widths ( Fix #4 — §6 scope clarification: The revised text correctly identifies Issues Found1. §2 diagram still says "monotonic per leader" — contradicts §3 item 5 (Fix #2 is incomplete)
Fix #2 correctly updated §3 item 5 to say the proposal ID is "monotonic per process load — NOT per leader", but the §2 diagram label was not updated. These two statements now contradict each other in the same document. The label should read "monotonic per process load" to be consistent. 2. §4 traffic-analysis bullet still says "proposal-ID resets" — also contradicts Fix #2
Same root cause as issue #1. Fix #2 removed the "per-leader reset" framing from §3 item 5 but left it intact in §4's traffic-analysis enumeration. The phrase "proposal-ID resets" is now factually wrong per the corrected §3 item 5 (the ID is process-wide and does not reset on leader flip). Term increments remain a valid leader-flip signal; proposal-ID resets do not. Suggested replacement: Non-issues / Confirmed Correct
Overall: Both remaining issues are minor textual inconsistencies introduced when fix #2 updated §3 item 5 but left the matching phrase in §2's diagram label and §4's traffic-analysis bullet unchanged. No new structural or threat-model concerns. Address issues #1 and #2 (a two-line edit) and the doc is ready to merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d3f80e427
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| ## 4. Threat-model justification for accepting the residual | ||
|
|
||
| The threat model in §2 of the parent design protects "the persisted state of the cluster" against an adversary with disk access. The residual cleartext above does not reveal user data; it reveals **traffic-analysis metadata**: |
There was a problem hiding this comment.
Scope the no-user-data claim to post-compaction entries
Fresh evidence: although lines 113-127 now correctly document that pre-cutover WAL segments can still expose keys/values until compaction, this threat-model summary remains unconditional. In clusters enabling Stage 6E after existing traffic, the residual cleartext above does reveal user data from pre-cutover entries, so this sentence should be qualified to index > raftEnvelopeCutoverIndex / after old WAL segments are purged rather than presented as generally true.
Useful? React with 👍 / 👎.
|
|
||
| - Cluster throughput (entries per WAL segment ≈ ops per epoch). | ||
| - Cluster topology changes (ConfChange entries). | ||
| - Leader-flip cadence (proposal-ID resets, term increments). |
There was a problem hiding this comment.
Remove proposal-ID resets from leader-flip summary
Fresh evidence: the detailed §3 text now says proposal IDs are process-local, shared with reads/admin requests, and do not reset on leadership transitions, which matches Engine.nextID() being a process-wide atomic counter used by Propose, reads, and admin paths. This summary still lists proposal-ID resets as a leader-flip signal, so operators reading only the threat-model justification get the wrong traffic-analysis model; keep term changes or another signal instead.
Useful? React with 👍 / 👎.
Summary
Stage 8b design — closure-style doc that formalizes the parent §4.3 + §4.6 analyses on what stays cleartext in the etcd-raft WAL after Stage 6E's raft envelope ships.
This PR ships no code. The parent design already settled WAL file-level wrapping in §4.3 ("no direct file-level wrapping ... protection comes entirely from §4.2"); 8b consolidates the analysis, justifies the trade-off explicitly, and closes the Stage 8 row.
Why this doc exists
(§4.5)shorthand for 8b was a stale label (corrected in PR docs(encryption): Stage 8a design proposal — snapshot header v2 #877 round-3).What's in the doc
internal/raftengine/etcd/wal_store.go— informational only, not committed.Sequencing
Stage 8b is decoupled from Stage 6E (the 8a closure notes 8b is "not blocked on 6E"). Lands as a docs-only PR; the parent doc's Stage 8 row update can ride this PR or land separately.
Test plan