docs(encryption): Stage 8a design proposal — snapshot header v2#877
Conversation
…_envelope_cutover_index carriage) Slices the parent design's Stage 8 — 'Snapshot header v2 + WAL coverage' — into 8a (header v2) and 8b (WAL coverage). 8a is the lower-risk slice: a versioned snapshot header that carries the raft_envelope_cutover_index alongside the existing HLC ceiling, so a node restoring a Phase-2 snapshot can route subsequent Raft entries through the correct wrap/unwrap path without side-channeling through the sidecar. Architecture: distinct 8-byte magic per version (EKVTHLC1 / EKVTHLC2) — the parent design (§4.4) already settled this choice; the rejected alternative was a heuristic on shared magic that misclassifies early-epoch HLC ceilings as v2. This slice is the implementation cut. Properties preserved: - Existing v1 snapshot files restore byte-for-byte under the new build. - Headerless legacy snapshots (TestFSMSnapshotRestoreOldFormat / TestFSMSnapshotRestoreSmallLegacy regression tests) still restore via the leave-bytes-in-stream fallback. - No migration step required; rollout is purely additive. Sibling Stage 8b (WAL coverage / §4.5) is deferred to a separate design and PR; the two are conceptually independent. Sequencing constraint: 8a MUST ship before Stage 6E (the enable-raft-envelope cutover) — 6E populates the sidecar's raft_envelope_cutover_index, and 8a's v2 writer is the carrier for that value across snapshot restore. The reverse order would leave 6E with no way to thread the cutover through a snapshot restore. 7-test verification plan covers all four read-path branches (v1, v2 with forward-compat trailing bytes, unknown EKVTHLC*, headerless), the write-path v1/v2 selector, and the restore-side cutover plumbing.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Review limit reached
More reviews will be available in 53 minutes and 9 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughIntroduces design specification for Stage 8a (Snapshot header v2), which defines a versioned snapshot header format using magic-based versioning ( ChangesSnapshot Header v2 Design (Stage 8a)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review Focus areas:
|
|
Claude finished @bootjp's task in 4m 45s —— View job Review: Stage 8a — Snapshot Header v2 Design
|
| # | Finding | Severity |
|---|---|---|
| §1 | §4.5 cross-reference in "Out of scope" doesn't match the parent doc | Low |
| §3.2 Gap A | Short-stream behavior not specified (bufio.Peek < 8 bytes) | Moderate |
| §3.2 Gap B | len < 16 validation missing from algorithm; implied only by §5 test |
Low |
| §3.3 | No-downgrade enforcement mechanism not specified | Low |
| §3.1/§7 | Forward-compat len hatch scope not bounded (optional vs. required fields) |
Low |
The core architecture (distinct magic, v2 layout, 4-branch discriminator, 6E sequencing) is sound and approved by the parent. Addressing the two §3.2 gaps in the doc before the implementation PR starts will prevent the implementer from having to infer behavior from ambiguous specs.
There was a problem hiding this comment.
Code Review
This pull request proposes a design document for Stage 8a, introducing Snapshot Header v2 to carry the raft_envelope_cutover_index field. This enables nodes restoring from a snapshot to correctly identify whether subsequent Raft log entries are plaintext or AEAD-wrapped. The review feedback highlights three important improvements: handling short streams during the 8-byte peek to prevent failing valid small legacy snapshots, validating that the payload length is at least 16 bytes in the v2 reader to avoid out-of-bounds panics, and clarifying the routing logic when cutover == 0 to prevent incorrectly attempting to unwrap plaintext entries.
| 1. Peek the 8 leading bytes from the snapshot stream (`bufio.Reader` | ||
| or equivalent — the bytes MUST be replayable into the | ||
| inner-store reader if the discriminator falls through to the | ||
| headerless branch). |
There was a problem hiding this comment.
When peeking the 8 leading bytes, if the stream is shorter than 8 bytes (e.g., legacy empty or very small snapshots), Peek(8) will return io.EOF or io.ErrUnexpectedEOF.
To prevent failing the restore on valid small legacy snapshots, the design should specify how to handle these Peek errors:
- If
Peek(8)returnsio.EOForio.ErrUnexpectedEOF, check the bytes actually read. - If the read bytes (even if fewer than 8) have "EKVTHLC" as a prefix, fail-closed with a truncated header error.
- Otherwise, safely fall back to the headerless legacy path (step 5).
| 2. `bytes.Equal(peeked, hlcSnapshotMagicV2[:])`: | ||
| - Consume the 8 magic bytes. | ||
| - Read 2 bytes → `len` (big-endian uint16). | ||
| - Read `len` payload bytes. | ||
| - Parse `ceiling` from the first 8 of payload, `cutover` from | ||
| the next 8, ignore any trailing bytes (forward-compat). | ||
| - Return `(ceiling, cutover, nil)`. |
There was a problem hiding this comment.
The design should explicitly specify that the reader must validate len >= 16 before parsing ceiling and cutover. If len < 16, it should return a typed error (e.g., ErrSnapshotHeaderInvalidLength or similar) to prevent out-of-bounds slice access or panics when attempting to parse the 8-byte ceiling and 8-byte cutover from a shorter payload.
| - `cutover` is plumbed to the applier / engine pre-apply hook so | ||
| subsequent Raft entries route through the correct wrap/unwrap | ||
| path: | ||
| - Entries with `raftIdx < cutover` are pre-Phase-2 plaintext; | ||
| pass through. | ||
| - Entries with `raftIdx >= cutover` are AEAD-wrapped; unwrap. |
There was a problem hiding this comment.
In §3.4, the routing logic states:
- Entries with
raftIdx < cutoverare pre-Phase-2 plaintext; pass through. - Entries with
raftIdx >= cutoverare AEAD-wrapped; unwrap.
However, if cutover == 0 (the default/pre-cutover state), a naive implementation of raftIdx >= cutover would evaluate to true for all entries (since raftIdx >= 0 is always true), leading to attempts to unwrap plaintext entries as AEAD-wrapped.
The design should explicitly clarify that the routing logic must first check if cutover != 0 (or cutover > 0) before applying the index-based routing, and if cutover == 0, all entries should be treated as plaintext/pre-cutover.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md (2)
77-84: ⚡ Quick winClarify the 8a/6E sequencing constraint.
Lines 77-81 state that 8a "does NOT require 6E to ship first," but lines 282-284 later state "8a MUST ship before 6E." While technically consistent (8a can compile without 6E merged, but must be deployed before 6E is enabled), the phrasing creates confusion.
Consider revising lines 79-81 to: "8a does not require 6E to be merged first — the v2 writer can compile and run with the sidecar field defined but always-zero — but 8a MUST be fully deployed before 6E is enabled in production."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md` around lines 77 - 84, Update the ambiguous phrasing around the 8a/6E sequencing: change the sentence in the "Stage 6E" paragraph that currently reads "8a does NOT require 6E to ship first" to explicitly state that 8a can be merged/compiled before 6E but must be deployed before 6E is enabled; reference the same concepts (8a, 6E, the v2 writer, and raft_envelope_cutover_index) and replace the line with: "8a does not require 6E to be merged first — the v2 writer can compile and run with the sidecar field defined but always-zero — but 8a MUST be fully deployed before 6E is enabled in production."
173-176: ⚡ Quick winRephrase to clarify the write path is stateless.
Lines 173-176 state "subsequent snapshots from this load stay on v2," which implies the writer maintains state. However, the write path is stateless and simply checks the sidecar's
raft_envelope_cutover_indexeach time.Suggest rephrasing: "Because the sidecar's
raft_envelope_cutover_index, once set non-zero byenable-raft-envelope, is never cleared back to 0 by rotation (out of scope for current semantics), subsequent snapshots will continue to use the v2 layout without requiring writer-side state."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md` around lines 173 - 176, Reword the sentence to clarify that the writer is stateless and simply reads the sidecar flag each time: replace the phrase asserting "subsequent snapshots from this load stay on v2" with a statement explaining that because the sidecar's raft_envelope_cutover_index (set non-zero by enable-raft-envelope) is not cleared by rotation under current semantics, each snapshot writer will observe that flag and continue producing v2-layout snapshots without any writer-side state or special tracking; mention raft_envelope_cutover_index, enable-raft-envelope, rotation and v2 explicitly to make the mechanism and responsibilities clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md`:
- Around line 117-119: After reading the uint16 `len` in the snapshot v2 read
path (the function handling ReadSnapshotHeader_V2 / the coded reader for §3.2),
validate that len is within bounds 16 <= len <= MAX_HEADER_PAYLOAD (define
MAX_HEADER_PAYLOAD = 1024 or suitable constant) and if not return a typed error
ErrSnapshotHeaderLenOutOfBounds; add a unit test
TestReadSnapshotHeader_V2WithLenTooLarge that constructs a v2 header with len =
0xFFFF and asserts the reader returns ErrSnapshotHeaderLenOutOfBounds to prevent
excessive allocation/IO on malformed headers.
- Around line 126-161: Clarify ReadSnapshotHeader's behavior for short streams:
update the algorithm (or add a step 0) in the ReadSnapshotHeader description to
state "Peek up to 8 bytes; if the reader returns fewer than 8 bytes (EOF or
io.ErrUnexpectedEOF), treat the snapshot as headerless legacy, do not consume
those bytes, and pass the available bytes through to the inner-store restore."
Reference ReadSnapshotHeader and kv/fsm.go::Restore and ensure this behavior
aligns with existing tests TestFSMSnapshotRestoreOldFormat and
TestFSMSnapshotRestoreSmallLegacy.
---
Nitpick comments:
In `@docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md`:
- Around line 77-84: Update the ambiguous phrasing around the 8a/6E sequencing:
change the sentence in the "Stage 6E" paragraph that currently reads "8a does
NOT require 6E to ship first" to explicitly state that 8a can be merged/compiled
before 6E but must be deployed before 6E is enabled; reference the same concepts
(8a, 6E, the v2 writer, and raft_envelope_cutover_index) and replace the line
with: "8a does not require 6E to be merged first — the v2 writer can compile and
run with the sidecar field defined but always-zero — but 8a MUST be fully
deployed before 6E is enabled in production."
- Around line 173-176: Reword the sentence to clarify that the writer is
stateless and simply reads the sidecar flag each time: replace the phrase
asserting "subsequent snapshots from this load stay on v2" with a statement
explaining that because the sidecar's raft_envelope_cutover_index (set non-zero
by enable-raft-envelope) is not cleared by rotation under current semantics,
each snapshot writer will observe that flag and continue producing v2-layout
snapshots without any writer-side state or special tracking; mention
raft_envelope_cutover_index, enable-raft-envelope, rotation and v2 explicitly to
make the mechanism and responsibilities clear.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4fe302dd-6c51-4206-b67d-ded700380f34
📒 Files selected for processing (1)
docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md
…claude moderate/low Six findings from PR #877 round-1 reviews on the 8a design doc: 1. (coderabbit MAJOR / gemini medium / claude moderate) Short-stream Peek behavior. The §3.2 read-path algorithm didn't specify what happens when the input stream has fewer than 8 bytes; today's TestFSMSnapshotRestoreSmallLegacy exercises this case. Added step 0 to the algorithm: Peek(8) returning fewer than 8 bytes falls through to the headerless-legacy branch with partial bytes left in the bufio.Reader. Added TestReadSnapshotHeader_ShortStreamFallsBackToLegacy covering 0/4/7-byte sub-cases. 2. (coderabbit MAJOR) `len` upper-bound DoS hardening. A malformed snapshot could specify len=0xFFFF and force a 64 KiB allocation per restore. §3.2 step 2 now mandates 16 <= len <= maxSnapshotHeaderPayload (suggested 1024); both bounds return ErrSnapshotHeaderInvalidLength. Added TestReadSnapshotHeader_V2WithLenTooLargeFails. 3. (gemini medium / claude low Gap B) `len >= 16` lower bound. Same typed error path; algorithm explicitly says "len < 16 → error" rather than the implicit panic-on-slice-indexing of the round-0 spec. The existing TestReadSnapshotHeader_V2WithLenMismatchFails was renamed to TestReadSnapshotHeader_V2WithLenTooShortFails for clarity. 4. (gemini medium) cutover==0 routing edge case in §3.4. A naive `raftIdx >= cutover` always true when cutover=0. Now a two-step routing logic with the `if cutover == 0 → all plaintext` precondition guard explicitly first. 5. (claude moderate / §3.3) No-downgrade enforcement was a documentation note, not a specified mechanism. Added a per-process-load latch: once the writer observes a non-zero raft_envelope_cutover_index, it stays on v2 even if a future sidecar read returns 0 (defends against silent data corruption from hypothetical reseed/factory-reset paths). log.Error (not Fatal — snapshot goroutine context) on the non-zero→0 transition. Added TestWriteSnapshotHeader_NoDowngradeAfterCutoverSeen. 6. (claude low §1) Cross-reference correction. The round-0 doc said WAL coverage is parent §4.5, but parent §4.5 is "Distribution catalog and HLC ceiling entries"; WAL-file encryption is not yet detailed in any §4.x. Rewrote the Out-of-scope bullet to acknowledge the parent label discrepancy. 7. (claude low §3.1) Forward-compat boundary. Added §3.1.1 spelling out that the len-skip hatch is only safe for ADVISORY/OPTIONAL fields; any mandatory future field requires EKVTHLC3, not a v2 payload extension. Architecture and scope slicing unchanged. The core read/write paths, 6E sequencing constraint, and rollout-without-migration property all stand as round-0.
|
Round-1 review feedback addressed in 8583991 (7 findings across coderabbit major / gemini medium / claude moderate/low):
Architecture and scope slicing unchanged. @claude review |
|
Claude finished @bootjp's task in 3m 51s —— View job Round-2 Review: Stage 8a — Snapshot Header v2 DesignTodo list
Verification of the 7 claimed fixesAll 7 are correctly implemented. Details below. Fix 1 — Short-stream Peek / step 0 + test ✅ §3.2 step 0 correctly specifies the fall-through to step 5 when Fix 2 — §3.2 step 2 now requires Fix 3 — Same step 2 now also requires Fix 4 — §3.4 now has the two-step precondition with Fix 5 — No-downgrade enforcement latch + log.Error + test ✅ §3.3 now specifies the per-process-load latch mechanism ("in-memory state on the writer remembers 'I have seen non-zero cutover'"), the log.Error (not Fatal) on the non-zero → zero sidecar transition, and the rationale for not using Fatal (snapshot goroutine context). One observation worth noting for the implementation: the per-process-load latch does not survive restart. The doc correctly acknowledges this ("Process restart re-reads the sidecar") — a legitimate pre-Phase-2 node that never saw a non-zero cutover never engages the latch. This is acceptable, and the scenario where a sidecar legitimately resets to 0 post-Phase-2 is out of scope. Fix 6 — §4.5 cross-reference rewrite ✅ The Out-of-scope WAL bullet now explains the discrepancy explicitly: "the parent doc's Stage 8 table row labels both slices as '… (§4.4, §4.5)', but §4.5 in the parent is actually 'Distribution catalog and HLC ceiling entries'; WAL-file encryption is a separate concern not yet detailed in any §4.x." The old misdirecting Fix 7 — Forward-compat §3.1.1 advisory/mandatory boundary ✅ New §3.1.1 clearly states: Three remaining issuesIssue 1 — §3.1.1 has a forward reference labeled "above" (low) §3.1.1 line 145 says:
§3.1.1 is a subsection of §3.1 and appears before §3.2, where step 4 is defined. This should be "(step 4 below)" or "(see §3.2, step 4)". A reader following the reference "step 4 above" will scroll backward and find nothing. Fix this → Issue 2 — §3.2 preamble (lines 149–154) says:
The phrase "takes ownership of" doesn't make it clear whether The correct API shape is: Issue 3 — §6 rollout: pre-8a binary reading a v2 snapshot (very low, documentation precision) Lines 360–365 describe a pre-8a binary reading a v2 snapshot and say it "would fall into the headerless-legacy branch (no The net result is the same (loud failure, not silent corruption), and the doc's conclusion is correct. This is a precision note, not a behavioral gap. Summary
The spec is substantially complete. New issues 1 and 2 are worth a follow-up edit before the implementation PR starts — issue 2 in particular can prevent a subtle "lost buffered bytes" bug if the implementer reads the preamble too quickly. Issue 1 is a trivial one-word fix. |
… forward-ref + pre-8a precision Three follow-ups from claude round-2 review on PR #877 (round-2 verdict: all 7 round-1 fixes verified correct; these are new low findings): 1. (Low — implementation-critical) §3.2 bufio.Reader ownership was ambiguous. 'takes ownership of' didn't make clear whether ReadSnapshotHeader creates the bufio.Reader internally or accepts one as a parameter. Rewrote the preamble to specify the API signature (caller-owned bufio.Reader passed in) AND the load-bearing invariant: the caller MUST pass the SAME *bufio.Reader to the inner-store restore on ALL branches (v1, v2, headerless), not just headerless. Reason: bufio.Reader may have read more from the underlying io.Reader than it returned (opportunistic buffer fill); inner-store bytes can sit in the buffer between header parse and inner-store read; switching to the original io.Reader after header parsing silently drops those bytes. 2. (Low — trivial) §3.1.1 forward-reference. 'step 4 above' is wrong since §3.1.1 appears before §3.2 where step 4 is defined. Changed to '(see §3.2, step 4)'. 3. (Very low — precision) §6 pre-8a binary reading a v2 snapshot. Added a parenthetical noting that pre-8a code has no 'unknown EKVTHLC* magic' branch, so the EKVTHLC prefix in EKVTHLC2 does NOT trigger a typed error on pre-8a readers — the fail-closed guarantee in §3.2 step 4 is a property of the 8a reader only. Outcome unchanged (loud inner-store failure, not silent corruption); precision note for documentation accuracy. Architecture and core spec unchanged.
|
@claude review |
|
Claude finished @bootjp's task in 3m 58s —— View job Round-3 Review: Stage 8a — Snapshot Header v2 DesignTodo list
Round-2 fix verificationAll three round-2 issues are correctly addressed. Fix 1 — §3.1.1 forward ref ✅ Fix 2 — Fix 3 — §6 pre-8a/v2 precision ✅ Remaining findingsFinding 1 — The out-of-scope section (§1, lines 71–77) correctly explains that the parent's §4.5 is "Distribution catalog and HLC ceiling entries", not WAL coverage. However, the same mislabeling persists in two places that were not updated:
A reader who lands on §7 directly (or skims the metadata block) sees The simplest fix: drop the Summary
The spec is complete. The one remaining open item (R3-1) is a two-word fix — dropping |
|
@claude review |
|
Claude finished @bootjp's task in 2m 37s —— View job Round-3 Review: Stage 8a — Snapshot Header v2 DesignTodo list
Verification of round-2 remaining issuesIssue 1 (round-2) — §3.1.1 "step 4 above" → forward ref ✅ Fixed Line 144–145 now reads:
The forward reference "(step 4 above)" is gone. ✅ Issue 2 (round-2) — Lines 149–170 now include a load-bearing preamble titled "API shape and
This is the right place, phrased correctly, and will prevent the "use original io.Reader for v1/v2" implementation mistake. ✅ Issue 3 (round-2) — §6 pre-8a/v2 scenario precision ✅ Fixed Lines 372–384 now include: "(Note: pre-8a code has no 'unknown This closes the precision gap — the pre-8a path is now accurately described (falls to legacy, passes Fresh independent passOne residual inconsistency found (low) §1 Out-of-Scope correctly explains (lines 71–77) that the parent doc's
However, this fix was not propagated to the two shorthand references that still carry the stale
A reader following either of those references to §4.5 in the parent doc finds "Distribution catalog and HLC ceiling entries" — which would be confusing without first reading §1's footnote. Since §1 already has the correct explanation, both shorthand occurrences should drop the Summary
The spec is substantively complete and correct. All moderate and implementation-critical issues from prior rounds are resolved. The single remaining item is a low-severity label inconsistency between §1's correct footnote and two shorthand references that weren't updated to match. Safe to approve once that is cleaned up (or defer to the implementation PR as a doc touch-up). |
…and for Stage 8b WAL coverage The §1 Out-of-Scope footnote correctly explains the parent doc's stale "(§4.5)" label for 8b — but two shorthand references (header table line 9 and §7 line 400) still carried the bare "(§4.5)" label, sending a reader who follows them to the parent's "Distribution catalog and HLC ceiling entries" section. Drop the §4.5 label and cross-reference §1 instead so the shorthand matches the footnote.
|
Addressed round-3 LOW finding in 0266818:
The §1 Out-of-Scope footnote already correctly explains that the parent doc's @claude review |
|
Claude finished @bootjp's task in 1m 44s —— View job Round-4 Review: Stage 8a — Snapshot Header v2 DesignTodo list
Verification of R3-1R3-1 — Metadata table (line 9):
The stale §7 (line 400–401):
Same treatment — Both shorthand references now match §1's correct explanation. Fresh independent passOne minor observation — not a blocker. §3.2 step 1 description uses Step 1 (line 179–180) reads:
Steps 2 and 3 do use No other issues found. Summary
The design doc is complete and correct. All moderate and implementation-critical issues from prior rounds are resolved; both shorthand §4.5 references are updated to match §1's explanation. The architecture (distinct magic, v2 layout, 6-branch read-path, 6E sequencing, no-downgrade latch) is sound and ready for the implementation PR. |
…SnapshotHeader API + no-downgrade latch (#886) ## Summary Implements Stage 8a — the snapshot header v2 carrier for `raft_envelope_cutover_index`. Lands on top of the merged design doc (`docs/design/2026_05_29_proposed_8a_snapshot_header_v2.md`, #877). No on-disk migration; v1 snapshots restore byte-for-byte under the new build, and only Phase-2 nodes ever emit v2. ### New API surface - `ReadSnapshotHeader(r *bufio.Reader) (ceiling, cutover uint64, err error)` — the 6-branch §3.2 reader. Caller-owned `bufio.Reader` is passed unchanged into the inner store on every branch (the bufio-buffering-loses-bytes failure mode flagged in PR #877 review). - `CutoverSource` interface + `WithCutoverSource(...)` FSM option — writer-side view of the sidecar's cutover index. - `noDowngradeLatch` (process-load) — §3.3's "once non-zero observed → always v2" invariant. Preserves the last non-zero cutover across hypothetical sidecar resets and slogs an Error once on the non-zero→0 transition. - `kvFSM.RestoredCutover()` — test seam for the §3.4 snapshot-to-applier handoff. Stage 6E will consume this via the apply hook. ### Read-path branches (all covered by tests) | Input | Result | |---|---| | v1 magic `EKVTHLC1` | `(ceiling, 0, nil)` | | v2 magic `EKVTHLC2` + valid len | `(ceiling, cutover, nil)`, trailing payload bytes skipped | | `EKVTHLC*` w/ unknown version byte | `(0, 0, ErrSnapshotHeaderUnknownMagic)` — fail-closed | | v2 magic + `len < 16` or `len > 1024` | `(0, 0, ErrSnapshotHeaderInvalidLength)` — fail-closed (DoS bound) | | No `EKVTHLC` prefix | `(0, 0, nil)` — headerless legacy, bytes preserved in `bufio.Reader` | | Stream `< 8` bytes | `(0, 0, nil)` — short-stream fallback, bytes preserved | ### Verification - 11 new tests in `kv/snapshot_test.go` per design §5. - 4 pre-8a regression tests (TestSnapshot, TestFSMSnapshotPreservesCeiling, TestFSMSnapshotRestoreOldFormat, TestFSMSnapshotRestoreSmallLegacy) stay green. - `go test -race ./kv/...` clean. - `golangci-lint --config=.golangci.yaml run ./kv/...` 0 issues. ### Self-review (5-lens) 1. **Data loss** — v1 + headerless restore byte-for-byte (TestWriteSnapshotHeader_PreCutoverWritesV1 pins the v1 bytewise output; TestFSMSnapshotRestoreOldFormat / TestFSMSnapshotRestoreSmallLegacy stay green). 2. **Concurrency / distributed failures** — `noDowngradeLatch` uses `atomic.Uint64`; v1/v2 selection captured at Snapshot() time so concurrent sidecar mutation cannot flip the format mid-write. 3. **Performance** — header is a stack-allocated `[26]byte`; v2 payload alloc bounded by `maxSnapshotHeaderPayload = 1024`. Reader uses `bufio.Reader.Discard` / `io.ReadFull`; no per-byte copies. 4. **Data consistency** — no-downgrade latch enforces §3.3; cutover plumbed through `kvFSM.restoredCutover` for Stage 6E. `cutover == 0` from a v2 snapshot routes correctly via Stage 6E's precondition guard. 5. **Test coverage** — every read-path branch + write-path selector + latch + restore-side handoff has a dedicated test. Bytewise pin on the v1 output guards against any future regression. ## Test plan - [x] `go test -race ./kv/...` clean - [x] `golangci-lint --config=.golangci.yaml run ./kv/...` clean - [x] All 11 new tests + 4 legacy regression tests pass - [ ] @claude review for the read-path branch correctness, the no-downgrade latch semantics, and the §3.4 cutover-handoff API

Summary
Slices the parent design's Stage 8 — 'Snapshot header v2 + WAL coverage' — into 8a (header v2) and 8b (WAL coverage). 8a is the lower-risk slice: a versioned snapshot header that carries the
raft_envelope_cutover_indexalongside the existing HLC ceiling.Why
Once Stage 6E lands the
enable-raft-envelopecutover, every Phase-2-aware snapshot needs to carry the cutover index so a node restoring from snapshot can route subsequent Raft entries through the correct wrap/unwrap path. Without 8a, the cutover would have to be side-channeled out-of-band through the sidecar — forking the source of truth for FSM-level decisions away from the snapshot it was applied against.Architecture (settled in parent §4.4)
Distinct 8-byte magic per version (
EKVTHLC1/EKVTHLC2). The rejected alternative (heuristic on shared magic) misclassifies early-epoch HLC ceilings as v2.Properties preserved
TestFSMSnapshotRestoreOldFormat/TestFSMSnapshotRestoreSmallLegacyregression tests) still restore via the leave-bytes-in-stream fallback.Sequencing constraint
8a MUST ship before Stage 6E — 6E populates the sidecar's
raft_envelope_cutover_index; 8a's v2 writer is the carrier for that value across snapshot restore.7-test verification plan covers
EKVTHLC*, headerless.Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit