Skip to content

feat(encryption): storage envelope wiring (Stage 2)#742

Open
bootjp wants to merge 14 commits intofeat/encryption-sidecarfrom
feat/encryption-storage
Open

feat(encryption): storage envelope wiring (Stage 2)#742
bootjp wants to merge 14 commits intofeat/encryption-sidecarfrom
feat/encryption-storage

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 6, 2026

Summary

Stage 2 of the data-at-rest encryption rollout (design doc: docs/design/2026_04_29_proposed_data_at_rest_encryption.md, §4.1). Wires the AES-256-GCM storage envelope into the pebble-backed MVCCStore. Encryption is opt-in via the new WithEncryption option, so every existing NewPebbleStore caller is untouched.

  • Value header byte 0 now packs tombstone | encryption_state(2 bits) | reserved(5 bits) per design §7.1; reserved encryption_state values fail closed with ErrEncryptedValueReservedState.
  • AAD record-binding (HeaderAADBytes(...) ‖ encoded_pebble_key) defeats §4.1 case-2 cut-and-paste and case-3 MVCC version substitution; tag mismatches surface as typed ErrEncryptedReadIntegrity.
  • NonceFactory interface is the seam Stage 7 (writer registry) replaces; today's CounterNonceFactory test helper produces the design's deterministic 12-byte (node_id ‖ local_epoch ‖ write_count) shape.
  • MVCC-deconstructed snapshot format keeps cleartext-only round-trip (a TODO points at Stage 8 which bumps the wire to v3); native Pebble snapshots ship raw bytes and are encryption-transparent.

Out of scope (deferred)

  • Stage 3: raft proposal envelope + engine pre-apply hook
  • Stage 5/6: EncryptionAdmin gRPC + 3-phase cluster rollout flags
  • Stage 7: writer registry (replaces CounterNonceFactory)
  • Stage 8: MVCC snapshot v3 to round-trip encrypted entries
  • Stage 9: KMS, Snappy compression, rotation/retire/rewrite, full Jepsen

Test plan

  • go test -race ./store/... round-trip + tamper + AAD record-binding + property test (rapid)
  • Existing store suite green on the cipher=nil legacy path
  • golangci-lint clean
  • go test -race ./adapter/... (CI verifies; local 5-minute timeout flake observed independently of these changes — adapter does not consume the encryption wiring)
  • make test full sweep (CI)

Self-review (CLAUDE.md 5 passes)

  1. Data loss — tombstone bit isolated from encryption_state; reserved encState rejected fail-closed; cipher=nil legacy data still readable; per-value encryption_state is the dispatch key on read so a mid-cutover cluster stays coherent.
  2. ConcurrencyCipher is already lock-free over atomic.Pointer[map]; pebbleStore adds three immutable fields set at construction; NonceFactory contract documents caller responsibility for (node_id, local_epoch, write_count) uniqueness.
  3. Performance — cipher=nil fast path is one nil branch in encryptForKey; encrypted path is one GCM seal + 34 bytes overhead; rapid property test has 100+ encrypted PutAt/GetAt round-trips and stays well under the existing test wall budget.
  4. Consistency — AAD binds (envelope_version, flag, key_id, encoded_pebble_key); cut-and-paste, version substitution, and tag tampering all rejected; encrypted reads never silent-zero.
  5. Test coverage — round-trip (4 sizes), ApplyMutations, tombstone independence, the §7.1 active-key window, AAD binding, tag tamper, reserved encState, header packing pin, rapid property; all under -race.

Wire the §4.1 storage envelope into the pebble-backed MVCCStore.
Every value handed to PutAt / ApplyMutations / ExpireAt is wrapped
in an authenticated AES-256-GCM envelope when a DEK is active for
the storage purpose; reads on the way out unwrap. Cleartext
behaviour stays the default — the cipher is opt-in via
WithEncryption, so every existing NewPebbleStore caller is
untouched.

Value header layout. The 9-byte fixed header now carries an
encryption_state field per design §7.1:

  byte 0: bit 0 tombstone | bits 1-2 encryption_state | bits 3-7 reserved
  bytes 1-8: ExpireAt (uint64 LE)
  bytes 9..: body, plaintext when encState=0b00, §4.1 envelope
             bytes when encState=0b01

Reserved encryption_state values (0b10, 0b11) are rejected with
the new ErrEncryptedValueReservedState sentinel so an old binary
cannot silently treat a future-format encrypted entry as cleartext.
Reserved bits in the header (3-7) are also fail-closed.

Wiring. WithEncryption takes a Cipher, a NonceFactory, and an
ActiveStorageKeyID closure. The closure-based active-key resolver
lets Stage 5/6 swap in a sidecar-backed source without re-touching
the storage layer. The NonceFactory interface is the seam Stage 7
will replace with the writer-registry-backed implementation;
todays CounterNonceFactory test helper produces the same 12-byte
node_id-local_epoch-write_count shape and satisfies the contract
for tests.

AAD record-binding. AAD is HeaderAADBytes(version, flag, key_id)
concat encoded_pebble_key, so a disk attacker who copies a valid
ciphertext into a different (key, ts) slot fails GCM verification.
The §4.1 case-2/3 cut-and-paste / version-substitution attacks
are all handled by AES-GCM authentication, never silently zeroed.
ErrEncryptedReadIntegrity surfaces tag mismatches as a typed read
error.

Snapshot path. Native Pebble snapshots (snapshot_pebble.go) ship
raw bytes and are encryption-transparent. The MVCC-deconstructed
snapshot format does not carry encryption_state in its v2 wire
(setEncodedVersionInBatch writes encStateCleartext, with a comment
pointing at Stage 8 which bumps the format to v3 to round-trip
encrypted entries). This intentionally limits Stage 2 to the
on-line Put/Get path; ranged restore from an encrypted dataset is
covered when Stage 8 lands.

Tests. Round-trip across PutAt/GetAt/ApplyMutations, tombstone
independence, the §7.1 cipher-wired-but-DEK-inactive window,
AAD record-binding (cut-and-paste from key A's slot into key B's
fails as ErrEncryptedReadIntegrity), 1-byte tag flip, reserved
encState rejection, header-encoding pin, and a rapid property
covering many random (key, value, ts) draws. The existing store
suite stays green on the cipher=nil legacy path.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f4fa67c-0c0d-4702-a5dc-195b37097a2d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/encryption-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements transparent value-level encryption for the Pebble-backed store, introducing an authenticated envelope that binds ciphertext to storage keys via AAD. It adds the necessary glue logic for encryption/decryption, a nonce factory for unique AES-GCM nonces, and updates the value encoding format to support encryption state flags. Review feedback suggests optimizing the hot paths by pre-allocating AAD buffers in both the encryption and decryption functions to reduce memory reallocations.

I am having trouble creating individual review comments. Click here to see my feedback.

store/encryption_glue.go (89-91)

medium

The AAD buffer is allocated twice here: once inside AppendHeaderAADBytes(nil, ...) and again when appending pebbleKey. Pre-allocating the buffer with the correct capacity avoids unnecessary reallocation on the hot path. Ensure that the total size is bounded to prevent potential OOM issues from external inputs.

aad := make([]byte, 0, encryption.HeaderAADSize+len(pebbleKey))
aad = encryption.AppendHeaderAADBytes(aad, encryption.EnvelopeVersionV1, envelopeFlag, keyID)
aad = append(aad, pebbleKey...)
References
  1. Pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation.
  2. Apply a fixed bound to collections that can grow from external requests to prevent unbounded memory growth and OOM issues.

store/encryption_glue.go (131-132)

medium

Similar to the encryption path, the AAD buffer here is allocated twice. Pre-allocating with the known capacity improves performance on the read path. Ensure the allocation size is bounded to avoid memory exhaustion.

aad := make([]byte, 0, encryption.HeaderAADSize+len(pebbleKey))
aad = encryption.AppendHeaderAADBytes(aad, env.Version, env.Flag, env.KeyID)
aad = append(aad, pebbleKey...)
References
  1. Pre-allocating a buffer is acceptable if a strictly enforced maximum size limit is checked before allocation.
  2. Apply a fixed bound to collections that can grow from external requests to prevent unbounded memory growth and OOM issues.

CI gci default sections expect bootjp imports mixed with externals
in the second group; the original commit accidentally landed the
3-group format (stdlib / external / bootjp) which fails the
golangci-lint v2.9.0 gci check used in CI but passed locally
because of a stale --fix cache. gci diff confirms the 2-group
format is the canonical layout for this config.
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

aad := encryption.AppendHeaderAADBytes(nil,
encryption.EnvelopeVersionV1, envelopeFlag, keyID)
aad = append(aad, pebbleKey...)
ciphertextAndTag, err := s.cipher.Encrypt(plaintext, aad, keyID, nonce)

P1 Badge Bind value-header bytes into AEAD AAD

The AAD only covers the envelope header and Pebble key, but not the outer value header (flags + ExpireAt), so a disk attacker can flip encState, tombstone, or TTL bits without invalidating the GCM tag. In particular, changing encState from encrypted to cleartext makes reads return raw envelope bytes via the cleartext branch instead of raising ErrEncryptedReadIntegrity, which is a silent integrity failure rather than fail-closed behavior.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…emini medium)

AppendHeaderAADBytes(nil, ...) allocates HeaderAADSize bytes; the
following append(aad, pebbleKey...) then re-allocates whenever the
returned slice's capacity is below header + key. Two allocations
per Put, two per Get.

Pre-size to HeaderAADSize+len(pebbleKey) so neither append
reallocates. Pebble key length is bounded by maxPebbleEncodedKeySize
so the capacity is finite. Symmetric on encrypt/decrypt because the
AAD must be byte-identical between the two paths or GCM
verification fails.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-2: gemini medium x2 を反映しました (commit 7ac2445)。

  • encryptForKey / decryptForKey 双方で AAD を make([]byte, 0, encryption.HeaderAADSize+len(pebbleKey)) で pre-size。AppendHeaderAADBytes 内 alloc + append(aad, pebbleKey...) re-alloc の二重アロケーションを単一バッファに集約。
  • bound: pebbleKey は maxPebbleEncodedKeySize で上限あり、unbounded growth リスクなし。
  • symmetric: encrypt/decrypt 両側で同じバッファ構築方法を取らないと GCM tag 検証が一致しないため、両側とも対象的に修正。
  • store test (encryption + property) green、lint 0 issues。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 5m 58s —— View job


PR742 Round-2 Review

  • Read changed files and gather context
  • Review AAD pre-sizing fix (commit 7ac2445)
  • Assess Codex P1: value-header bytes not bound into AAD
  • Review overall encryption correctness and security
  • Post findings

AAD pre-sizing fix (7ac2445) — Correct ✅

Both encryptForKey and decryptForKey now pre-allocate the AAD buffer with make([]byte, 0, encryption.HeaderAADSize+len(pebbleKey)) and fill it via AppendHeaderAADBytes + a single append. There is no double-allocation anymore. The capacity bound is sound: pebbleKey is the encoded Pebble key (user_key ‖ inverted_ts) whose length is bounded by maxPebbleEncodedKeySize. The two paths are byte-symmetric — required since GCM tag verification would fail otherwise.

Gemini's two medium findings are fully addressed.


Codex P1 — Value-header bytes not in AAD

The Codex concern is real, but it is a deliberate design trade-off that needs to be surfaced explicitly.

What the design doc says. §2.2 of docs/design/2026_04_29_proposed_data_at_rest_encryption.md explicitly accepts this:

TTL and tombstone bits. The 9-byte value header ([tombstone+enc_state(1)] [expireAt(8)]) is intentionally cleartext so the GC sweep and TTL expiry can run without decrypting every value.

The outer header byte (tombstone + encState bits) and ExpireAt are cleartext by design so runCompactionGC and isVisibleLiveKey can inspect them without a DEK.

The specific gap Codex names. Flipping encState on disk from 0b010b00 makes decryptForKey take the fast-path return and hand the raw GCM envelope bytes back to the caller as the user value — no error, no integrity signal, just 34+ bytes of opaque ciphertext silently surfaced as "plaintext." This is different from the tombstone/expiry flip cases (those produce ErrKeyNotFound / key-not-found semantics), and is not explicitly called out in the design doc.

Is it a blocker? Not by itself — the design doc acceptance of the cleartext header covers this flip path too. But there are two paths forward worth choosing between:

Option A (preferred): Bind the flags byte into the AAD. The GC operations only need tombstone and expireAt — they do NOT need encState to be cleartext. Including byte 0 of the value header in the AAD closes the encState-flip silent-corruption without breaking the GC-without-decrypt invariant:

// encrypt side — encState is always encStateEncrypted here (we only
// call encryptForKey when the cipher is active), tombstone is always
// 0 for live writes:
const encryptedLiveFlags byte = encStateEncrypted << encStateShift // 0x02
aad = append(aad, encryptedLiveFlags)

// decrypt side — read the flags byte we stored:
aad = append(aad, flags) // flags = data[0] as parsed by decodeValue

If an attacker flips the flags byte after writing, the AAD at decrypt time (data[0]) differs from the AAD at encrypt time (0x02), so GCM tag verification fails → ErrEncryptedReadIntegrity. This would require a design-doc amendment (§4.1 AAD definition changes) and is therefore a follow-up commit rather than in-flight work. Fix this →

Option B (current + doc): Document the behavior. If including the flags byte in AAD is deferred (format change, design-doc amendment needed), add a // INVARIANT comment to decryptForKey explicitly stating that encState-flip causes silent-garbage return per design §2.2, and add a test that asserts the current behavior so any future change to that behavior is visible:

// TestEncryption_EncStateFlagFlip documents the §2.2 design
// trade-off: the flags byte is not in the AAD, so flipping encState
// 0b01→0b00 makes the read return raw envelope bytes rather than
// ErrEncryptedReadIntegrity. This is accepted per §2.2.

Option A is worth a one-line design-doc amendment and a follow-up commit. Option B unblocks merge now.


Other observations

isVisibleLiveKey (lsm_store.go:1362) — Correct. Calls decodeValue but never reads sv.Value — only checks tombstone/expiry. No decryption needed, no issue.

MVCC snapshot restore — Correct. setEncodedVersionInBatch explicitly passes encStateCleartext with a Stage 8 TODO. The native Pebble snapshot path (snapshot_pebble.go) copies raw bytes verbatim (encryption-transparent). Both paths are correctly handled.

CounterNonceFactory file placement — Minor nit. encryption_test_helpers.go is not a _test.go file, so CounterNonceFactory and NewCounterNonceFactory are exported from the store package into any importer. The comment explains this is intentional for cross-package test use, but the filename implies test-only. Consider renaming to encryption_nonce_counter.go with a prominent "test-only / not safe for production" doc comment, so go doc store doesn't present it as a production API.

Prop test coverage — Good. TestEncryption_Property_PutGet in lsm_store_encryption_prop_test.go exercises random (key, value, ts) triples including empty values and full uint64 timestamp range. The comment correctly notes tamper-rejection is deterministic in the unit tests. nonEmptyBytes is shared from mvcc_store_prop_test.go within the same package store test binary — no issue.

Reserved encState detection — Correct. The dual-check in decodeValue (bits 3–7 via encStateReservedMask, then bits 1–2 for 0b10/0b11) closes both reserved paths and is validated by TestEncryption_ReservedEncStateRejected.


Summary

Status
AAD pre-sizing (Gemini medium ×2) ✅ Addressed in 7ac2445
Codex P1: encState-flip → silent garbage ⚠️ Design-accepted per §2.2; silent-garbage-vs-error distinction not documented. Recommend Option A (include flags byte in AAD, design-doc amendment) as a follow-up, or Option B (add // INVARIANT comment + behaviour-pinning test) before merge.
isVisibleLiveKey, GC/TTL paths ✅ Correct (no decrypt needed)
Snapshot paths (MVCC v2 + native Pebble) ✅ Correct
Reserved encState trip-wire ✅ Correct
CounterNonceFactory export Minor — consider filename clarification

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ac2445e46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/lsm_store.go
Comment on lines +715 to +719
if sv.Tombstone {
return nil, ErrKeyNotFound
}
if sv.ExpireAt != 0 && sv.ExpireAt <= ts {
return nil, ErrKeyNotFound
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Authenticate header bits before serving not-found for encrypted rows

This path returns ErrKeyNotFound based on Tombstone/ExpireAt before any AEAD verification, but those fields come from the outer value header and are not covered by decryptForKey's AAD/body check. A disk attacker can flip bit 0 or lower ExpireAt on an encrypted record and force silent not-found/expired reads without triggering ErrEncryptedReadIntegrity, which defeats the storage-envelope tamper-detection goal for at-rest attacks.

Useful? React with 👍 / 👎.

The original AAD covered only (envelope_version, flag, key_id) and
the encoded Pebble key. The on-disk value-header (tombstone bit,
encryption_state, expireAt) was not authenticated, so a disk
attacker could:

  - flip the tombstone bit on an encrypted record so getAt /
    processFoundValue returns ErrKeyNotFound BEFORE any AEAD
    verification runs, or
  - lower expireAt below the read timestamp so the same paths
    return ErrKeyNotFound via the expired-skip branch.

In both cases the encryption-integrity error never surfaces and
the storage-envelope tamper-detection goal is silently defeated
for at-rest attacks.

Fix:

  - Add writeValueHeaderBytes helper that emits the 9-byte header
    (flags + expireAt) without a body slice.
  - encryptForKey now takes expireAt and includes the on-the-wire
    header bytes in the AAD; tombstone is hard-coded false because
    encrypt is never invoked on tombstone writes (deletes carry no
    plaintext and stay cleartext).
  - decryptForKey now takes the storedValue, reproduces the same
    9-byte header from sv.Tombstone / sv.ExpireAt / sv.EncState,
    and includes it in the AAD. A flip on disk no longer matches
    the encrypt-time AAD, so GCM fails and
    ErrEncryptedReadIntegrity surfaces.
  - readVisibleVersion / processFoundValue now decrypt FIRST and
    run the tombstone / expireAt visibility checks AFTER. Pre-
    decrypt header values are not yet authenticated, so the old
    early-return paths were the bypass.
  - buildStorageAAD centralises the AAD layout
    (header_aad + value_header + pebble_key) and pre-allocates the
    buffer so neither append re-allocates (preserves PR742 round-1
    gemini perf fix).

Caller audit (semantic change: pre-tombstone-check now decrypts):

  - encryptForKey callers: PutAt, ExpireAt, applyMutationsBatch
    OpTypePut. All three now thread expireAt to the helper. Tombstone
    writes (DeleteAt, applyMutationsBatch OpTypeDelete) bypass the
    encrypt path entirely.
  - decryptForKey callers: readVisibleVersion (getAt path),
    processFoundValue (scan path). Both reorder visibility checks
    to run after decrypt. Cleartext entries (encState=0b00)
    short-circuit decryptForKey at the first branch and keep the
    legacy unauthenticated tombstone/expireAt — the §7.1 rollout
    closes this for fully-encrypted clusters.

New regression tests cover three on-disk flips that the original
AAD permitted: tombstone-bit set, expireAt lowered to past,
expireAt advanced. All surface ErrEncryptedReadIntegrity.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-3: codex P1 Authenticate header bits before serving not-found for encrypted rows を反映しました (commit 624016d)。

攻撃面: 元 AAD は (envelope_version, flag, key_id, pebble_key) のみで、値ヘッダ (tombstone bit, encryption_state, expireAt) は未認証。ディスク攻撃者が tombstone bit を flip すれば silent ErrKeyNotFound、expireAt を lower すれば silent expired-skip を引き起こせ、ErrEncryptedReadIntegrity を経由しない bypass。

修正:

  • writeValueHeaderBytes helper で 9-byte header を body 無しで emit。
  • encryptForKey に expireAt 引数追加、9-byte header を AAD に bind。tombstone は false 固定 (encrypt 経路は plaintext 持ち入力のみ、Delete は cleartext-tombstone)。
  • decryptForKeystoredValue を受け取り、on-disk header bytes を AAD に同様に bind。ディスク上の flip は encrypt 時 AAD と不一致 → GCM 失敗 → ErrEncryptedReadIntegrity。
  • readVisibleVersion / processFoundValue: decrypt を tombstone/expireAt 検査より先に実行。pre-decrypt 値は未認証のため、AAD 検証成功後にのみ visibility 判断。
  • buildStorageAAD で AAD layout (header_aad + value_header + pebble_key) を集約、PR742 round-1 の pre-allocation 最適化を維持。

caller audit (意味論変更: pre-tombstone-check で decrypt):

  • encryptForKey: PutAt / ExpireAt / applyMutationsBatch OpTypePut の3箇所、全て expireAt を thread。Delete 経路は encrypt を bypass (tombstone は cleartext)。
  • decryptForKey: readVisibleVersion / processFoundValue の2箇所、visibility 検査を decrypt 後に reorder。
  • cleartext entry (encState=0b00) は decryptForKey 先頭で short-circuit、tombstone/expireAt は依然非認証 (Stage 7+ で全 encrypted 化により closed)。

新規テスト (TestEncryption_ValueHeaderTamperRejected): 3 ケース (tombstone-bit set / expireAt lower / expireAt advance)、全て ErrEncryptedReadIntegrity を確認。
make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 2m 57s —— View job


PR742 Round-3 Review

  • Read trigger comment and understand request
  • Read changed files (encryption_glue.go, lsm_store.go, tests)
  • Review buildStorageAAD / writeValueHeaderBytes correctness
  • Verify decrypt-before-visibility-check reorder in readVisibleVersion / processFoundValue
  • Verify encryptForKey caller audit (expireAt threading)
  • Verify cleartext short-circuit path still correct
  • Review new test TestEncryption_ValueHeaderTamperRejected
  • Check for regressions in existing test coverage
  • Post findings

Round-3 changes — what was fixed

The core approach is correct and the two main read paths are now properly protected.

buildStorageAAD (encryption_glue.go:177) — AAD layout is now envelope_version ‖ flag ‖ key_id ‖ value_header(9B) ‖ pebble_key. Single allocation maintained from round-2. The value_header slice passed in is always [valueHeaderSize]byte (9 bytes), so the capacity calculation encryption.HeaderAADSize + len(header) + len(pebbleKey) is exact and no re-allocation occurs. ✅

encryptForKey (encryption_glue.go:89) — Builds the header AAD with tombstone=false, expireAt=caller-supplied, encState=encStateEncrypted. This is what will be written to disk, so it exactly matches the bytes that decryptForKey will reproduce from the on-disk storedValue. ✅

decryptForKey (encryption_glue.go:147) — Reproduces the AAD from sv.Tombstone, sv.ExpireAt, sv.EncState (all read from the on-disk header). If any field was flipped by a disk attacker, AAD at decrypt time ≠ AAD at encrypt time → GCM tag mismatch → ErrEncryptedReadIntegrity. ✅

readVisibleVersion (lsm_store.go:724)decryptForKey now runs before the tombstone/expireAt visibility checks. A flipped tombstone or lowered expireAt causes GCM failure before any branching. ✅

processFoundValue (lsm_store.go:768) — Same reorder applied. plain is returned instead of sv.Value. ✅

TestEncryption_ValueHeaderTamperRejected — Covers three cases (tombstone flip, expireAt lower, expireAt advance) over the GetAt / readVisibleVersion path. All assert ErrEncryptedReadIntegrity. ✅


P2: isVisibleLiveKey has the same unauthenticated-header gap (not covered by round-3)

isVisibleLiveKey (lsm_store.go:1392) is the only remaining decodeValue caller that inspects visibility fields without first running decryptForKey:

func (s *pebbleStore) isVisibleLiveKey(iter *pebble.Iterator, userKey []byte, version, commitTS uint64) (bool, error) {
    if !s.seekToVisibleVersion(iter, userKey, version, commitTS) {
        return false, nil
    }
    sv, err := decodeValue(iter.Value())          // ← sv.Tombstone / sv.ExpireAt unauthenticated
    if err != nil {
        return false, errors.WithStack(err)
    }
    if sv.Tombstone || (sv.ExpireAt != 0 && sv.ExpireAt <= commitTS) {
        return false, nil                          // ← branched on before any GCM check
    }
    return true, nil
}

This is called by scanDeletePrefix (lsm_store.go:1356), which writes deletion tombstones during prefix-delete operations. An attacker who flips the tombstone bit (or lowers expireAt) on an encrypted record causes isVisibleLiveKey to return false, and scanDeletePrefix silently skips writing the deletion tombstone. The key survives the prefix delete.

This is a weaker attack than the P1 reads (no plaintext is revealed — a subsequent GetAt will still fail with ErrEncryptedReadIntegrity because the read path is now protected), but it is an integrity bypass: a delete-prefix operation can be made to silently not delete an encrypted key.

The trigger comment's "caller audit" lists readVisibleVersion / processFoundValue as the two callers that needed reordering, but does not mention isVisibleLiveKey. The round-3 test covers only the GetAt path.

Suggested fix (one-line extension within isVisibleLiveKey):

func (s *pebbleStore) isVisibleLiveKey(iter *pebble.Iterator, userKey []byte, version, commitTS uint64) (bool, error) {
    if !s.seekToVisibleVersion(iter, userKey, version, commitTS) {
        return false, nil
    }
    sv, err := decodeValue(iter.Value())
    if err != nil {
        return false, errors.WithStack(err)
    }
    // For encrypted entries, authenticate the header before branching on
    // tombstone/expireAt — the same invariant enforced in readVisibleVersion
    // and processFoundValue (PR742 round-3).
    if sv.EncState == encStateEncrypted {
        if _, err := s.decryptForKey(iter.Key(), sv, sv.Value); err != nil {
            return false, err
        }
    }
    if sv.Tombstone || (sv.ExpireAt != 0 && sv.ExpireAt <= commitTS) {
        return false, nil
    }
    return true, nil
}

And a corresponding TestEncryption_ValueHeaderTamperRejected_DeletePrefix case (or a new sub-case in the existing table) that calls DeletePrefixAt over a prefix containing a tampered encrypted key and asserts ErrEncryptedReadIntegrity propagates rather than a silent no-op tombstone.

Fix this →


Minor: encStateCleartext flip in decryptForKey short-circuit (design-accepted, unchanged from round-2)

Flipping encState from 0b010b00 on disk still makes decryptForKey short-circuit and return the raw envelope bytes as plaintext (the if sv.EncState == encStateCleartext early return). The round-3 fix does NOT address this because sv.EncState is a separate field from tombstone/expireAt and the design-doc §2.2 acceptance from round-2 still applies. This is documented in the round-2 review and is an Option A / Option B choice for a future amendment. No new finding here.


Summary

Finding Status
buildStorageAAD / writeValueHeaderBytes correctness ✅ Correct
encryptForKey header binding (tombstone=false, expireAt, encState) ✅ Correct
decryptForKey AAD reproduction from on-disk fields ✅ Correct
readVisibleVersion decrypt-before-visibility ✅ Fixed in round-3
processFoundValue decrypt-before-visibility ✅ Fixed in round-3
TestEncryption_ValueHeaderTamperRejected (GetAt path) ✅ Correct
isVisibleLiveKey in scanDeletePrefix — unauthenticated header ⚠️ P2: Not covered by round-3. Same class of tamper bypass, narrower attack vector (delete-path, not read-path).
encStateCleartext flip silent-garbage return ⚠️ Design-accepted per §2.2 (unchanged from round-2, not a regression)

The read path is now fully protected. isVisibleLiveKey is the remaining gap worth a follow-up commit before merge.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 624016d52e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/encryption_glue.go
Comment on lines +142 to +143
if sv.EncState == encStateCleartext {
return body, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject encrypted rows relabeled as cleartext

When decryptForKey sees encStateCleartext, it returns the body without any envelope/tag verification. A disk attacker (or bit-flip corruption) can change an encrypted row’s header bits from 0b01 to 0b00, causing reads to return raw envelope bytes as if they were legitimate plaintext instead of surfacing ErrEncryptedReadIntegrity. This is a fail-open integrity bypass for encrypted-at-rest data because the authentication check is skipped entirely once the header is relabeled.

Useful? React with 👍 / 👎.

The cleartext branch of decryptForKey returned the body verbatim
without any envelope verification. A disk attacker who flipped
encryption_state from 0b01 to 0b00 left the original envelope bytes
in place and told the read path to skip decryption — silently
surfacing raw envelope bytes to the caller as if they were
legitimate plaintext. This is a fail-open integrity bypass for
encrypted-at-rest data because the AEAD tag check is skipped
entirely once the header is relabeled.

Fix:

  - Add Cipher.HasKey accessor in the encryption package, returning
    false on nil/zero-value receivers (mirrors the existing
    zero-value contract) and delegating to Keystore.Has otherwise.
  - On the cleartext branch in decryptForKey, run the body through
    DecodeEnvelope. If it parses as a well-formed envelope AND the
    embedded key_id is loaded in the live keystore (i.e. the bytes
    are almost certainly a relabeled real envelope rather than a
    coincidental shape match), surface ErrEncryptedReadIntegrity
    instead of returning the bytes.
  - Bodies that don't parse, or whose parsed key_id is unknown,
    keep the legitimate-cleartext fast path so a Stage-2-style
    cutover (legacy cleartext entries + new encrypted entries
    coexisting) stays readable.

False-positive bound: a benign cleartext payload would have to
start with EnvelopeVersionV1 (0x01), be at least EnvelopeOverhead
(34) bytes, parse cleanly through DecodeEnvelope, AND embed a 32-bit
key_id collision with a loaded DEK. Joint probability is small
enough to treat the heuristic as production-safe; Stage 8's
authenticated MVCC metadata bit will replace it with a deterministic
authentication.

Caller audit: rejectRebadgedEnvelope is called only by
decryptForKey on the cleartext branch. The cipher=nil short-circuit
preserves the legacy single-mode deployment path. The existing
TestEncryption_InactiveKeyWritesCleartext test exercises legitimate
cleartext+encrypted coexistence and continues to pass — its
cleartext payloads don't parse as envelopes.

New regression test TestEncryption_RebadgeAttackRejected uses a
distinguishable writeTS (314159 vs the other tamper tests' 100) to
also satisfy the unparam linter on tamperPebbleValue's ts arg.

New foundation API: TestCipher_HasKey covers the loaded id, an
unknown id, and the nil-receiver / zero-value paths.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-4: codex P1 Reject encrypted rows relabeled as cleartext を反映しました (commit eb7edd4)。

攻撃面: encryption_state を 0b01→0b00 に flip すると envelope bytes が cleartext として返り、AEAD 検証を完全 bypass、生 ciphertext+tag+nonce が caller に届く fail-open。

修正:

  • Cipher.HasKey(keyID) アクセサを encryption package に追加 (nil/zero-value 安全、既存 zero-value contract と対称)。
  • decryptForKey の cleartext 分岐で rejectRebadgedEnvelope ガードを実行。body が DecodeEnvelope で well-formed かつ embedded key_id が live keystore に存在する場合 (ほぼ確実に relabeled された真の envelope)、ErrEncryptedReadIntegrity で fail closed。
  • parse 失敗 / key_id 未ロード時は legitimate cleartext fast-path を維持 → Stage 2 の混在 (cleartext + encrypted) 互換性確保 (TestEncryption_InactiveKeyWritesCleartext が回帰テスト)。

false-positive bound: 偶然の cleartext payload が EnvelopeVersionV1 で始まり / EnvelopeOverhead 以上 / DecodeEnvelope を通過 / 32-bit key_id collision まで揃う必要があり、結合確率は低い。Stage 8 の MVCC metadata 認証 bit で deterministic 化される。

caller audit:

  • rejectRebadgedEnvelopedecryptForKey の cleartext 分岐からのみ。
  • cipher=nil short-circuit は legacy single-mode deployment 経路を保持。
  • Cipher.HasKey: nil/zero-value 受信器で false (Encrypt/Decrypt の zero-value contract と対称)。

新規テスト:

  • TestCipher_HasKey (4 ケース: 既知 id / 未知 id / nil receiver / zero-value)。
  • TestEncryption_RebadgeAttackRejected (encState flip 0b01→0b00 → ErrEncryptedReadIntegrity)。distinguishable writeTS (314159) で tamperPebbleValue の unparam lint も解消。

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 3m 5s —— View job


PR742 Round-4 Review

  • Read trigger comment and understand request
  • Read round-4 diff (commit eb7edd4)
  • Review Cipher.HasKey implementation and tests
  • Review rejectRebadgedEnvelope logic and false-positive analysis
  • Verify decryptForKey cleartext branch correctness
  • Check if round-3 P2 (isVisibleLiveKey) was addressed
  • Review new tests (TestCipher_HasKey, TestEncryption_RebadgeAttackRejected)
  • Post findings

Codex P1 rebadge guard — Correct ✅

The encState 0b01→0b00 flip is now closed by rejectRebadgedEnvelope. When an attacker flips the encryption_state header bit on disk, decryptForKey enters the encStateCleartext branch and calls rejectRebadgedEnvelope(body). The GCM-encrypted envelope parses successfully via DecodeEnvelope, env.KeyID is a loaded DEK, and HasKey returns true → ErrEncryptedReadIntegrity. The attack is rejected fail-closed.

This also closes the round-2 "design-accepted" concern (Option A from round-2 review). That finding was previously marked as design-accepted per §2.2 because including flags in the AAD required a design-doc amendment; the rebadge guard is an alternative mechanical fix that covers the same flip without modifying the AAD layout. Both paths are now blocked.


Cipher.HasKey (cipher.go:108) — Correct ✅

Nil-receiver and zero-value Cipher both short-circuit at c == nil || c.keystore == nil. This is symmetric with the aeadFor boundary in Encrypt/Decrypt. Keystore.Set rejects ReservedKeyID=0, so HasKey(0) always returns false in a properly provisioned store — no false trigger on zero-valued key IDs.


rejectRebadgedEnvelope (encryption_glue.go:194) — Correct ✅

Three no-op guards are correct:

  1. s.cipher == nil → legacy cleartext-only mode, no rebadge attack possible ✅
  2. len(body) < EnvelopeOverhead (34 bytes) → can't parse as envelope ✅
  3. DecodeEnvelope fails → treat as legitimate cleartext (nolint:nilerr is justified; the comment makes the intent explicit) ✅
  4. !s.cipher.HasKey(env.KeyID) → body looks envelope-shaped but key not loaded → cleartext ✅

The false-positive path (legitimate cleartext accidentally matching all four conditions) requires:

  • First byte = 0x01 (EnvelopeVersionV1)
  • Body ≥ 34 bytes
  • DecodeEnvelope parse passes (structured format with specific field widths)
  • 32-bit key_id collision with a loaded DEK

This is bounded and documented. Stage 8 makes it deterministic.


TestCipher_HasKey (cipher_test.go) — Complete ✅

All four cases covered: known id (true), unknown id (false), nil receiver (false), zero-value Cipher (false). The zero-value test correctly takes the address of the local — c.keystore == nil fires, returns false without reaching Keystore.Has.


TestEncryption_RebadgeAttackRejected (lsm_store_encryption_test.go:414) — Correct ✅

raw[0] &^= encStateMask clears bits 1-2 (encStateMask = 0b0000_0110) from the flags byte, mapping encStateEncrypted<<10b00 while leaving the tombstone bit (bit 0) untouched. The writeTS of 314159 (distinct from the other tamper tests' ts=100) is correct and eliminates the unparam lint concern on tamperPebbleValue. The GetAt path exercises readVisibleVersiondecryptForKeyrejectRebadgedEnvelopeErrEncryptedReadIntegrity. ✅


Round-3 P2: isVisibleLiveKey — Still open ⚠️

isVisibleLiveKey (lsm_store.go:1392) was not modified in round-4. It still branches on sv.Tombstone / sv.ExpireAt without calling decryptForKey first:

sv, err := decodeValue(iter.Value())           // header unauthenticated
...
if sv.Tombstone || (sv.ExpireAt != 0 && ...) { // branchable before GCM check
    return false, nil
}

This is called by scanDeletePrefix (lsm_store.go:1356). An attacker who flips the tombstone bit or lowers expireAt on an encrypted entry can cause scanDeletePrefix to silently skip the deletion tombstone write — the encrypted key survives the prefix-delete operation. Unlike the read-path tampers fixed in rounds 3–4, this is a write-side integrity bypass: the outcome is a persisted decision (a missing tombstone in Pebble) rather than a returned wrong value.

Note that the rejectRebadgedEnvelope guard is NOT in this path — isVisibleLiveKey never calls decryptForKey. The rebadge guard only protects the encState flip attack; the tombstone/expireAt flip in isVisibleLiveKey is a separate and orthogonal gap.

Suggested fix (same as round-3):

func (s *pebbleStore) isVisibleLiveKey(iter *pebble.Iterator, userKey []byte, version, commitTS uint64) (bool, error) {
    if !s.seekToVisibleVersion(iter, userKey, version, commitTS) {
        return false, nil
    }
    sv, err := decodeValue(iter.Value())
    if err != nil {
        return false, errors.WithStack(err)
    }
    // Authenticate the header before branching on tombstone/expireAt —
    // same invariant enforced in readVisibleVersion / processFoundValue.
    if sv.EncState == encStateEncrypted {
        if _, err := s.decryptForKey(iter.Key(), sv, sv.Value); err != nil {
            return false, err
        }
    }
    if sv.Tombstone || (sv.ExpireAt != 0 && sv.ExpireAt <= commitTS) {
        return false, nil
    }
    return true, nil
}

And a TestEncryption_ValueHeaderTamperRejected_DeletePrefix sub-case (or new test function) that calls DeletePrefixAt over a key whose tombstone/expireAt was flipped and asserts ErrEncryptedReadIntegrity propagates instead of a silent no-op.

Fix this →


Summary

Finding Status
Codex P1: encState-flip rebadge attack (decryptForKey) ✅ Fixed in round-4
Round-2 design-accepted: encState-flip → raw envelope bytes ✅ Also closed by rejectRebadgedEnvelope
Cipher.HasKey nil/zero-value safety ✅ Correct
rejectRebadgedEnvelope guard logic ✅ Correct
False-positive boundary analysis ✅ Bounded and documented; Stage 8 makes it deterministic
nolint:nilerr suppression ✅ Justified and commented
TestCipher_HasKey (4 cases) ✅ Complete
TestEncryption_RebadgeAttackRejected ✅ Correct
Round-3 P2: isVisibleLiveKey unauthenticated header in scanDeletePrefix ⚠️ Still open — write-side integrity bypass on prefix-delete path

The read path (GetAt, scan, GetRange) is now fully protected through all four rounds. isVisibleLiveKey is the one remaining gap before the encryption envelope can be considered complete for all write-path visibility decisions.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb7edd4bc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/lsm_store.go
if sv.ExpireAt != 0 && sv.ExpireAt <= ts {
return nil, ErrKeyNotFound
}
return plain, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve empty-value existence on decrypted reads

Return-value semantics changed in readVisibleVersion: decrypted bytes are returned directly, but AES-GCM Open can return nil for a valid zero-length plaintext. In the encryption-enabled path this makes GetAt return nil, nil for empty values, and ExistsAt then reports the key as absent because it checks val != nil. This regresses correctness for any workload that stores empty values and relies on ExistsAt (e.g., key-existence checks after writing an empty payload).

Useful? React with 👍 / 👎.

Comment thread store/encryption_glue.go Outdated
Comment on lines +207 to +212
if !s.cipher.HasKey(env.KeyID) {
// Body parses but the embedded key_id is not loaded — the
// joint shape+keystore-collision probability is low enough
// to treat this as legitimate cleartext that happens to
// look envelope-shaped.
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject rebadged envelopes even when key_id is altered

The cleartext rebadge guard is bypassable because it only errors when the parsed envelope key_id is currently loaded. A disk attacker can flip encryption_state to cleartext and also modify the envelope’s key_id bytes to any unloaded value; rejectRebadgedEnvelope then returns nil, and the read path serves raw envelope bytes as plaintext instead of surfacing integrity failure. This still violates the fail-closed tamper-detection goal for encrypted-at-rest rows.

Useful? React with 👍 / 👎.

…x P1 round-4)

Two new P1 codex findings on the round-3 commit, both legitimate:

1. Empty-plaintext semantics regression. AES-GCM Open returns a nil
   dst slice for a zero-length plaintext, and ExistsAt distinguishes
   "key absent" from "key present with empty value" via val != nil.
   In the encryption-enabled path GetAt was therefore returning
   (nil, nil) for empty values and ExistsAt reported the key as
   missing. Normalize to []byte{} on the decrypt success path so
   the empty-value contract matches the cleartext fast path.

2. Rebadge guard bypassable. The previous round's heuristic only
   rejected when the embedded key_id was loaded in the live
   keystore. A disk attacker who flips encryption_state to cleartext
   can also rewrite the envelope's key_id bytes to any unloaded
   value, falling out of the keystore lookup and serving the
   relabeled bytes as plaintext. The strengthened guard rejects
   whenever DecodeEnvelope succeeds — independent of key_id —
   accepting the trade-off that legitimate cleartext payloads
   parsing as well-formed envelopes (extremely rare under the
   length/version/flag/nonce constraints) will also fail closed.
   Stage 8's authenticated MVCC metadata bit replaces this
   heuristic with a deterministic check.

Caller audit (semantic change: tightened cleartext rejection):

  - decryptForKey is the only caller of rejectRebadgedEnvelope; the
    cleartext fast path otherwise is unchanged.
  - The cipher=nil short-circuit is preserved so legacy single-mode
    deployments are not affected.
  - Existing tests (TestEncryption_InactiveKeyWritesCleartext +
    others) write short cleartext bodies that decode-envelope
    rejects on length, so the strict guard does not break them.

Cipher.HasKey was added in round-3 specifically to power the
keystore-collision narrow check; with that check removed it has no
production caller, so it is removed too rather than left as
dead-code surface.

New regression tests:

  - TestEncryption_RebadgeAttackRejected gains the
    "encState flipped AND key_id rewritten to unloaded" case so a
    future regression of the bypass surface fails the suite.
  - TestEncryption_EmptyValueExistsAt locks down the empty-value
    Put → GetAt → ExistsAt = true contract under encryption.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-5: codex P1 x2 を反映しました (commit 1bda846)。

P1-A: Empty-plaintext regression (lsm_store.go:744)

  • AES-GCM Open は空 plaintext で nil dst を返す → GetAt が (nil, nil) を返し ExistsAt (val != nil チェック) が空 value のキーを "absent" 誤判定。
  • decryptForKey の Decrypt 成功後に if plain == nil { plain = []byte{} } で正規化、cleartext fast-path と空値契約を揃える。

P1-B: Rebadge guard bypass (encryption_glue.go:212)

  • 攻撃者が encState flip + 同時に envelope key_id を未ロード値 (例 0xFFFFFFFF) に書き換え → 旧 keystore-collision check は keystore 未存在を見て nil 返し fail-open。
  • rejectRebadgedEnvelope を強化: DecodeEnvelope が parse 成功した時点で ErrEncryptedReadIntegrity を fail-closed (key_id ロード状態とは無関係)。
  • trade-off: 偶然 envelope-shape を持つ cleartext payload も reject されるが、length+version+flag+nonce 制約が揃う確率は低く許容。Stage 8 の MVCC metadata 認証 bit で deterministic 化される。

caller audit:

  • rejectRebadgedEnvelopedecryptForKey の cleartext 分岐のみから呼ばれる。
  • cipher=nil short-circuit は legacy single-mode deployment 経路を保持。
  • 既存 TestEncryption_InactiveKeyWritesCleartext 等は短い cleartext body を書き DecodeEnvelope が長さで弾くため影響なし (回帰確認済み)。
  • Cipher.HasKey は round-3 で keystore-collision narrow check のために追加していたが、round-5 で check 自体が消えたため production caller がなくなり、dead-code 化を避けて削除。

新規テスト:

  • TestEncryption_RebadgeAttackRejected をテーブル駆動化、"encState flipped AND key_id rewritten to unloaded" ケースを追加。
  • TestEncryption_EmptyValueExistsAt で空 value Put → GetAt 非 nil → ExistsAt=true の契約を pin。

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 2m 59s —— View job


PR742 Round-5 Review

  • Read trigger comment and understand request
  • Read changed files (encryption_glue.go, lsm_store.go, tests)
  • Review P1-A: Empty-plaintext nil normalization
  • Review P1-B: Strengthened rejectRebadgedEnvelope
  • Check Cipher.HasKey removal
  • Check if round-3/4 P2 (isVisibleLiveKey) was addressed
  • Review new tests
  • Post findings

P1-A: Empty-plaintext nil normalization — Correct ✅

decryptForKey (encryption_glue.go:186–188) now normalizes nil → []byte{} after cipher.Decrypt returns. This is specific to the AES-GCM path: Go's AEAD Open returns a nil dst slice when the plaintext is zero-length, while the cleartext path gets sv.Value = make([]byte, 0) from decodeValue:457 — non-nil regardless of body length. The asymmetry was the bug; the fix closes it cleanly.

The trace for an empty encrypted value is now correct:

  1. PutAt(key, []byte{}, ts, 0)cipher.Encrypt([]byte{}) → 16-byte GCM tag only → stored as encStateEncrypted
  2. GetAtcipher.DecryptOpen returns nil → normalized to []byte{}
  3. ExistsAtval != niltrue

TestEncryption_EmptyValueExistsAt pins all three assertions (GetAt non-nil, GetAt zero-length, ExistsAt true). ✅


P1-B: rejectRebadgedEnvelope strengthened — Correct ✅

The key_id-bypass attack is now closed. Old guard: DecodeEnvelope success AND HasKey(env.KeyID) true. New guard: DecodeEnvelope success alone (encryption_glue.go:221–227). An attacker who flips encState 0b01→0b00 and simultaneously rewrites the envelope's key_id bytes to an unloaded value (e.g. 0xFFFFFFFF) can no longer fall through the keystore lookup.

Three no-op guards remain correct:

  • s.cipher == nil → legacy cleartext-only mode, no attack surface ✅
  • len(body) < EnvelopeOverhead (34 bytes) → body can't parse as envelope ✅
  • DecodeEnvelope returns error (wrong version byte, short body, etc.) → legitimate cleartext ✅

The false-positive trade-off (cipher-enabled deployments, cleartext values ≥ 34 bytes with body[0] == 0x01 that satisfy the full envelope parse) is documented in the function comment and is bounded. It only bites in the mixed-mode window (cipher=set, activeKeyID returning false) and requires: first byte 0x01 + version check + flag + nonce + body-length alignment. Stage 8's authenticated MVCC metadata bit makes this deterministic.

TestEncryption_RebadgeAttackRejected is now table-driven with both variants covered:

  • "encState flipped, key_id intact" (round-3 case) ✅
  • "encState flipped AND key_id rewritten to unloaded" (round-5 new case) ✅

Note: The kidOffset = valueHeaderSize + 2 calculation (flags 1B + expireAt 8B + version 1B + flag 1B = offset 11) is correct per envelope.go:keyIDOffset=2 within the envelope's own body (which starts at valueHeaderSize=9). ✅


Cipher.HasKey removal — Clean ✅

Confirmed fully absent from all files in store/ and internal/encryption/. TestCipher_HasKey from round-4 is also gone. No dangling references.


Round-3/4 P2: isVisibleLiveKey — Still open ⚠️

isVisibleLiveKey (lsm_store.go:1392–1404) was not modified in round-5:

func (s *pebbleStore) isVisibleLiveKey(iter *pebble.Iterator, ...) (bool, error) {
    ...
    sv, err := decodeValue(iter.Value())          // header unauthenticated
    ...
    if sv.Tombstone || (sv.ExpireAt != 0 && ...) { // branched before any GCM check
        return false, nil
    }
    return true, nil
}

This is called exclusively by scanDeletePrefix (lsm_store.go:1356). A disk attacker who flips the tombstone bit or lowers expireAt on an encrypted entry causes scanDeletePrefix to skip writing the deletion tombstone — the key survives the prefix delete silently. Unlike the read-path flips fixed in rounds 3–5, this is a write-side integrity bypass: a missing tombstone is a persisted outcome in Pebble, not a transient wrong return value.

The four rounds of fixes have fully protected: GetAtreadVisibleVersion (round-3), GetRange/scanprocessFoundValue (round-3), decryptForKey cleartext branch (rounds 4–5), empty-value semantics (round-5). isVisibleLiveKey is the one remaining gap.

Suggested fix (same as rounds 3 and 4):

func (s *pebbleStore) isVisibleLiveKey(iter *pebble.Iterator, userKey []byte, version, commitTS uint64) (bool, error) {
    if !s.seekToVisibleVersion(iter, userKey, version, commitTS) {
        return false, nil
    }
    sv, err := decodeValue(iter.Value())
    if err != nil {
        return false, errors.WithStack(err)
    }
    if sv.EncState == encStateEncrypted {
        if _, err := s.decryptForKey(iter.Key(), sv, sv.Value); err != nil {
            return false, err
        }
    }
    if sv.Tombstone || (sv.ExpireAt != 0 && sv.ExpireAt <= commitTS) {
        return false, nil
    }
    return true, nil
}

And a TestEncryption_ValueHeaderTamperRejected_DeletePrefix test that writes an encrypted entry, tampers tombstone or expireAt on the Pebble bytes, calls DeletePrefixAt, and asserts ErrEncryptedReadIntegrity propagates instead of a silent no-op tombstone.

Fix this →


Summary

Finding Status
P1-A: Empty-plaintext nil normalization (decryptForKey) ✅ Fixed in round-5
P1-B: Rebadge guard key_id-rewrite bypass ✅ Fixed in round-5
Cipher.HasKey dead-code removal ✅ Clean
TestEncryption_EmptyValueExistsAt ✅ Correct; pins GetAt non-nil + ExistsAt=true
TestEncryption_RebadgeAttackRejected (table-driven, 2 cases) ✅ Correct
Rounds 1–5 read path (GetAt, scan, rebadge, empty-value) ✅ Fully protected
Round-3/4 P2: isVisibleLiveKey unauthenticated header in scanDeletePrefix ⚠️ Still open — write-side integrity bypass on prefix-delete path

The read path and the encState rebadge family are now fully protected across all five rounds. isVisibleLiveKey is the sole remaining gap before the storage envelope can be considered complete for all visibility-decision paths.

…ude[bot] round-5)

Read-side paths (readVisibleVersion, processFoundValue) gained
value-header AAD authentication in rounds 3-5, but the write-side
isVisibleLiveKey was missed. Its sole caller is scanDeletePrefix,
which uses the bool to decide whether DeletePrefixAt needs to
write a fresh tombstone. A disk attacker who flips the tombstone
bit (or lowers expireAt) on an encrypted entry would cause the
function to return needsTombstone=false on a key that is actually
still live — DeletePrefixAt then skips writing the deletion
tombstone and the key survives the prefix delete silently. Unlike
the read-path flips, this is a persisted integrity bypass: the
missing tombstone stays missing across restarts.

Fix: run decryptForKey inside isVisibleLiveKey when sv.EncState is
encrypted. The plaintext is discarded; we only need the
authentication side-effect over the (envelope header ‖
value-header bytes ‖ pebble key) AAD. After Decrypt succeeds,
sv.Tombstone and sv.ExpireAt are now operating on authenticated
bytes; on tag mismatch ErrEncryptedReadIntegrity propagates
through scanDeletePrefix to the operator, halting DeletePrefixAt
fail-closed.

Caller audit:
- isVisibleLiveKey is called only by scanDeletePrefix (lsm_store.go:1356).
- scanDeletePrefix already propagates errors via `return err`, so
  the new ErrEncryptedReadIntegrity surface bubbles to the
  operator without further changes.
- Cleartext entries (sv.EncState == encStateCleartext) hit
  decryptForKey's fast path and only run the rebadge guard, which
  rejects envelope-shaped bytes. Legitimate cleartext stays
  unaffected.
- Empty-plaintext encrypted entries: decryptForKey now returns
  []byte{}; we discard it, so the round-5 normalization is
  harmless here.

New regression test
TestEncryption_DeletePrefixHeaderTamperRejected writes an encrypted
entry under "doomed/", flips the tombstone bit on disk, calls
DeletePrefixAt, and asserts the surface error is
ErrEncryptedReadIntegrity instead of a silent no-op.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-6: claude[bot] round-5 review で指摘された isVisibleLiveKey の write-side gap を解消しました (commit f8fc68e)。

攻撃面: round 3-5 で read paths (readVisibleVersion, processFoundValue) は value-header AAD 認証で守ったが、write-side isVisibleLiveKey が抜けていた。唯一の caller scanDeletePrefix は tombstone 書き込み要否の判定にこの戻り値を使うため、ディスク攻撃者が encrypted entry の tombstone bit を flip すると needsTombstone=false 誤判定 → DeletePrefixAt が tombstone 書込を silent skip → キーが prefix delete 後も生き残る (再起動越しの persisted bypass)。

修正: isVisibleLiveKeydecryptForKey を実行 (plaintext は破棄、AAD 認証 side-effect のみ利用)。tag mismatch 時の ErrEncryptedReadIntegrityscanDeletePrefix → return err 経由で operator まで bubble、fail-closed。

caller audit:

  • isVisibleLiveKeyscanDeletePrefix (lsm_store.go:1356) のみから。
  • scanDeletePrefix は既存 return err で error 伝播済み、追加変更不要。
  • cleartext entry は decryptForKey の fast-path (rebadge guard のみ実行)、legitimate cleartext は無影響。
  • empty-plaintext encrypted: decryptForKey[]byte{} 返すが破棄するため round-5 正規化と矛盾なし。

新規テスト TestEncryption_DeletePrefixHeaderTamperRejected:

  • encrypted entry doomed/k1 を書き、ディスク上で tombstone bit を flip
  • DeletePrefixAt("doomed/")ErrEncryptedReadIntegrity で fail
  • pre-fix なら silent no-op (キー生存)

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 2m 19s —— View job


PR742 Round-6 Review

  • Read trigger comment and understand request
  • Read round-6 diff (commit f8fc68e)
  • Review isVisibleLiveKey fix correctness
  • Verify decryptForKey call placement and error propagation
  • Review new test TestEncryption_DeletePrefixHeaderTamperRejected
  • Check for any remaining gaps
  • Post findings

Round-3/4/5 P2: isVisibleLiveKey — Fixed ✅

The write-side integrity bypass that persisted across rounds 3–5 is now closed. The fix in isVisibleLiveKey (lsm_store.go:1416) is structurally correct and symmetric with readVisibleVersion and processFoundValue.

Fix correctness. decryptForKey is called unconditionally (not gated on sv.EncState == encStateEncrypted), which is the right choice — the function self-dispatches on sv.EncState and handles both states:

  • Encrypted entries (sv.EncState == encStateEncrypted): reproduces the AAD from on-disk sv.Tombstone, sv.ExpireAt, sv.EncState and calls cipher.Decrypt. A disk attacker who flips the tombstone bit causes the reconstructed AAD at decrypt time to differ from the AAD used at encrypt time → GCM tag mismatch → ErrEncryptedReadIntegrity. ✅
  • Cleartext entries (sv.EncState == encStateCleartext): takes the fast path, runs only rejectRebadgedEnvelope, returns body. Legitimate cleartext entries are unaffected. ✅

Plaintext discard. _, err := s.decryptForKey(...) is correct — isVisibleLiveKey only needs the authentication side-effect; the bool result is what scanDeletePrefix acts on.

Error propagation. isVisibleLiveKey returns false, err on ErrEncryptedReadIntegrity (lsm_store.go:1417). This propagates via scanDeletePrefix (lsm_store.go:1356's if err != nil { return err }) → deletePrefixAtWithOpts (lsm_store.go:1313) → DeletePrefixAt → operator. The DeletePrefixAt call in the test correctly asserts that the surface error wraps ErrEncryptedReadIntegrity, which confirms the bubble chain is intact. ✅

Empty-plaintext edge case. For an empty encrypted entry, decryptForKey now returns []byte{} (round-5 normalization). The result is discarded here, so the nil → []byte{} normalization is harmless. ✅


TestEncryption_DeletePrefixHeaderTamperRejected — Correct ✅

The test correctly pins the attack → defense contract:

  1. PutAt("doomed/k1", "victim", ts=100, expireAt=0) — writes an authenticated encrypted entry with tombstone=false embedded in the AAD
  2. tamperPebbleValue(raw[0] |= tombstoneMask) — flips bit 0 of the flags byte on disk, setting tombstone=true
  3. DeletePrefixAt("doomed/", nil, ts=200) — calls isVisibleLiveKey, which calls decryptForKey, which rebuilds the AAD with the on-disk sv.Tombstone=true — mismatch with the tombstone=false AAD baked into the GCM tag at write time → ErrEncryptedReadIntegrity
  4. Asserts errors.Is(err, ErrEncryptedReadIntegrity) — confirms fail-closed rather than silent no-op tombstone skip

The choice of tombstoneMask (bit 0) as the tamper target is the most impactful case for this path (an attacker trying to prevent a deletion rather than cause a false deletion). ✅


Comprehensive call-site audit — All sites now protected ✅

All three decodeValue callers that branch on sv.Tombstone / sv.ExpireAt now call decryptForKey before the visibility check:

Site Function Fixed in
lsm_store.go:730 readVisibleVersion Round-3
lsm_store.go:770 processFoundValue Round-3
lsm_store.go:1407 isVisibleLiveKey Round-6

The remaining decodeValue call (line 1748, setEncodedVersionInBatch) uses version.Tombstone from an in-memory VersionedValue, not from on-disk bytes — it is not a tamper target. ✅


Cumulative summary — All rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Fixed (round-1)
Codex P1: value-header (tombstone/expireAt) not in AAD ✅ Fixed (round-3)
Read path readVisibleVersion: decrypt-before-visibility ✅ Fixed (round-3)
Read path processFoundValue: decrypt-before-visibility ✅ Fixed (round-3)
Codex P1: encState-flip rebadge → raw envelope as plaintext ✅ Fixed (round-3/4)
Codex P1: key_id-rewrite bypass on rebadge guard ✅ Fixed (round-5)
Empty-value nilExistsAt false negative ✅ Fixed (round-5)
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Fixed (round-6)
CounterNonceFactory filename / export visibility Minor nit — encryption_test_helpers.go is not a _test.go file so CounterNonceFactory is exported from the store package. Still a nit, not a blocker.

The storage envelope wiring is now security-complete across all read and write visibility-decision paths. No remaining gaps identified.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8fc68e8da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/encryption_glue.go Outdated
Comment on lines +226 to +227
return errors.Wrap(ErrEncryptedReadIntegrity,
"store: cleartext-labelled value parses as an envelope; refusing rebadge attempt")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid rejecting cleartext values that match envelope shape

This branch turns any cleartext value that DecodeEnvelope can parse into ErrEncryptedReadIntegrity, which makes legitimate rows unreadable once encryption is wired. In this commit, DecodeEnvelope only requires len(body) >= 34 and body[0] == 0x01, so binary/plain values written during the phase where activeKeyID is false (or pre-encryption legacy data) can be rejected even though they were never encrypted. That creates deterministic read failures for valid user data in mixed-mode rollout scenarios, not just attacker tampering.

Useful? React with 👍 / 👎.

…(PR742 codex P1 round-6)

Round-5's "reject any DecodeEnvelope-parseable body" guard was too
strict: legitimate cleartext rows whose bytes coincidentally pass
the length / version / flag / nonce shape checks (binary blobs,
images, certain protobuf payloads, anything with a 0x01 prefix and
≥34 bytes) became deterministically unreadable once a cipher was
wired. Codex round-6 flagged this as a P1 deterministic read
failure for valid user data in mixed-mode rollout.

The earlier round-3 "shape AND key_id loaded" check had the
opposite problem (codex round-4 P1: bypassable by also rewriting
the on-disk key_id). The two findings appear contradictory but are
reconciled by an AEAD trial decrypt: rebuild the AAD as if the row
carried encState=encrypted and call cipher.Decrypt. Only a real
envelope produced under one of the loaded DEKs verifies — random
cleartext (even envelope-shaped) has 2⁻¹²⁸ chance of forging a tag.

To handle the kid-rewrite-to-unloaded variant (round-4 P1),
iterate over all loaded DEKs and substitute each candidate kid in
the AAD's HeaderAAD section. This way a relabeled envelope whose
on-disk key_id was rewritten still verifies under the original
encrypt-time kid. The Tombstone / ExpireAt / pebbleKey AAD
components remain bound from the encrypt path, so any flip there
trips the tag mismatch regardless of which DEK we trial.

Cost: O(loaded_DEKs) AEAD decrypt attempts per cleartext read.
Typical clusters carry 1-3 active DEKs; each Decrypt is a few µs.
Negligible relative to the Pebble I/O the read already performs.

Re-add Cipher.LoadedKeyIDs accessor (round-3 had Cipher.HasKey,
removed in round-5; the new guard needs to enumerate). Mirrors
the existing Keystore zero-value contract — returns nil for a
nil/zero-value receiver.

Caller audit:
  - rejectRebadgedEnvelope is the only LoadedKeyIDs caller.
  - Signature changes from (body) to (pebbleKey, sv, body) so the
    AAD reconstruction can use the on-disk Tombstone / ExpireAt;
    only call site is decryptForKey's cleartext branch.
  - cipher=nil short-circuit, "body too short for envelope", and
    "DecodeEnvelope parse fail" fast paths preserved.

New regression test
TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext
constructs a cleartext payload that round-5 would have rejected
(0x01 prefix + 64 bytes that pass DecodeEnvelope's structural
checks) and confirms PutAt → GetAt round-trips unchanged.

The pre-existing TestEncryption_RebadgeAttackRejected covers both
attack variants (encState flip alone; encState flip + key_id
rewrite to unloaded) and continues to pass.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-7: codex P1 Avoid rejecting cleartext values that match envelope shape を反映しました (commit 0f9fbda)。

矛盾する 2 つの過去指摘:

  • round-4 codex (P1): shape+keyID-loaded check は keyID rewrite to unloaded で bypass 可
  • round-6 codex (P1, today): shape-only reject は legitimate cleartext (binary blob/0x01 prefix の正当データ) を deterministic 失敗させる

解決: shape-only reject を AEAD trial decrypt に置換。

  • AAD を encState=encrypted で再構築し、loaded DEK 全てで cipher.Decrypt を試行
  • 本物の envelope (どの loaded DEK で encrypt されたか問わず) は tag verify 成功 → reject
  • legitimate cleartext (envelope-shape を偶然満たすデータ含む) は forge probability 2^-128 で拒絶確率ゼロ → そのまま返す
  • keyID rewrite case: AAD の HeaderAAD section に env.KeyID ではなく candidate kid を置換するため、attacker が on-disk keyID をどう書き換えても original encrypt-time kid が loaded であれば tag verify 成功 → reject

追加 API: Cipher.LoadedKeyIDs() []uint32 — keystore.IDs のラッパー、nil/zero-value 安全。round-3 で追加した HasKey (round-5 で削除) と同形の foundation API 追加。

caller audit:

  • rejectRebadgedEnvelope シグネチャ (pebbleKey, sv, body) に変更、唯一の caller は decryptForKey cleartext 分岐 → 更新済み。
  • cipher=nil / body too short / DecodeEnvelope parse fail の fast-path 維持。
  • Cipher.LoadedKeyIDs の唯一の caller は新 guard。

コスト: cleartext read あたり O(loaded_DEKs) AEAD Decrypt。typical 1-3 DEK で μs 級、Pebble I/O 比で無視可能。

新規テスト:

  • TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext: 64-byte 0x01 prefix の cleartext payload が PutAt → GetAt で round-trip → round-5 false-positive 回帰防止。
  • 既存 TestEncryption_RebadgeAttackRejected (encState flip alone / + keyID rewrite to unloaded) は引き続き両ケース pass。

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 6m 30s —— View job


PR742 Round-7 Review

  • Read trigger comment and understand request
  • Read changed files (encryption_glue.go, cipher.go, tests)
  • Review Cipher.LoadedKeyIDs() implementation
  • Review AEAD trial decrypt in rejectRebadgedEnvelope
  • Verify false-positive analysis (legitimate cleartext with envelope shape)
  • Verify keyID-rewrite attack still rejected
  • Review new tests
  • Check for any remaining gaps

AEAD trial decrypt (rejectRebadgedEnvelope) — Correct ✅

The round-5 shape-only guard (DecodeEnvelope success → reject) has been replaced by a cryptographically sound AEAD trial decrypt. The approach:

  1. Parse body via DecodeEnvelope (parse failure → legitimate cleartext, return nil) ✅
  2. Reconstruct the header AAD with encStateEncrypted hardcoded rather than the on-disk sv.EncState. This is the critical insight: the encrypt-time header always used encStateEncrypted, so the rebadge guard must use the same value regardless of what the attacker flipped on disk. ✅
  3. Substitute each candidate kid (from LoadedKeyIDs()) into the AAD — not env.KeyID from the parsed envelope — so an attacker who rewrites the key_id field is still caught by the iteration over loaded DEKs. ✅
  4. If any cipher.Decrypt succeeds → GCM tag verifies → the body is unambiguously a real encrypted envelope → reject with ErrEncryptedReadIntegrity. ✅
  5. No loaded DEK matches → return nil (legitimate cleartext, or retired-DEK envelope). ✅

False-positive analysis: A legitimate cleartext value must produce a valid GCM tag verifiable under some loaded DEK with the reconstructed AAD. Probability: 2⁻¹²⁸. Negligible by construction. TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext pins this ✅

Round-4 attack (encState flip, key_id intact) → At least one trial (the original DEK) produces AAD that matches encrypt-time AAD → GCM succeeds → reject. ✅

Round-5 attack (encState flip + key_id rewritten to 0xFFFFFFFF, unloaded)LoadedKeyIDs() doesn't include 0xFFFFFFFF; iteration tries the original DEK and the AAD matches (kid field in the AAD is the candidate, not env.KeyID) → GCM succeeds → reject. ✅

New variant (encState flip + key_id rewritten to another loaded DEK B, original A still loaded) → We iterate [A, B]; trying kid=A produces the original encrypt-time AAD → GCM succeeds → reject. ✅


Cipher.LoadedKeyIDs() — Correct ✅

Nil/zero-value safe (c == nil || c.keystore == nil guard). Delegates to Keystore.IDs(), which takes an atomic snapshot of the key map and returns a sorted slice. No TOCTOU concern: a DEK removed between LoadedKeyIDs() and a cipher.Decrypt call surfaces as ErrUnknownKeyID inside aeadFor → decrypt returns an error → we move to the next iteration, no panic.


TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext — Correct ✅

The test correctly models the Phase 0 / pre-cutover window:

  • active=false at write time → PutAt stores encStateCleartext
  • Payload starts with 0x01 (EnvelopeVersionV1), key_id = 0xCAFEBABE (not loaded), nonce + body are arbitrary filler
  • active=true at read time → cipher is armed → decryptForKey enters the cleartext branch and calls rejectRebadgedEnvelope
  • Trial decrypt with kid=71 (loaded DEK): body is random bytes, GCM verification fails → guard returns nil → caller gets original bytes verbatim ✅

One test-prerequisite note: the test silently depends on DecodeEnvelope(envelopeShaped) succeeding — if the envelope format changes, the test would lose its "envelope-shaped cleartext" property without failing at the assertion. A defensive assertion if _, err := encryption.DecodeEnvelope(envelopeShaped); err != nil { t.Fatalf(...) } would pin the prerequisite explicitly. Minor, not a blocker.


TestEncryption_RebadgeAttackRejected — Still correct with round-7 ✅

Both table-driven cases:

  • Case 1 (encState flipped, key_id intact): sv.EncState=0b00, sv.Tombstone=false, sv.ExpireAt=0 from decodeValue. Guard builds hdr with encStateEncrypted hardcoded. Trial with kid=31 → AAD matches encrypt-time AAD → GCM verifies → reject. ✅
  • Case 2 (encState flipped + key_id→0xFFFFFFFF): LoadedKeyIDs()=[31]. Trial with kid=31 → AAD (containing kid=31) matches original encrypt-time AAD → GCM verifies → reject. ✅

Two comment issues — Minor ⚠️

1. Stale decryptForKey function comment. The doc comment at encryption_glue.go:137–148 still describes the old round-4/5 approach ("if it parses as a well-formed envelope AND its key_id is loaded in the keystore… reject as ErrEncryptedReadIntegrity. False positives on legitimate cleartext are bounded by the joint probability of envelope-shape match + a 32-bit key_id collision with a loaded DEK"). The round-7 AEAD trial decrypt is fundamentally different; the comment should be updated to match the rejectRebadgedEnvelope doc (which is accurate).

2. Ambiguous "falls through" phrase in rejectRebadgedEnvelope. The comment says:

The kid-rewrite-to-loaded-other-DEK variant still falls through (AAD mismatch on the foreign DEK), but the user observes ciphertext garbage that application code rejects downstream rather than plaintext

This is misleading. If the original DEK (kid=A) is still loaded and the attacker rewrites env.KeyID from A to another loaded DEK B, the attack is caught — trial with kid=A produces the encrypt-time AAD and GCM verifies → reject. The scenario where the guard actually falls through is when the original DEK has been retired (removed from the keystore), leaving only DEK B loaded. Consider tightening the wording to "when the original encrypt-time DEK has been retired" so it's clear this is the retired-key gap (already accepted per Stage 8), not a bypass in the active-rotation case.

Fix the stale comment →


No direct unit test for Cipher.LoadedKeyIDs() — Minor ⚠️

HasKey (round-4) had TestCipher_HasKey (4 cases); it was removed in round-5. LoadedKeyIDs has no equivalent unit test. Coverage is currently only via integration (the TestEncryption_RebadgeAttackRejected path). The function is a thin wrapper around Keystore.IDs() (which is tested by TestKeystore_IDs_Sorted), so this is a minor gap rather than a blocker. A simple TestCipher_LoadedKeyIDs covering nil receiver / zero-value / 1 key / 2 keys sorted would be the symmetric complement of the removed TestCipher_HasKey.


Retired-DEK gap — Documented, not new ⚠️

The guard's failure mode for envelopes encrypted under a retired DEK is explicitly documented at the bottom of rejectRebadgedEnvelope: the caller receives raw envelope bytes as "plaintext". This is the same gap that existed in all prior rounds; the round-7 change doesn't regress it. Deferred to Stage 8 per design doc. Correctly captured in the // No loaded DEK produces a tag match — ... an envelope under a retired DEK comment.


Performance

O(n) AEAD decryptions per cleartext read, where n = loaded DEKs (typically 1–3 during rotation). Each GCM Open on a short body is O(len(body)). Documented in the trigger comment as negligible vs Pebble I/O. The cost applies only to cleartext reads (encState=0b00), not to encrypted reads (which take the encStateEncrypted branch directly). No existing benchmark covers the mixed-mode path; if this store is hot for cleartext reads during Phase 0, a benchmark addition would be worth it before Stage 5/6 wires the production active-key path.


Cumulative summary — All rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Round-1
Codex P1: value-header not bound into AAD ✅ Round-3
readVisibleVersion / processFoundValue: decrypt-before-visibility ✅ Round-3
Codex P1: encState-flip rebadge → raw ciphertext as plaintext ✅ Round-4
Codex P1: key_id-rewrite bypasses keystore collision guard ✅ Round-5
Empty-value nilExistsAt false negative ✅ Round-5
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Round-6
Codex P1: shape-only reject → false-positives on legitimate cleartext Round-7
Stale decryptForKey comment (still describes round-4/5 mechanism) ⚠️ Minor — not a correctness issue
Ambiguous "falls through" phrase in rejectRebadgedEnvelope comment ⚠️ Minor — clarify to "original DEK retired"
No direct unit test for Cipher.LoadedKeyIDs() ⚠️ Minor
Retired-DEK rebadge gap ⚠️ Documented, deferred to Stage 8
CounterNonceFactory exported from non-test file ⚠️ Carried from round-2, not a blocker

All security-critical paths are now protected. The two stale comments are the only actionable items before merge; both are documentation-only fixes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f9fbda281

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/encryption_glue.go Outdated
Comment on lines +251 to +255
var hdr [valueHeaderSize]byte
writeValueHeaderBytes(hdr[:], sv.Tombstone, sv.ExpireAt, encStateEncrypted)
for _, kid := range s.cipher.LoadedKeyIDs() {
aad := buildStorageAAD(env.Version, env.Flag, kid, hdr[:], pebbleKey)
if _, err := s.cipher.Decrypt(env.Body, aad, kid, env.Nonce[:]); err == nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject relabeled envelopes when header fields are modified

rejectRebadgedEnvelope rebuilds trial-decrypt AAD from sv.Tombstone/sv.ExpireAt, which come from the attacker-controlled cleartext-labeled header. If an attacker flips encryption_state to cleartext and changes tombstone/expiry bits, every trial decrypt sees the wrong AAD and fails, so this function returns nil and the read path serves ciphertext bytes (or silently returns not-found/expired) instead of ErrEncryptedReadIntegrity. This recreates a fail-open tamper path for encrypted rows under disk-byte modification.

Useful? React with 👍 / 👎.

…1 round-7)

Round-7's trial-decrypt rebadge guard reconstructed the AAD from
on-disk sv.Tombstone / sv.ExpireAt — but those fields live in the
same value-header byte / region as encryption_state, so an attacker
who flips encState 0b01→0b00 can simultaneously flip tombstone or
rewrite expireAt. The trial AAD then no longer matches the
encrypt-time AAD, every loaded-DEK Decrypt fails, the guard
returns nil, and the read path serves either ciphertext garbage
("not found" via the post-decryptForKey tombstone check, or
"expired" via expireAt) — a fail-open tamper path codex round-7
flagged P1.

Two changes close most of the gap:

1. Canonical tombstone in the trial AAD. The encrypt path always
   writes tombstone=false (encrypted entries never carry the
   tombstone bit — deletes use cleartext tombstone-only writes).
   Trusting the on-disk tombstone bit was therefore unsound.
   Substitute false unconditionally so a flipped tombstone bit no
   longer changes the AAD that we trial against.

2. Enumerate expireAt candidates. We can't know the encrypt-time
   value from the on-disk bytes, but the common case is no-TTL
   (expireAt=0); attackers who rewrote expireAt away from 0 still
   verify when we try expireAt=0 in the trial AAD. We also keep
   the on-disk value as a candidate to cover the encState-only
   flip case where expireAt was untouched. Cost: up to 2 × loaded_DEKs
   AEAD Decrypt attempts per cleartext read — typical clusters are
   1-3 active DEKs, each Decrypt is microseconds, negligible
   relative to Pebble I/O.

Caller audit: rejectRebadgedEnvelope is still called only by
decryptForKey's cleartext branch. Signature unchanged. Cleartext
reads that don't parse as envelopes (the legitimate-mixed-mode
fast path codex round-6 required) are unaffected.

Residual gap: encState flip + expireAt rewrite when the original
expireAt was a non-zero value the attacker also rewrote. The trial
can't recover the original without authenticated metadata. Stage 8
moves encryption_state into authenticated MVCC metadata, which
makes the rebadge attack structurally impossible.

New regression test TestEncryption_RebadgeAttackCombinedHeaderFlips
covers three combined-flip scenarios on a no-TTL row: encState +
tombstone, encState + expireAt rewrite, encState + tombstone +
expireAt — all surface ErrEncryptedReadIntegrity.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-8: codex P1 Reject relabeled envelopes when header fields are modified を反映しました (commit 55bca7c)。

攻撃: round-7 trial decrypt が on-disk sv.Tombstone / sv.ExpireAt を AAD reconstruction に使用していたため、attacker が encState flip と同時に tombstone/expireAt も flip すると trial AAD が encrypt-time AAD と不一致 → fail-open。

修正:

  1. Canonical tombstone: encrypt path は必ず tombstone=false で AAD を構築する (encrypted entry は plaintext を含むため delete は cleartext-tombstone-only)。trial AAD でも false を強制、on-disk bit を信用しない。
  2. expireAt enumeration: 候補に {on-disk, 0} を列挙。原 expireAt=0 (no-TTL、最頻ケース) を candidate=0 trial が捕捉、encState-only flip ケースは on-disk trial が捕捉。

コスト: cleartext read あたり最大 2 × loaded_DEKs trial decrypt。typical 1-3 DEK、μs 級、Pebble I/O 比で無視可。

caller audit:

  • rejectRebadgedEnvelopedecryptForKey cleartext 分岐のみ、シグネチャ変更なし。
  • legitimate cleartext (DecodeEnvelope parse fail) は影響なし。
  • Cipher.LoadedKeyIDs の唯一の caller は新 guard のまま。

残課題: encState + expireAt rewrite (原 expireAt が非ゼロの場合)。Stage 8 で encryption_state を authenticated MVCC metadata に移行することで deterministic に解決。

新規テスト TestEncryption_RebadgeAttackCombinedHeaderFlips:

  • encState + tombstone flip → reject
  • encState + expireAt rewrite (no-TTL → past) → reject
  • encState + tombstone + expireAt all flip → reject

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 3m 52s —— View job


PR742 Round-8 Review

  • Read trigger comment and understand request
  • Read store/encryption_glue.go (round-8 changes)
  • Read store/lsm_store_encryption_test.go (new test)
  • Verify canonical tombstone=false in trial decrypt AAD
  • Verify expireAt enumeration logic and coverage
  • Analyze residual gap (non-zero original expireAt rewrite)
  • Check new TestEncryption_RebadgeAttackCombinedHeaderFlips cases
  • Post findings

Codex P1 combined-flip fix — Correct ✅

The codex P1 finding was that round-7's rejectRebadgedEnvelope rebuilt the trial AAD from sv.Tombstone and sv.ExpireAt — both of which come from the on-disk bytes the attacker controls. Flipping encState AND tombstone/expireAt simultaneously made every trial AAD mismatch the encrypt-time AAD → guard returned nil → fail-open.

Round-8 closes the common case via two changes:

1. Canonical tombstone=false (encryption_glue.go:277)

writeValueHeaderBytes(hdr[:], false /*canonical*/, candidateExpire, encStateEncrypted)

This is sound because encryptForKey always writes tombstone=false (encryption_glue.go:105). The encrypt path is never reached for tombstone writes — deletes are cleartext tombstones. An attacker who flips tombstone ON TOP of flipping encState no longer changes the trial AAD. Test case 1 ("encState + tombstone flipped, no TTL") correctly pins this. ✅

2. expireAt candidate enumeration (encryption_glue.go:270–284)

candidateExpireAts := []uint64{sv.ExpireAt}
if sv.ExpireAt != 0 {
    candidateExpireAts = append(candidateExpireAts, 0)
}

This strategy is correct for the documented attack surface:

Attack sv.ExpireAt on disk Candidates Caught by
encState-only flip, orig=0 0 [0] candidate=0 ✅
encState + tombstone flip, orig=0 0 [0] candidate=0 ✅
encState + expireAt rewrite 0→past, orig=0 past [past, 0] candidate=0 ✅
encState + tombstone + expireAt all flip, orig=0 future [future, 0] candidate=0 ✅
encState + expireAt rewrite X→Y, orig=X (non-zero) Y [Y, 0] miss — documented residual

The cost (≤2 × loaded_DEKs AEAD Decrypt per cleartext read, μs-range) is proportionate and documented.


TestEncryption_RebadgeAttackCombinedHeaderFlips — Correct ✅

All three cases cover no-TTL (origExpireAt=0) rows, which is the common write path and the attack surface where the fix achieves full coverage:

Case 1 (encState + tombstone): candidateExpireAts=[0]; trial with tombstone=false, expireAt=0 matches encrypt-time AAD → ErrEncryptedReadIntegrity ✅

Case 2 (encState + expireAt 0→1): candidateExpireAts=[1, 0]; trial with candidate=0 matches → ErrEncryptedReadIntegrity ✅

Case 3 (encState + tombstone + expireAt→0x7fff...): candidateExpireAts=[0x7fffffffffffffff, 0]; trial with candidate=0 matches → ErrEncryptedReadIntegrity ✅

The test correctly documents the residual gap in the comment above origExpireAt.


Documented residual gap — Correctly scoped ⚠️

When origExpireAt was non-zero (X) and the attacker rewrites it to any other non-zero Y:

  • candidateExpireAts = [Y, 0]; neither matches X → fail-open

What the caller actually observes depends on the attacker's choice of Y:

  • Y in the past (sv.ExpireAt <= ts): readVisibleVersion returns ErrKeyNotFound (key appears expired). Raw envelope bytes are NOT returned.
  • Y in the future (sv.ExpireAt > ts): the plain return from decryptForKey (which contains raw envelope bytes, since the rebadge guard returned nil) gets returned to the caller. This is a real plaintext-disclosure failure: the application receives garbled ciphertext+tag+nonce as "user data."

The commit message and code comment correctly identify Stage 8's authenticated MVCC metadata bit as the structural fix. This gap is acceptable for Stage 2.

One untested sub-case worth noting: origExpireAt=X (non-zero), attacker rewrites to 0. In this case sv.ExpireAt=0candidateExpireAts=[0]. Trial with candidate=0 will not match encrypt-time AAD which used X → miss. The attacker effectively extends a TTL-bearing entry to appear immortal, and if the guard misses, raw bytes flow to the caller. The comment focuses on "non-zero → non-zero" rewrite but "non-zero → 0" (TTL removal) is the same class of gap and worth noting explicitly for completeness.


Three stale comments — Carried from round-7 ⚠️

These were flagged in round-7 and remain unchanged in round-8:

1. decryptForKey lines 137–148 (most misleading): The "Cleartext-rebadge guard" block still describes the old round-3/4 mechanism — "if it parses as a well-formed envelope AND its key_id is loaded in the keystore... reject as ErrEncryptedReadIntegrity. False positives... bounded by the joint probability of envelope-shape match + a 32-bit key_id collision with a loaded DEK." This has been superseded by the AEAD trial decrypt approach.

Additionally, line 151 now has a new inaccuracy: "its Tombstone, ExpireAt, and EncState are reproduced into the AAD" — after round-8, the rebadge guard does NOT reproduce sv.Tombstone into the trial AAD (it's hardcoded false), and EncState is hardcoded encStateEncrypted. The statement is accurate for the encrypted branch (lines 169–171) but misleading when read as applying to decryptForKey as a whole.

2. rejectRebadgedEnvelope lines 218–224: The PR742 history still ends with "round-7 (this change)" but round-8 adds tombstone canonicalization and expireAt enumeration. These should be a "round-8" bullet in the history.

3. Ambiguous "falls through" claim (line 221–223): "The kid-rewrite-to-loaded-other-DEK variant still falls through (AAD mismatch on the foreign DEK)." This is misleading: when the attacker rewrites key_id from DEK A to loaded DEK B, and the original DEK A is still loaded, round-7's iteration over LoadedKeyIDs() tries A → matches → attack IS caught. The "falls through" case is only when the original DEK has been retired from the keystore, not any time the key_id is rewritten. The comment should say "when the original DEK has been retired" rather than naming the "kid-rewrite-to-loaded-other-DEK" scenario.

Fix stale comments →


TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext — Minor nit still present ⚠️

The defensive if _, err := encryption.DecodeEnvelope(envelopeShaped); err != nil { t.Fatalf(...) } prerequisite assertion flagged in round-7 was not added. Without it, if the envelope format changes and the test payload no longer parses as an envelope, the test stops proving the "envelope-shaped cleartext is allowed through" invariant without any explicit failure signal. Minor, not a blocker.


Cumulative summary — All rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Round-1
Codex P1: value-header not bound into AAD ✅ Round-3
readVisibleVersion / processFoundValue: decrypt-before-visibility ✅ Round-3
Codex P1: encState-flip rebadge → raw ciphertext as plaintext ✅ Round-4
Codex P1: key_id-rewrite bypasses keystore collision guard ✅ Round-5
Empty-value nilExistsAt false negative ✅ Round-5
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Round-6
Codex P1: shape-only reject → false-positives on legitimate cleartext ✅ Round-7
Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch Round-8
Residual: encState + expireAt rewrite when orig expireAt non-zero ⚠️ Documented, deferred to Stage 8
Stale decryptForKey comment (lines 137–148, 151) ⚠️ Not yet fixed (carried from round-7)
rejectRebadgedEnvelope history missing round-8 entry ⚠️ Minor
Misleading "kid-rewrite-to-loaded-other-DEK still falls through" ⚠️ Only applies when original DEK retired
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext ⚠️ Minor nit (carried from round-7)

The security-critical fix is correct and complete for the no-TTL case (the common path). The three stale comments are the only actionable items; the residual non-zero expireAt rewrite gap is correctly documented and structurally deferred.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55bca7c97b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/encryption_glue.go
Comment on lines +118 to +122
encoded, err := env.Encode()
if err != nil {
return nil, 0, errors.Wrap(err, "store: encode envelope")
}
return encoded, encStateEncrypted, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce snapshot-size limit after envelope encoding

Encrypted writes can exceed the restore-time value length cap even when validateValueSize succeeds: encryptForKey adds envelope overhead and returns bytes that are stored as-is, but restore still caps each raw value at maxSnapshotValueSize + valueHeaderSize in readRestoreEntry. With encryption enabled, a plaintext near maxSnapshotValueSize is writable yet its native snapshot restore fails with ErrValueTooLarge, so the system can persist data that cannot be recovered via snapshot. Please either validate post-encryption size here or raise the restore bound for encrypted rows.

Useful? React with 👍 / 👎.

…dex P1 round-8)

validateValueSize accepts plaintexts up to maxSnapshotValueSize,
but encryptForKey adds 34 bytes (EnvelopeOverhead) on the storage
envelope path, and Pebble snapshots ship the resulting bytes
verbatim. The restore-side cap (maxSnapshotValueSize +
valueHeaderSize) was therefore 34 bytes short of the largest
encrypted entry the write path can produce — a plaintext written
exactly at maxSnapshotValueSize is persisted but rejected on
snapshot restore with ErrValueTooLarge. Codex round-8 P1 flagged
this as data that can persist yet not be recovered via snapshot.

Raise readRestoreEntry's per-value cap to
maxSnapshotValueSize + valueHeaderSize + EnvelopeOverhead so the
restore bound matches what the encrypt-and-write path produces.
The existing safety motivation (cap arbitrary stream lengths to
prevent OOM on a malicious snapshot) is preserved — adding 34
bytes does not materially change the resource ceiling.

Caller audit: this is a one-line bound change inside
readRestoreEntry. The callers (Restore, restoreFromStreamingMVCC)
already propagate ErrValueTooLarge, so behaviour is identical
unless the value would have rejected before — and now the
encrypted-at-max-plaintext case round-trips. mvcc_store.go's
identical-shape cap at line 921 is for the in-memory MVCC stream
format, which Stage 2 does not yet ship encrypted bodies through
(see setEncodedVersionInBatch — it always restores as cleartext);
that cap is therefore left for Stage 8 to bump together with the
v3 metadata-side encryption_state.

New regression test TestEncryption_SnapshotRestoreAtMaxValueSize
shrinks maxSnapshotValueSize to 4 KiB (avoids allocating 256 MiB
in CI) and round-trips a plaintext at that exact size through
PutAt → Snapshot.WriteTo → fresh-store Restore → GetAt. NOT
t.Parallel — the test mutates a package-level var the rest of the
suite reads.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-9: codex P1 Enforce snapshot-size limit after envelope encoding を反映しました (commit 009e798)。

Gap: validateValueSize は plaintext を maxSnapshotValueSize まで許容するが、encryptForKey は EnvelopeOverhead (34 bytes) を追加、Pebble snapshot は raw bytes を ship する。restore 側の per-value cap (maxSnapshotValueSize + valueHeaderSize) は 34 bytes 不足、plaintext を maxSnapshotValueSize ぴったりで PutAt すると persist には成功するが snapshot restore で ErrValueTooLarge。データが書けるのに復元できない silent gap。

修正: readRestoreEntry の cap を maxSnapshotValueSize + valueHeaderSize + encryption.EnvelopeOverhead に引き上げ。OOM 対策の上限性質は保持 (34 bytes 追加は ceiling 性質に影響なし)。

caller audit:

  • readRestoreEntry の caller (Restore, restoreFromStreamingMVCC) は ErrValueTooLarge を伝播済み、cap 変更で挙動変わるのは encrypted-at-max-plaintext のみ → 正常 round-trip 可能化。
  • mvcc_store.go:921 にも同形 cap があるが、これは in-memory MVCC stream format 用。Stage 2 では setEncodedVersionInBatch が encState=cleartext で書き戻すため encrypted body は流れず、Stage 8 の v3 metadata 拡張時にまとめて引き上げる。

新規テスト TestEncryption_SnapshotRestoreAtMaxValueSize:

  • maxSnapshotValueSize を 4 KiB に縮小 (256 MiB alloc 回避)
  • plaintext をぴったり maxSnapshotValueSize で PutAt → Snapshot.WriteTo → fresh store Restore → GetAt
  • pre-fix: ErrValueTooLarge / post-fix: round-trip 成立
  • NOT t.Parallel (package-level var を mutate)

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 4m 5s —— View job


PR742 Round-9 Review

  • Read trigger comment and understand request
  • Read changed files (lsm_store.go diff, snapshot_pebble.go, mvcc_store.go cap)
  • Review readRestoreEntry cap fix correctness
  • Verify mvcc_store.go:921 gap is correctly analyzed and not touched
  • Review new test TestEncryption_SnapshotRestoreAtMaxValueSize
  • Check for any remaining gaps
  • Post findings

Codex P1: Snapshot restore cap — Fixed ✅

The gap is real and the fix is minimal and correct.

The bug. validateValueSize caps the plaintext at maxSnapshotValueSize. encryptForKey adds EnvelopeOverhead = HeaderSize + TagSize = 18 + 16 = 34 bytes on top, and the Pebble snapshot write path (writePebbleSnapshotEntries, snapshot_pebble.go:73) ships the raw on-disk bytes verbatim: len(v) = valueHeaderSize(9) + EnvelopeOverhead(34) + len(plaintext). The old restore cap maxSnapshotValueSize + valueHeaderSize was 34 bytes short for a plaintext written at exactly maxSnapshotValueSize.

The fix (lsm_store.go:1691):

vLen, err = readRestoreFieldLen(r, "snapshot value",
    maxSnapshotValueSize+valueHeaderSize+encryption.EnvelopeOverhead)

The cap arithmetic is now exact:

Component Bytes
valueHeaderSize 9
encryption.EnvelopeOverhead 34 (= HeaderSize(18) + TagSize(16))
maxSnapshotValueSize (plaintext) maxSnapshotValueSize
New cap maxSnapshotValueSize + 9 + 34

OOM protection preserved. The ceiling changes from 256 MiB + 9 to 256 MiB + 43 — immaterial. Malicious streams can still only force 256 MiB + 43 bytes of allocation per entry; the safety property is unchanged. ✅


mvcc_store.go:921 — Correctly left alone ✅

// mvcc_store.go:921
decodedValueLen, err := restoreFieldLenInt(valueLen, "snapshot value", maxSnapshotValueSize)

This cap is for the streaming MVCC format (restoreFromStreamingMVCC, lsm_store.go:2086). In that format, values are serialized field-by-field (ts, expireAt, tombstone, valueLen, value) — there is no value header and no envelope in the stream. Stage 2's setEncodedVersionInBatch (lsm_store.go:1756) always passes encStateCleartext, so no encrypted envelope ever passes through this path. Raising this cap is correctly deferred to Stage 8 per the trigger comment. ✅


TestEncryption_SnapshotRestoreAtMaxValueSize — Correct ✅

The test correctly exercises the boundary condition end-to-end:

  1. Shrinks maxSnapshotValueSize to 4096 (avoids 256 MiB allocation in CI) with proper cleanup via t.Cleanup
  2. Writes a plaintext of exactly maxSnapshotValueSize bytes — this is the tightest boundary case
  3. Takes a native Pebble snapshot via Snapshot.WriteTo
  4. Restores into a fresh encrypted store via Restore
  5. Verifies the round-trip via GetAt with byte-equality check

Destination store is correctly encrypted. The test creates the destination with the same cipher and keyID as the source, so GetAtdecryptForKey can verify the restored encrypted bytes correctly decrypt to the original plaintext. This tests both the cap fix (restore doesn't fail with ErrValueTooLarge) AND the data correctness (the decrypted value matches). ✅

NOT t.Parallel is correct. The test mutates the package-level maxSnapshotValueSize var. This is consistent with the existing convention in lsm_store_test.go (lines 762–835). ✅

One minor nit. The inline comment says "Restore into a fresh store (unencrypted is fine — Pebble snapshots ship raw bytes and the encrypted envelope is preserved verbatim; we are only testing the restore size cap here, not decrypt)." The comment is slightly misleading because the test does in fact test decryption (the GetAt call verifies the value after decrypt). The note should say "an unencrypted destination store would also work for testing the size cap boundary alone, but we use an encrypted destination so the final GetAt verifies the full decrypt round-trip." Minor wording only, not a correctness issue.


Caller audit — Complete ✅

readRestoreEntry is called exclusively by restoreBatchLoopInto (lsm_store.go:1780). That function is called from:

  • restorePebbleNative (lsm_store.go:1923)
  • restorePebbleNativeAtomic (lsm_store.go:2012)

Both propagate the error from readRestoreEntry, and Restore (lsm_store.go:1811) propagates to the operator. The cap change affects only the encrypted-at-max-plaintext case — all other entries are unaffected. ✅


Cumulative summary — All rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Round-1
Codex P1: value-header not bound into AAD ✅ Round-3
readVisibleVersion / processFoundValue: decrypt-before-visibility ✅ Round-3
Codex P1: encState-flip rebadge → raw ciphertext as plaintext ✅ Round-4
Codex P1: key_id-rewrite bypasses keystore collision guard ✅ Round-5
Empty-value nilExistsAt false negative ✅ Round-5
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Round-6
Codex P1: shape-only reject → false-positives on legitimate cleartext ✅ Round-7
Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch ✅ Round-8
Codex P1: Snapshot restore cap 34 bytes short for encrypted entries Round-9
Residual: encState + expireAt rewrite when orig expireAt non-zero ⚠️ Documented, deferred to Stage 8
Stale decryptForKey comment (lines 137–148, 151) ⚠️ Minor — not a correctness issue
rejectRebadgedEnvelope history missing round-8 entry + misleading "falls through" ⚠️ Minor — carried from round-7
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext ⚠️ Minor nit
CounterNonceFactory exported from non-test file ⚠️ Minor nit

All security-critical paths are now fully protected across all nine rounds. The storage envelope wiring is correct for reads (GetAt, scan), writes (PutAt, ExpireAt, ApplyMutations), the delete-prefix visibility path, and snapshot save/restore. The remaining items are documentation-only nits that do not affect correctness or security. The PR is ready for merge subject to CI green.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 009e798dd6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/encryption_glue.go Outdated
Comment on lines +237 to +240
if err != nil {
// Body does not parse as an envelope; treat as legitimate
// cleartext.
return nil //nolint:nilerr // intentional: parse failure means "not an envelope"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail closed when relabeled envelope decoding fails

rejectRebadgedEnvelope treats any DecodeEnvelope failure as legitimate cleartext, so an attacker can flip encryption_state to cleartext and corrupt one envelope header byte (for example version) to force this branch and bypass integrity checks. In that case decryptForKey returns raw ciphertext bytes instead of ErrEncryptedReadIntegrity, which is a fail-open path for encrypted-at-rest data because KV values are opaque and callers may accept those bytes as valid application data.

Useful? React with 👍 / 👎.

…und-9)

Round-9's rebadge guard called encryption.DecodeEnvelope on the
cleartext body; on parse failure it short-circuited to "treat as
cleartext". An attacker who flips encryption_state to cleartext can
also corrupt the envelope's version (or flag) byte to force
DecodeEnvelope into that error path, ducking the trial decrypt and
letting raw envelope bytes flow back to the caller as plaintext —
a fail-open path codex round-9 flagged P1.

Replace DecodeEnvelope with manual fixed-offset slicing:

  nonce = body[HeaderAADSize : HeaderSize]   // bytes 6..18
  ct    = body[HeaderSize:]                  // bytes 18..

The trial AAD is now built with canonical envelope_version (0x01)
and flag (0) — both fixed by the encrypt path — instead of the
on-disk bytes. So a corrupted version or flag byte no longer
matters; only ciphertext / tag bytes can break the trial, and at
that point the original envelope is destroyed (the user receives
garbage, not the original plaintext).

Combined with rounds 7-8 (key_id substitution, canonical tombstone,
expireAt enumeration), the guard now catches:

  - encState flip alone (round-3 case)
  - encState + tombstone flip (round-7)
  - encState + expireAt rewrite when original was 0 (round-7)
  - encState + tombstone + expireAt all flipped on no-TTL row (round-7)
  - encState + envelope version byte corrupted (this round)
  - encState + envelope flag byte corrupted (this round)
  - encState + version + flag both corrupted (this round)
  - encState + key_id rewritten to unloaded (round-4)

Residual: ciphertext or tag byte tampering (no integrity surface
unless we add authenticated metadata, which Stage 8 does
deterministically) and expireAt rewrite when the original was a
specific non-zero value the attacker also rewrote.

Caller audit: rejectRebadgedEnvelope is still called only by
decryptForKey's cleartext branch. Signature unchanged. The
cipher=nil and len(body) < EnvelopeOverhead short-circuits are
preserved so legitimate cleartext smaller than an envelope or in
single-mode deployments hit the fast path unchanged.

New regression test
TestEncryption_RebadgeAttackEnvelopeHeaderCorruption covers three
sub-cases (version corrupted, flag corrupted, both corrupted) and
locks in the fail-closed behaviour against the
DecodeEnvelope-fail-then-fall-through bypass.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-10: codex P1 Fail closed when relabeled envelope decoding fails を反映しました (commit b4da4e7)。

Gap: round-9 の rebadge guard が encryption.DecodeEnvelope を呼び、parse 失敗時に "legitimate cleartext" として fall-through。attacker が encState flip と同時に envelope の version (or flag) byte を corrupt させると DecodeEnvelope が失敗 → trial decrypt をスキップ → envelope bytes が cleartext として返る fail-open。

修正: DecodeEnvelope を bypass、固定オフセットで body を直接スライス:

  • nonce = body[HeaderAADSize:HeaderSize] (6:18)
  • ct = body[HeaderSize:] (18:)

Trial AAD は canonical envelope_version=0x01 / flag=0 (encrypt path の fixed values) を使用。version/flag byte 改変は AAD に影響しない (canonical を使うため)。

catch できる variant:

  • encState only (round-3)
  • encState + tombstone (round-7)
  • encState + expireAt rewrite from 0 (round-7)
  • encState + tomb + expireAt 三重 (round-7)
  • encState + envelope version 改変 (round-9 new)
  • encState + envelope flag 改変 (round-9 new)
  • encState + version + flag 両改変 (round-9 new)
  • encState + key_id rewrite (round-4)

残課題: ciphertext / tag byte 改変 (元 envelope 破壊と引き換え、user は garbage 受領)、expireAt rewrite (元値 non-zero の場合)。Stage 8 metadata 認証で deterministic 化。

caller audit:

  • rejectRebadgedEnvelope の caller (decryptForKey cleartext 分岐) のみ、シグネチャ変更なし。
  • cipher=nil / body too short の fast-path は preserved → legitimate cleartext / single-mode deployment 影響なし。
  • 新 fixed-offset slicing は body length >= EnvelopeOverhead 確認済みなので out-of-bounds なし。

新規テスト TestEncryption_RebadgeAttackEnvelopeHeaderCorruption (3 sub-case): version 改変 / flag 改変 / 両改変、全て ErrEncryptedReadIntegrity

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 3m 42s —— View job


PR742 Round-10 Review

  • Read trigger comment and understand request
  • Read changed files (commit b4da4e7 — encryption_glue.go, tests)
  • Review rejectRebadgedEnvelope bypass of DecodeEnvelope — fixed-offset slicing
  • Verify canonical AAD (envelope_version=0x01, flag=0) correctness
  • Verify out-of-bounds safety of fixed-offset slicing
  • Check new test TestEncryption_RebadgeAttackEnvelopeHeaderCorruption
  • Review interaction with round-7/8/9 guards (regression check)
  • Check for any remaining gaps
  • Post findings

Codex P1: DecodeEnvelope bypass via version/flag corruption — Fixed ✅

The fix is correct and sound. The root cause was that rejectRebadgedEnvelope called encryption.DecodeEnvelope, which validates body[0] == EnvelopeVersionV1 (0x01) before extracting fields. An attacker who flips encState AND corrupts body[0] (version) to any non-0x01 value causes DecodeEnvelope to return ErrEnvelopeVersion → the old guard short-circuited with return nil ("legitimate cleartext") → raw envelope bytes flowed to the caller.

Fixed-offset slicing (encryption_glue.go:263–264):

nonce := body[encryption.HeaderAADSize:encryption.HeaderSize]  // body[6:18] = 12-byte nonce
ct := body[encryption.HeaderSize:]                              // body[18:] = ciphertext+tag

These slice expressions exactly match the field layout from Encode():

  • nonceOffset = 6 = HeaderAADSize → nonce at bytes 6–17 ✅
  • ciphertext+tag start at HeaderSize = 18

This produces identical values to what the old env.Nonce[:] and env.Body extracted, but without parsing the version or flag bytes that an attacker can corrupt. The trial decrypt then uses:

aad := buildStorageAAD(encryption.EnvelopeVersionV1, 0 /*canonical*/, kid, hdr[:], pebbleKey)

Since the AAD uses canonical version=0x01 and flag=0 (the actual encrypt-time values), a corrupted version/flag byte on disk doesn't affect the trial AAD at all. The GCM tag was computed over the canonical AAD at write time → it still verifies under the canonical AAD at trial time → attack is caught. ✅


Bounds safety — Correct ✅

The pre-existing guard len(body) < encryption.EnvelopeOverhead (line 233) ensures len(body) >= 34 = HeaderSize(18) + TagSize(16). Therefore:

  • body[6:18]: requires len(body) >= 18 — satisfied ✅
  • body[18:]: requires len(body) >= 18 — satisfied; len(ct) >= 16 = TagSize, which is the minimum valid GCM body ✅
  • cipher.Decrypt validates len(nonce) == NonceSize = 12 — our slice is exactly 12 bytes ✅

Attack coverage matrix — Complete for the commit's claimed scope ✅

Attack variant Caught by
encState flip alone Round-7 (candidate DEK trial)
encState + tombstone flip Round-8 (canonical tombstone=false in trial AAD)
encState + expireAt rewrite (orig=0) Round-8 (candidateExpireAts includes 0)
encState + tombstone + expireAt all flip Round-8
encState + key_id rewrite to unloaded Round-7 (LoadedKeyIDs iteration, not trusting on-disk key_id)
encState + envelope version byte corrupted Round-10 (bypass DecodeEnvelope, use fixed offsets)
encState + envelope flag byte corrupted Round-10 (canonical flag=0 in trial AAD)
encState + version AND flag corrupted Round-10

Residual (documented, deferred to Stage 8):

  • encState flip + nonce bytes corrupted → trial uses wrong nonce → GCM check fails for all candidates → guard returns nil → caller gets garbled bytes (not original plaintext). Ciphertext corruption is the attacker's loss — the original plaintext is unrecoverable.
  • encState flip + expireAt rewrite when original expireAt was non-zero.

TestEncryption_RebadgeAttackEnvelopeHeaderCorruption — Correct ✅

All three cases are correctly structured:

  • Case 1 (version corrupted): raw[valueHeaderSize] = 0x07body[0] (version) becomes 0x07. With old code, DecodeEnvelope would fail on 0x07 != 0x01. With new code, nonce = body[6:18] (still correct), ct = body[18:] (still correct), canonical AAD uses 0x01 → GCM verifies → ErrEncryptedReadIntegrity. ✅
  • Case 2 (flag corrupted): raw[valueHeaderSize+1] = 0xffbody[1] (flag) becomes 0xff. Old code would pass DecodeEnvelope (flag is not validated) but env.Flag = 0xff would be used in the trial AAD instead of 0x00 → AAD mismatch → trial decrypt fails → guard returns nil. New code uses canonical flag=0 → correct trial AAD → GCM verifies → ErrEncryptedReadIntegrity. ✅
  • Case 3 (both corrupted): Combination of the above. ✅

Note about case 2 (flag corruption with old round-9 code): This was a silent bypass in round-9 even without the DecodeEnvelope path, because env.Flag from the parsed envelope was used in the trial AAD. An attacker who corrupted body[1] (flag) from 0x00 to 0xff would cause env.Flag = 0xff → trial AAD used flag=0xff instead of the encrypt-time flag=0x00 → GCM verification failed for all candidates → guard returned nil. Round-10 closes this by canonicalizing flag=0 in the trial AAD. The commit message lists flag corruption as "round-9 new" which is accurate. ✅


decryptForKey stale comment — Still not addressed ⚠️

The block comment at lines 137–148 still describes the round-3 mechanism ("if it parses as a well-formed envelope AND its key_id is loaded in the keystore, reject as ErrEncryptedReadIntegrity") which has been superseded by the round-7/8/9/10 AEAD trial decrypt approach. This was flagged in rounds 7, 8, and 9 and remains unchanged.

Additionally, line 151 states: "its Tombstone, ExpireAt, and EncState are reproduced into the AAD" — this is misleading for the cleartext rebadge path (where tombstone is canonicalized to false and encState to encStateEncrypted in the trial AAD), though accurate for the encrypted path.

The rejectRebadgedEnvelope history section (lines 218–224) still says "round-7 (this change)" with no round-8, 9, or 10 entries. The misleading "kid-rewrite-to-loaded-other-DEK variant still falls through" phrase (which only applies when the original DEK has been retired) is also still present, unchanged since round-7.

These are documentation-only issues with no correctness impact. Since they've persisted across four rounds, it may be simplest to address them in a single documentation cleanup commit before merge.

Fix stale comments →


Cumulative summary — All rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Round-1
Codex P1: value-header not bound into AAD ✅ Round-3
readVisibleVersion / processFoundValue: decrypt-before-visibility ✅ Round-3
Codex P1: encState-flip rebadge → raw ciphertext as plaintext ✅ Round-4
Codex P1: key_id-rewrite bypasses keystore collision guard ✅ Round-5
Empty-value nilExistsAt false negative ✅ Round-5
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Round-6
Codex P1: shape-only reject → false-positives on legitimate cleartext ✅ Round-7
Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch ✅ Round-8
Codex P1: snapshot restore cap 34 bytes short for encrypted entries ✅ Round-9
Codex P1: DecodeEnvelope fail → cleartext fall-through (version/flag corruption) Round-10
Residual: encState + nonce/ciphertext corruption → garbled bytes (not original plaintext) ⚠️ Documented; attacker gain: no original plaintext revealed
Residual: encState + expireAt rewrite when orig expireAt non-zero ⚠️ Documented, deferred to Stage 8
Stale decryptForKey comment (lines 137–148) + misleading line 151 ⚠️ Carried since round-7 — documentation-only
rejectRebadgedEnvelope history section out of date (round-7 as latest) + misleading "falls through" ⚠️ Carried since round-7 — documentation-only
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext ⚠️ Minor nit, round-7
CounterNonceFactory exported from non-test file ⚠️ Minor nit, round-2

All security-critical paths are now fully protected across all ten rounds. The storage envelope wiring is correct for reads (GetAt, scan), writes (PutAt, ExpireAt, ApplyMutations), the delete-prefix visibility path, and snapshot save/restore. The remaining items are all documentation-only nits. The PR is ready for merge subject to CI green.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4da4e7d55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread store/encryption_glue.go Outdated
Comment on lines +280 to +284
// No (DEK, candidate-expireAt) combination produces a tag match.
// Body is either legitimate cleartext that happens to look
// envelope-shaped, an envelope under a retired DEK, or a
// partially-tampered envelope (residual gap above). The first
// case is the legitimate-mixed-mode outcome the round-6 codex
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail closed on relabeled envelopes with body tampering

rejectRebadgedEnvelope only raises ErrEncryptedReadIntegrity when a trial decrypt succeeds, and otherwise returns nil as cleartext. If an at-rest attacker flips encryption_state from encrypted to cleartext and corrupts any ciphertext/tag byte (body[18:]), every trial decrypt fails with integrity errors, this function falls through, and decryptForKey returns attacker-controlled envelope bytes as plaintext instead of surfacing an integrity failure. This leaves a fail-open tamper path for encrypted rows under the same disk-modification threat model this guard is meant to block.

Useful? React with 👍 / 👎.

… P1 round-10)

Codex round-10 re-flagged the ciphertext/tag-byte tampering case as
a fail-open path. The case was already documented as residual in
round-9's commit (and round-7's code comment, but only briefly.
Expand the in-code rationale to be explicit about:

  - which residuals exist (ct/tag tamper, non-zero expireAt rewrite)
  - why no further Stage 2 patch closes them (encryption_state
    itself is unauthenticated; the AAD reconstruction depends on
    encState to dispatch, so it cannot also bind it)
  - what the user observes (garbage bytes, not original plaintext —
    confidentiality is intact, integrity observability is the gap)
  - which variants the guard does catch
  - how Stage 8 closes the gap deterministically (move
    encryption_state into authenticated MVCC metadata)

No code change. Tests / lint still green.
EOF
)
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

PR742 round-11: codex P1 round-10 Fail closed on relabeled envelopes with body tampering への対応です (commit 2addfa1)。

結論: Stage 2 の wire format における fundamental limit に到達。コード追加なし、residual-gap の rationale 強化のみ。

根拠:

  • encryption_state bit 自体が unauthenticated (tombstone bit と同一 byte)
  • AAD reconstruction は encState で dispatch 判断するため AAD に encState を bind できない (循環依存)
  • attacker が encState flip と同時に ct/tag bytes を改変すると、trial decrypt の AAD reconstruction は正しいが Decrypt が ErrIntegrity で失敗 → fall through
  • ANY bit flip on disk that does NOT match a candidate AAD reconstruction is undetectable without authenticated metadata

Stage 2 がカバーする variant (catch 対象):

  1. encState flip alone (round-3)
  2. encState + value-header tamper (round-3 base + round-7)
  3. encState + envelope version/flag corruption (round-9)
  4. encState + key_id rewrite (round-4)
  5. encState + tombstone flip (round-7)
  6. encState + no-TTL expireAt rewrite (round-7)
  7. encState + tomb + expireAt 三重 (round-7)

residual (Stage 2 では catch 不可):

  • encState + ct/tag byte tamper (round-10): user は garbage bytes を受領、original plaintext は流出しない (attacker は DEK を持たないため)
  • encState + 非ゼロ expireAt rewrite: 同様に garbage

user-observable impact:

  • confidentiality: 維持 (plaintext 流出なし)
  • integrity observability: gap あり (ErrEncryptedReadIntegrity ではなく garbage cleartext)

Stage 8 deterministic closure:
docs/design/2026_04_29_proposed_data_at_rest_encryption.md §1656 Per-version encryption_state bit in MVCC metadata で encryption_state を MVCC metadata 側に移動 (authenticated)。attacker は metadata の AAD を破らずに encState bit を flip できなくなる。

コード上、この residual を明示する詳細 rationale comment を rejectRebadgedEnvelope に追加しました。次の codex review で同じ ct/tag tamper を P1 として再 flag した場合、それは Stage 8 まで持ち越すべき認識済み制約として扱う必要があります。

make test/lint green。

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 2m 19s —— View job


PR742 Round-11 Review

  • Read trigger comment and understand request
  • Read round-11 diff (commit 2addfa1)
  • Review rejectRebadgedEnvelope rationale comments
  • Verify fundamental-limit analysis is correct
  • Check coverage matrix completeness
  • Assess stale comments status (carried from rounds 7–10)
  • Post findings

Codex P1 round-10: ciphertext/tag-byte tamper — Fundamental limit confirmed ✅

The round-11 decision to expand the rationale rather than add code is correct. The circular dependency argument is sound and worth spelling out explicitly:

  1. encryption_state (bits 1–2 of the value-header byte 0) is unauthenticated — it lives in the same on-disk byte as the tombstone bit and is what decryptForKey reads to decide which branch to take.
  2. The rebadge guard fires on the cleartext branch (sv.EncState == encStateCleartext). Inside rejectRebadgedEnvelope, the trial AAD is built with canonical values: encState = encStateEncrypted, tombstone = false. This is correct.
  3. If the attacker flips encState 0b01→0b00 AND corrupts body[18:] (ciphertext or GCM tag bytes), the trial decrypts all fail for every loaded DEK — not because the AAD is wrong, but because the ciphertext/tag is no longer a valid AEAD ciphertext. The guard returns nil.
  4. To close this via AAD, you would need to include the original encState=0b01 in the AAD reconstruction — but the entire reason we're in this branch is that the on-disk byte says 0b00. Using the on-disk value would make every trial AAD mismatch the encrypt-time AAD; using canonical 0b01 is already what the code does, but that doesn't help when the ciphertext itself is corrupted.

This is a genuine circularity, not an oversight. Stage 8's plan to move encryption_state into authenticated MVCC metadata is the correct structural fix: the attacker would then need to corrupt the metadata's own AAD-bound bytes to flip that bit, which closes the loop.

User-observable impact in the residual case: the caller receives garbage bytes — the corrupted ciphertext content — not the original plaintext. The attacker does not hold the DEK and cannot recover the plaintext. This is an integrity observability gap (no ErrEncryptedReadIntegrity raised), not a confidentiality leak. The rationale comment at lines 277–278 correctly characterizes this: "the user observes garbage bytes, NOT the original plaintext."


Round-11 comment additions — Accurate ✅

The new text at encryption_glue.go:258–288 correctly states:

  • Which specific residual cases remain: body[18:] corruption and non-zero expireAt rewrite with attacker knowledge of the original. ✅
  • The cause: encryption_state cannot be bound into the trial AAD because the guard's entry condition is encState == cleartext, creating the circular dependency. ✅
  • The impact boundary: integrity observability gap, not confidentiality leak. ✅
  • The catch list (seven variants, rounds 3–9). ✅
  • The Stage 8 closure mechanism. ✅

One precision note on the catch-list wording (lines 283–287): the phrase "encState + value-header tamper" is slightly ambiguous — this could be read as covering all value-header flips, but the tombstone flip is separately listed. The intent is clear from context, but a reader unfamiliar with the PR history might wonder. Minor.


Four stale comments — Still not addressed ⚠️

These were flagged in rounds 7, 8, 9, and 10 and remain unchanged after round-11. Since round-11 is a documentation-only commit, this is a natural place to bundle them. None affect correctness but they actively mislead a reader:

1. decryptForKey lines 137–148 (most misleading): Still describes the round-3 mechanism — "if it parses as a well-formed envelope AND its key_id is loaded in the keystore... reject as ErrEncryptedReadIntegrity. False positives bounded by envelope-shape match + 32-bit key_id collision." The implementation has been AEAD trial decrypt since round-7 via rejectRebadgedEnvelope. A reader who reads only this function-level comment will have an entirely wrong mental model of the guard.

2. decryptForKey line 151: "its Tombstone, ExpireAt, and EncState are reproduced into the AAD". For the cleartext rebadge branch, tombstone is canonicalized to false (not reproduced from sv) and EncState is canonicalized to encStateEncrypted. The statement is only accurate for the encStateEncrypted branch (lines 169–171). This needs a clarification that round-11's own residual-gap rationale refers to.

3. rejectRebadgedEnvelope history section (lines 211–224): Still shows "round-7 (this change)" as the latest entry. Rounds 8 (canonical tombstone + expireAt enumeration), 9 (bypass DecodeEnvelope, fixed offsets), and 10 (canonical flag) are all absent. The history section is the canonical record of why the guard evolved this way — it should be current.

4. Lines 221–223: "The kid-rewrite-to-loaded-other-DEK variant still falls through (AAD mismatch on the foreign DEK)". This is misleading: if the attacker rewrites env.KeyID from DEK A to loaded DEK B while DEK A is still loaded, round-7's LoadedKeyIDs() iteration tries A → AAD matches encrypt-time AAD → attack IS caught. The "falls through" case only applies when the original DEK A has been retired from the keystore. The comment should say "when the original encrypt-time DEK has been retired" rather than naming the foreign-DEK scenario as the trigger.

Since all four of these are in encryption_glue.go and round-11 already touches that file, a single cleanup commit here would close the documentation gap before merge.

Fix stale comments →


Cumulative summary — All rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Round-1
Codex P1: value-header not bound into AAD ✅ Round-3
readVisibleVersion / processFoundValue: decrypt-before-visibility ✅ Round-3
Codex P1: encState-flip rebadge → raw ciphertext as plaintext ✅ Round-4
Codex P1: key_id-rewrite bypasses keystore collision guard ✅ Round-5
Empty-value nilExistsAt false negative ✅ Round-5
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Round-6
Codex P1: shape-only reject → false-positives on legitimate cleartext ✅ Round-7
Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch ✅ Round-8
Codex P1: snapshot restore cap 34 bytes short for encrypted entries ✅ Round-9
Codex P1: DecodeEnvelope fail → cleartext fall-through (version/flag corruption) ✅ Round-10
Codex P1: encState + ct/tag corruption → fundamental wire-format limit Round-11: correctly identified as fundamental limit; rationale expanded
Residual: encState + ct/tag tamper → garbage bytes (not original plaintext) ⚠️ Documented in Stage 2; deferred to Stage 8. Confidentiality intact.
Residual: encState + non-zero expireAt rewrite → same class ⚠️ Same scope as above
Stale decryptForKey comment (lines 137–148, 151) ⚠️ Carried since round-7 — four rounds of flagging, still unaddressed
rejectRebadgedEnvelope history section missing rounds 8–11 ⚠️ Carried since round-7
Misleading "kid-rewrite-to-loaded-other-DEK still falls through" ⚠️ Carried since round-7; only applies when original DEK is retired
Prerequisite assertion missing in TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext ⚠️ Minor nit, carried from round-7
CounterNonceFactory exported from non-test file ⚠️ Minor nit, carried from round-2

The security-critical analysis is complete and correct. Round-11's documentation-only conclusion is well-reasoned. The four stale comments in decryptForKey and rejectRebadgedEnvelope are the only actionable items before merge — they've persisted across four rounds and are most efficiently addressed together in a single cleanup commit.

bootjp added 2 commits May 8, 2026 05:41
CLAUDE.md guidance: comments should explain WHY a piece of code
exists, not WHO requested it or in WHICH PR review round. Across
PR #742's 11 review rounds the rebadge guard's comments
accumulated provenance markers ("PR742 codex P1 round-3", "round-7
— this change", "rounds 6, 7, 9, 10", etc.) that describe a
historical sequence of fixes rather than the current state of the
code, and they rot every time a new round lands.

Single cleanup commit:

  - decryptForKey doc: drop the "Cleartext-rebadge guard (PR742
    codex P1, round-3)" block. That paragraph described the
    round-3 implementation (envelope-shape + key_id-loaded check)
    which has since been fully replaced by trial decryption; it
    reads like an accurate description of the code but no longer
    matches it. The current rebadge-guard rationale already lives
    on rejectRebadgedEnvelope, so the doc cross-reference is the
    only thing decryptForKey needs.
  - rejectRebadgedEnvelope doc: drop the "PR742 review history"
    block (round-3 / round-5 / round-7 narrative) and the
    inline "(round-7 …)" / "(round-8 …)" / "(round-9 …)"
    asides. Replace with a single rationale describing the
    canonical-AAD trial-decrypt design as it stands today and an
    explicit residual-gap section pointing at Stage 8.
  - decryptForKey body: drop "(PR742 codex P1 round-4)" from the
    nil-plaintext normalization comment.
  - buildStorageAAD doc: drop "PR742 round-1" / "round-2"
    references; keep the technical rationale (pre-sizing avoids
    re-allocation; value-header inclusion binds tombstone /
    encryption_state / expireAt).
  - lsm_store.go processFoundValue: drop "(PR742 codex P1 …)"
    aside on the decrypt-before-visibility-check rationale.
  - lsm_store.go readRestoreEntry: drop "codex P1 PR742 round-7
    caught a case where" history and rephrase as a forward
    rationale.
  - cipher.go LoadedKeyIDs: drop "(PR742 codex P1 family)" from
    the use-case comment.
  - encryption_glue.go encryptForKey AAD doc: drop "(PR742 codex
    P1)" trailing reference.

No code change. The technical rationale for every defence (header
binding, key_id substitution, expireAt enumeration, canonical
version/flag, residual-gap-deferred-to-Stage-8) is preserved
verbatim — only the round-N timeline is gone. Tests / lint green.
Jepsen unit tests have been failing with persistent Maven artifact
resolution errors (Could not find artifact org.clojure:clojure:1.7.0
etc.) on commit 20e6d09 across four manual re-runs. The previous
commit on the same branch (2addfa1) ran the same workflow with no
code-tree changes between (cleanup is comment-only) and passed.
This empty commit triggers a fresh Lein resolution attempt against
Maven Central / Clojars at a different time.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

Note: Jepsen test failures are infra-side, not code-side

CI test job has been failing with persistent Maven artifact resolution errors (Could not find artifact org.clojure:clojure:jar:... for both 1.7.0 and 1.11.1) since ~20:35 UTC.

Verification: just triggered the same Jepsen workflow on main (run 25521941547) — main also fails with identical Maven resolution errors. Same workflow on the same branch passed at 17:12 UTC (commit 2addfa1) and earlier today; nothing in the cleanup commit (20e6d099, doc-only) or the empty retrigger (f2254f86) touches Lein deps.

This is GitHub Actions runner ↔ Maven Central / Clojars network/repository issue. Operator action: wait for upstream recovery, or pin Lein to a specific version that doesn't trigger resolution.

build / golangci-lint / test (ubuntu-latest, the Go test suite) all pass. The integrity/security review (rounds 1-11) is unchanged.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 7, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @bootjp's task in 4m 35s —— View job


PR742 Round-12 Review

  • Read current state of changed files
  • Verify stale comments from rounds 7-11 were addressed (commit 20e6d09)
  • Fresh review of encryption_glue.go
  • Fresh review of lsm_store.go changes
  • Review test coverage
  • Post findings

Docs cleanup commit 20e6d09 — All round-7/8/9/10/11 comment nits resolved ✅

The commit correctly strips review-provenance from every affected site:

  • decryptForKey: The old "Cleartext-rebadge guard (PR742 codex P1, round-3)" block describing the stale round-3 mechanism (shape + key_id-loaded check) is gone. The current comment accurately describes the two dispatch paths and cross-references rejectRebadgedEnvelope for the cleartext guard.
  • rejectRebadgedEnvelope: The PR742 round history section (round-3 / round-5 / round-7 narrative) and all (round-N …) asides are gone. The current comment describes the canonical-AAD trial-decrypt design as it stands today, including the full residual-gap section pointing at Stage 8.
  • The misleading "kid-rewrite-to-loaded-other-DEK variant still falls through" phrase (which was only true when the original DEK has been retired, not in the general case) is gone.
  • buildStorageAAD, processFoundValue, readRestoreEntry, cipher.go LoadedKeyIDs, encryptForKey: All (PR742 …) provenance asides removed; technical rationale preserved verbatim.

Security completeness — Fully protected ✅

All write and read paths that branch on value-header fields are now authenticated before branching:

Path Function Status
GetAt / key lookup readVisibleVersion ✅ decrypt before tombstone/expireAt check
GetRange / scan processFoundValue ✅ decrypt before tombstone/expireAt check
DeletePrefixAt isVisibleLiveKey ✅ decrypt before tombstone/expireAt check (round-6)
Snapshot restore readRestoreEntry cap ✅ +34 bytes for EnvelopeOverhead (round-9)
Rebadge: encState flip alone rejectRebadgedEnvelope ✅ trial decrypt
Rebadge: + key_id rewrite rejectRebadgedEnvelope ✅ LoadedKeyIDs iteration, not trusting on-disk key_id
Rebadge: + tombstone/expireAt flip rejectRebadgedEnvelope ✅ canonical tombstone=false, expireAt enumeration
Rebadge: + version/flag byte corrupt rejectRebadgedEnvelope ✅ fixed-offset slicing, canonical version/flag in trial AAD
Rebadge: + ct/tag corruption (residual) ⚠️ Documented & deferred to Stage 8
Empty-value nil → ExistsAt=false decryptForKey ✅ nil → []byte{} normalization

One remaining minor inaccuracy in decryptForKey comment ⚠️

encryption_glue.go:138-142:

// sv is the storedValue freshly decoded from the on-disk bytes; its
// Tombstone, ExpireAt, and EncState are reproduced into the AAD so
// any flip on disk fails GCM verification.

For the encrypted branch (encStateEncrypted, lines 157-160), this is accurate: writeValueHeaderBytes(hdr[:], sv.Tombstone, sv.ExpireAt, sv.EncState) reproduces all three fields from sv into the AAD.

For the cleartext rebadge guard path, tombstone is hardcoded false and EncState is hardcoded encStateEncrypted in the trial AAD — they are canonicalized, not reproduced from sv. The current paragraph structure (the preceding sentence already says "encState=cleartext returns the body verbatim after the envelope-rebadge guard below") means a careful reader will understand the "reproduced" statement applies to the encrypted branch, but the comment doesn't make this scope explicit.

A one-line scope qualifier would remove the ambiguity: "For the encrypted branch, sv's Tombstone, ExpireAt, and EncState are reproduced into the AAD…" This is the only residual inaccuracy from the round-8/11 nit family.

Fix this →


Residual-gap comment: nonce bytes omitted ⚠️

rejectRebadgedEnvelope, the residual-gap block says the trial falls through when the attacker corrupts "specifically body[HeaderSize:] (ciphertext / tag)". It omits the nonce region body[HeaderAADSize:HeaderSize] (bytes 6–17): if the attacker also flips nonce bytes on disk, the trial uses the corrupted nonce → all cipher.Decrypt calls return wrong results → guard returns nil → caller gets garbage bytes. The outcome is identical to the ciphertext/tag case (garbage, not original plaintext), so this is an integrity-observability gap in the same Stage 8 bucket — but the comment's "specifically" clause is incomplete.

Fix this →


Minor nits (unchanged from prior rounds)

TestEncryption_RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext (lsm_store_encryption_test.go:735): The test constructs envelopeShaped bytes designed to pass DecodeEnvelope, but there is no assertion that they actually do. If the envelope format changes and the test payload no longer parses as an envelope, the test stops proving the "envelope-shaped cleartext is allowed through" invariant without a failure signal. Adding a guard at the start of the test (if _, err := encryption.DecodeEnvelope(envelopeShaped); err != nil { t.Fatalf(...) }) would pin the prerequisite. Nit — not a blocker.

CounterNonceFactory (encryption_test_helpers.go): Exported from a non-_test.go file, so it appears in go doc store as a public production API. The inline comment already explains the cross-package test use case. If reducing the public surface is desirable before Stage 7 ships the registry-backed factory, renaming the file to something that makes the test-grade status obvious (e.g. encryption_testing.go with a prominent doc comment) is the minimal change. Not a blocker.


Cumulative summary — All 12 rounds

Finding Status
Gemini medium ×2: AAD double-allocation ✅ Round-1
Codex P1: value-header not bound into AAD ✅ Round-3
readVisibleVersion / processFoundValue: decrypt-before-visibility ✅ Round-3
Codex P1: encState-flip rebadge → raw ciphertext as plaintext ✅ Round-4
Codex P1: key_id-rewrite bypasses keystore collision guard ✅ Round-5
Empty-value nil → ExistsAt false negative ✅ Round-5
Write-side isVisibleLiveKey / scanDeletePrefix gap ✅ Round-6
Codex P1: shape-only reject → false-positives on legitimate cleartext ✅ Round-7
Codex P1: combined encState + tombstone/expireAt flip → trial AAD mismatch ✅ Round-8
Codex P1: snapshot restore cap 34 bytes short ✅ Round-9
Codex P1: DecodeEnvelope fail → cleartext fall-through (version/flag) ✅ Round-10
Codex P1: ct/tag tamper → fundamental wire-format limit ✅ Round-11: rationale expanded
Stale comments from rounds 7-11 Round-12: all addressed in commit 20e6d09
Residual: encState + ct/tag/nonce tamper → garbage bytes (not original plaintext) ⚠️ Documented; deferred to Stage 8. Confidentiality intact.
Residual: encState + non-zero expireAt rewrite ⚠️ Same scope
decryptForKey line 138: "reproduced" applies to encrypted branch only ⚠️ Minor — one-word qualifier fixes it
Residual-gap comment: nonce region not listed alongside ct/tag ⚠️ Minor — documentation precision
Missing prerequisite assertion in RebadgeGuardAllowsLegitimateEnvelopeShapedCleartext ⚠️ Nit, round-7
CounterNonceFactory exported from non-test file ⚠️ Nit, round-2

The PR is security-complete. All correctness and integrity paths are protected and tested. The two remaining ⚠️ items that are not already-accepted residuals are one-line comment precision fixes; neither affects runtime behavior. Ready to merge subject to CI green (Jepsen failures are infra-side per the Maven resolution note).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant