Skip to content

AES-GCM nonce reuse in WAL encryption on LSN rewind / snapshot restore #36

@hollanf

Description

@hollanf

Breaks WAL confidentiality and enables forgery under a realistic operator workflow.

Summary

nodedb-wal/src/crypto.rs::lsn_to_nonce derives the AES-256-GCM 96-bit nonce solely from the 8-byte LSN, with the top 4 bytes hard-coded to zero. There is no per-writer epoch, random prefix, or any value committed to the segment header that would disambiguate nonces across WAL lifetimes.

AES-GCM requires that every (key, nonce) tuple be unique; reusing a nonce with the same key catastrophically breaks both confidentiality (keystream recoverable via XOR) and integrity (GCM auth tag forgery is well-documented once a nonce collides).

Current code

nodedb-wal/src/crypto.rs:186-195

/// Derive a 12-byte nonce from an LSN.
///
/// AES-256-GCM requires a 96-bit (12 byte) nonce. Since LSNs are monotonically
/// increasing and globally unique, they make ideal deterministic nonces.
/// We zero-pad the 8-byte LSN to 12 bytes.
fn lsn_to_nonce(lsn: u64) -> aes_gcm::Nonce<aes_gcm::aead::consts::U12> {
    let mut nonce_bytes = [0u8; 12];
    nonce_bytes[..8].copy_from_slice(&lsn.to_le_bytes());
    nonce_bytes.into()
}

Combined with the writer at nodedb-wal/src/writer.rs:207:

let lsn = self.next_lsn.fetch_add(1, Ordering::Relaxed);

where next_lsn is seeded from recovery::recover() which scans the current WAL file from offset 0 and sets next_lsn = last_lsn + 1. There is no persisted monotonic counter independent of the WAL file's own content.

The KeyRing at crypto.rs:108-113 tracks current + previous keys for rotation but does not carry a nonce prefix.

Why it's broken

Any operator workflow that re-issues already-used LSNs under the same key reuses nonces. Concrete paths:

  1. Snapshot restore + WAL truncation. Restore a snapshot at LSN X, delete the WAL directory (standard restore flow), restart — recover() finds no file, next_lsn = 1. New writes encrypt lsn = 1, 2, 3, … with the same encryption key as previous writes of the same LSNs. The previous ciphertexts exist in backups, off-site replicas, or tape. Attacker XORs matching-LSN ciphertexts → pt_old ^ pt_new recovered → full plaintext if either is known.
  2. Operator clone / replay from backup into a new DB with the same key.
  3. Segment truncation before compaction that resets next_lsn.

Since the nonce space has no random component whatsoever, this is a latent landmine even when current operational procedures happen to rotate the key — anyone who misses the key-rotate step on restore loses confidentiality silently.

Reproduction

# Enable encryption with a fixed key.
nodedb --wal-encrypt-key=<K> ...
# Write some records.
INSERT ... ; INSERT ... ; INSERT ...
# Stop, save ciphertext of the WAL segment, then wipe the WAL dir.
rm -rf $DATA_DIR/wal
# Restart with same key.
nodedb --wal-encrypt-key=<K> ...
# Write DIFFERENT plaintext records.
INSERT ... ;
# Diff the two ciphertexts at matching LSNs — XOR recovers pt_old ^ pt_new.

Notes

  • Found during a CPU/memory audit sweep of nodedb-wal/src/*.
  • No evidence this has been exploited in the wild; filing as a design-level crypto defect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions