Skip to content

Integer overflow in mmap vector segment header enables OOB read (memory unsafety) #37

@hollanf

Description

@hollanf

A crafted or corrupt segment file produces an undefined-behavior out-of-bounds read that escapes the mmap'd region.

Summary

MmapVectorSegment::open reads dim and count directly from the file header without any range checks, then computes the expected file size with unchecked usize multiplication. On 64-bit hosts the multiplication wraps silently for crafted headers, the size validation passes with a bogus expected value, and every subsequent get_vector() call performs an unchecked slice::from_raw_parts at a wrapped offset — reading arbitrary memory adjacent to the mmap region.

Current code

nodedb-vector/src/mmap_segment.rs:86-114

let dim = unsafe {
    let ptr = base as *const u32;
    u32::from_le(*ptr) as usize
};
let count = unsafe {
    let ptr = base.add(4) as *const u32;
    u32::from_le(*ptr) as usize
};

let expected = HEADER_SIZE + count * dim * 4;   // usize * usize, unchecked
if file_size < expected {
    ...
    return Err(...);
}

nodedb-vector/src/mmap_segment.rs:117-129

pub fn get_vector(&self, id: u32) -> Option<&[f32]> {
    let idx = id as usize;
    if idx >= self.count {
        return None;
    }
    let offset = self.data_offset + idx * self.dim * 4;   // also unchecked
    unsafe {
        let ptr = self.base.add(offset) as *const f32;
        Some(std::slice::from_raw_parts(ptr, self.dim))   // unchecked slice into mmap
    }
}

Same pattern repeats in prefetch at lines 132-139.

Why it's broken

  1. count * dim * 4 uses usize multiplication. On 64-bit hosts, count = dim = 0x4000_0001 yields count * dim * 4 that wraps to a tiny value — the file_size < expected check passes for any small file.
  2. open() then stores the bogus dim and count on self.
  3. get_vector(id) recomputes idx * self.dim * 4 — same overflow, producing an attacker-controllable offset.
  4. base.add(offset) + slice::from_raw_parts(ptr, dim) reads dim * 4 bytes starting from an out-of-range pointer. Depending on what follows the mmap region in the address space, this is either a SIGSEGV/SIGBUS DoS or a silent read of adjacent process memory (heap, other mmaps, stack for tiny allocations).

This is triggerable by anyone who can place a file under the data directory (typical threat model includes a crashed/truncated file from an unclean shutdown, a corrupt segment on cold-storage sync, or a malicious neighbor on a shared filesystem).

Reproduction

# Build a 16-byte file whose header claims dim=count=0x40000001.
python3 -c 'import sys; sys.stdout.buffer.write(bytes.fromhex("01000040 01000040") + b"\x00"*8)' \
  > $DATA_DIR/vector_segment_0.bin
# Start nodedb; trigger a vector search that opens the segment.
# Observe SIGSEGV/SIGBUS, or run under ASan to see the OOB read.

Fuzzing MmapVectorSegment::open on the first 8 bytes with AFL++ trips this within a second.

Notes

  • Found during a CPU/memory audit sweep of nodedb-vector/src/*.
  • All arithmetic here should be checked_mul / checked_add returning Err on overflow.
  • get_vector should additionally verify offset.checked_add(dim * 4) <= mmap_size before constructing the slice, since header-time validation alone isn't sufficient once dim/count are attacker-controlled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions