A crafted or corrupt segment file produces an undefined-behavior out-of-bounds read that escapes the mmap'd region.
Summary
MmapVectorSegment::open reads dim and count directly from the file header without any range checks, then computes the expected file size with unchecked usize multiplication. On 64-bit hosts the multiplication wraps silently for crafted headers, the size validation passes with a bogus expected value, and every subsequent get_vector() call performs an unchecked slice::from_raw_parts at a wrapped offset — reading arbitrary memory adjacent to the mmap region.
Current code
nodedb-vector/src/mmap_segment.rs:86-114
let dim = unsafe {
let ptr = base as *const u32;
u32::from_le(*ptr) as usize
};
let count = unsafe {
let ptr = base.add(4) as *const u32;
u32::from_le(*ptr) as usize
};
let expected = HEADER_SIZE + count * dim * 4; // usize * usize, unchecked
if file_size < expected {
...
return Err(...);
}
nodedb-vector/src/mmap_segment.rs:117-129
pub fn get_vector(&self, id: u32) -> Option<&[f32]> {
let idx = id as usize;
if idx >= self.count {
return None;
}
let offset = self.data_offset + idx * self.dim * 4; // also unchecked
unsafe {
let ptr = self.base.add(offset) as *const f32;
Some(std::slice::from_raw_parts(ptr, self.dim)) // unchecked slice into mmap
}
}
Same pattern repeats in prefetch at lines 132-139.
Why it's broken
count * dim * 4 uses usize multiplication. On 64-bit hosts, count = dim = 0x4000_0001 yields count * dim * 4 that wraps to a tiny value — the file_size < expected check passes for any small file.
open() then stores the bogus dim and count on self.
get_vector(id) recomputes idx * self.dim * 4 — same overflow, producing an attacker-controllable offset.
base.add(offset) + slice::from_raw_parts(ptr, dim) reads dim * 4 bytes starting from an out-of-range pointer. Depending on what follows the mmap region in the address space, this is either a SIGSEGV/SIGBUS DoS or a silent read of adjacent process memory (heap, other mmaps, stack for tiny allocations).
This is triggerable by anyone who can place a file under the data directory (typical threat model includes a crashed/truncated file from an unclean shutdown, a corrupt segment on cold-storage sync, or a malicious neighbor on a shared filesystem).
Reproduction
# Build a 16-byte file whose header claims dim=count=0x40000001.
python3 -c 'import sys; sys.stdout.buffer.write(bytes.fromhex("01000040 01000040") + b"\x00"*8)' \
> $DATA_DIR/vector_segment_0.bin
# Start nodedb; trigger a vector search that opens the segment.
# Observe SIGSEGV/SIGBUS, or run under ASan to see the OOB read.
Fuzzing MmapVectorSegment::open on the first 8 bytes with AFL++ trips this within a second.
Notes
- Found during a CPU/memory audit sweep of
nodedb-vector/src/*.
- All arithmetic here should be
checked_mul / checked_add returning Err on overflow.
get_vector should additionally verify offset.checked_add(dim * 4) <= mmap_size before constructing the slice, since header-time validation alone isn't sufficient once dim/count are attacker-controlled.
A crafted or corrupt segment file produces an undefined-behavior out-of-bounds read that escapes the mmap'd region.
Summary
MmapVectorSegment::openreadsdimandcountdirectly from the file header without any range checks, then computes the expected file size with uncheckedusizemultiplication. On 64-bit hosts the multiplication wraps silently for crafted headers, the size validation passes with a bogusexpectedvalue, and every subsequentget_vector()call performs an uncheckedslice::from_raw_partsat a wrapped offset — reading arbitrary memory adjacent to the mmap region.Current code
nodedb-vector/src/mmap_segment.rs:86-114nodedb-vector/src/mmap_segment.rs:117-129Same pattern repeats in
prefetchat lines 132-139.Why it's broken
count * dim * 4usesusizemultiplication. On 64-bit hosts,count = dim = 0x4000_0001yieldscount * dim * 4that wraps to a tiny value — thefile_size < expectedcheck passes for any small file.open()then stores the bogusdimandcountonself.get_vector(id)recomputesidx * self.dim * 4— same overflow, producing an attacker-controllable offset.base.add(offset)+slice::from_raw_parts(ptr, dim)readsdim * 4bytes starting from an out-of-range pointer. Depending on what follows the mmap region in the address space, this is either a SIGSEGV/SIGBUS DoS or a silent read of adjacent process memory (heap, other mmaps, stack for tiny allocations).This is triggerable by anyone who can place a file under the data directory (typical threat model includes a crashed/truncated file from an unclean shutdown, a corrupt segment on cold-storage sync, or a malicious neighbor on a shared filesystem).
Reproduction
Fuzzing
MmapVectorSegment::openon the first 8 bytes with AFL++ trips this within a second.Notes
nodedb-vector/src/*.checked_mul/checked_addreturningErron overflow.get_vectorshould additionally verifyoffset.checked_add(dim * 4) <= mmap_sizebefore constructing the slice, since header-time validation alone isn't sufficient oncedim/countare attacker-controlled.