fix(adi): correct multi-level char-key index build (runaway + truncated child page)#59
Conversation
There was a problem hiding this comment.
Code Review
This pull request fixes critical bugs in the ADI index driver related to character-key indexes and multi-level B-tree splits, including widening the child page number field to 4 bytes, reserving pages immediately upon allocation, and properly handling root-branch splits to avoid infinite loops. It also introduces a defense-in-depth descent depth check and a regression test. The review feedback suggests removing a duplicated comment block in the descent check and replacing a potentially unsafe std::memcpy with std::snprintf in the test code to prevent buffer overflows.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // Defense-in-depth: a healthy B-tree is only a handful of levels deep. | ||
| // If the descent ever exceeds a sane bound, the index is corrupt | ||
| // (e.g. a self-referential child pointer); fail loudly instead of | ||
| // looping forever and exhausting memory. | ||
| // Defense-in-depth: a healthy B-tree is only a handful of levels deep. | ||
| // If the descent ever exceeds a sane bound the index is corrupt (e.g. a | ||
| // self-referential child pointer); fail loudly instead of looping | ||
| // forever and exhausting memory. | ||
| if (path.size() > 128) { | ||
| return util::Error{6106, 0, | ||
| "ADI index corrupt: descent exceeded max depth", ""}; | ||
| } |
There was a problem hiding this comment.
The comment explaining the defense-in-depth check is duplicated with slightly different phrasing. Removing the duplicate block keeps the code clean and readable.
// Defense-in-depth: a healthy B-tree is only a handful of levels deep.
// If the descent ever exceeds a sane bound, the index is corrupt
// (e.g. a self-referential child pointer); fail loudly instead of
// looping forever and exhausting memory.
if (path.size() > 128) {
return util::Error{6106, 0,
"ADI index corrupt: descent exceeded max depth", ""};
}| UNSIGNED8 srv[260]{}; | ||
| std::memcpy(srv, tmp.string().c_str(), tmp.string().size()); |
There was a problem hiding this comment.
Using std::memcpy to copy the temporary directory path into the fixed-size srv buffer can cause a buffer overflow if the path length exceeds 259 characters. Using std::snprintf instead is safer as it guarantees null-termination and prevents buffer overflow.
| UNSIGNED8 srv[260]{}; | |
| std::memcpy(srv, tmp.string().c_str(), tmp.string().size()); | |
| UNSIGNED8 srv[260]{}; | |
| std::snprintf(reinterpret_cast<char*>(srv), sizeof(srv), "%s", tmp.string().c_str()); |
…ed page) Building a character-key ADI index that grew past a single page level ran away in time and memory: a 25k-row tag consumed minutes of CPU and gigabytes of RAM, and never finished. Two defects, both in the multi-level path: 1. Root-branch split reused root_page_ for the new root AND its left child. When the root branch itself split, promote_split_ wrote the new root to root_page_ while the left half had been left in place at the same page, so the new root's first child pointed at itself. The next insert's descent then looped forever, growing the path stack without bound. Fix: when the branch being split is the root, move the left half to a fresh page (mirrors the root dense-leaf split, which already pushes both halves to new pages). 2. The character branch entry stored the child page number in a single byte, capping a char-key index at 256 pages and silently truncating larger page numbers into corrupt child pointers. Widen it to 4 bytes little-endian, the same width the numeric branch entry already uses. Also: alloc_page_ now reserves the page by extending the file, so two allocations issued before the first is written can't hand out the same number; and the insert descent fails loudly (index-corrupt error) instead of looping forever if it ever exceeds a sane depth. Adds abi_adi_multilevel_build_test (8k char rows -> multi-level tree, full ordered walk + spread seeks). Full unit suite 575/575, 0 regressions; the ADS-created reference fixtures (leases.adi / landlords.adi) still read. Known remaining limit (separate, not addressed here): the dense-leaf record number is 2 bytes, so a tag over 65535 rows still raises 5000.
76dc821 to
28e290b
Compare
Fixes #58.
Building a character-key ADI index that grows past a single page level ran away in time and memory — a 25k-row tag consumed minutes of CPU and gigabytes of RAM and never finished. Two defects in the multi-level path, plus hardening.
1. Root-branch split produced a self-referential child
When the root branch itself split, the left half was left in place at
frame.page_no(==root_page_) and thenpromote_split_rewrote the new root branch to that sameroot_page_. The new root's first child therefore pointed at itself. The next insert's descent looped forever, growing the path stack without bound — the source of the time/memory runaway.Fix: when the branch being split is the root, move the left half to a fresh page (mirrors the root dense-leaf split, which already pushes both halves to new pages).
2. Character branch entry truncated the child page number
The char branch entry stored the child page in a single byte (
page_no & 0xFF), capping a char-key index at 256 pages and silently corrupting larger page pointers. Widened to 4 bytes little-endian — the same width the numeric branch entry already uses.Hardening
alloc_page_now reserves the page by extending the file, so two allocations issued before the first is written can't return the same number.Tests
abi_adi_multilevel_build_test: 8k character rows force a multi-level tree (root-branch split); verifies the build completes, a full ordered walk visits every row ascending, and spread seeks all hit. This case did not terminate before the fix.leases.adi/landlords.adi) still read correctly.Known remaining limit (separate, not in this PR)
The dense-leaf record number is 2 bytes, so a single tag over 65535 rows still raises
5000. That needs a dense-leaf format change and is left for a follow-up.