Skip to content

fix(adi): correct multi-level char-key index build (runaway + truncated child page)#59

Merged
FiveTechSoft merged 1 commit into
FiveTechSoft:mainfrom
Admnwk:fix/adi-index-build-runaway
Jun 23, 2026
Merged

fix(adi): correct multi-level char-key index build (runaway + truncated child page)#59
FiveTechSoft merged 1 commit into
FiveTechSoft:mainfrom
Admnwk:fix/adi-index-build-runaway

Conversation

@Admnwk

@Admnwk Admnwk commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Fixes #58.

Building a character-key ADI index that grows past a single page level ran away in time and memory — a 25k-row tag consumed minutes of CPU and gigabytes of RAM and never finished. Two defects in the multi-level path, plus hardening.

1. Root-branch split produced a self-referential child

When the root branch itself split, the left half was left in place at frame.page_no (== root_page_) and then promote_split_ rewrote the new root branch to that same root_page_. The new root's first child therefore pointed at itself. The next insert's descent looped forever, growing the path stack without bound — the source of the time/memory runaway.

Fix: when the branch being split is the root, move the left half to a fresh page (mirrors the root dense-leaf split, which already pushes both halves to new pages).

2. Character branch entry truncated the child page number

The char branch entry stored the child page in a single byte (page_no & 0xFF), capping a char-key index at 256 pages and silently corrupting larger page pointers. Widened to 4 bytes little-endian — the same width the numeric branch entry already uses.

Hardening

  • alloc_page_ now reserves the page by extending the file, so two allocations issued before the first is written can't return the same number.
  • The insert descent fails loudly with an index-corrupt error instead of looping forever if depth ever exceeds a sane bound.

Tests

  • New abi_adi_multilevel_build_test: 8k character rows force a multi-level tree (root-branch split); verifies the build completes, a full ordered walk visits every row ascending, and spread seeks all hit. This case did not terminate before the fix.
  • Full unit suite 575/575, 0 regressions; the ADS-created reference fixtures (leases.adi / landlords.adi) still read correctly.

Known remaining limit (separate, not in this PR)

The dense-leaf record number is 2 bytes, so a single tag over 65535 rows still raises 5000. That needs a dense-leaf format change and is left for a follow-up.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes critical bugs in the ADI index driver related to character-key indexes and multi-level B-tree splits, including widening the child page number field to 4 bytes, reserving pages immediately upon allocation, and properly handling root-branch splits to avoid infinite loops. It also introduces a defense-in-depth descent depth check and a regression test. The review feedback suggests removing a duplicated comment block in the descent check and replacing a potentially unsafe std::memcpy with std::snprintf in the test code to prevent buffer overflows.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1118 to +1129
// Defense-in-depth: a healthy B-tree is only a handful of levels deep.
// If the descent ever exceeds a sane bound, the index is corrupt
// (e.g. a self-referential child pointer); fail loudly instead of
// looping forever and exhausting memory.
// Defense-in-depth: a healthy B-tree is only a handful of levels deep.
// If the descent ever exceeds a sane bound the index is corrupt (e.g. a
// self-referential child pointer); fail loudly instead of looping
// forever and exhausting memory.
if (path.size() > 128) {
return util::Error{6106, 0,
"ADI index corrupt: descent exceeded max depth", ""};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment explaining the defense-in-depth check is duplicated with slightly different phrasing. Removing the duplicate block keeps the code clean and readable.

        // Defense-in-depth: a healthy B-tree is only a handful of levels deep.
        // If the descent ever exceeds a sane bound, the index is corrupt
        // (e.g. a self-referential child pointer); fail loudly instead of
        // looping forever and exhausting memory.
        if (path.size() > 128) {
            return util::Error{6106, 0,
                "ADI index corrupt: descent exceeded max depth", ""};
        }

Comment on lines +30 to +31
UNSIGNED8 srv[260]{};
std::memcpy(srv, tmp.string().c_str(), tmp.string().size());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::memcpy to copy the temporary directory path into the fixed-size srv buffer can cause a buffer overflow if the path length exceeds 259 characters. Using std::snprintf instead is safer as it guarantees null-termination and prevents buffer overflow.

Suggested change
UNSIGNED8 srv[260]{};
std::memcpy(srv, tmp.string().c_str(), tmp.string().size());
UNSIGNED8 srv[260]{};
std::snprintf(reinterpret_cast<char*>(srv), sizeof(srv), "%s", tmp.string().c_str());

…ed page)

Building a character-key ADI index that grew past a single page level ran
away in time and memory: a 25k-row tag consumed minutes of CPU and gigabytes
of RAM, and never finished.

Two defects, both in the multi-level path:

1. Root-branch split reused root_page_ for the new root AND its left child.
   When the root branch itself split, promote_split_ wrote the new root to
   root_page_ while the left half had been left in place at the same page,
   so the new root's first child pointed at itself. The next insert's descent
   then looped forever, growing the path stack without bound. Fix: when the
   branch being split is the root, move the left half to a fresh page (mirrors
   the root dense-leaf split, which already pushes both halves to new pages).

2. The character branch entry stored the child page number in a single byte,
   capping a char-key index at 256 pages and silently truncating larger page
   numbers into corrupt child pointers. Widen it to 4 bytes little-endian, the
   same width the numeric branch entry already uses.

Also: alloc_page_ now reserves the page by extending the file, so two
allocations issued before the first is written can't hand out the same number;
and the insert descent fails loudly (index-corrupt error) instead of looping
forever if it ever exceeds a sane depth.

Adds abi_adi_multilevel_build_test (8k char rows -> multi-level tree, full
ordered walk + spread seeks). Full unit suite 575/575, 0 regressions; the
ADS-created reference fixtures (leases.adi / landlords.adi) still read.

Known remaining limit (separate, not addressed here): the dense-leaf record
number is 2 bytes, so a tag over 65535 rows still raises 5000.
@FiveTechSoft FiveTechSoft force-pushed the fix/adi-index-build-runaway branch from 76dc821 to 28e290b Compare June 23, 2026 18:22
@FiveTechSoft FiveTechSoft merged commit 53c02e6 into FiveTechSoft:main Jun 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ADI: index build on ADT tables runs away (time + memory) past ~3000 rows — linear below, hard cliff above

3 participants