Skip to content

fix(cdx): size a character index key from the field width, not the trimmed first record#68

Merged
FiveTechSoft merged 2 commits into
FiveTechSoft:mainfrom
Admnwk:fix/cdx-char-key-length
Jun 24, 2026
Merged

fix(cdx): size a character index key from the field width, not the trimmed first record#68
FiveTechSoft merged 2 commits into
FiveTechSoft:mainfrom
Admnwk:fix/cdx-char-key-length

Conversation

@Admnwk

@Admnwk Admnwk commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Problem

A character index key is fixed-width on disk, but AdsCreateIndex61 derived the
key length from the trimmed value of the first record. When the first row is
short and later rows share a longer prefix, every later key is truncated to the
first row's width and collapses onto the same key — distinct values become
indistinguishable and a seek lands on the wrong record, both inside the index
and for native FoxPro/Clipper readers of the bag.

Reproduction

A NOME C(30) table whose first row is "ANA" and whose later rows are
"ANABELA CARDOSO" / "ANABELA FERREIRA": before the fix all three keys are
stored as the 3-byte "ANA", so seek("ANABELA FERREIRA") returns record 1
instead of record 3.

Fix

Derive the key width from the declared field length for a bare character field;
fall back to the untrimmed first-record key width for a composite
expression; keep the 32-char default only for an empty table with nothing to
probe. (Numeric CDX/NTX key widths are unchanged.)

Test

Adds abi_cdx_char_keylen_test, mirroring the existing NTX numeric-key interop
test: it builds the index through the ACE ABI and asserts every stored key is
the full field width and that a seek for a long, prefix-sharing name finds the
right record. The change is additive; the full suite stays green at 739/739.

🤖 Generated with Claude Code

…immed first record

A character index key is fixed-width on disk, but AdsCreateIndex61 derived the
key length from the *trimmed* value of the first record. When the first row is
short and later rows share a longer prefix (e.g. "ANA", then "ANABELA CARDOSO"
and "ANABELA FERREIRA"), every later key was truncated to the first row's width
and collapsed onto the same key: distinct names became indistinguishable and a
seek landed on the wrong record -- both inside the index and for native
FoxPro/Clipper readers of the bag.

Derive the width from the declared field length for a bare character field;
fall back to the untrimmed first-record key width for a composite expression;
keep the 32-char default only for an empty table with nothing to probe.

Adds abi_cdx_char_keylen_test, mirroring the NTX numeric-key interop test: it
asserts every stored key is the full field width and that a seek for a long,
prefix-sharing name finds the right record. Full suite 739/739.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019Um1qrC4vrHD4rA643CjAJ

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies how the on-disk key length is determined for character keys during index creation, preferring the declared field width of bare character fields over the trimmed value of the first record to prevent key truncation and collisions. A new unit test is added to verify this fix. The reviewer pointed out a significant issue where composite expressions will always result in a key length of 254 bytes due to padding in evaluate_index_expr, which wastes disk space and degrades performance.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/abi/ace_exports.cpp
Comment on lines +6977 to +6981
if (auto k = openads::engine::evaluate_index_expr(*t, expr, 254)) {
std::size_t n = std::move(k).value().size(); // untrimmed
if (n > 0)
klen = static_cast<std::uint16_t>(
std::min<std::size_t>(n, 254));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Mismatch in Key Length Determination for Composite Expressions\n\nThere is a subtle but significant issue with the fallback logic for composite expressions here.\n\n1. evaluate_index_expr Always Pads:\n The call openads::engine::evaluate_index_expr(*t, expr, 254) explicitly passes 254 as the key_len. Inside evaluate_index_expr, the returned string is always padded or resized to exactly key_len (254). Therefore, std::move(k).value().size() will always be 254, and klen will always be set to 254 for any composite expression. This results in unnecessarily large, fixed-width index keys on disk (254 bytes), wasting significant disk space and memory, and degrading performance.\n\n2. read_field Trims Trailing Spaces:\n Even if evaluate_index_expr did not pad the result, t->read_field trims trailing spaces from character fields. For an expression like UPPER(NOME) where the first record is "ANA", the evaluator only sees "ANA" (length 3). Thus, the evaluated length would still be 3, which would re-introduce the truncation bug for subsequent longer records (e.g., "ANABELA").\n\n### Suggested Solutions:\nTo correctly determine the untrimmed key length of a composite expression without defaulting to 254 or truncating:\n- Option A (Architectural): Introduce a way to evaluate index expressions using the declared/padded field lengths instead of trimmed values (e.g., an option in read_field or a separate evaluation path).\n- Option B (Heuristic/Fallback): If a robust evaluation is too complex, consider using a more reasonable default fallback length (e.g., 128 or max(32, trimmed_len)) or document that composite expressions will default to 254 bytes.

@FiveTechSoft FiveTechSoft merged commit 377a413 into FiveTechSoft:main Jun 24, 2026
5 checks passed
FiveTechSoft added a commit that referenced this pull request Jun 24, 2026
…ld fix

Record PR #68 (character index key sized from field width, not the
trimmed first record) and the <cstdint> include that unblocked the
ninja-clang -Werror CI job. Bump fallback project VERSION to 1.2.3.
Suite now 739/739.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FiveTechSoft added a commit that referenced this pull request Jun 26, 2026
fix(cdx): don't pin composite index keys to the 254-byte probe width (follow-up to #68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants