fix(cdx): size a character index key from the field width, not the trimmed first record#68
Conversation
…immed first record A character index key is fixed-width on disk, but AdsCreateIndex61 derived the key length from the *trimmed* value of the first record. When the first row is short and later rows share a longer prefix (e.g. "ANA", then "ANABELA CARDOSO" and "ANABELA FERREIRA"), every later key was truncated to the first row's width and collapsed onto the same key: distinct names became indistinguishable and a seek landed on the wrong record -- both inside the index and for native FoxPro/Clipper readers of the bag. Derive the width from the declared field length for a bare character field; fall back to the untrimmed first-record key width for a composite expression; keep the 32-char default only for an empty table with nothing to probe. Adds abi_cdx_char_keylen_test, mirroring the NTX numeric-key interop test: it asserts every stored key is the full field width and that a seek for a long, prefix-sharing name finds the right record. Full suite 739/739. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019Um1qrC4vrHD4rA643CjAJ
There was a problem hiding this comment.
Code Review
This pull request modifies how the on-disk key length is determined for character keys during index creation, preferring the declared field width of bare character fields over the trimmed value of the first record to prevent key truncation and collisions. A new unit test is added to verify this fix. The reviewer pointed out a significant issue where composite expressions will always result in a key length of 254 bytes due to padding in evaluate_index_expr, which wastes disk space and degrades performance.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (auto k = openads::engine::evaluate_index_expr(*t, expr, 254)) { | ||
| std::size_t n = std::move(k).value().size(); // untrimmed | ||
| if (n > 0) | ||
| klen = static_cast<std::uint16_t>( | ||
| std::min<std::size_t>(n, 254)); |
There was a problem hiding this comment.
Mismatch in Key Length Determination for Composite Expressions\n\nThere is a subtle but significant issue with the fallback logic for composite expressions here.\n\n1. evaluate_index_expr Always Pads:\n The call openads::engine::evaluate_index_expr(*t, expr, 254) explicitly passes 254 as the key_len. Inside evaluate_index_expr, the returned string is always padded or resized to exactly key_len (254). Therefore, std::move(k).value().size() will always be 254, and klen will always be set to 254 for any composite expression. This results in unnecessarily large, fixed-width index keys on disk (254 bytes), wasting significant disk space and memory, and degrading performance.\n\n2. read_field Trims Trailing Spaces:\n Even if evaluate_index_expr did not pad the result, t->read_field trims trailing spaces from character fields. For an expression like UPPER(NOME) where the first record is "ANA", the evaluator only sees "ANA" (length 3). Thus, the evaluated length would still be 3, which would re-introduce the truncation bug for subsequent longer records (e.g., "ANABELA").\n\n### Suggested Solutions:\nTo correctly determine the untrimmed key length of a composite expression without defaulting to 254 or truncating:\n- Option A (Architectural): Introduce a way to evaluate index expressions using the declared/padded field lengths instead of trimmed values (e.g., an option in read_field or a separate evaluation path).\n- Option B (Heuristic/Fallback): If a robust evaluation is too complex, consider using a more reasonable default fallback length (e.g., 128 or max(32, trimmed_len)) or document that composite expressions will default to 254 bytes.
…ld fix Record PR #68 (character index key sized from field width, not the trimmed first record) and the <cstdint> include that unblocked the ninja-clang -Werror CI job. Bump fallback project VERSION to 1.2.3. Suite now 739/739. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(cdx): don't pin composite index keys to the 254-byte probe width (follow-up to #68)
Problem
A character index key is fixed-width on disk, but
AdsCreateIndex61derived thekey length from the trimmed value of the first record. When the first row is
short and later rows share a longer prefix, every later key is truncated to the
first row's width and collapses onto the same key — distinct values become
indistinguishable and a seek lands on the wrong record, both inside the index
and for native FoxPro/Clipper readers of the bag.
Reproduction
A
NOME C(30)table whose first row is"ANA"and whose later rows are"ANABELA CARDOSO"/"ANABELA FERREIRA": before the fix all three keys arestored as the 3-byte
"ANA", soseek("ANABELA FERREIRA")returns record 1instead of record 3.
Fix
Derive the key width from the declared field length for a bare character field;
fall back to the untrimmed first-record key width for a composite
expression; keep the 32-char default only for an empty table with nothing to
probe. (Numeric CDX/NTX key widths are unchanged.)
Test
Adds
abi_cdx_char_keylen_test, mirroring the existing NTX numeric-key interoptest: it builds the index through the ACE ABI and asserts every stored key is
the full field width and that a seek for a long, prefix-sharing name finds the
right record. The change is additive; the full suite stays green at 739/739.
🤖 Generated with Claude Code