Skip to content

Enrich E007 IoError with positional context on mid-record reads#222

Merged
dchud merged 1 commit into
mainfrom
fix/e007-positional-context-bd-0x73.21.6
May 28, 2026
Merged

Enrich E007 IoError with positional context on mid-record reads#222
dchud merged 1 commit into
mainfrom
fix/e007-positional-context-bd-0x73.21.6

Conversation

@dchud
Copy link
Copy Markdown
Owner

@dchud dchud commented May 28, 2026

What

MarcError::IoError (E007) raised while reading a record's data area now carries positional context — record_index, byte_offset, and source_name — instead of the context-free From<std::io::Error> fallback that left all three None. Before this, a read failure partway through a multi-record stream surfaced as a bare "IO error" with no indication of which record or byte position failed.

How

read_record_data (src/iso2709.rs) now constructs the error via ParseContext::err_io(e) rather than e.into(). At that point begin_record has run and the leader has been consumed, so the context is fully positioned for the in-progress record. All three ISO 2709 readers (bibliographic, authority, holdings) share the parse_iso2709_record skeleton, so this single site covers every reader's mid-record path.

The leader read at a record boundary (read_leader_bytes) deliberately keeps the context-free From fallback: it runs before begin_record, and a clean EOF there legitimately means there is no next record, so the context's record_index still names the previous record. Enriching there would misattribute a boundary-read failure. This reasoning is now documented on both read_leader_bytes and the From<io::Error> impl.

Python's OSError surface is unchanged — E007 still maps to built-in OSError; only the typed Rust variant gained fields.

Tests

  • New tests/e007_io_positional_context.rs: a multi-record stream whose second record's data read fails asserts the surfaced IoError carries record_index == 2, the reader's source_name, and a byte_offset past record 1. A companion test locks the leader-boundary fallback (no record_index).
  • Error-coverage manifest: the E007 entry is split into a raw-io case (io_error, leader boundary, no context) and a parse-path case (io_error_parse_path, mid-record, expected_context = [record_index, byte_offset, source_name]), with a matching source_name → source probe mapping and a Python harness skip branch.

Closes #210

Bead: bd-0x73.21.6

read_record_data now constructs MarcError::IoError via
ParseContext::err_io rather than the context-free From<io::Error>
conversion, so an I/O failure while reading a record's data area
carries record_index, byte_offset, and source_name. The leader read at
a record boundary keeps the From fallback by design: begin_record has
not run there, so enriching would misattribute the failure to the
previous record (or none, on the first). Documented on both
read_leader_bytes and the From impl.

All three ISO 2709 readers (bibliographic, authority, holdings) share
the parse_iso2709_record skeleton, so the single read_record_data fix
covers every reader's mid-record path. Python's OSError surface is
unchanged.

Adds a regression test (multi-record stream, the second record's data
read fails) asserting record_index and source_name, plus a companion
test locking the leader-boundary context-free fallback. Splits the
error-coverage manifest E007 entry into raw-io (leader boundary, no
context) and parse-path (mid-record, enriched) cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 28, 2026

Merging this PR will improve performance by 19.81%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 11 improved benchmarks
✅ 49 untouched benchmarks
⏩ 18 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime test_parallel_read_4x_1k 110.1 ms 90.7 ms +21.48%
WallTime test_threaded_with_title_extraction_2x_10k 359.3 ms 300.6 ms +19.56%
WallTime test_file_parallel_4x_10k 1.2 s 1 s +14.14%
WallTime test_file_parallel_4x_10k_with_extraction 1,217 ms 998.4 ms +21.9%
WallTime test_parallel_read_with_extract_4x_1k 116.1 ms 98.6 ms +17.72%
WallTime test_threaded_reading_4x_1k 111.3 ms 89.9 ms +23.89%
WallTime test_threaded_reading_4x_10k 1,114.9 ms 924 ms +20.67%
WallTime test_parallel_read_4x_10k 1,135.7 ms 949.3 ms +19.63%
WallTime test_threading_speedup_4x_10k 1,153.1 ms 961.1 ms +19.97%
WallTime test_threaded_with_title_extraction_4x_10k 1,147.7 ms 955.1 ms +20.16%
WallTime test_parallel_read_with_extract_4x_10k 1.2 s 1 s +19.11%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing fix/e007-positional-context-bd-0x73.21.6 (9627258) with main (15bf24c)

Open in CodSpeed

Footnotes

  1. 18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@dchud dchud self-assigned this May 28, 2026
@dchud dchud merged commit e423469 into main May 28, 2026
48 checks passed
@dchud dchud deleted the fix/e007-positional-context-bd-0x73.21.6 branch May 28, 2026 00:34
dchud added a commit that referenced this pull request May 28, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dchud added a commit that referenced this pull request May 28, 2026
Three ParseContext helpers had no callers: err_record_length_invalid,
err_base_address_invalid, and err_base_address_not_found. Their error
firing sites live in Leader::from_bytes and Leader::validate_for_reading,
which are low-level Leader methods with no ParseContext in scope. The
ISO 2709 / JSON / marcjson parse paths enrich those errors after the fact
via e.with_position(ctx).with_bytes_near(...), so the helpers are
redundant rather than refactorable into the firing sites. Decision:
delete (option b). err_io, originally listed as dead, is now live after
the E007 enrichment work (PR #222), so it stays.

The two u32::try_from checks in MarcWriter::write_record are redundant
runtime guards: check_iso2709_size caps both record_length and
base_address at 99_999, far below u32::MAX, so the error arms are
unreachable from the public API. Decision: keep with a comment (option c)
— they still prevent a silent usize->u32 truncation if a future refactor
reaches the leader-population step without routing through
check_iso2709_size first.

Bead: bd-0x73.21.4

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

E007 positional context propagation — enrich IoError at all io-fallible parse sites

1 participant