Enrich E007 IoError with positional context on mid-record reads#222
Conversation
read_record_data now constructs MarcError::IoError via ParseContext::err_io rather than the context-free From<io::Error> conversion, so an I/O failure while reading a record's data area carries record_index, byte_offset, and source_name. The leader read at a record boundary keeps the From fallback by design: begin_record has not run there, so enriching would misattribute the failure to the previous record (or none, on the first). Documented on both read_leader_bytes and the From impl. All three ISO 2709 readers (bibliographic, authority, holdings) share the parse_iso2709_record skeleton, so the single read_record_data fix covers every reader's mid-record path. Python's OSError surface is unchanged. Adds a regression test (multi-record stream, the second record's data read fails) asserting record_index and source_name, plus a companion test locking the leader-boundary context-free fallback. Splits the error-coverage manifest E007 entry into raw-io (leader boundary, no context) and parse-path (mid-record, enriched) cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merging this PR will improve performance by 19.81%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | test_parallel_read_4x_1k |
110.1 ms | 90.7 ms | +21.48% |
| ⚡ | WallTime | test_threaded_with_title_extraction_2x_10k |
359.3 ms | 300.6 ms | +19.56% |
| ⚡ | WallTime | test_file_parallel_4x_10k |
1.2 s | 1 s | +14.14% |
| ⚡ | WallTime | test_file_parallel_4x_10k_with_extraction |
1,217 ms | 998.4 ms | +21.9% |
| ⚡ | WallTime | test_parallel_read_with_extract_4x_1k |
116.1 ms | 98.6 ms | +17.72% |
| ⚡ | WallTime | test_threaded_reading_4x_1k |
111.3 ms | 89.9 ms | +23.89% |
| ⚡ | WallTime | test_threaded_reading_4x_10k |
1,114.9 ms | 924 ms | +20.67% |
| ⚡ | WallTime | test_parallel_read_4x_10k |
1,135.7 ms | 949.3 ms | +19.63% |
| ⚡ | WallTime | test_threading_speedup_4x_10k |
1,153.1 ms | 961.1 ms | +19.97% |
| ⚡ | WallTime | test_threaded_with_title_extraction_4x_10k |
1,147.7 ms | 955.1 ms | +20.16% |
| ⚡ | WallTime | test_parallel_read_with_extract_4x_10k |
1.2 s | 1 s | +19.11% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing fix/e007-positional-context-bd-0x73.21.6 (9627258) with main (15bf24c)
Footnotes
-
18 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three ParseContext helpers had no callers: err_record_length_invalid, err_base_address_invalid, and err_base_address_not_found. Their error firing sites live in Leader::from_bytes and Leader::validate_for_reading, which are low-level Leader methods with no ParseContext in scope. The ISO 2709 / JSON / marcjson parse paths enrich those errors after the fact via e.with_position(ctx).with_bytes_near(...), so the helpers are redundant rather than refactorable into the firing sites. Decision: delete (option b). err_io, originally listed as dead, is now live after the E007 enrichment work (PR #222), so it stays. The two u32::try_from checks in MarcWriter::write_record are redundant runtime guards: check_iso2709_size caps both record_length and base_address at 99_999, far below u32::MAX, so the error arms are unreachable from the public API. Decision: keep with a comment (option c) — they still prevent a silent usize->u32 truncation if a future refactor reaches the leader-population step without routing through check_iso2709_size first. Bead: bd-0x73.21.4 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
MarcError::IoError(E007) raised while reading a record's data area now carries positional context —record_index,byte_offset, andsource_name— instead of the context-freeFrom<std::io::Error>fallback that left all threeNone. Before this, a read failure partway through a multi-record stream surfaced as a bare "IO error" with no indication of which record or byte position failed.How
read_record_data(src/iso2709.rs) now constructs the error viaParseContext::err_io(e)rather thane.into(). At that pointbegin_recordhas run and the leader has been consumed, so the context is fully positioned for the in-progress record. All three ISO 2709 readers (bibliographic, authority, holdings) share theparse_iso2709_recordskeleton, so this single site covers every reader's mid-record path.The leader read at a record boundary (
read_leader_bytes) deliberately keeps the context-freeFromfallback: it runs beforebegin_record, and a clean EOF there legitimately means there is no next record, so the context'srecord_indexstill names the previous record. Enriching there would misattribute a boundary-read failure. This reasoning is now documented on bothread_leader_bytesand theFrom<io::Error>impl.Python's
OSErrorsurface is unchanged — E007 still maps to built-inOSError; only the typed Rust variant gained fields.Tests
tests/e007_io_positional_context.rs: a multi-record stream whose second record's data read fails asserts the surfacedIoErrorcarriesrecord_index == 2, the reader'ssource_name, and abyte_offsetpast record 1. A companion test locks the leader-boundary fallback (norecord_index).io_error, leader boundary, no context) and a parse-path case (io_error_parse_path, mid-record,expected_context = [record_index, byte_offset, source_name]), with a matchingsource_name → sourceprobe mapping and a Python harness skip branch.Closes #210
Bead: bd-0x73.21.6