Pymarc-compatible current_exception and current_chunk accessors on MARCReader by dchud · Pull Request #182 · dchud/mrrc

dchud · 2026-05-10T23:42:51Z

Closes #156

Summary

After each MARCReader.__next__ step, two pymarc-compatible accessors carry diagnostic information about what was just read:

current_chunk — bytes of the record just read from the source (declared length per the leader). Set on every successful chunk read, regardless of whether the parse step then succeeded or failed.
current_exception — typed MrrcException swallowed by permissive=True, or None on a clean read.

This is the bead's recommended Option A: enrich permissive=True rather than add a new pymarc_compat mode. Existing pymarc-shape error-diagnosis code (if reader.current_exception: ...) works against mrrc unchanged.

Implementation

Rust binding (src-python/src/readers.rs): PyMARCReader carries last_chunk: Option<Vec<u8>>, populated in both read_record() and __next__() after the chunk is read, before the parse step. Exposed via #[getter] fn last_chunk.
Python wrapper (mrrc/__init__.py): MARCReader.__init__ initializes the two attributes to None; __next__ reads _inner.last_chunk and stashes both. Clean-read path resets current_exception so callers don't see stale state.
Type stub (mrrc/_mrrc.pyi): documented _MARCReader.last_chunk property.

Tests

tests/python/test_pymarc_compat_accessors.py covers:

Default state (both None before any iteration).
Clean read: current_chunk matches leader-declared length, current_exception stays None.
Clean iteration clears stale current_exception after a prior failure.
Permissive swallow: both accessors populated, full chunk bytes preserved.
Strict mode raises (the accessors aren't the intended catch surface).
Cross-library iteration-shape parity vs pymarc — guarded by pytest.importorskip("pymarc") so we don't gain a hard dep.

Documented divergences

Updated docs/guides/migration-from-pymarc.md:

Encoding strictness: mrrc raises EncodingError (swallowed via current_exception under permissive=True) on invalid UTF-8 in subfield values where pymarc applies lossy substitution silently. Iteration shape unchanged.
current_chunk on byte-read errors: When the underlying read of the next record's bytes fails before parsing begins (truncated stream, I/O error), current_chunk may be None even though current_exception is set. For parse failures of fully-read chunks (the common case), current_chunk carries the full record bytes as pymarc does.

Verification

cargo test --test fuzz_regressions + full Python suite — 538 Rust + 700+ Python tests pass
tests/python/test_pymarc_compat_accessors.py — 5 pass, 1 skip (pymarc parity, since pymarc isn't a project dep)
.cargo/check.sh --quick and full — all green
PR CI green on all platforms

Bead: bd-0x73.7

🤖 Generated with Claude Code

Closes #156 The Python MARCReader wrapper now matches pymarc's default-reader shape for the diagnostic accessors. After each __next__ step: - current_chunk holds the bytes of the record just read from the source (declared-length bytes per the leader). Set on every successful chunk read, regardless of whether the parse step then succeeds or fails. - current_exception holds the typed MrrcException caught by permissive=True, or None on a clean read. Implementation: - Rust binding: PyMARCReader carries last_chunk: Option<Vec<u8>>, populated in both read_record() and __next__() after reading the record bytes, before the parse step. Exposed via #[getter] last_chunk on _MARCReader. - Python wrapper: MARCReader.__init__ initializes current_exception and current_chunk to None; __next__ reads _inner.last_chunk and stashes both, mirroring pymarc semantics. The clean-read path resets current_exception so callers don't see stale state. - Type stub: _MARCReader.last_chunk getter. Tests: tests/python/test_pymarc_compat_accessors.py covers default state, clean reads, permissive swallow, strict-mode raises, and cross-library iteration shape parity (the cross-library test importorskips pymarc so the project doesn't gain a hard dependency). Documented two divergences from pymarc in docs/guides/migration-from-pymarc.md: - Encoding strictness — mrrc raises EncodingError on invalid UTF-8 in subfield values where pymarc applies lossy substitution silently. Same iteration shape under permissive=True; callers using `except Exception:` keep working. - current_chunk None on byte-read errors — when the underlying read of the next record's bytes fails before parsing begins (truncated stream, I/O error), current_chunk may be None even though current_exception is set. For parse failures of fully-read chunks (the common case), current_chunk carries the full record bytes as pymarc does. Bead: bd-0x73.7 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-05-10T23:47:34Z

Merging this PR will degrade performance by 18.51%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 7 improved benchmarks
❌ 1 regressed benchmark
✅ 52 untouched benchmarks
⏩ 16 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	WallTime	`test_pipeline_parallel_4x_10k_threaded`	118.2 ms	102.1 ms	+15.81%
⚡	WallTime	`test_pipeline_sequential_extraction_4x_10k`	118.5 ms	104.1 ms	+13.86%
⚡	WallTime	`test_process_4_files_sequential`	99.8 ms	84.5 ms	+18.13%
⚡	WallTime	`test_process_4_files_parallel_4_threads`	118.3 ms	97 ms	+21.96%
❌	WallTime	`test_threaded_reading_2x_10k`	285.3 ms	350.1 ms	-18.51%
⚡	WallTime	`test_pipeline_sequential_1x_10k`	22.4 ms	20.3 ms	+10.21%
⚡	WallTime	`test_pipeline_sequential_4x_10k`	99.6 ms	85.5 ms	+16.55%
⚡	WallTime	`test_pipeline_parallel_2x_10k_threaded`	54.5 ms	47.5 ms	+14.71%

_{Comparing feat/permissive-pymarc-compat (b1e0f2e) with main (297626a)}

16 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

…ccessors Four additional pytest cases for the pymarc-compat accessors: - Multi-record sequencing: walk a 3-record stream; current_chunk reflects each successive record's leader-declared length and content, not just the first. - EOF retention: after StopIteration, current_chunk and current_exception retain their last values (matches pymarc). - Default strict mode: current_chunk tracks on every successful chunk read regardless of permissive, so even strict callers get it populated. - iter_with_errors independence: the alternate diagnostic surface (yielded tuple) does not clobber the pymarc-compat accessors; they stay at their initial None when only iter_with_errors is exercised. Documents the two surfaces' non-interference contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Splits the responsibilities cleanly between the two error-handling docs to remove the duplication my earlier commit introduced: docs/reference/error-handling.md (the catalog — for users writing new mrrc code): - Updates the stale "MARCReader.current_exception / current_chunk" section. The reference said "mrrc does not currently expose these reader-side attributes" — predated PR #182 (bd-0x73.7). Replaced with the actual landed semantics plus the encoding-strictness and byte-read-error divergences. - Adds a "Known hierarchy divergences from pymarc" subsection noting the FatalReaderError parentage difference (its tree diagram showed the actual mrrc shape but didn't call out the comparison to pymarc) and the PymarcException → MrrcException base-class rename. - Expands the omissions row in the mapping table into a full sub-table with rationale and mrrc-equivalent behavior for NoFieldsFound, WriteNeedsRecord, NoActiveFile, BadLeaderValue, MissingLinkedFields. docs/guides/migration-from-pymarc.md (the porter's recipe — for users with existing pymarc code in front of them): - Drops the duplicated class-name mapping table and the omissions table that just landed; replaced with a one-sentence summary and a link to the reference for the catalog. - Keeps the inline migration recipes (base-class rename, the two FatalReaderError catch patterns) since porters want them inline, not "go read the reference." - Trims explanatory prose for the FatalReaderError divergence and links to the reference for the rationale. Net diff: -66 noise lines in the guide, +93 substance lines in the reference. CHANGELOG entry updated to reflect the split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Pymarc exception-class-name parity verification + documentation Closes #155 bd-0x73.6's narrow scope per the 2026-05-03 re-scope: verify that a pymarc-shaped error-handling code path keeps working after a port, and document the small gaps where it doesn't. Adds tests/python/test_pymarc_exception_parity.py — 47 cases that pin: - Every pymarc 5.3.1 exception class name a typical except clause catches is importable from both mrrc.exceptions and the top-level mrrc package (10 names: RecordLengthInvalid, RecordLeaderInvalid, RecordDirectoryInvalid, BaseAddressInvalid, BaseAddressNotFound, EndOfRecordNotFound, TruncatedRecord, FieldNotFound, FatalReaderError, BadSubfieldCodeWarning). - Each shared-name class is a proper Exception/Warning subclass and, for the exceptions, inherits from MrrcException so callers can catch the mrrc base class instead of PymarcException. - pymarc names mrrc deliberately omits (NoFieldsFound, WriteNeedsRecord, NoActiveFile, BadLeaderValue, MissingLinkedFields) remain absent — guarding against accidental drift — and each omission carries a one-line rationale in the test source. - A pymarc-shape except RecordDirectoryInvalid: actually catches what mrrc raises on the E101 non-ASCII-tag fixture. - A port using except MrrcException: catches every typed variant. Extends docs/guides/migration-from-pymarc.md with the class-name mapping table, the omissions table with rationale per omission, and an explicit section on the FatalReaderError inheritance-hierarchy divergence (mrrc keeps the fatal record-level classes as siblings under MrrcException; FatalReaderError in mrrc means "recovered-error cap exceeded" specifically). Two pymarc-compatible migration recipes spelled out for that divergence. No code changes — the verification is the deliverable. Bead: bd-0x73.6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Consolidate pymarc-compat docs: catalog in reference, recipes in guide Splits the responsibilities cleanly between the two error-handling docs to remove the duplication my earlier commit introduced: docs/reference/error-handling.md (the catalog — for users writing new mrrc code): - Updates the stale "MARCReader.current_exception / current_chunk" section. The reference said "mrrc does not currently expose these reader-side attributes" — predated PR #182 (bd-0x73.7). Replaced with the actual landed semantics plus the encoding-strictness and byte-read-error divergences. - Adds a "Known hierarchy divergences from pymarc" subsection noting the FatalReaderError parentage difference (its tree diagram showed the actual mrrc shape but didn't call out the comparison to pymarc) and the PymarcException → MrrcException base-class rename. - Expands the omissions row in the mapping table into a full sub-table with rationale and mrrc-equivalent behavior for NoFieldsFound, WriteNeedsRecord, NoActiveFile, BadLeaderValue, MissingLinkedFields. docs/guides/migration-from-pymarc.md (the porter's recipe — for users with existing pymarc code in front of them): - Drops the duplicated class-name mapping table and the omissions table that just landed; replaced with a one-sentence summary and a link to the reference for the catalog. - Keeps the inline migration recipes (base-class rename, the two FatalReaderError catch patterns) since porters want them inline, not "go read the reference." - Trims explanatory prose for the FatalReaderError divergence and links to the reference for the rationale. Net diff: -66 noise lines in the guide, +93 substance lines in the reference. CHANGELOG entry updated to reflect the split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dchud self-assigned this May 10, 2026

dchud merged commit ffeccfd into main May 11, 2026
47 of 48 checks passed

dchud deleted the feat/permissive-pymarc-compat branch May 11, 2026 00:01

dchud added a commit that referenced this pull request May 11, 2026

chore(beads): close bd-0x73.7 (landed in #182)

7b8d9dd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pymarc-compatible current_exception and current_chunk accessors on MARCReader#182

Pymarc-compatible current_exception and current_chunk accessors on MARCReader#182
dchud merged 2 commits into
mainfrom
feat/permissive-pymarc-compat

dchud commented May 10, 2026

Uh oh!

codspeed-hq Bot commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dchud commented May 10, 2026

Summary

Implementation

Tests

Documented divergences

Verification

Uh oh!

codspeed-hq Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 18.51%

Performance Changes

Footnotes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented May 10, 2026 •

edited

Loading