Pymarc exception-class-name parity verification + docs#185
Conversation
Closes #155 bd-0x73.6's narrow scope per the 2026-05-03 re-scope: verify that a pymarc-shaped error-handling code path keeps working after a port, and document the small gaps where it doesn't. Adds tests/python/test_pymarc_exception_parity.py — 47 cases that pin: - Every pymarc 5.3.1 exception class name a typical except clause catches is importable from both mrrc.exceptions and the top-level mrrc package (10 names: RecordLengthInvalid, RecordLeaderInvalid, RecordDirectoryInvalid, BaseAddressInvalid, BaseAddressNotFound, EndOfRecordNotFound, TruncatedRecord, FieldNotFound, FatalReaderError, BadSubfieldCodeWarning). - Each shared-name class is a proper Exception/Warning subclass and, for the exceptions, inherits from MrrcException so callers can catch the mrrc base class instead of PymarcException. - pymarc names mrrc deliberately omits (NoFieldsFound, WriteNeedsRecord, NoActiveFile, BadLeaderValue, MissingLinkedFields) remain absent — guarding against accidental drift — and each omission carries a one-line rationale in the test source. - A pymarc-shape except RecordDirectoryInvalid: actually catches what mrrc raises on the E101 non-ASCII-tag fixture. - A port using except MrrcException: catches every typed variant. Extends docs/guides/migration-from-pymarc.md with the class-name mapping table, the omissions table with rationale per omission, and an explicit section on the FatalReaderError inheritance-hierarchy divergence (mrrc keeps the fatal record-level classes as siblings under MrrcException; FatalReaderError in mrrc means "recovered-error cap exceeded" specifically). Two pymarc-compatible migration recipes spelled out for that divergence. No code changes — the verification is the deliverable. Bead: bd-0x73.6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merging this PR will degrade performance by 19.5%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | test_memory_streaming_read_10k |
274.9 ms | 249.6 ms | +10.15% |
| ⚡ | WallTime | test_process_8_files_parallel_4_threads |
275.2 ms | 228.6 ms | +20.41% |
| ❌ | WallTime | test_file_sequential_1x_10k |
81.3 ms | 92.1 ms | -11.72% |
| ⚡ | WallTime | test_pipeline_sequential_1x_10k |
29.3 ms | 22.4 ms | +30.49% |
| ❌ | WallTime | test_threading_speedup_4x_10k |
861.1 ms | 976.8 ms | -11.85% |
| ❌ | WallTime | test_parallel_read_with_extract_4x_10k |
963.5 ms | 1,100 ms | -12.41% |
| ❌ | WallTime | test_file_parallel_4x_10k_with_extraction |
901.9 ms | 1,066 ms | -15.39% |
| ⚡ | WallTime | test_process_4_files_parallel_4_threads |
132.3 ms | 101.3 ms | +30.56% |
| ❌ | WallTime | test_parallel_read_4x_1k |
79.8 ms | 96.6 ms | -17.39% |
| ❌ | WallTime | test_file_parallel_4x_10k |
844.2 ms | 1,048.7 ms | -19.5% |
| ⚡ | WallTime | test_pipeline_sequential_extraction_4x_10k |
142.9 ms | 122.4 ms | +16.78% |
| ❌ | WallTime | test_parallel_read_with_extract_4x_1k |
94.5 ms | 107.8 ms | -12.29% |
| ⚡ | WallTime | test_pipeline_sequential_4x_10k |
123.9 ms | 98.2 ms | +26.18% |
| ❌ | WallTime | test_threaded_with_title_extraction_4x_10k |
883.7 ms | 1,020.8 ms | -13.43% |
| ⚡ | WallTime | test_write_pathlib_1k_rustfile |
5.3 ms | 4.4 ms | +20.61% |
| ❌ | WallTime | test_threaded_reading_4x_1k |
85.3 ms | 95.9 ms | -11.04% |
| ⚡ | WallTime | test_process_4_files_sequential |
121.7 ms | 91.2 ms | +33.38% |
| ⚡ | WallTime | test_pipeline_parallel_4x_10k_threaded |
130.3 ms | 116.8 ms | +11.56% |
| ❌ | WallTime | test_parallel_read_4x_10k |
863.5 ms | 1,008.5 ms | -14.38% |
| ❌ | WallTime | test_file_sequential_4x_10k |
331.1 ms | 371.3 ms | -10.82% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing feat/pymarc-exception-class-parity (3e40d89) with main (f9760b0)
Footnotes
-
16 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Splits the responsibilities cleanly between the two error-handling docs to remove the duplication my earlier commit introduced: docs/reference/error-handling.md (the catalog — for users writing new mrrc code): - Updates the stale "MARCReader.current_exception / current_chunk" section. The reference said "mrrc does not currently expose these reader-side attributes" — predated PR #182 (bd-0x73.7). Replaced with the actual landed semantics plus the encoding-strictness and byte-read-error divergences. - Adds a "Known hierarchy divergences from pymarc" subsection noting the FatalReaderError parentage difference (its tree diagram showed the actual mrrc shape but didn't call out the comparison to pymarc) and the PymarcException → MrrcException base-class rename. - Expands the omissions row in the mapping table into a full sub-table with rationale and mrrc-equivalent behavior for NoFieldsFound, WriteNeedsRecord, NoActiveFile, BadLeaderValue, MissingLinkedFields. docs/guides/migration-from-pymarc.md (the porter's recipe — for users with existing pymarc code in front of them): - Drops the duplicated class-name mapping table and the omissions table that just landed; replaced with a one-sentence summary and a link to the reference for the catalog. - Keeps the inline migration recipes (base-class rename, the two FatalReaderError catch patterns) since porters want them inline, not "go read the reference." - Trims explanatory prose for the FatalReaderError divergence and links to the reference for the rationale. Net diff: -66 noise lines in the guide, +93 substance lines in the reference. CHANGELOG entry updated to reflect the split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #155
Summary
bd-0x73.6's narrow scope per the 2026-05-03 re-scope: verify that pymarc-shaped error-handling code keeps working after a port, and document the small gaps where it doesn't.
This is a verification + documentation pass — no Rust or Python source changes outside tests and docs.
Tests
Adds
tests/python/test_pymarc_exception_parity.py— 47 cases that pin:exceptclause catches is importable from bothmrrc.exceptionsand the top-levelmrrcpackage:RecordLengthInvalid,RecordLeaderInvalid,RecordDirectoryInvalid,BaseAddressInvalid,BaseAddressNotFound,EndOfRecordNotFound,TruncatedRecord,FieldNotFound,FatalReaderError,BadSubfieldCodeWarning.Exception/Warningsubclass and (for the exceptions) inherits fromMrrcExceptionsoexcept MrrcException:catches them all.NoFieldsFound,WriteNeedsRecord,NoActiveFile,BadLeaderValue,MissingLinkedFields) remain absent — guarding against drift — and each omission carries a rationale.except RecordDirectoryInvalid:actually catches what mrrc raises on the E101 non-ASCII-tag fixture (real catch-flow check, not justisinstance).except MrrcException:catches every typed variant from a parse failure.Documentation
Extends
docs/guides/migration-from-pymarc.mdwith three new subsections under "Permissive Mode":FatalReaderErrorinheritance-hierarchy divergence — spells out that mrrc keeps the fatal record-level classes as siblings underMrrcException(not children ofFatalReaderErroras in pymarc), and gives two pymarc-compatible migration recipes.What's deliberately NOT in this PR
FatalReaderErrorparentage. Documented as a known divergence with migration recipes instead — restructuring would change the semantic meaning ofmrrc.FatalReaderError(currently "recovered-error cap exceeded") and is its own decision.Verification
tests/python/test_pymarc_exception_parity.py— 46 pass, 1 skip (BadSubfieldCodeWarningtop-level re-export, guarded bypytest.skipif absent — it IS re-exported, so it passes here).cargo/check.sh --quick— all greenBead: bd-0x73.6
🤖 Generated with Claude Code