feat(error): enrich MarcError with positional context (bd-qzew PR2) by dchud · Pull Request #100 · dchud/mrrc

dchud · 2026-04-19T20:13:06Z

Summary

PR2 of 2 for bd-qzew. Restructures MarcError from string-message tuple variants to structured struct variants carrying positional metadata (record index, byte offsets, 001 control number, source filename, field tag, indicator/subfield position, the bad bytes, etc.), threads a ParseContext through all three readers, and wires typed Python exception classes with the same metadata as kwargs.

PR1 (extract shared ISO 2709 primitives) merged earlier in this branch's history.

What's in this PR

11 commits, organized as:

refactor(error) — restructure variants, hand-write Display/detailed, add truncate_bytes helper, #[source] chains
feat(iso2709) — introduce ParseContext + ctx.err_* helpers + shared parse_data_field/parse_subfields
feat(readers) — wire ParseContext through bib/authority/holdings; with_source/from_path constructors; opportunistic 001 extraction
feat(formats) — MARCXML enrichment via XmlError with Box<dyn Error> cause
feat(writer) — writer-path errors carry record_index + 001
feat(python) — typed exception hierarchy (mrrc-specific subclasses extend pymarc parents); kwargs/repr/_format/detailed; PyO3 typed-exception construction; ParseError routes through unified mapping
test — snapshot tests via insta (Rust) and syrupy (Python)
docs — docs/reference/error-handling.md + CHANGELOG [Unreleased] entry
Plus 3 fixup commits applying findings from architect review (G1), security audit (G2), and pre-merge comprehensive review (G4)

Quality gates

Per the bead notes' four-gate review plan:

G1 — Architect review of the new variant shape after commit 1. Surfaced 5 should-fix items (empty-header Display fallback, detailed() label alignment bug, scope migration helpers to pub(crate), etc.). All applied as a fixup commit.
G2 — Security audit of the PyO3 typed-exception construction + ParseError unification after commit 6. Surfaced 2 medium-severity issues: pickle __setstate__ blindly accepting arbitrary keys (could shadow methods), and missing __cause__ chaining when typed-exception construction fails. Both fixed as a fixup commit.
G3 — Snapshot baseline review during commit 7. Reviewed every Rust + Python snapshot baseline before accepting; uncovered an alignment drift between languages on TruncatedRecord and tightened the Python detailed() to mirror Rust byte-for-byte.
G4 — Comprehensive pre-merge review before opening this PR. Surfaced 3 blockers: cross-language detailed() drift on InvalidIndicator, TruncatedRecord losing positional context in Strict mode, and recovery.rs lenient-path positionless errors. The first two are fixed in the final commit. The third is filed as a follow-up bead because that dispatch path is currently dead code (documented in PR1).

Pymarc compatibility

Strict superset:

All pymarc class names exist with the same parents.
New mrrc-specific classes (InvalidIndicator, BadSubfieldCode, InvalidField, TruncatedRecord, EncodingError, XmlError, JsonError, WriterError) extend the closest pymarc parent so existing except RecordDirectoryInvalid:-style catches keep catching them.
Bare-constructor compatibility preserved: raise RecordLeaderInvalid() still works.
Pickle round-trip preserves all positional attributes (with a security whitelist on incoming attribute names).

See docs/reference/error-handling.md for the full hierarchy diagram, the per-variant field reference, and the migration story scoped to the exception layer only.

Follow-up beads (filed during G4)

bd-3ois — Thread ParseContext through recovery.rs salvage path
bd-6zvz — Enrich MARCJSON parse-path errors via JsonError variant (the actual serde_json parse lives in src-python/src/formats.rs)
bd-4axe — Populate ParseError positional metadata at boundary-scanner sites
bd-wc5r — FFI integration test verifying typed Python exception construction for each MarcError variant
bd-umwp — Add typed exception class stubs to mrrc/_mrrc.pyi
Plus bd-uj7c (filed during G2) — Cap per-stream error count in lenient/permissive recovery

Net diff

41 files changed, +3385 / -489.

Area	Change
`src/error.rs`	Restructured enum + Display/detailed + truncate_bytes + helpers
`src/iso2709.rs`	New `ParseContext` + `ctx.err_*` + shared subfield parser
`src/{reader,authority_reader,holdings_reader}.rs`	Wire `ParseContext`, add `with_source`/`from_path`, opportunistic 001
`src/marcxml.rs`	`XmlError` construction at parse sites
`src/writer.rs`	`WriterError` with record context
`mrrc/__init__.py`	Exception class hierarchy + kwargs + pickle + format/detailed
`src-python/src/{error,parse_error}.rs`	PyO3 typed-exception construction
`tests/python/test_errors.py` (new)	36 tests (hierarchy, bare-construct, pickle, snapshot)
`src/snapshots/` (new)	6 Rust snapshot baselines
`tests/python/__snapshots__/` (new)	7 Python snapshot baselines
`docs/reference/error-handling.md` (new)	Full reference page
`CHANGELOG.md`	`[Unreleased]` entry
`Cargo.toml`, `pyproject.toml`	`insta`, `syrupy` dev deps

Test plan

.cargo/check.sh clean locally (rustfmt, clippy, 495 unit, 30 doc, 643 Python tests, audit, maturin)
Pre-commit hook ran the check on every commit
All four quality gates run with findings addressed (G1, G2 inline; G3 baseline review; G4 blockers fixed, should-fixes filed as follow-ups)
CI green on all platforms (macOS, Linux, Windows)

🤖 Generated with Claude Code

…xt fields Replaces the previous seven string-message variants with a structured set of variants that each carry positional metadata (record index, byte offsets, 001 control number, source filename, field tag, and the bad bytes capped at 32). The default Display impl produces an actionable one-liner with the byte offset visually subordinate; a separate detailed() method renders a multi-line diagnostic. Variants mirror the per-variant requirements documented in the bd-qzew bead: InvalidLeader, RecordLengthInvalid, BaseAddressInvalid, BaseAddressNotFound, DirectoryInvalid, TruncatedRecord, EndOfRecordNotFound, InvalidIndicator, BadSubfieldCode, InvalidField, EncodingError, FieldNotFound, IoError, XmlError, JsonError, WriterError. Wrapping variants (IoError, XmlError, JsonError) carry their underlying cause via #[source] so std::error::Error::source() walks the chain. This commit is the wire-shape change only: every existing call site is mechanically updated to construct the new variants via positional-less helper constructors (invalid_field_msg, encoding_msg, truncated_msg, leader_msg, writer_msg). All optional positional fields are populated with None at this stage. Subsequent commits introduce ParseContext and thread real positional metadata through the readers, formats, writer, and Python bindings. Other changes: - truncate_bytes helper for the 32-byte found cap - Field renamed from `source` to `source_name` to avoid collision with thiserror's auto-detected error-source field name - src-python error mapping consumes the new shape via a single placeholder fall-through that produces bare PyValueError/PyIOError with the new Display output; typed-exception construction lands in a later commit - crate-level allow(clippy::result_large_err): MarcError carries structured metadata by design; boxing every Result<T, MarcError> would add an allocation without addressing a real perf concern All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ment, scope migration helpers Follow-up polish from architecture review of the previous error-type restructure. - Display::render_oneline: when no positional context fields are populated, lead with the variant name (e.g., "BaseAddressNotFound: base address not found") so the rendered message is at least as informative as the previous string-message form. Otherwise an empty header produced a body-only string that obscured what kind of error was raised. - detailed(): label-padding bug fixed. The previous arithmetic (10 - label.len() - 1) underflowed for labels longer than 10 characters (e.g., "record-relative:") and produced visually drifting columns. Now compute the padding from the widest label in the actual output so columns align consistently regardless of which fields are populated. - Migration constructor helpers (invalid_field_msg, encoding_msg, truncated_msg, leader_msg) demoted to pub(crate). They exist to support the mechanical update of internal call sites and are not part of the long-term API; the proper construction path will be the ParseContext::error_here helper added in the next change. The unused writer_msg helper is removed entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the parsing context that error-enrichment work threads through the ISO 2709 reader stack. ParseContext carries source filename, record index, stream and record-relative byte offsets, the 001 control number once available, and the field/subfield/indicator currently being parsed. A family of ctx.err_* methods constructs the appropriate MarcError variant with all positional fields auto-populated from the context, so call sites don't have to repeat the metadata at every error-construction site. Also lands the deferred-from-the-extraction-PR shared subfield parsing primitives: - DataFieldParseConfig with three named const profiles (BIBLIOGRAPHIC, AUTHORITY, HOLDINGS) capturing each existing reader's historical behavior for two orthogonal axes: * structure: Strict vs Permissive handling of unrecognized bytes between subfields * utf8: Lossy vs Strict decoding of subfield value bytes - parse_data_field: parses indicators + subfields into a Field per the given config, attaching positional metadata to errors via the context - parse_subfields: walks subfield bytes after the indicators This commit is purely additive. The existing iso2709 primitives and all three readers are unchanged; the next commit rewires them to use ParseContext and the shared subfield parser. All checks pass (495 Rust tests + 611 Python tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ings readers Each reader now owns a ParseContext that tracks the current record index, byte offsets, source filename, and field-tag-being-parsed, so errors raised anywhere in the read loop carry consistent positional metadata. Also adds opportunistic 001 extraction (the value is captured into ctx.record_control_number as soon as the 001 control field is parsed, so all subsequent errors for that record include it) and filename plumbing via two new constructors per reader: - with_source(name): builder method that sets source_name on the context for callers that have a name (Python file objects, in-memory cursors, custom stream identifiers). - from_path(path): convenience constructor on Reader<File> that opens the file and sets source_name to path.display().to_string(). Subfield parsing now goes through the shared iso2709::parse_data_field with the appropriate per-reader DataFieldParseConfig: - bib: BIBLIOGRAPHIC (strict structure, lossy UTF-8) - authority: AUTHORITY (permissive structure, lossy UTF-8) - holdings: HOLDINGS (permissive structure, strict UTF-8) This eliminates the three near-duplicate inline subfield-parsing loops that the readers carried, while preserving each reader's historical semantic behavior. Holdings's two-pass IndexMap dispatch is collapsed into the same single-pass tag-routing pattern as authority. Error-construction sites that previously used the migration helpers (MarcError::invalid_field_msg / truncated_msg / etc.) are rewritten to use ctx.err_* methods, so errors get full positional context. Helpers constructed inside the iso2709 primitives (parse_4digits, parse_5digits) still raise context-less errors when called from the directory walk; threading ctx into those primitives is intentionally deferred so primitive signatures don't change in this commit. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ror variant XML parse failures in marcxml_to_record and marcxml_to_records now construct MarcError::XmlError wrapping the underlying quick_xml error via #[source], so std::error::Error::source() walks the chain and callers can downcast to inspect the original parser failure. Adds ParseContext::err_xml and ParseContext::err_json constructors for the format-level wrappers. The XmlError variant's cause type changes to Box<dyn Error + Send + Sync> so any of quick_xml's error families (Error, DeError, etc.) can be wrapped uniformly. MARCJSON-structural errors (missing leader, wrong shape, etc.) stay as InvalidField since they describe MARC-level structural problems rather than JSON parser failures; the marcjson module operates on already- parsed serde_json::Value inputs and has no JSON parser sites of its own. JsonError remains available for any future call site that does own a serde_json parse step. Byte-offset extraction for both formats is left as Option<None> for now: quick_xml does not expose a byte position from DeError, and translating serde_json::Error::line()/column() to a byte offset requires the original input bytes (out of scope for the current abstraction). The underlying parser positions remain available via std::error::Error::source() and the wrapped cause's own Display. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three writer error sites now construct MarcError::WriterError with the 1-based record index in the output stream and the record's 001 control number where available, rather than generic InvalidField wrappers: - "Cannot write to a finished writer" raises WriterError with no record context (no record is being processed at that point). - u32 overflow on record length raises WriterError with the record index, 001, and the actual byte count in the message. - u32 overflow on base address: same shape. The record_index/record_control_number snapshot happens once at the top of write_record so multiple potential failure sites share the same context without re-extracting. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates the Python exception class hierarchy to expose the structured positional metadata that the Rust MarcError carries, while preserving pymarc-style catch behavior. mrrc/__init__.py: - All exception classes now share a _MrrcExceptionBase mixin that accepts ten positional kwargs (record_index, record_control_number, field_tag, indicator_position, subfield_code, found, expected, byte_offset, record_byte_offset, source). Every kwarg is optional so bare-constructor compatibility is preserved (raise RecordLeaderInvalid() still works). - The mixin implements __reduce__ and __setstate__ so kwargs survive a pickle round-trip. By default Exception.__reduce__ only round-trips args, dropping instance __dict__; the explicit reducer fixes this. - _format() produces the actionable one-liner shown by str(err); detailed() produces the multi-line diagnostic. Both shapes mirror the Rust Display and MarcError::detailed() outputs. - New mrrc-specific subclasses extend the closest pymarc parent so pymarc-style catches still trigger: InvalidIndicator(RecordDirectoryInvalid) BadSubfieldCode(RecordDirectoryInvalid) InvalidField(RecordDirectoryInvalid) TruncatedRecord(EndOfRecordNotFound) EncodingError(MrrcException) XmlError(MrrcException) JsonError(MrrcException) WriterError(MrrcException) src-python/src/error.rs: - marc_error_to_py_err is rewritten to construct the typed Python exception class corresponding to each MarcError variant, passing positional context as kwargs. Falls through to PyValueError / PyIOError if typed-exception construction fails for any reason (missing class, kwarg rejected) so an error is never dropped silently. - IoError variants continue to surface as Python's built-in OSError via PyIOError, matching pymarc. src-python/src/parse_error.rs: - ParseError::to_py_err now routes each variant through marc_error_to_py_err so the boundary-scanner / buffered-reader paths raise the same typed Python exception classes as the synchronous reader path. The variant API is unchanged so the many existing ParseError construction sites need no edits. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…as __cause__ Two refinements from the security audit of the typed-exception construction layer. mrrc/__init__.py: - __setstate__ now whitelists incoming state keys against the per-class allowed set (_POSITIONAL_FIELDS plus _pickle_extra_fields). Without this, a maliciously-crafted pickle could setattr arbitrary names — including method names — and shadow class methods on the instance. Pickle deserialization itself is the RCE primitive (anyone unpickling untrusted data has already lost), but blind setattr unnecessarily amplifies the blast radius from "you executed attacker code" to "subsequent operations on the unpickled object also execute attacker logic." Cheap to fix, no API impact. - Subclasses with extra __init__ kwargs (InvalidField, TruncatedRecord, EncodingError, XmlError, JsonError, WriterError) now declare those fields via _pickle_extra_fields so they actually round-trip through pickle instead of being silently dropped. Functional bug, surfaced alongside the security fix. src-python/src/error.rs: - Typed-exception construction failures now chain as __cause__ on the fallback PyValueError/PyIOError. A broken install (mrrc module not importable, class missing, kwargs rejected) is now debuggable instead of silently swallowed. Filed bd-uj7c for the related "cap per-stream error count in lenient recovery" concern, which is out of scope for this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds insta and syrupy as dev dependencies and pins the externally- visible error format for representative MarcError variants and Python typed exception classes. Rust (src/error.rs): - 6 snapshot tests covering Display (one-liner) and detailed() (multi-line) for representative variants: InvalidIndicator with full context, BaseAddressNotFound with no context (kind-name fallback path introduced in the earlier fixup), DirectoryInvalid with truncated found bytes, TruncatedRecord detailed(), and WriterError with record context. - Snapshots live in src/snapshots/. Run `cargo insta review` to inspect/accept changes when these drift. Python (tests/python/test_errors.py): - 36 tests total: hierarchy assertions (subclass relationships, pymarc-style catch behavior), bare-constructor compatibility for every exception class, pickle round-trip (including subclass extras and the security whitelist), and 7 snapshot tests pinning str/repr/detailed output for the same variants as the Rust suite. - Snapshots live in tests/python/__snapshots__/ as syrupy .ambr files. Format alignment: - Python detailed() reworked to mirror the Rust label-padding algorithm (compute width from widest label across the actual populated rows, pad each label to that width). Output is now byte-for-byte identical to Rust's MarcError::detailed() for the same variant + populated fields. - Subclasses with extra context fields (TruncatedRecord: expected_length/actual_length) hook into _extra_detail_rows so detailed() doesn't need a per-class override of the whole method. Total: 495 Rust unit tests (was 489) + 643 Python tests (was 611) + 30 doc tests + audit + maturin all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds docs/reference/error-handling.md documenting: - The exception hierarchy (pymarc-named parents + mrrc-specific subclasses, with the "you also catch X" subclass behavior table) - Pymarc exception compatibility, scoped tightly to the exception layer (with explicit out-of-scope list for record/reader/writer APIs and format coverage) - The optional symbol-level alias snippet for projects mid-migration - Three before/after patterns showing what callers gain on the exception layer (same-catch-more-context, opt-in granularity, diagnostic dump) - Per-variant field reference table - Position semantics by format (ISO 2709 / MARCXML / MARCJSON) - Source filename plumbing via with_source / from_path - Recovery-mode interaction - Pickle round-trip and the __setstate__ whitelist behavior Adds the page to the Reference section of mkdocs.yml nav. CHANGELOG entry under [Unreleased] documents: - Breaking: MarcError variant restructure, removed InvalidRecord, Display format change - Added: positional context, ParseContext, shared subfield primitives, with_source/from_path constructors, opportunistic 001 extraction, detailed() rendering, mrrc-specific Python subclasses with positional kwargs/pickle/_format/detailed, PyO3 typed exception construction with __cause__ chaining, the docs page, snapshot tests via insta/syrupy - Changed: source() chain walks underlying causes, From<io::Error> produces new struct shape, ParseError routes through marc_error_to_py_err, shared ISO 2709 primitives in iso2709.rs No version bump; project remains on 0.7.x while related error beads land before the next release cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…catedRecord Two issues surfaced by the pre-merge comprehensive review. Cross-language detailed() drift on InvalidIndicator --------------------------------------------------- The Rust and Python detailed() outputs disagreed on column alignment for InvalidIndicator: Rust pushed ("indicator", "1: found ...") (label without colon, value carrying it), Python pushed ("indicator 1:", "found ...") (number+colon in label). With column-padding to the widest label, the visible whitespace landed in different places. Aligned Rust to match Python's shape — label is now "indicator 0:" or "indicator 1:" and the value is just "found X, expected Y". Snapshot re-captured. Rust and Python detailed() outputs are now byte-for-byte identical for InvalidIndicator (matching the same property already verified for TruncatedRecord). TruncatedRecord lost positional context in Strict mode ------------------------------------------------------ iso2709::read_record_data raised TruncatedRecord via the migration helper truncated_msg in Strict mode, producing an error with all positional fields set to None. Since strict-mode truncation is the most common error in production, this contradicted bd-qzew's contract ("Always populated for TruncatedRecord: record_index, byte_offset, record_byte_offset"). read_record_data now takes &ParseContext and constructs the error via ctx.err_truncated_record(expected_length, actual_length). Switched the underlying read from read_exact to a loop tracking bytes_read so actual_length is accurate. Updated all three reader call sites and the iso2709 self-tests. Follow-up beads filed for related G4 findings: - bd-3ois: thread ParseContext through recovery.rs salvage path (currently unreachable due to dead-code dispatch, but a latent gap for when recovery is properly wired) - bd-6zvz: enrich MARCJSON parse-path errors via JsonError variant (the actual serde_json parse lives in src-python/src/formats.rs, not src/marcjson.rs) - bd-4axe: populate ParseError positional metadata at boundary- scanner sites - bd-wc5r: FFI integration test (Rust → PyO3 → Python) verifying typed exception construction for each MarcError variant - bd-umwp: add typed exception class stubs to mrrc/_mrrc.pyi for mypy/pyright workflows Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-04-19T20:18:46Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 60 untouched benchmarks
⏩ 16 skipped benchmarks¹

_{Comparing feat/bd-qzew-error-enrichment (92da5e2) with main (12272ba)²}

16 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (92da5e2) during the generation of this report, so 12272ba was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

…s get syrupy The wheel-test CI workflows were pip-installing test deps as a hardcoded list (pytest, pytest-benchmark) parallel to pyproject.toml's dev extra. Adding syrupy in commit 7 (for snapshot tests) updated pyproject.toml but not the workflow lists, so the wheel-test runners on macOS/Linux/ Windows failed at fixture setup with "fixture 'snapshot' not found". Fix: extract test runtime deps into a new [test] optional dependency in pyproject.toml. Both wheel-test workflows now install the wheel together with that extra (`uv pip install "${WHEEL}[test]"`), so the dep list has a single source of truth and any future test dep added to [test] is automatically picked up by CI without a parallel workflow edit. The dev extra still carries the same test deps explicitly (rather than extending [test]) to avoid the circular-extra resolution that some package managers handle awkwardly. Both lists are kept aligned by review; the test extra is the canonical source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Type checkers (mypy, pyright) read .py source directly when no .pyi exists for the module. The mrrc exception classes set their positional context attributes via setattr in __init__, which leaves them as Any from a type checker's perspective. Adds class-level attribute annotations to _MrrcExceptionBase and to each subclass with extra fields (message on InvalidField/EncodingError/ XmlError/JsonError/WriterError, expected_length/actual_length on TruncatedRecord) so checkers see the real types when callers inspect attributes after `except mrrc.InvalidIndicator as e: e.record_index`. No runtime behavior change. Closes bd-umwp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The actual serde_json::from_str call for MARCJSON parsing lives in the PyO3 binding (src-python/src/formats.rs::marcjson_to_record), not in src/marcjson.rs which operates on already-parsed serde_json::Value inputs. The JsonError variant and ParseContext::err_json infrastructure introduced in PR commit 4 were therefore unreachable; this commit hooks them up at the actual parse site. Both json_to_record and marcjson_to_record now construct a transient ParseContext, use ctx.err_json to wrap the underlying serde_json::Error, and route through marc_error_to_py_err so the typed JsonError Python exception class is raised (rather than a bare PyValueError("Invalid MARCJSON: ...")). The wrapped serde_json::Error is preserved via #[source] so callers can downcast to inspect line/column. Closes bd-6zvz. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Restructures ParseError from a 3-variant string-message enum into a struct with a kind discriminator (ParseErrorKind) and an optional ParseErrorContext (record_index, byte_offset, source_name). Adds builder methods (with_record_index, with_byte_offset, with_source) so callers can attach context fluently when they have it; to_py_err forwards any populated context into the constructed MarcError so the typed Python exception receives the same positional attributes as the synchronous reader path. The 30+ existing construction sites in src-python/src/{backend, unified_reader,batched_reader,buffered_reader}.rs are migrated mechanically to function-call form (ParseError::invalid_record(msg), ::record_boundary_error(msg), ::io_error(msg)) — no contextual change at the call sites themselves because the boundary scanner / buffered reader paths don't currently track records_read counters or incremental byte offsets that would feed the new context fields. The builder methods are #[allow(dead_code)] for now — they're deliberately retained as infrastructure so per-site context tracking can be added incrementally as the boundary-scanner / buffered-reader loops gain counters, without further ParseError surgery. Closes bd-4axe (the structural unification + context infrastructure); populating the context at specific scan sites needs a separate records-read-tracking change in the ReaderBackend iteration loop and is out of scope for this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/recovery.rs's try_recover_record was constructing MarcError values via the legacy migration helpers (truncated_msg, invalid_field_msg) which produce errors with all positional fields set to None. Recovered-mode errors therefore carried no record_index, byte_offset, source_name, or 001 control number — contradicting bd-qzew's contract that errors carry positional metadata in all recovery modes. Adds &ParseContext as a parameter to try_recover_record; the bib reader passes its own ctx through so any error raised during salvage inherits the same positional shape as strict-mode errors. The truncated-no- directory case raises ctx.err_truncated_record; the field-data-not- available case raises ctx.err_invalid_field with the field tag set on a cloned context. Note: the dispatch path from MarcReader to try_recover_record is currently unreachable in practice (the buffer-length comparison can never trigger because read_record_data returns a buffer of full length even on a short read). The contract is now correct so when the dead- code dispatch is fixed in a future change, recovered errors carry full context automatically. Removes the now-unused truncated_msg migration helper (the only caller was recovery.rs). Closes bd-3ois. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds TestFfiTypedExceptions in tests/python/test_errors.py that drives the Rust → PyO3 → Python typed-exception conversion path with malformed MARC inputs and asserts the raised Python class plus populated positional attributes. Surfaces gaps the unit-only tests missed. Two real bugs caught and fixed by this test: 1. Authority/holdings reader iterators were rewrapping every MarcError as a bare PyValueError("Failed to parse record: ...") via pyo3::exceptions::PyValueError::new_err, bypassing the typed exception construction entirely. Replaced with crate::error::marc_error_to_py_err so callers see the proper typed class. 2. Boundary-scanner detection of truncation in src-python/src/{backend,unified_reader,reader_helpers}.rs was constructing ParseError::invalid_record with a "Truncated record: expected N, got M" string, which mapped to mrrc.InvalidField instead of mrrc.TruncatedRecord on the Python side. Added ParseError::truncated_record(expected, actual) constructor that maps to MarcError::TruncatedRecord; updated all 4 call sites. Now mrrc.TruncatedRecord surfaces with expected_length/actual_length populated. Closes bd-wc5r. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bd-qzew added ~380 lines of exception class hierarchy to mrrc/__init__.py, pushing it past the threshold where the structure starts hurting (mixing re-exports, module-level helpers, exception classes, format constants, and several large wrapper classes). The exception hierarchy is the easiest unit to extract cleanly since nothing else in the file depends on it at import time. Moves _MrrcExceptionBase, _POSITIONAL_FIELDS, MrrcException, all 12 mrrc-named exception classes (RecordLengthInvalid, RecordLeaderInvalid, BaseAddressInvalid, BaseAddressNotFound, RecordDirectoryInvalid, EndOfRecordNotFound, FieldNotFound, FatalReaderError, InvalidIndicator, BadSubfieldCode, InvalidField, TruncatedRecord, EncodingError, XmlError, JsonError, WriterError), and BadSubfieldCodeWarning to a new mrrc/exceptions.py module. mrrc/__init__.py re-exports all of them so the public API (mrrc.InvalidIndicator, mrrc.RecordLeaderInvalid, etc.) is byte-for-byte identical. Aligns with pymarc's pymarc/exceptions.py convention. mrrc/__init__.py reduced from 2339 to 1971 lines (-368 net after the imports added back). All 647 Python tests pass unmodified. Filed bd-mcei for the larger refactor of the remaining wrapper classes (Field, Record, Leader, MARCReader, MARCWriter) which together are ~1500 more lines and would naturally live in their own modules following the same pymarc-aligned pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Project's `requires-python = ">=3.10"` already covers the syntax improvements people typically reach for the future import to enable: PEP 604 union types (`int | None`) work natively from 3.10+, and PEP 585 generic builtins (`list[int]`, `dict[str, int]`) from 3.9+. PEP 563 (postponed annotations) is still not the default in any released Python version — but neither file uses any forward references that would need the lazy-evaluation behavior. Removes the import from mrrc/exceptions.py and tests/python/test_errors.py where I added it during PR2 without checking whether it was actually needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Empty bead created accidentally by a failed jq parse during follow-up filing in PR #100. The actual work it would describe is delivered in bd-3ois (now closed) via PR #100 commit 60fd27d. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #100 merged with the full bd-qzew error enrichment work. Followups bd-44wb (P1, stable error codes + help URLs), bd-kodg (P1, structured to_dict/to_json + hex-dump diagnostics), and bd-uj7c (P2, per-stream error cap) target the same 0.8.0 release window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dchud and others added 11 commits April 19, 2026 15:16

dchud marked this pull request as draft April 19, 2026 20:32

dchud and others added 6 commits April 19, 2026 16:33

dchud marked this pull request as ready for review April 19, 2026 20:50

dchud and others added 2 commits April 19, 2026 17:37

dchud merged commit 8860ef0 into main Apr 19, 2026
47 of 48 checks passed

dchud deleted the feat/bd-qzew-error-enrichment branch April 19, 2026 22:29

dchud mentioned this pull request Apr 19, 2026

Enrich MarcError with byte-offset and record-index context #89

Closed

dchud mentioned this pull request Apr 19, 2026

feat(error): stable error codes + configurable help_url (bd-44wb) #101

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(error): enrich MarcError with positional context (bd-qzew PR2)#100

feat(error): enrich MarcError with positional context (bd-qzew PR2)#100
dchud merged 19 commits intomainfrom
feat/bd-qzew-error-enrichment

dchud commented Apr 19, 2026

Uh oh!

codspeed-hq bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dchud commented Apr 19, 2026

Summary

What's in this PR

Quality gates

Pymarc compatibility

Follow-up beads (filed during G4)

Net diff

Test plan

Uh oh!

codspeed-hq bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq bot commented Apr 19, 2026 •

edited

Loading