feat(error): enrich MarcError with positional context (bd-qzew PR2)#100
feat(error): enrich MarcError with positional context (bd-qzew PR2)#100
Conversation
…xt fields Replaces the previous seven string-message variants with a structured set of variants that each carry positional metadata (record index, byte offsets, 001 control number, source filename, field tag, and the bad bytes capped at 32). The default Display impl produces an actionable one-liner with the byte offset visually subordinate; a separate detailed() method renders a multi-line diagnostic. Variants mirror the per-variant requirements documented in the bd-qzew bead: InvalidLeader, RecordLengthInvalid, BaseAddressInvalid, BaseAddressNotFound, DirectoryInvalid, TruncatedRecord, EndOfRecordNotFound, InvalidIndicator, BadSubfieldCode, InvalidField, EncodingError, FieldNotFound, IoError, XmlError, JsonError, WriterError. Wrapping variants (IoError, XmlError, JsonError) carry their underlying cause via #[source] so std::error::Error::source() walks the chain. This commit is the wire-shape change only: every existing call site is mechanically updated to construct the new variants via positional-less helper constructors (invalid_field_msg, encoding_msg, truncated_msg, leader_msg, writer_msg). All optional positional fields are populated with None at this stage. Subsequent commits introduce ParseContext and thread real positional metadata through the readers, formats, writer, and Python bindings. Other changes: - truncate_bytes helper for the 32-byte found cap - Field renamed from `source` to `source_name` to avoid collision with thiserror's auto-detected error-source field name - src-python error mapping consumes the new shape via a single placeholder fall-through that produces bare PyValueError/PyIOError with the new Display output; typed-exception construction lands in a later commit - crate-level allow(clippy::result_large_err): MarcError carries structured metadata by design; boxing every Result<T, MarcError> would add an allocation without addressing a real perf concern All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment, scope migration helpers Follow-up polish from architecture review of the previous error-type restructure. - Display::render_oneline: when no positional context fields are populated, lead with the variant name (e.g., "BaseAddressNotFound: base address not found") so the rendered message is at least as informative as the previous string-message form. Otherwise an empty header produced a body-only string that obscured what kind of error was raised. - detailed(): label-padding bug fixed. The previous arithmetic (10 - label.len() - 1) underflowed for labels longer than 10 characters (e.g., "record-relative:") and produced visually drifting columns. Now compute the padding from the widest label in the actual output so columns align consistently regardless of which fields are populated. - Migration constructor helpers (invalid_field_msg, encoding_msg, truncated_msg, leader_msg) demoted to pub(crate). They exist to support the mechanical update of internal call sites and are not part of the long-term API; the proper construction path will be the ParseContext::error_here helper added in the next change. The unused writer_msg helper is removed entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the parsing context that error-enrichment work threads through the
ISO 2709 reader stack. ParseContext carries source filename, record
index, stream and record-relative byte offsets, the 001 control number
once available, and the field/subfield/indicator currently being
parsed. A family of ctx.err_* methods constructs the appropriate
MarcError variant with all positional fields auto-populated from the
context, so call sites don't have to repeat the metadata at every
error-construction site.
Also lands the deferred-from-the-extraction-PR shared subfield
parsing primitives:
- DataFieldParseConfig with three named const profiles
(BIBLIOGRAPHIC, AUTHORITY, HOLDINGS) capturing each existing reader's
historical behavior for two orthogonal axes:
* structure: Strict vs Permissive handling of unrecognized bytes
between subfields
* utf8: Lossy vs Strict decoding of subfield value bytes
- parse_data_field: parses indicators + subfields into a Field per the
given config, attaching positional metadata to errors via the context
- parse_subfields: walks subfield bytes after the indicators
This commit is purely additive. The existing iso2709 primitives and
all three readers are unchanged; the next commit rewires them to use
ParseContext and the shared subfield parser.
All checks pass (495 Rust tests + 611 Python tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ings readers Each reader now owns a ParseContext that tracks the current record index, byte offsets, source filename, and field-tag-being-parsed, so errors raised anywhere in the read loop carry consistent positional metadata. Also adds opportunistic 001 extraction (the value is captured into ctx.record_control_number as soon as the 001 control field is parsed, so all subsequent errors for that record include it) and filename plumbing via two new constructors per reader: - with_source(name): builder method that sets source_name on the context for callers that have a name (Python file objects, in-memory cursors, custom stream identifiers). - from_path(path): convenience constructor on Reader<File> that opens the file and sets source_name to path.display().to_string(). Subfield parsing now goes through the shared iso2709::parse_data_field with the appropriate per-reader DataFieldParseConfig: - bib: BIBLIOGRAPHIC (strict structure, lossy UTF-8) - authority: AUTHORITY (permissive structure, lossy UTF-8) - holdings: HOLDINGS (permissive structure, strict UTF-8) This eliminates the three near-duplicate inline subfield-parsing loops that the readers carried, while preserving each reader's historical semantic behavior. Holdings's two-pass IndexMap dispatch is collapsed into the same single-pass tag-routing pattern as authority. Error-construction sites that previously used the migration helpers (MarcError::invalid_field_msg / truncated_msg / etc.) are rewritten to use ctx.err_* methods, so errors get full positional context. Helpers constructed inside the iso2709 primitives (parse_4digits, parse_5digits) still raise context-less errors when called from the directory walk; threading ctx into those primitives is intentionally deferred so primitive signatures don't change in this commit. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ror variant XML parse failures in marcxml_to_record and marcxml_to_records now construct MarcError::XmlError wrapping the underlying quick_xml error via #[source], so std::error::Error::source() walks the chain and callers can downcast to inspect the original parser failure. Adds ParseContext::err_xml and ParseContext::err_json constructors for the format-level wrappers. The XmlError variant's cause type changes to Box<dyn Error + Send + Sync> so any of quick_xml's error families (Error, DeError, etc.) can be wrapped uniformly. MARCJSON-structural errors (missing leader, wrong shape, etc.) stay as InvalidField since they describe MARC-level structural problems rather than JSON parser failures; the marcjson module operates on already- parsed serde_json::Value inputs and has no JSON parser sites of its own. JsonError remains available for any future call site that does own a serde_json parse step. Byte-offset extraction for both formats is left as Option<None> for now: quick_xml does not expose a byte position from DeError, and translating serde_json::Error::line()/column() to a byte offset requires the original input bytes (out of scope for the current abstraction). The underlying parser positions remain available via std::error::Error::source() and the wrapped cause's own Display. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three writer error sites now construct MarcError::WriterError with the 1-based record index in the output stream and the record's 001 control number where available, rather than generic InvalidField wrappers: - "Cannot write to a finished writer" raises WriterError with no record context (no record is being processed at that point). - u32 overflow on record length raises WriterError with the record index, 001, and the actual byte count in the message. - u32 overflow on base address: same shape. The record_index/record_control_number snapshot happens once at the top of write_record so multiple potential failure sites share the same context without re-extracting. All 495 Rust tests + 611 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the Python exception class hierarchy to expose the structured
positional metadata that the Rust MarcError carries, while preserving
pymarc-style catch behavior.
mrrc/__init__.py:
- All exception classes now share a _MrrcExceptionBase mixin that
accepts ten positional kwargs (record_index, record_control_number,
field_tag, indicator_position, subfield_code, found, expected,
byte_offset, record_byte_offset, source). Every kwarg is optional so
bare-constructor compatibility is preserved (raise RecordLeaderInvalid()
still works).
- The mixin implements __reduce__ and __setstate__ so kwargs survive a
pickle round-trip. By default Exception.__reduce__ only round-trips
args, dropping instance __dict__; the explicit reducer fixes this.
- _format() produces the actionable one-liner shown by str(err);
detailed() produces the multi-line diagnostic. Both shapes mirror the
Rust Display and MarcError::detailed() outputs.
- New mrrc-specific subclasses extend the closest pymarc parent so
pymarc-style catches still trigger:
InvalidIndicator(RecordDirectoryInvalid)
BadSubfieldCode(RecordDirectoryInvalid)
InvalidField(RecordDirectoryInvalid)
TruncatedRecord(EndOfRecordNotFound)
EncodingError(MrrcException)
XmlError(MrrcException)
JsonError(MrrcException)
WriterError(MrrcException)
src-python/src/error.rs:
- marc_error_to_py_err is rewritten to construct the typed Python
exception class corresponding to each MarcError variant, passing
positional context as kwargs. Falls through to PyValueError /
PyIOError if typed-exception construction fails for any reason
(missing class, kwarg rejected) so an error is never dropped silently.
- IoError variants continue to surface as Python's built-in OSError via
PyIOError, matching pymarc.
src-python/src/parse_error.rs:
- ParseError::to_py_err now routes each variant through
marc_error_to_py_err so the boundary-scanner / buffered-reader paths
raise the same typed Python exception classes as the synchronous
reader path. The variant API is unchanged so the many existing
ParseError construction sites need no edits.
All 495 Rust tests + 611 Python tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…as __cause__ Two refinements from the security audit of the typed-exception construction layer. mrrc/__init__.py: - __setstate__ now whitelists incoming state keys against the per-class allowed set (_POSITIONAL_FIELDS plus _pickle_extra_fields). Without this, a maliciously-crafted pickle could setattr arbitrary names — including method names — and shadow class methods on the instance. Pickle deserialization itself is the RCE primitive (anyone unpickling untrusted data has already lost), but blind setattr unnecessarily amplifies the blast radius from "you executed attacker code" to "subsequent operations on the unpickled object also execute attacker logic." Cheap to fix, no API impact. - Subclasses with extra __init__ kwargs (InvalidField, TruncatedRecord, EncodingError, XmlError, JsonError, WriterError) now declare those fields via _pickle_extra_fields so they actually round-trip through pickle instead of being silently dropped. Functional bug, surfaced alongside the security fix. src-python/src/error.rs: - Typed-exception construction failures now chain as __cause__ on the fallback PyValueError/PyIOError. A broken install (mrrc module not importable, class missing, kwargs rejected) is now debuggable instead of silently swallowed. Filed bd-uj7c for the related "cap per-stream error count in lenient recovery" concern, which is out of scope for this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds insta and syrupy as dev dependencies and pins the externally- visible error format for representative MarcError variants and Python typed exception classes. Rust (src/error.rs): - 6 snapshot tests covering Display (one-liner) and detailed() (multi-line) for representative variants: InvalidIndicator with full context, BaseAddressNotFound with no context (kind-name fallback path introduced in the earlier fixup), DirectoryInvalid with truncated found bytes, TruncatedRecord detailed(), and WriterError with record context. - Snapshots live in src/snapshots/. Run `cargo insta review` to inspect/accept changes when these drift. Python (tests/python/test_errors.py): - 36 tests total: hierarchy assertions (subclass relationships, pymarc-style catch behavior), bare-constructor compatibility for every exception class, pickle round-trip (including subclass extras and the security whitelist), and 7 snapshot tests pinning str/repr/detailed output for the same variants as the Rust suite. - Snapshots live in tests/python/__snapshots__/ as syrupy .ambr files. Format alignment: - Python detailed() reworked to mirror the Rust label-padding algorithm (compute width from widest label across the actual populated rows, pad each label to that width). Output is now byte-for-byte identical to Rust's MarcError::detailed() for the same variant + populated fields. - Subclasses with extra context fields (TruncatedRecord: expected_length/actual_length) hook into _extra_detail_rows so detailed() doesn't need a per-class override of the whole method. Total: 495 Rust unit tests (was 489) + 643 Python tests (was 611) + 30 doc tests + audit + maturin all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds docs/reference/error-handling.md documenting: - The exception hierarchy (pymarc-named parents + mrrc-specific subclasses, with the "you also catch X" subclass behavior table) - Pymarc exception compatibility, scoped tightly to the exception layer (with explicit out-of-scope list for record/reader/writer APIs and format coverage) - The optional symbol-level alias snippet for projects mid-migration - Three before/after patterns showing what callers gain on the exception layer (same-catch-more-context, opt-in granularity, diagnostic dump) - Per-variant field reference table - Position semantics by format (ISO 2709 / MARCXML / MARCJSON) - Source filename plumbing via with_source / from_path - Recovery-mode interaction - Pickle round-trip and the __setstate__ whitelist behavior Adds the page to the Reference section of mkdocs.yml nav. CHANGELOG entry under [Unreleased] documents: - Breaking: MarcError variant restructure, removed InvalidRecord, Display format change - Added: positional context, ParseContext, shared subfield primitives, with_source/from_path constructors, opportunistic 001 extraction, detailed() rendering, mrrc-specific Python subclasses with positional kwargs/pickle/_format/detailed, PyO3 typed exception construction with __cause__ chaining, the docs page, snapshot tests via insta/syrupy - Changed: source() chain walks underlying causes, From<io::Error> produces new struct shape, ParseError routes through marc_error_to_py_err, shared ISO 2709 primitives in iso2709.rs No version bump; project remains on 0.7.x while related error beads land before the next release cut. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…catedRecord
Two issues surfaced by the pre-merge comprehensive review.
Cross-language detailed() drift on InvalidIndicator
---------------------------------------------------
The Rust and Python detailed() outputs disagreed on column alignment
for InvalidIndicator: Rust pushed ("indicator", "1: found ...") (label
without colon, value carrying it), Python pushed ("indicator 1:",
"found ...") (number+colon in label). With column-padding to the
widest label, the visible whitespace landed in different places.
Aligned Rust to match Python's shape — label is now "indicator 0:" or
"indicator 1:" and the value is just "found X, expected Y". Snapshot
re-captured. Rust and Python detailed() outputs are now byte-for-byte
identical for InvalidIndicator (matching the same property already
verified for TruncatedRecord).
TruncatedRecord lost positional context in Strict mode
------------------------------------------------------
iso2709::read_record_data raised TruncatedRecord via the migration
helper truncated_msg in Strict mode, producing an error with all
positional fields set to None. Since strict-mode truncation is the
most common error in production, this contradicted bd-qzew's contract
("Always populated for TruncatedRecord: record_index, byte_offset,
record_byte_offset").
read_record_data now takes &ParseContext and constructs the error via
ctx.err_truncated_record(expected_length, actual_length). Switched
the underlying read from read_exact to a loop tracking bytes_read so
actual_length is accurate. Updated all three reader call sites and
the iso2709 self-tests.
Follow-up beads filed for related G4 findings:
- bd-3ois: thread ParseContext through recovery.rs salvage path
(currently unreachable due to dead-code dispatch, but a latent gap
for when recovery is properly wired)
- bd-6zvz: enrich MARCJSON parse-path errors via JsonError variant
(the actual serde_json parse lives in src-python/src/formats.rs,
not src/marcjson.rs)
- bd-4axe: populate ParseError positional metadata at boundary-
scanner sites
- bd-wc5r: FFI integration test (Rust → PyO3 → Python) verifying
typed exception construction for each MarcError variant
- bd-umwp: add typed exception class stubs to mrrc/_mrrc.pyi for
mypy/pyright workflows
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merging this PR will not alter performance
|
…s get syrupy
The wheel-test CI workflows were pip-installing test deps as a hardcoded
list (pytest, pytest-benchmark) parallel to pyproject.toml's dev extra.
Adding syrupy in commit 7 (for snapshot tests) updated pyproject.toml
but not the workflow lists, so the wheel-test runners on macOS/Linux/
Windows failed at fixture setup with "fixture 'snapshot' not found".
Fix: extract test runtime deps into a new [test] optional dependency in
pyproject.toml. Both wheel-test workflows now install the wheel together
with that extra (`uv pip install "${WHEEL}[test]"`), so the dep list has
a single source of truth and any future test dep added to [test] is
automatically picked up by CI without a parallel workflow edit.
The dev extra still carries the same test deps explicitly (rather than
extending [test]) to avoid the circular-extra resolution that some
package managers handle awkwardly. Both lists are kept aligned by
review; the test extra is the canonical source.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Type checkers (mypy, pyright) read .py source directly when no .pyi exists for the module. The mrrc exception classes set their positional context attributes via setattr in __init__, which leaves them as Any from a type checker's perspective. Adds class-level attribute annotations to _MrrcExceptionBase and to each subclass with extra fields (message on InvalidField/EncodingError/ XmlError/JsonError/WriterError, expected_length/actual_length on TruncatedRecord) so checkers see the real types when callers inspect attributes after `except mrrc.InvalidIndicator as e: e.record_index`. No runtime behavior change. Closes bd-umwp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The actual serde_json::from_str call for MARCJSON parsing lives in the
PyO3 binding (src-python/src/formats.rs::marcjson_to_record), not in
src/marcjson.rs which operates on already-parsed serde_json::Value
inputs. The JsonError variant and ParseContext::err_json infrastructure
introduced in PR commit 4 were therefore unreachable; this commit hooks
them up at the actual parse site.
Both json_to_record and marcjson_to_record now construct a transient
ParseContext, use ctx.err_json to wrap the underlying serde_json::Error,
and route through marc_error_to_py_err so the typed JsonError Python
exception class is raised (rather than a bare PyValueError("Invalid
MARCJSON: ...")). The wrapped serde_json::Error is preserved via
#[source] so callers can downcast to inspect line/column.
Closes bd-6zvz.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructures ParseError from a 3-variant string-message enum into a
struct with a kind discriminator (ParseErrorKind) and an optional
ParseErrorContext (record_index, byte_offset, source_name). Adds
builder methods (with_record_index, with_byte_offset, with_source) so
callers can attach context fluently when they have it; to_py_err
forwards any populated context into the constructed MarcError so the
typed Python exception receives the same positional attributes as the
synchronous reader path.
The 30+ existing construction sites in src-python/src/{backend,
unified_reader,batched_reader,buffered_reader}.rs are migrated
mechanically to function-call form (ParseError::invalid_record(msg),
::record_boundary_error(msg), ::io_error(msg)) — no contextual change
at the call sites themselves because the boundary scanner / buffered
reader paths don't currently track records_read counters or
incremental byte offsets that would feed the new context fields.
The builder methods are #[allow(dead_code)] for now — they're
deliberately retained as infrastructure so per-site context tracking
can be added incrementally as the boundary-scanner / buffered-reader
loops gain counters, without further ParseError surgery.
Closes bd-4axe (the structural unification + context infrastructure);
populating the context at specific scan sites needs a separate
records-read-tracking change in the ReaderBackend iteration loop and
is out of scope for this commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/recovery.rs's try_recover_record was constructing MarcError values via the legacy migration helpers (truncated_msg, invalid_field_msg) which produce errors with all positional fields set to None. Recovered-mode errors therefore carried no record_index, byte_offset, source_name, or 001 control number — contradicting bd-qzew's contract that errors carry positional metadata in all recovery modes. Adds &ParseContext as a parameter to try_recover_record; the bib reader passes its own ctx through so any error raised during salvage inherits the same positional shape as strict-mode errors. The truncated-no- directory case raises ctx.err_truncated_record; the field-data-not- available case raises ctx.err_invalid_field with the field tag set on a cloned context. Note: the dispatch path from MarcReader to try_recover_record is currently unreachable in practice (the buffer-length comparison can never trigger because read_record_data returns a buffer of full length even on a short read). The contract is now correct so when the dead- code dispatch is fixed in a future change, recovered errors carry full context automatically. Removes the now-unused truncated_msg migration helper (the only caller was recovery.rs). Closes bd-3ois. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds TestFfiTypedExceptions in tests/python/test_errors.py that drives
the Rust → PyO3 → Python typed-exception conversion path with malformed
MARC inputs and asserts the raised Python class plus populated
positional attributes. Surfaces gaps the unit-only tests missed.
Two real bugs caught and fixed by this test:
1. Authority/holdings reader iterators were rewrapping every MarcError
as a bare PyValueError("Failed to parse record: ...") via
pyo3::exceptions::PyValueError::new_err, bypassing the typed
exception construction entirely. Replaced with
crate::error::marc_error_to_py_err so callers see the proper
typed class.
2. Boundary-scanner detection of truncation in
src-python/src/{backend,unified_reader,reader_helpers}.rs was
constructing ParseError::invalid_record with a "Truncated record:
expected N, got M" string, which mapped to mrrc.InvalidField
instead of mrrc.TruncatedRecord on the Python side.
Added ParseError::truncated_record(expected, actual) constructor
that maps to MarcError::TruncatedRecord; updated all 4 call sites.
Now mrrc.TruncatedRecord surfaces with expected_length/actual_length
populated.
Closes bd-wc5r.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bd-qzew added ~380 lines of exception class hierarchy to mrrc/__init__.py, pushing it past the threshold where the structure starts hurting (mixing re-exports, module-level helpers, exception classes, format constants, and several large wrapper classes). The exception hierarchy is the easiest unit to extract cleanly since nothing else in the file depends on it at import time. Moves _MrrcExceptionBase, _POSITIONAL_FIELDS, MrrcException, all 12 mrrc-named exception classes (RecordLengthInvalid, RecordLeaderInvalid, BaseAddressInvalid, BaseAddressNotFound, RecordDirectoryInvalid, EndOfRecordNotFound, FieldNotFound, FatalReaderError, InvalidIndicator, BadSubfieldCode, InvalidField, TruncatedRecord, EncodingError, XmlError, JsonError, WriterError), and BadSubfieldCodeWarning to a new mrrc/exceptions.py module. mrrc/__init__.py re-exports all of them so the public API (mrrc.InvalidIndicator, mrrc.RecordLeaderInvalid, etc.) is byte-for-byte identical. Aligns with pymarc's pymarc/exceptions.py convention. mrrc/__init__.py reduced from 2339 to 1971 lines (-368 net after the imports added back). All 647 Python tests pass unmodified. Filed bd-mcei for the larger refactor of the remaining wrapper classes (Field, Record, Leader, MARCReader, MARCWriter) which together are ~1500 more lines and would naturally live in their own modules following the same pymarc-aligned pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Project's `requires-python = ">=3.10"` already covers the syntax improvements people typically reach for the future import to enable: PEP 604 union types (`int | None`) work natively from 3.10+, and PEP 585 generic builtins (`list[int]`, `dict[str, int]`) from 3.9+. PEP 563 (postponed annotations) is still not the default in any released Python version — but neither file uses any forward references that would need the lazy-evaluation behavior. Removes the import from mrrc/exceptions.py and tests/python/test_errors.py where I added it during PR2 without checking whether it was actually needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #100 merged with the full bd-qzew error enrichment work. Followups bd-44wb (P1, stable error codes + help URLs), bd-kodg (P1, structured to_dict/to_json + hex-dump diagnostics), and bd-uj7c (P2, per-stream error cap) target the same 0.8.0 release window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
PR2 of 2 for bd-qzew. Restructures
MarcErrorfrom string-message tuple variants to structured struct variants carrying positional metadata (record index, byte offsets, 001 control number, source filename, field tag, indicator/subfield position, the bad bytes, etc.), threads aParseContextthrough all three readers, and wires typed Python exception classes with the same metadata as kwargs.PR1 (extract shared ISO 2709 primitives) merged earlier in this branch's history.
What's in this PR
11 commits, organized as:
refactor(error)— restructure variants, hand-write Display/detailed, addtruncate_byteshelper,#[source]chainsfeat(iso2709)— introduceParseContext+ctx.err_*helpers + sharedparse_data_field/parse_subfieldsfeat(readers)— wireParseContextthrough bib/authority/holdings;with_source/from_pathconstructors; opportunistic 001 extractionfeat(formats)— MARCXML enrichment viaXmlErrorwithBox<dyn Error>causefeat(writer)— writer-path errors carry record_index + 001feat(python)— typed exception hierarchy (mrrc-specific subclasses extend pymarc parents); kwargs/repr/_format/detailed; PyO3 typed-exception construction;ParseErrorroutes through unified mappingtest— snapshot tests viainsta(Rust) andsyrupy(Python)docs—docs/reference/error-handling.md+ CHANGELOG[Unreleased]entryQuality gates
Per the bead notes' four-gate review plan:
detailed()label alignment bug, scope migration helpers topub(crate), etc.). All applied as a fixup commit.ParseErrorunification after commit 6. Surfaced 2 medium-severity issues: pickle__setstate__blindly accepting arbitrary keys (could shadow methods), and missing__cause__chaining when typed-exception construction fails. Both fixed as a fixup commit.TruncatedRecordand tightened the Pythondetailed()to mirror Rust byte-for-byte.detailed()drift onInvalidIndicator,TruncatedRecordlosing positional context in Strict mode, andrecovery.rslenient-path positionless errors. The first two are fixed in the final commit. The third is filed as a follow-up bead because that dispatch path is currently dead code (documented in PR1).Pymarc compatibility
Strict superset:
InvalidIndicator,BadSubfieldCode,InvalidField,TruncatedRecord,EncodingError,XmlError,JsonError,WriterError) extend the closest pymarc parent so existingexcept RecordDirectoryInvalid:-style catches keep catching them.raise RecordLeaderInvalid()still works.See
docs/reference/error-handling.mdfor the full hierarchy diagram, the per-variant field reference, and the migration story scoped to the exception layer only.Follow-up beads (filed during G4)
bd-3ois— ThreadParseContextthroughrecovery.rssalvage pathbd-6zvz— Enrich MARCJSON parse-path errors viaJsonErrorvariant (the actualserde_jsonparse lives insrc-python/src/formats.rs)bd-4axe— PopulateParseErrorpositional metadata at boundary-scanner sitesbd-wc5r— FFI integration test verifying typed Python exception construction for eachMarcErrorvariantbd-umwp— Add typed exception class stubs tomrrc/_mrrc.pyibd-uj7c(filed during G2) — Cap per-stream error count in lenient/permissive recoveryNet diff
41 files changed, +3385 / -489.
src/error.rssrc/iso2709.rsParseContext+ctx.err_*+ shared subfield parsersrc/{reader,authority_reader,holdings_reader}.rsParseContext, addwith_source/from_path, opportunistic 001src/marcxml.rsXmlErrorconstruction at parse sitessrc/writer.rsWriterErrorwith record contextmrrc/__init__.pysrc-python/src/{error,parse_error}.rstests/python/test_errors.py(new)src/snapshots/(new)tests/python/__snapshots__/(new)docs/reference/error-handling.md(new)CHANGELOG.md[Unreleased]entryCargo.toml,pyproject.tomlinsta,syrupydev depsTest plan
.cargo/check.shclean locally (rustfmt, clippy, 495 unit, 30 doc, 643 Python tests, audit, maturin)🤖 Generated with Claude Code