You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Breaking
MarcError variants restructured to struct form carrying
positional metadata (record_index, byte_offset, record_byte_offset, record_control_number, field_tag, indicator_position, subfield_code, found, expected, source_name). Wrapping variants (IoError, XmlError, JsonError)
chain via #[source] so Error::source() walks correctly; Display
output changed for every variant. Direct constructors and pattern
matches need updating. Migration: match on .code() ("E001"– "E099", "E101"+) instead of variant names — codes are stable
across future enum changes.
MarcError::InvalidRecord removed. Use the new specific variants
(DirectoryInvalid, RecordLengthInvalid, …) or InvalidField as
the fall-through.
MarcError::ParseError removed. Same migration as InvalidRecord: the new specific variants and InvalidField
cover the cases that ParseError used to wrap. Match on .code()
for stable identity.
recovery::RecoveryContext removed from the public re-exports.
No external consumer; the type was an internal helper that
accumulated messages and dropped them. Pattern-match on MarcError instead.
recovery::try_recover_record takes a &ParseContext as a
fifth parameter. The previous v0.7 signature constructed RecoveryContext internally; v0.8 callers pass the ParseContext that carries record-index / byte-offset state.
Added
Structured positional context on every parse error — record
index, absolute and record-relative byte offsets, 001 control number,
field tag (and indicator position or subfield code where applicable),
offending bytes (capped at 32), and source filename. New MarcError::detailed() multi-line diagnostic includes a hex-dump of
the byte window around the error offset; the default Display is a
one-liner. Both produce byte-for-byte identical output across Rust
and Python via matching _format() / detailed() methods on every
Python exception class.
Typed Python exception subclasses for every MarcError variant
(e.g. InvalidIndicator, BadSubfieldCode, TruncatedRecord, XmlError, JsonError, WriterError, FatalReaderError)
extending the closest pymarc-named parent so existing pymarc-style except clauses keep catching the same conditions. Each carries
positional kwargs, supports pickle round-trip, and survives the
PyO3 boundary with all attributes intact.
Stable error codes (E001–E007, E099, E101, E105, E106, E201, E202, E301, E401, E402, E404) on every
variant via MarcError::code() / slug(), with matching code / slug / help_url() on every Python class. Codes never get
renumbered (policy in CONTRIBUTING.md); MRRC_DOCS_BASE_URL
overrides the help-URL host. All codes documented in docs/reference/error-codes.md.
Structured error serialization via to_dict() / to_json() on
every Python exception (suitable for ELK / Datadog / Splunk) and MarcError::to_json_value() / to_json() on the Rust side; bytes
fields hex-encoded, _cause flattens the exception chain, schema_version: 1 for forward-compat. New BytesNear public
struct exposes the captured byte window.
with_source() / from_path() builder methods on MarcReader, AuthorityMarcReader, and HoldingsMarcReader. from_path populates
the source name from the file path so emitted errors carry the
filename.
Per-stream recovered-error cap
(#110) via with_max_errors(n) on all three readers. In Lenient / Permissive modes the reader counts each recovered failure and
halts with MarcError::FatalReaderError (E099) once the cap
(default 10000) trips. Pass 0 to disable.
Per-field lenient/permissive recovery parity across all three
ISO 2709 readers (#121, #122). AuthorityMarcReader and HoldingsMarcReader now honor the same
per-field error sites the bibliographic reader does (bad
field-length / start-position digits, field-extends-past-data,
data-field-too-short-for-indicators, parse_data_field failure).
Strict-mode behavior unchanged at every site; lenient/permissive
mode skips the offending entry, counts the recovery against the
per-stream cap, and continues. Truncated-record dispatch in
lenient/permissive mode no longer returns a hard error on a short
read; instead it notes the recovery and falls through to
best-effort directory parsing.
8 property tests (tests/properties.rs) covering binary,
MARCXML, and MARCJSON round-trips plus four ISO 2709 structural
invariants (leader length, directory tiling, indicator byte set,
subfield code shape). ProptestConfig { cases: 64 }; full suite
~3s locally. Primer at docs/contributing/formal-methods.md
(#111).
Coverage-guided fuzzing
(#90, #115). Standalone fuzz/ Cargo workspace with cargo-fuzz; nightly CI matrix runs parse_record (full reader) and roundtrip_binary (reader →
writer → reader coupling) for 5 minutes daily at 03:00 UTC.
Findings are not a PR gate; reproducers get copied into tests/data/fuzz-regressions/ to run on every PR. Triage playbook
in docs/contributing/fuzzing.md.
docs extra in pyproject.toml for mkdocs site dependencies;
CI builds via uv sync --extra docs --no-install-project so the
docs-only job doesn't need a Rust toolchain.
Changed
Shared ISO 2709 parsing primitives + generic skeleton
(#125). src/iso2709.rs
owns the leader read, truncation-aware record-data read,
single-entry directory parsing, ASCII numeric helpers, and the
control-field-tag predicate. New iso2709_skeleton::Iso2709Builder
trait + parse_iso2709_record<R, B> skeleton drives one record's
parse end-to-end; each reader implements Iso2709Builder via a
small adapter (~60 lines each) so read_record collapses to a
one-line dispatch. ~200 lines of near-duplicate read_record body
across three readers replaced by one shared implementation. New recovery::RecoveryCap struct consolidates the per-stream cap
state machine that had been duplicated. Behavior changes that
fall out of the unification: AuthorityMarcReader now treats
too-short data fields as a strict-Err / lenient-skip event with
cap accounting (was silently skipped in all modes); HoldingsMarcReader's field-extends-beyond-data lenient branch now
salvages a clamped slice (matching bib); the holdings "field
exceeds data" error message changed wording to match the other two
readers. The skeleton is <R: Read, B: Iso2709Builder> with no dyn dispatch so trait calls monomorphize and inline at every
call site, preserving hot-loop characteristics. Per-type quirks
(authority's tag UTF-8 strictness + trailing 0x1F trim, holdings'
strict UTF-8 on control fields) preserved via trait-method
overrides; the wider strict-vs-lossy unification is tracked in
bd-bov7.
HoldingsMarcReader::with_max_errors is now active. Originally
landed inert when the cap was introduced (no recovery sites in the
holdings path); the recovery sites added in this cycle hook into it.
MarcError source-error chain walks correctly for IoError, XmlError, JsonError (was previously empty). Code that was
checking Error::source() == None may need updating.
Pinned Rust toolchain to 1.95.0
(#96, #97) via rust-toolchain.toml. Library MSRV
(Cargo.toml → rust-version = "1.71") is unchanged.
CI: skip workflows on docs-only changes. Added paths-ignore
for **.md, docs/**, mkdocs.yml, LICENSE, and .gitignore to lint, test, build, python-build, benchmark-python, and benchmark-rust workflows. A docs-only push or PR previously fired
~47 jobs across the six workflows; it now fires zero. Mixed PRs
(code + docs) still run normally — paths-ignore skips only when
every changed path matches.
.cargo/check.sh builds the docs site (mkdocs build) in full
mode, surfacing broken cross-links pre-push. Skipped under --quick.
Requires the docs extra (uv sync --all-extras).
mkdocs warnings cleanup. Excluded docs/history/ (archival per
CLAUDE.md) from the published site, removed its nav entry, and
fixed broken cross-links and a stale anchor in active docs. Build
warnings down from 18 to 4.
Fixed
Python typing fidelity improvements. Stub gaps in mrrc/_mrrc.pyi
closed (parse_batch_parallel / parse_batch_parallel_limited
signatures, Field.delete_subfield, Record.to_marc21,
module-level __version__) and several wrapper-side narrowing
issues fixed. mypy mrrc/ and pyright mrrc/ now both report zero
errors and run in .cargo/check.sh full mode. Type-only — no
runtime behavior change.
MARCXML reader: missing XML 1.1 §2.11 end-of-line normalization
in text and CDATA content
(#112). Switched both
arms of read_leaf_text from decode() to xml_content() so
CR / CRLF / NEL / LSEP normalize to LF per spec. Domain impact is
small (MARC field content rarely carries line separators) but the
divergence from spec was real.
Read-path performance regression from structured-error refactor
(#117). The ParseContext refactor stopped the compiler from inlining parse_data_field, costing ~15-17% on read hot paths vs v0.7.6
(investigated in #116). Restored to
within +9-11% of baseline via #[inline(always)] on parse_data_field paired with shrinking ParseContext::current_field_tag to Option<[u8; 3]>. The
compact context is what lets forced inlining avoid ballooning L1-i
cache usage on parallel workloads.
CI: Clippy collapsible_match errors in src/bibframe/converter.rs after Rust stable advanced to 1.95.0
(#96, #97).
CI: ASAN job failed with -Zbuild-std on stable; sets RUSTUP_TOOLCHAIN=nightly on the ASAN step
(#105).
CI: Miri job could not run insta snapshot tests; setting INSTA_WORKSPACE_ROOT skips the cargo metadata spawn
(#106).
CI: docs deploy broke when #130
switched the install line to build mrrc itself; now uses uv sync --extra docs --no-install-project
(#131).