Release v0.8.0 · dchud/mrrc

Breaking

MarcError variants restructured to struct form carrying
positional metadata (record_index, byte_offset,
record_byte_offset, record_control_number, field_tag,
indicator_position, subfield_code, found, expected,
source_name). Wrapping variants (IoError, XmlError, JsonError)
chain via #[source] so Error::source() walks correctly; Display
output changed for every variant. Direct constructors and pattern
matches need updating. Migration: match on .code() ("E001"–
"E099", "E101"+) instead of variant names — codes are stable
across future enum changes.
MarcError::InvalidRecord removed. Use the new specific variants
(DirectoryInvalid, RecordLengthInvalid, …) or InvalidField as
the fall-through.
MarcError::ParseError removed. Same migration as
InvalidRecord: the new specific variants and InvalidField
cover the cases that ParseError used to wrap. Match on .code()
for stable identity.
recovery::RecoveryContext removed from the public re-exports.
No external consumer; the type was an internal helper that
accumulated messages and dropped them. Pattern-match on
MarcError instead.
recovery::try_recover_record takes a &ParseContext as a
fifth parameter. The previous v0.7 signature constructed
RecoveryContext internally; v0.8 callers pass the
ParseContext that carries record-index / byte-offset state.

Added

Structured positional context on every parse error — record
index, absolute and record-relative byte offsets, 001 control number,
field tag (and indicator position or subfield code where applicable),
offending bytes (capped at 32), and source filename. New
MarcError::detailed() multi-line diagnostic includes a hex-dump of
the byte window around the error offset; the default Display is a
one-liner. Both produce byte-for-byte identical output across Rust
and Python via matching _format() / detailed() methods on every
Python exception class.
Typed Python exception subclasses for every MarcError variant
(e.g. InvalidIndicator, BadSubfieldCode, TruncatedRecord,
XmlError, JsonError, WriterError, FatalReaderError)
extending the closest pymarc-named parent so existing pymarc-style
except clauses keep catching the same conditions. Each carries
positional kwargs, supports pickle round-trip, and survives the
PyO3 boundary with all attributes intact.
Stable error codes (E001–E007, E099, E101, E105,
E106, E201, E202, E301, E401, E402, E404) on every
variant via MarcError::code() / slug(), with matching code /
slug / help_url() on every Python class. Codes never get
renumbered (policy in CONTRIBUTING.md); MRRC_DOCS_BASE_URL
overrides the help-URL host. All codes documented in
docs/reference/error-codes.md.
Structured error serialization via to_dict() / to_json() on
every Python exception (suitable for ELK / Datadog / Splunk) and
MarcError::to_json_value() / to_json() on the Rust side; bytes
fields hex-encoded, _cause flattens the exception chain,
schema_version: 1 for forward-compat. New BytesNear public
struct exposes the captured byte window.
with_source() / from_path() builder methods on MarcReader,
AuthorityMarcReader, and HoldingsMarcReader. from_path populates
the source name from the file path so emitted errors carry the
filename.
Per-stream recovered-error cap
(#110) via
with_max_errors(n) on all three readers. In Lenient /
Permissive modes the reader counts each recovered failure and
halts with MarcError::FatalReaderError (E099) once the cap
(default 10000) trips. Pass 0 to disable.
Per-field lenient/permissive recovery parity across all three
ISO 2709 readers (#121,
#122).
AuthorityMarcReader and HoldingsMarcReader now honor the same
per-field error sites the bibliographic reader does (bad
field-length / start-position digits, field-extends-past-data,
data-field-too-short-for-indicators, parse_data_field failure).
Strict-mode behavior unchanged at every site; lenient/permissive
mode skips the offending entry, counts the recovery against the
per-stream cap, and continues. Truncated-record dispatch in
lenient/permissive mode no longer returns a hard error on a short
read; instead it notes the recovery and falls through to
best-effort directory parsing.
8 property tests (tests/properties.rs) covering binary,
MARCXML, and MARCJSON round-trips plus four ISO 2709 structural
invariants (leader length, directory tiling, indicator byte set,
subfield code shape). ProptestConfig { cases: 64 }; full suite
~3s locally. Primer at docs/contributing/formal-methods.md
(#111).
Coverage-guided fuzzing
(#90,
#115). Standalone
fuzz/ Cargo workspace with cargo-fuzz; nightly CI matrix runs
parse_record (full reader) and roundtrip_binary (reader →
writer → reader coupling) for 5 minutes daily at 03:00 UTC.
Findings are not a PR gate; reproducers get copied into
tests/data/fuzz-regressions/ to run on every PR. Triage playbook
in docs/contributing/fuzzing.md.
docs extra in pyproject.toml for mkdocs site dependencies;
CI builds via uv sync --extra docs --no-install-project so the
docs-only job doesn't need a Rust toolchain.

Changed

Shared ISO 2709 parsing primitives + generic skeleton
(#125). src/iso2709.rs
owns the leader read, truncation-aware record-data read,
single-entry directory parsing, ASCII numeric helpers, and the
control-field-tag predicate. New iso2709_skeleton::Iso2709Builder
trait + parse_iso2709_record<R, B> skeleton drives one record's
parse end-to-end; each reader implements Iso2709Builder via a
small adapter (~60 lines each) so read_record collapses to a
one-line dispatch. ~200 lines of near-duplicate read_record body
across three readers replaced by one shared implementation. New
recovery::RecoveryCap struct consolidates the per-stream cap
state machine that had been duplicated. Behavior changes that
fall out of the unification: AuthorityMarcReader now treats
too-short data fields as a strict-Err / lenient-skip event with
cap accounting (was silently skipped in all modes);
HoldingsMarcReader's field-extends-beyond-data lenient branch now
salvages a clamped slice (matching bib); the holdings "field
exceeds data" error message changed wording to match the other two
readers. The skeleton is <R: Read, B: Iso2709Builder> with no
dyn dispatch so trait calls monomorphize and inline at every
call site, preserving hot-loop characteristics. Per-type quirks
(authority's tag UTF-8 strictness + trailing 0x1F trim, holdings'
strict UTF-8 on control fields) preserved via trait-method
overrides; the wider strict-vs-lossy unification is tracked in
bd-bov7.
HoldingsMarcReader::with_max_errors is now active. Originally
landed inert when the cap was introduced (no recovery sites in the
holdings path); the recovery sites added in this cycle hook into it.
MarcError source-error chain walks correctly for IoError,
XmlError, JsonError (was previously empty). Code that was
checking Error::source() == None may need updating.
Pinned Rust toolchain to 1.95.0
(#96,
#97) via
rust-toolchain.toml. Library MSRV
(Cargo.toml → rust-version = "1.71") is unchanged.
CI: skip workflows on docs-only changes. Added paths-ignore
for **.md, docs/**, mkdocs.yml, LICENSE, and .gitignore to
lint, test, build, python-build, benchmark-python, and
benchmark-rust workflows. A docs-only push or PR previously fired
~47 jobs across the six workflows; it now fires zero. Mixed PRs
(code + docs) still run normally — paths-ignore skips only when
every changed path matches.
.cargo/check.sh builds the docs site (mkdocs build) in full
mode, surfacing broken cross-links pre-push. Skipped under --quick.
Requires the docs extra (uv sync --all-extras).
mkdocs warnings cleanup. Excluded docs/history/ (archival per
CLAUDE.md) from the published site, removed its nav entry, and
fixed broken cross-links and a stale anchor in active docs. Build
warnings down from 18 to 4.

Fixed

Python typing fidelity improvements. Stub gaps in mrrc/_mrrc.pyi
closed (parse_batch_parallel / parse_batch_parallel_limited
signatures, Field.delete_subfield, Record.to_marc21,
module-level __version__) and several wrapper-side narrowing
issues fixed. mypy mrrc/ and pyright mrrc/ now both report zero
errors and run in .cargo/check.sh full mode. Type-only — no
runtime behavior change.
MARCXML reader: missing XML 1.1 §2.11 end-of-line normalization
in text and CDATA content
(#112). Switched both
arms of read_leaf_text from decode() to xml_content() so
CR / CRLF / NEL / LSEP normalize to LF per spec. Domain impact is
small (MARC field content rarely carries line separators) but the
divergence from spec was real.
Read-path performance regression from structured-error refactor
(#117). The
ParseContext refactor stopped the compiler from inlining
parse_data_field, costing ~15-17% on read hot paths vs v0.7.6
(investigated in
#116). Restored to
within +9-11% of baseline via #[inline(always)] on
parse_data_field paired with shrinking
ParseContext::current_field_tag to Option<[u8; 3]>. The
compact context is what lets forced inlining avoid ballooning L1-i
cache usage on parallel workloads.
CI: Clippy collapsible_match errors in
src/bibframe/converter.rs after Rust stable advanced to 1.95.0
(#96,
#97).
CI: ASAN job failed with -Zbuild-std on stable; sets
RUSTUP_TOOLCHAIN=nightly on the ASAN step
(#105).
CI: Miri job could not run insta snapshot tests; setting
INSTA_WORKSPACE_ROOT skips the cargo metadata spawn
(#106).
CI: docs deploy broke when #130
switched the install line to build mrrc itself; now uses
uv sync --extra docs --no-install-project
(#131).

Dependencies

Bump ruff from 0.15.11 to 0.15.12
Bump mypy from 1.20.1 to 1.20.2
Bump pyright from 1.1.408 to 1.1.409

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Breaking

Added

Changed

Fixed

Dependencies

Uh oh!