Skip to content

v0.8.0

Choose a tag to compare

@github-actions github-actions released this 29 Apr 05:05
· 171 commits to main since this release
9ef496f

Breaking

  • MarcError variants restructured to struct form carrying
    positional metadata (record_index, byte_offset,
    record_byte_offset, record_control_number, field_tag,
    indicator_position, subfield_code, found, expected,
    source_name). Wrapping variants (IoError, XmlError, JsonError)
    chain via #[source] so Error::source() walks correctly; Display
    output changed for every variant. Direct constructors and pattern
    matches need updating. Migration: match on .code() ("E001"
    "E099", "E101"+) instead of variant names — codes are stable
    across future enum changes.
  • MarcError::InvalidRecord removed. Use the new specific variants
    (DirectoryInvalid, RecordLengthInvalid, …) or InvalidField as
    the fall-through.
  • MarcError::ParseError removed. Same migration as
    InvalidRecord: the new specific variants and InvalidField
    cover the cases that ParseError used to wrap. Match on .code()
    for stable identity.
  • recovery::RecoveryContext removed from the public re-exports.
    No external consumer; the type was an internal helper that
    accumulated messages and dropped them. Pattern-match on
    MarcError instead.
  • recovery::try_recover_record takes a &ParseContext as a
    fifth parameter. The previous v0.7 signature constructed
    RecoveryContext internally; v0.8 callers pass the
    ParseContext that carries record-index / byte-offset state.

Added

  • Structured positional context on every parse error — record
    index, absolute and record-relative byte offsets, 001 control number,
    field tag (and indicator position or subfield code where applicable),
    offending bytes (capped at 32), and source filename. New
    MarcError::detailed() multi-line diagnostic includes a hex-dump of
    the byte window around the error offset; the default Display is a
    one-liner. Both produce byte-for-byte identical output across Rust
    and Python via matching _format() / detailed() methods on every
    Python exception class.
  • Typed Python exception subclasses for every MarcError variant
    (e.g. InvalidIndicator, BadSubfieldCode, TruncatedRecord,
    XmlError, JsonError, WriterError, FatalReaderError)
    extending the closest pymarc-named parent so existing pymarc-style
    except clauses keep catching the same conditions. Each carries
    positional kwargs, supports pickle round-trip, and survives the
    PyO3 boundary with all attributes intact.
  • Stable error codes (E001E007, E099, E101, E105,
    E106, E201, E202, E301, E401, E402, E404) on every
    variant via MarcError::code() / slug(), with matching code /
    slug / help_url() on every Python class. Codes never get
    renumbered (policy in CONTRIBUTING.md); MRRC_DOCS_BASE_URL
    overrides the help-URL host. All codes documented in
    docs/reference/error-codes.md.
  • Structured error serialization via to_dict() / to_json() on
    every Python exception (suitable for ELK / Datadog / Splunk) and
    MarcError::to_json_value() / to_json() on the Rust side; bytes
    fields hex-encoded, _cause flattens the exception chain,
    schema_version: 1 for forward-compat. New BytesNear public
    struct exposes the captured byte window.
  • with_source() / from_path() builder methods on MarcReader,
    AuthorityMarcReader, and HoldingsMarcReader. from_path populates
    the source name from the file path so emitted errors carry the
    filename.
  • Per-stream recovered-error cap
    (#110) via
    with_max_errors(n) on all three readers. In Lenient /
    Permissive modes the reader counts each recovered failure and
    halts with MarcError::FatalReaderError (E099) once the cap
    (default 10000) trips. Pass 0 to disable.
  • Per-field lenient/permissive recovery parity across all three
    ISO 2709 readers
    (#121,
    #122).
    AuthorityMarcReader and HoldingsMarcReader now honor the same
    per-field error sites the bibliographic reader does (bad
    field-length / start-position digits, field-extends-past-data,
    data-field-too-short-for-indicators, parse_data_field failure).
    Strict-mode behavior unchanged at every site; lenient/permissive
    mode skips the offending entry, counts the recovery against the
    per-stream cap, and continues. Truncated-record dispatch in
    lenient/permissive mode no longer returns a hard error on a short
    read; instead it notes the recovery and falls through to
    best-effort directory parsing.
  • 8 property tests (tests/properties.rs) covering binary,
    MARCXML, and MARCJSON round-trips plus four ISO 2709 structural
    invariants (leader length, directory tiling, indicator byte set,
    subfield code shape). ProptestConfig { cases: 64 }; full suite
    ~3s locally. Primer at docs/contributing/formal-methods.md
    (#111).
  • Coverage-guided fuzzing
    (#90,
    #115). Standalone
    fuzz/ Cargo workspace with cargo-fuzz; nightly CI matrix runs
    parse_record (full reader) and roundtrip_binary (reader →
    writer → reader coupling) for 5 minutes daily at 03:00 UTC.
    Findings are not a PR gate; reproducers get copied into
    tests/data/fuzz-regressions/ to run on every PR. Triage playbook
    in docs/contributing/fuzzing.md.
  • docs extra in pyproject.toml for mkdocs site dependencies;
    CI builds via uv sync --extra docs --no-install-project so the
    docs-only job doesn't need a Rust toolchain.

Changed

  • Shared ISO 2709 parsing primitives + generic skeleton
    (#125). src/iso2709.rs
    owns the leader read, truncation-aware record-data read,
    single-entry directory parsing, ASCII numeric helpers, and the
    control-field-tag predicate. New iso2709_skeleton::Iso2709Builder
    trait + parse_iso2709_record<R, B> skeleton drives one record's
    parse end-to-end; each reader implements Iso2709Builder via a
    small adapter (~60 lines each) so read_record collapses to a
    one-line dispatch. ~200 lines of near-duplicate read_record body
    across three readers replaced by one shared implementation. New
    recovery::RecoveryCap struct consolidates the per-stream cap
    state machine that had been duplicated. Behavior changes that
    fall out of the unification: AuthorityMarcReader now treats
    too-short data fields as a strict-Err / lenient-skip event with
    cap accounting (was silently skipped in all modes);
    HoldingsMarcReader's field-extends-beyond-data lenient branch now
    salvages a clamped slice (matching bib); the holdings "field
    exceeds data" error message changed wording to match the other two
    readers. The skeleton is <R: Read, B: Iso2709Builder> with no
    dyn dispatch so trait calls monomorphize and inline at every
    call site, preserving hot-loop characteristics. Per-type quirks
    (authority's tag UTF-8 strictness + trailing 0x1F trim, holdings'
    strict UTF-8 on control fields) preserved via trait-method
    overrides; the wider strict-vs-lossy unification is tracked in
    bd-bov7.
  • HoldingsMarcReader::with_max_errors is now active. Originally
    landed inert when the cap was introduced (no recovery sites in the
    holdings path); the recovery sites added in this cycle hook into it.
  • MarcError source-error chain walks correctly for IoError,
    XmlError, JsonError (was previously empty). Code that was
    checking Error::source() == None may need updating.
  • Pinned Rust toolchain to 1.95.0
    (#96,
    #97) via
    rust-toolchain.toml. Library MSRV
    (Cargo.tomlrust-version = "1.71") is unchanged.
  • CI: skip workflows on docs-only changes. Added paths-ignore
    for **.md, docs/**, mkdocs.yml, LICENSE, and .gitignore to
    lint, test, build, python-build, benchmark-python, and
    benchmark-rust workflows. A docs-only push or PR previously fired
    ~47 jobs across the six workflows; it now fires zero. Mixed PRs
    (code + docs) still run normally — paths-ignore skips only when
    every changed path matches.
  • .cargo/check.sh builds the docs site (mkdocs build) in full
    mode, surfacing broken cross-links pre-push. Skipped under --quick.
    Requires the docs extra (uv sync --all-extras).
  • mkdocs warnings cleanup. Excluded docs/history/ (archival per
    CLAUDE.md) from the published site, removed its nav entry, and
    fixed broken cross-links and a stale anchor in active docs. Build
    warnings down from 18 to 4.

Fixed

  • Python typing fidelity improvements. Stub gaps in mrrc/_mrrc.pyi
    closed (parse_batch_parallel / parse_batch_parallel_limited
    signatures, Field.delete_subfield, Record.to_marc21,
    module-level __version__) and several wrapper-side narrowing
    issues fixed. mypy mrrc/ and pyright mrrc/ now both report zero
    errors and run in .cargo/check.sh full mode. Type-only — no
    runtime behavior change.
  • MARCXML reader: missing XML 1.1 §2.11 end-of-line normalization
    in text and CDATA content

    (#112). Switched both
    arms of read_leaf_text from decode() to xml_content() so
    CR / CRLF / NEL / LSEP normalize to LF per spec. Domain impact is
    small (MARC field content rarely carries line separators) but the
    divergence from spec was real.
  • Read-path performance regression from structured-error refactor
    (#117). The
    ParseContext refactor stopped the compiler from inlining
    parse_data_field, costing ~15-17% on read hot paths vs v0.7.6
    (investigated in
    #116). Restored to
    within +9-11% of baseline via #[inline(always)] on
    parse_data_field paired with shrinking
    ParseContext::current_field_tag to Option<[u8; 3]>. The
    compact context is what lets forced inlining avoid ballooning L1-i
    cache usage on parallel workloads.
  • CI: Clippy collapsible_match errors in
    src/bibframe/converter.rs after Rust stable advanced to 1.95.0
    (#96,
    #97).
  • CI: ASAN job failed with -Zbuild-std on stable; sets
    RUSTUP_TOOLCHAIN=nightly on the ASAN step
    (#105).
  • CI: Miri job could not run insta snapshot tests; setting
    INSTA_WORKSPACE_ROOT skips the cargo metadata spawn
    (#106).
  • CI: docs deploy broke when #130
    switched the install line to build mrrc itself; now uses
    uv sync --extra docs --no-install-project
    (#131).

Dependencies

  • Bump ruff from 0.15.11 to 0.15.12
  • Bump mypy from 1.20.1 to 1.20.2
  • Bump pyright from 1.1.408 to 1.1.409