A test harness for the mrrc MARC record processor. Discovers bugs that curated unit test fixtures miss by running mrrc against real-world MARC data at scale.
The testbed exercises mrrc across two dimensions:
- Rust core tests: stress testing, malformed record handling, encoding roundtrips, concurrency, and edge case discovery against large datasets
- Python binding tests: pymarc API compatibility, encoding through bindings, iteration at scale
Two test modes control which data is used:
- CI mode (default): Runs against committed fixture records only (~680 KB in
data/fixtures/). Fast, deterministic, no downloads required. - Local mode (
MRRC_TEST_MODE=local): Runs against downloaded public datasets and optional bring-your-own data. Thorough, may take minutes.
git clone https://github.com/dchud/mrrc-testbed.git
cd mrrc-testbed
just setup # cargo build, uv sync, copy .env.example -> .envPrerequisites: Rust (current edition), Python 3.13+, uv, just. Optional: gh (for just report).
just test # run all Rust + Python tests
just test-rust # Rust only
just test-python # Python onlyjust test-local # all suites against downloaded data
just test-stress # stress tests only (with verbose output)
just bench # same as test-stress (alias)Local-mode tests are marked #[ignore] in Rust and @pytest.mark.local in Python. They are automatically included when running just test-local.
Optional pre-commit and pre-push hooks gate commits on lint/format and pushes on tests. Test failures are checked against state/known-failures.yaml — known upstream mrrc failures pass through, only unexpected failures block.
just install-hooks # symlink hooks into .git/hooks/
just uninstall-hooks # remove the symlinksThe known-failures file is scoped by mrrc_source (released, local, or any), so failures tied to a released mrrc version won't suppress the same test when running against a local checkout with the fix.
just known-failures # show current allow-list
just check-known-failures # audit for stale entries
just update-known-failures # auto-remove stale entries
just add-known-failure cargo test_name "mrrc #42 — description"CI enforces the same known-failure checks regardless of hook installation.
just download watson # ~20 MB, 11 files, good starting point
just download ia_lendable # ~129 MB, Internet Archive lendable books
just download-verify # check integrity of all downloaded datasetsAvailable datasets:
| Name | Size | Description |
|---|---|---|
watson |
~20 MB | Watson MARC test collection (11 .mrc files) |
ia_lendable |
~129 MB | Internet Archive lendable books metadata |
loc_books |
~15 GB | Library of Congress Books All (deferred) |
loc_names |
~1.5 GB | Library of Congress Name Authority File |
loc_subjects |
~1 GB | Library of Congress Subject Authority File |
Downloads go to data/downloads/ (gitignored).
Set environment variables in .env to point at your own MARC files:
# Override a specific dataset name
MRRC_WATSON=/path/to/my/watson.mrc
# Or set a local directory (subdirectories should match dataset names)
MRRC_LOCAL_DIR=/path/to/my/marc/data
# Or point to a single local file
MRRC_LOCAL_DATASET=/path/to/any/file.mrcThe dataset priority cascade in local mode is: env override -> local path -> downloads -> fixtures.
Local-mode tests automatically scan for parsing errors and unusual records. The results flow through a two-stage pipeline:
Stage 1: Test output (ephemeral)
Tests write JSON to results/discoveries/ (gitignored). This happens automatically during just test-local.
Stage 2: Import to state (persistent)
just import # deduplicate and convert to YAML in state/
just discoveries # list all discoveries
just show disc-ia-20260226-0001 # view details of a specific discoveryThe import step deduplicates by SHA-256 hash of the raw record bytes, so re-running tests and re-importing is safe.
When the testbed discovers a bug, this is the full cycle from discovery through fix verification to permanent regression test.
After a local-mode test run, import and review discoveries:
just import
just discoveriesOutput:
ID Date Category Dataset Control#
disc-ia-20260226-0042 2026-02-26 truncated_record ia_lendable ocm12345678
disc-2026-02-27-001 2026-02-27 malformed_record ia_lendable unknown
...
Each row is a distinct problematic record. View full details with:
just show disc-ia-20260226-0042The discovery YAML includes the error message, source dataset, byte offset,
and path to an extracted copy of the record in state/records/.
just report disc-ia-20260226-0042This creates a GitHub issue on dchud/mrrc with the error details, source
dataset, reproduction info, and a link back to the testbed discovery.
Copy the issue URL for the promote step later.
Once there's a fix in your local mrrc checkout (any branch):
just use-local-mrrc ../mrrcThis patches both the Rust and Python dependencies and confirms the switch:
mrrc source: local (/Users/you/mrrc)
Rust: mrrc v0.7.3
Python: mrrc 0.7.3 (/Users/you/mrrc)
Check the current state at any time with:
just mrrc-statusjust promote disc-ia-20260226-0042
just testThe promote step copies the extracted record into data/fixtures/edge_cases/,
making it a permanent regression test. CI tests validate all fixtures parse
cleanly — if the fix works, tests pass.
Optionally link the mrrc issue for provenance:
just promote disc-ia-20260226-0042 edge_cases --issue=https://github.com/dchud/mrrc/issues/42After the fix ships in a new mrrc release, update the version pins in
Cargo.toml and pyproject.toml, then switch back:
just use-released-mrrc
just testThe promoted fixture stays permanently as a regression test.
| Recipe | Description |
|---|---|
just setup |
Build Rust, install Python deps, create .env |
just test |
CI-mode tests (Rust + Python, fixtures only) |
just test-local |
Local-mode tests (all datasets) |
just test-rust |
Rust CI-mode tests only |
just test-python |
Python CI-mode tests only |
just test-stress |
Stress tests with verbose output |
just bench |
Alias for test-stress |
just lint |
Check formatting and linting (cargo fmt, clippy, ruff) |
just fmt |
Auto-fix formatting |
just download NAME |
Download a specific dataset |
just download-verify |
Verify all downloaded datasets |
just validate |
Validate committed fixtures and manifests |
just import |
Import test results to persistent state |
just discoveries |
List all discoveries |
just show ID |
Show details of a specific discovery |
just promote ID [FIXTURE] |
Promote discovery to fixture (default: edge_cases) |
just report ID |
File an mrrc issue from a discovery (requires gh) |
just use-local-mrrc [PATH] |
Point testbed at a local mrrc checkout |
just use-released-mrrc |
Revert to released mrrc from crates.io / PyPI |
just mrrc-status |
Show which mrrc version is active |
just install-hooks |
Install git hooks (pre-commit lint, pre-push tests) |
just uninstall-hooks |
Remove git hooks |
just known-failures |
Show current known-failures allow-list |
just check-known-failures |
Audit for stale/unexpected known failures |
just update-known-failures |
Auto-remove stale known-failure entries |
just add-known-failure RUNNER TEST_ID REASON |
Add a known failure entry |
Each discovery in state/discoveries/ is a YAML file:
discovery_id: disc-ia-20260226-0001
discovered_at: '2026-02-26T23:38:23'
discovered_in_run: run-2026-02-26-001
mrrc_version: 0.1.0
test_suite: ia_lendable_discovery
test_name: full_scan
record:
sha256: f334f844...
control_number: 8087primer00palm
source_dataset: ia_lendable
source_offset: 1039123
extracted_file: state/records/ia_lendable_0001.mrc
error:
category: truncated_record
message: 'Invalid record: Truncated record: expected 930 bytes, got 930'
mrrc_error: 'Invalid record: Truncated record: expected 930 bytes, got 930'Each import creates a run record in state/runs/:
run_id: run-2026-02-26-001
started_at: '2026-02-27T04:41:01'
completed_at: '2026-02-27T04:41:01'
environment:
mrrc_version: 0.1.0
results:
total_records: 233
new_discoveries: 233
duplicates_skipped: 0
discovery_ids:
- disc-ia-20260226-0001
- disc-ia-20260226-0002
# ...Each fixture directory contains a manifest.json with provenance for every committed record:
[
{
"control_number": "2004436158",
"source": "LOC Catalog SRU",
"query": "bath.title=\"history\" and dc.date=\"2003\"",
"retrieved_at": "2026-02-27T00:15:42",
"record_index": 0,
"sha256": "a1b2c3d4..."
}
]Validate fixture integrity with just validate.
crates/mrrc_testbed/ Rust test harness (lib + integration tests)
src/mrrc_testbed/ Python package (config, datasets, state, discovery)
suites/ Python test suites (pymarc compat, encoding, discovery)
scripts/ CLI tools (download, validate, import, curate)
data/fixtures/ Committed test records with manifest.json provenance
data/downloads/ Gitignored large public datasets
data/local/ Gitignored BYOD data
state/ Discovery and run YAML files (committed)
results/ Gitignored per-run output