OpenDQI

Turn EMIR/SFTR Trade Repository files into deterministic data quality reports and Arrow tables — locally, reproducibly, from your existing data stack.

A local-first Rust engine that ingests both the reports a firm submits to its Trade Repository and the files the TR sends back, then produces reproducible HTML, JSON, CSV, Parquet, and pyarrow.Table outputs. Use it from your terminal (CLI), your browser (local web UI on http://127.0.0.1:7878), or your Python / PyArrow notebook. No network calls. No cloud dependency. Embeds in your own pipeline.

The three things OpenDQI does

Three workflows mirror the three layers of any TR/firm conversation. Each one is one command. Each one writes a deterministic HTML/CSV/JSON triple under --out.

1. TR state health — "what does the TR think I have open?"

opendqi emir tr-state-scan auth107.xml --out ./report/

Ingests the daily auth.107 Trade State Report and surfaces stale valuations, trades past maturity, duplicate active UTIs, valuation after termination, placeholder maturity dates. Score + 16 issues on the shipped 8-record fixture.

2. Rejection intelligence — "what's the TR throwing back, and why?"

opendqi emir feedback auth092.xml --store ./history.db --out ./feedback/
opendqi feedback analytics --store ./history.db --regime emir --out ./rejection_profile.yml
opendqi emir scan next-batch.csv --mapping mapping.yml --rejection-profile ./rejection_profile.yml --out ./pre-flight/

auth.092 rejections feed a SQLite history store + an Open/Resolved/Stale workflow, and roll up into a rejection_profile.yml that gates the next submission via the EMIR.PSC.* family. The post-TR ↔ pre-TR feedback loop.

3. Combined audit — "one report for the committee"

opendqi emir tr-audit --tar auth030.xml --tsr auth107.xml --feedback auth092.xml --out ./audit/

Three layers, one HTML, plus 3 cross-layer EMIR.AUD.* coherence checks (rejected-but-outstanding-in-TSR, MODI-without-prior-NEWT, TERM-but-still-outstanding). 251 issues on the shipped 20-record audit fixture.

Operator scenarios for each workflow in docs/use-cases.md. SFTR has the same surface (opendqi sftr ...) minus rejection feedback (auth.080 is a reconciliation status advice, not feedback).

30-second demo

git clone https://github.com/PauFou/OpenDQI && cd OpenDQI
bash scripts/demo.sh

Runs all three workflows above against the synthetic kit at examples/quickstart-emir/, drops three report.html files under /tmp/opendqi-demo/, and opens the consolidated audit report in your default browser. Builds a debug binary on the first run; subsequent runs are sub-second.

Install

Pick the channel that matches how you work — same engine, same 216 checks, same v1.0 Arrow output contract behind each.

Python (recommended for data teams)

pip install opendqi                # core: pyarrow only
pip install opendqi[spark]         # + pyspark>=3.5 (v0.14)
pip install opendqi[polars]        # + polars>=0.20 (v0.14)
pip install opendqi[all]           # + both

CLI (recommended for ops / one-shot audits)

curl -sSL https://github.com/PauFou/OpenDQI/releases/download/v0.16.0/opendqi-cli-installer.sh | sh

Pre-built binaries for Linux x86_64 + ARM64 and macOS x86_64 + ARM64 from the GitHub Releases page.

Rust source (build from a tag)

cargo install --git https://github.com/PauFou/OpenDQI --tag v0.16.0 opendqi-cli

Then 5 lines that work after either install

import opendqi
table = opendqi.emir.parse_xml("auth030.xml")
result = opendqi.emir.scan_table(table, mapping={c: c for c in table.column_names})
print(result.summary)   # dict — same shape as summary.json
print(result.issues)    # pyarrow.Table — v1.0 stable 11-column schema

Or one of the v0.13 cross-message workflows — 3 files, one call, post-TR audit including the 3 EMIR.AUD.* cross-layer coherence checks:

result = opendqi.emir.tr_audit(
    tar="auth030.xml", tsr="auth107.xml", feedback="auth092.xml",
)

Or v0.14 native Spark — partition-friendly mapInPandas, stays distributed across executors :

issues_sdf = opendqi.spark.scan_spark_dataframe(
    spark_df, regime="emir", mapping={"uti": "trade_uti", ...},
)   # returns a pyspark.sql.DataFrame of issues

Or the Data Quality Pack — 28 regulator-style indicators across EMIR + SFTR (numerator / denominator / rate / threshold / status) with drill-down evidence, granular issues co-produced :

# EMIR — 24 indicators (10 v0.15 + 14 v0.16)
result = opendqi.emir.data_quality_pack(
    tsr="auth107.xml", tar="auth030.xml", feedback="auth092.xml",
)
print(result.indicators.to_pandas())   # 24 rows, v1.0 stable Arrow schema

# SFTR — 4 indicators (v0.16 T2-layer subset, paths-only)
sftr = opendqi.sftr.data_quality_pack(
    tsr="auth079.xml", tar="auth052.xml",
)
print(sftr.indicators.to_pandas())     # 4 rows, same v1.0 schema

The full Python surface (v0.12 → v0.16) adds 15 entry points: scan_parquet/scan_table/parse_xml × 2 régimes + multi-file scan_directory/scan_files + cross-message tr_audit/collateral_audit/book_reconcile/missing_collateral + native data-platform opendqi.spark.scan_spark_dataframe + opendqi.polars.scan_lazyframe + opendqi.emir.data_quality_pack + opendqi.sftr.data_quality_pack (+ the EXPERIMENTAL opendqi.spark.emir.data_quality_pack wrapper). More on the Python side: minimal demo in examples/python/quickstart.py · 8 progressively-realistic patterns in examples/python/ · executable Jupyter notebook in examples/python/quickstart.ipynb · full quickstart guide in docs/python.md · DQI spec in docs/data-quality-pack.md · architecture spec in docs/python-roadmap.md.

Coverage at a glance

	EMIR	SFTR	Total
Single-batch DQ checks (6 dimensions)	89	44	133
TR-layer & cross-message (TSR · TAR · MAR · MSR · Recon · Warnings · Missing-collat · Audit · Collateral-audit · Book-vs-TR · Lifecycle · Feedback · Pre-submission)	62	21	83
Live catalog (post-v0.10.0)	151	65	216

12 ISO 20022 messages parsed against the real ESMA XSD subset, gated locally by OPENDQI_XSD_DIR (SWIFT-licensed XSDs never redistributed). Full catalogues: docs/emir-checks.md · docs/sftr-checks.md. Per-message coverage notes: docs/auth-messages/.

Documentation

Get started : docs/use-cases.md (operator scenarios) · examples/quickstart-emir/ (3-file kit) · scripts/demo.sh (one-shot) · docs/python.md (Python quickstart) · examples/python/ (7 scripts + Jupyter notebook).
Data Quality Pack (v0.16) : docs/data-quality-pack.md — 28 regulator-style indicators (24 EMIR + 4 SFTR) above the 216 granular checks ; the "committee-readable" view. TARGET2 business-day calendar for stale-data indicators.
Positioning : docs/positioning.md (3-layer mental model).
Per-workflow : docs/tr-state-checks.md · docs/tr-activity-checks.md · docs/tr-audit.md · docs/tr-feedback.md · docs/rejection-analytics.md · docs/pre-submission-checks.md · docs/book-reconcile.md · docs/collateral-audit.md · docs/emir-mar-msr.md · docs/emir-recon-stats.md · docs/emir-warnings.md · docs/sftr-missing-collateral.md.
Engineering : docs/auth-messages.md · docs/iso20022-emir.md · docs/iso20022-sftr.md · docs/xml-format.md · docs/xsd-validation.md · docs/parquet-normalize.md · docs/history-store.md · docs/lifecycle-cross-batch.md · docs/desktop-web-ui.md · docs/email-notifications.md.
Reliability : docs/reliability.md · docs/performance.md · CHANGELOG.md.
What's next : docs/python-roadmap.md (Python/Arrow bindings architecture; v0.12 implemented, v0.13+ deferred).

Input formats

CSV (with YAML mapping), ISO 20022 XML (12 supported messages — see above), directories of mixed files, .zip archives (no zip-slip — csv/xml/parquet members only, directory components dropped), single-stream .gz, and Parquet (read + write round-trip — same canonical schema as the bindings spec in docs/python-roadmap.md). Optional XSD validation via xmllint.

Status & roadmap

Version	Theme
v0.16.0 (current)	DQI expansion + SFTR pack v1 — 10 → 28 indicators (24 EMIR + 4 SFTR T2-layer), TARGET2 business-day calendar for stale-data DQIs, new `opendqi sftr data-quality-pack` CLI + `opendqi.sftr.data_quality_pack` Python ; v1.0 stable Arrow schemas unchanged (added rows, not columns). See `docs/data-quality-pack.md`.
v0.15.0	EMIR Data Quality Pack v1 — 10 regulator-style indicators above the 216 checks ; `opendqi emir data-quality-pack` CLI + `opendqi.emir.data_quality_pack` Python ; v1.0 stable Arrow schemas for indicators + evidence.
v0.14.0	Native Spark `mapInPandas` UDF + Polars LazyFrame fast path + `pip install opendqi[spark,polars,all]`
v0.13.0	Python feature expansion — 10 new entry points (multi-file + cross-message + Spark experimental)
v0.12.0	Python/Arrow bindings preview — `opendqi.emir.scan_parquet` + `scan_table(arrow_tbl, mapping)`, issues as `pyarrow.Table`
v0.10.0	Streaming-issue pipeline end-to-end (~32 % EMIR-1M peak RSS reduction, honest measurement on 3 views) + EMIR Article 11 collateral cross-reference (`COL.*`)
v1.0.0	Stable CLI / output / Arrow contract. Locked schemas.

MSRV Rust 1.87.0 (verified in CI). CI cargo fmt --check, cargo clippy -D warnings, build, 831 tests + 20/20 goldens byte-identical, cargo-deny daily — all on Ubuntu + macOS. Local preflight : ./scripts/preflight.sh (one-shot setup cargo install cargo-deny --locked). Auto-run on push : ./scripts/install-hooks.sh.

Full release history in CHANGELOG.md.

Shell completions & man page

opendqi completions bash | sudo tee /etc/bash_completion.d/opendqi
opendqi completions zsh  > ~/.zfunc/_opendqi
opendqi completions fish > ~/.config/fish/completions/opendqi.fish
opendqi man > opendqi.1 && man ./opendqi.1

Supported shells: bash, zsh, fish, powershell, elvish.

Contributing

See CONTRIBUTING.md.

Security

See SECURITY.md for the vulnerability disclosure process. No SWIFT-licensed XSDs are committed; all fixtures are synthetic.

Disclaimer

OpenDQI is not a Trade Repository, ARM, reporting gateway, or regulatory certification tool. It does not submit reports. It provides local data quality analysis and validation support.

Specifically for the v0.15 Data Quality Pack: the DQIs (DQI_VAL_MISSING, DQI_REJ_RATE, etc.) are internal data quality indicators, not regulatory verdicts. A red status is an internal alert asking the firm to investigate — not a declaration of non-compliance. The aggregation rates and the thresholds that drive them are firm-configurable. See docs/data-quality-pack.md for the full vocabulary discipline.

Users remain responsible for their regulatory reporting obligations and should validate outputs against applicable rules, internal controls, and professional advice.

License

Licensed under the Apache License, Version 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
examples		examples
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml
dist-workspace.toml		dist-workspace.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenDQI

The three things OpenDQI does

1. TR state health — "what does the TR think I have open?"

2. Rejection intelligence — "what's the TR throwing back, and why?"

3. Combined audit — "one report for the committee"

30-second demo

Install

Python (recommended for data teams)

CLI (recommended for ops / one-shot audits)

Rust source (build from a tag)

Then 5 lines that work after either install

Coverage at a glance

Documentation

Input formats

Status & roadmap

Shell completions & man page

Contributing

Security

Disclaimer

License

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenDQI

The three things OpenDQI does

1. TR state health — "what does the TR think I have open?"

2. Rejection intelligence — "what's the TR throwing back, and why?"

3. Combined audit — "one report for the committee"

30-second demo

Install

Python (recommended for data teams)

CLI (recommended for ops / one-shot audits)

Rust source (build from a tag)

Then 5 lines that work after either install

Coverage at a glance

Documentation

Input formats

Status & roadmap

Shell completions & man page

Contributing

Security

Disclaimer

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages