Turn EMIR/SFTR Trade Repository files into deterministic data quality reports and Arrow tables — locally, reproducibly, from your existing data stack.
A local-first Rust engine that ingests both the reports a firm submits to its Trade Repository and the files the TR sends back, then produces reproducible HTML, JSON, CSV, Parquet, and pyarrow.Table outputs. Use it from your terminal (CLI), your browser (local web UI on http://127.0.0.1:7878), or your Python / PyArrow notebook. No network calls. No cloud dependency. Embeds in your own pipeline.
Three workflows mirror the three layers of any TR/firm conversation. Each one is one command. Each one writes a deterministic HTML/CSV/JSON triple under --out.
opendqi emir tr-state-scan auth107.xml --out ./report/Ingests the daily auth.107 Trade State Report and surfaces stale valuations, trades past maturity, duplicate active UTIs, valuation after termination, placeholder maturity dates. Score + 16 issues on the shipped 8-record fixture.
opendqi emir feedback auth092.xml --store ./history.db --out ./feedback/
opendqi feedback analytics --store ./history.db --regime emir --out ./rejection_profile.yml
opendqi emir scan next-batch.csv --mapping mapping.yml --rejection-profile ./rejection_profile.yml --out ./pre-flight/auth.092 rejections feed a SQLite history store + an Open/Resolved/Stale workflow, and roll up into a rejection_profile.yml that gates the next submission via the EMIR.PSC.* family. The post-TR ↔ pre-TR feedback loop.
opendqi emir tr-audit --tar auth030.xml --tsr auth107.xml --feedback auth092.xml --out ./audit/Three layers, one HTML, plus 3 cross-layer EMIR.AUD.* coherence checks (rejected-but-outstanding-in-TSR, MODI-without-prior-NEWT, TERM-but-still-outstanding). 251 issues on the shipped 20-record audit fixture.
Operator scenarios for each workflow in docs/use-cases.md. SFTR has the same surface (opendqi sftr ...) minus rejection feedback (auth.080 is a reconciliation status advice, not feedback).
git clone https://github.com/PauFou/OpenDQI && cd OpenDQI
bash scripts/demo.shRuns all three workflows above against the synthetic kit at examples/quickstart-emir/, drops three report.html files under /tmp/opendqi-demo/, and opens the consolidated audit report in your default browser. Builds a debug binary on the first run; subsequent runs are sub-second.
Pick the channel that matches how you work — same engine, same 216 checks, same v1.0 Arrow output contract behind each.
pip install opendqi # core: pyarrow only
pip install opendqi[spark] # + pyspark>=3.5 (v0.14)
pip install opendqi[polars] # + polars>=0.20 (v0.14)
pip install opendqi[all] # + bothcurl -sSL https://github.com/PauFou/OpenDQI/releases/download/v0.16.0/opendqi-cli-installer.sh | shPre-built binaries for Linux x86_64 + ARM64 and macOS x86_64 + ARM64 from the GitHub Releases page.
cargo install --git https://github.com/PauFou/OpenDQI --tag v0.16.0 opendqi-cliimport opendqi
table = opendqi.emir.parse_xml("auth030.xml")
result = opendqi.emir.scan_table(table, mapping={c: c for c in table.column_names})
print(result.summary) # dict — same shape as summary.json
print(result.issues) # pyarrow.Table — v1.0 stable 11-column schemaOr one of the v0.13 cross-message workflows — 3 files, one call, post-TR audit including the 3 EMIR.AUD.* cross-layer coherence checks:
result = opendqi.emir.tr_audit(
tar="auth030.xml", tsr="auth107.xml", feedback="auth092.xml",
)Or v0.14 native Spark — partition-friendly mapInPandas, stays distributed across executors :
issues_sdf = opendqi.spark.scan_spark_dataframe(
spark_df, regime="emir", mapping={"uti": "trade_uti", ...},
) # returns a pyspark.sql.DataFrame of issuesOr the Data Quality Pack — 28 regulator-style indicators across EMIR + SFTR (numerator / denominator / rate / threshold / status) with drill-down evidence, granular issues co-produced :
# EMIR — 24 indicators (10 v0.15 + 14 v0.16)
result = opendqi.emir.data_quality_pack(
tsr="auth107.xml", tar="auth030.xml", feedback="auth092.xml",
)
print(result.indicators.to_pandas()) # 24 rows, v1.0 stable Arrow schema
# SFTR — 4 indicators (v0.16 T2-layer subset, paths-only)
sftr = opendqi.sftr.data_quality_pack(
tsr="auth079.xml", tar="auth052.xml",
)
print(sftr.indicators.to_pandas()) # 4 rows, same v1.0 schemaThe full Python surface (v0.12 → v0.16) adds 15 entry points: scan_parquet/scan_table/parse_xml × 2 régimes + multi-file scan_directory/scan_files + cross-message tr_audit/collateral_audit/book_reconcile/missing_collateral + native data-platform opendqi.spark.scan_spark_dataframe + opendqi.polars.scan_lazyframe + opendqi.emir.data_quality_pack + opendqi.sftr.data_quality_pack (+ the EXPERIMENTAL opendqi.spark.emir.data_quality_pack wrapper). More on the Python side: minimal demo in examples/python/quickstart.py · 8 progressively-realistic patterns in examples/python/ · executable Jupyter notebook in examples/python/quickstart.ipynb · full quickstart guide in docs/python.md · DQI spec in docs/data-quality-pack.md · architecture spec in docs/python-roadmap.md.
| EMIR | SFTR | Total | |
|---|---|---|---|
| Single-batch DQ checks (6 dimensions) | 89 | 44 | 133 |
| TR-layer & cross-message (TSR · TAR · MAR · MSR · Recon · Warnings · Missing-collat · Audit · Collateral-audit · Book-vs-TR · Lifecycle · Feedback · Pre-submission) | 62 | 21 | 83 |
| Live catalog (post-v0.10.0) | 151 | 65 | 216 |
12 ISO 20022 messages parsed against the real ESMA XSD subset, gated locally by OPENDQI_XSD_DIR (SWIFT-licensed XSDs never redistributed). Full catalogues: docs/emir-checks.md · docs/sftr-checks.md. Per-message coverage notes: docs/auth-messages/.
- Get started :
docs/use-cases.md(operator scenarios) ·examples/quickstart-emir/(3-file kit) ·scripts/demo.sh(one-shot) ·docs/python.md(Python quickstart) ·examples/python/(7 scripts + Jupyter notebook). - Data Quality Pack (v0.16) :
docs/data-quality-pack.md— 28 regulator-style indicators (24 EMIR + 4 SFTR) above the 216 granular checks ; the "committee-readable" view. TARGET2 business-day calendar for stale-data indicators. - Positioning :
docs/positioning.md(3-layer mental model). - Per-workflow :
docs/tr-state-checks.md·docs/tr-activity-checks.md·docs/tr-audit.md·docs/tr-feedback.md·docs/rejection-analytics.md·docs/pre-submission-checks.md·docs/book-reconcile.md·docs/collateral-audit.md·docs/emir-mar-msr.md·docs/emir-recon-stats.md·docs/emir-warnings.md·docs/sftr-missing-collateral.md. - Engineering :
docs/auth-messages.md·docs/iso20022-emir.md·docs/iso20022-sftr.md·docs/xml-format.md·docs/xsd-validation.md·docs/parquet-normalize.md·docs/history-store.md·docs/lifecycle-cross-batch.md·docs/desktop-web-ui.md·docs/email-notifications.md. - Reliability :
docs/reliability.md·docs/performance.md·CHANGELOG.md. - What's next :
docs/python-roadmap.md(Python/Arrow bindings architecture; v0.12 implemented, v0.13+ deferred).
CSV (with YAML mapping), ISO 20022 XML (12 supported messages — see above), directories of mixed files, .zip archives (no zip-slip — csv/xml/parquet members only, directory components dropped), single-stream .gz, and Parquet (read + write round-trip — same canonical schema as the bindings spec in docs/python-roadmap.md). Optional XSD validation via xmllint.
| Version | Theme |
|---|---|
| v0.16.0 (current) | DQI expansion + SFTR pack v1 — 10 → 28 indicators (24 EMIR + 4 SFTR T2-layer), TARGET2 business-day calendar for stale-data DQIs, new opendqi sftr data-quality-pack CLI + opendqi.sftr.data_quality_pack Python ; v1.0 stable Arrow schemas unchanged (added rows, not columns). See docs/data-quality-pack.md. |
| v0.15.0 | EMIR Data Quality Pack v1 — 10 regulator-style indicators above the 216 checks ; opendqi emir data-quality-pack CLI + opendqi.emir.data_quality_pack Python ; v1.0 stable Arrow schemas for indicators + evidence. |
| v0.14.0 | Native Spark mapInPandas UDF + Polars LazyFrame fast path + pip install opendqi[spark,polars,all] |
| v0.13.0 | Python feature expansion — 10 new entry points (multi-file + cross-message + Spark experimental) |
| v0.12.0 | Python/Arrow bindings preview — opendqi.emir.scan_parquet + scan_table(arrow_tbl, mapping), issues as pyarrow.Table |
| v0.10.0 | Streaming-issue pipeline end-to-end (~32 % EMIR-1M peak RSS reduction, honest measurement on 3 views) + EMIR Article 11 collateral cross-reference (COL.*) |
| v1.0.0 | Stable CLI / output / Arrow contract. Locked schemas. |
MSRV Rust 1.87.0 (verified in CI). CI cargo fmt --check, cargo clippy -D warnings, build, 831 tests + 20/20 goldens byte-identical, cargo-deny daily — all on Ubuntu + macOS. Local preflight : ./scripts/preflight.sh (one-shot setup cargo install cargo-deny --locked). Auto-run on push : ./scripts/install-hooks.sh.
Full release history in CHANGELOG.md.
opendqi completions bash | sudo tee /etc/bash_completion.d/opendqi
opendqi completions zsh > ~/.zfunc/_opendqi
opendqi completions fish > ~/.config/fish/completions/opendqi.fish
opendqi man > opendqi.1 && man ./opendqi.1Supported shells: bash, zsh, fish, powershell, elvish.
See CONTRIBUTING.md.
See SECURITY.md for the vulnerability disclosure process. No SWIFT-licensed XSDs are committed; all fixtures are synthetic.
OpenDQI is not a Trade Repository, ARM, reporting gateway, or regulatory certification tool. It does not submit reports. It provides local data quality analysis and validation support.
Specifically for the v0.15 Data Quality Pack: the DQIs (DQI_VAL_MISSING, DQI_REJ_RATE, etc.) are internal data quality indicators, not regulatory verdicts. A red status is an internal alert asking the firm to investigate — not a declaration of non-compliance. The aggregation rates and the thresholds that drive them are firm-configurable. See docs/data-quality-pack.md for the full vocabulary discipline.
Users remain responsible for their regulatory reporting obligations and should validate outputs against applicable rules, internal controls, and professional advice.
Licensed under the Apache License, Version 2.0. See LICENSE.