An open-source EDM-style reference data platform simulator for Azure. It gives developers, data engineers, and architects a runnable reference for common enterprise data management capabilities: ingestion, staging, validation, survivorship, golden source, audit, lineage, monitoring, and downstream distribution.
This is not Markit EDM and does not include proprietary vendor code, SDKs, schemas, or real bank data. It is a generic, educational, production-inspired simulator built with synthetic data.
- Data engineers learning reference data platform patterns.
- SQL Server developers modernizing batch-oriented data platforms.
- Azure architects designing ADF, Azure SQL, ADLS, Key Vault, and monitoring patterns.
- Teams that need a non-proprietary sandbox for validation, survivorship, audit, and lineage discussions.
- Validate fake vendor securities, prices, and ratings files.
- Generate markdown data quality summaries.
- Load CSV files into SQL Server staging tables.
- Study Azure SQL-compatible staging, core, audit, lineage, and golden source scripts.
- Review ADF sample pipeline JSON for ingestion and distribution.
- Use Bicep templates as a starting point for Azure resource design.
flowchart LR
VendorFiles[Fake vendor CSV files] --> Landing[ADLS-style landing zone]
Landing --> ADF[Azure Data Factory pipelines]
ADF --> Staging[SQL staging tables]
Staging --> Validation[Validation rules]
Validation --> Audit[Audit and DQ issue tables]
Validation --> Survivorship[Survivorship rules]
Survivorship --> Core[Core master tables]
Core --> Golden[Golden source views]
Golden --> Distribution[Downstream distribution extracts]
ADF --> Lineage[Lineage events]
KeyVault[Azure Key Vault] --> ADF
Monitor[Azure Monitor] --> ADF
git clone https://github.com/corzosoft/azure-edm-reference-data-platform.git
cd azure-edm-reference-data-platform
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
python -m pytest
python -m ruff check .On macOS/Linux, activate the environment with source .venv/bin/activate.
edm-ref validate-file sample-data/vendor_a_securities.csv
edm-ref validate-file sample-data/vendor_b_prices.csv --file-type price
edm-ref quality-report sample-data/vendor_a_securities.csv sample-data/vendor_b_prices.csv
edm-ref generate-lineage-report --batch-id demo-001 --records 4Expected quality report shape:
# Data Quality Summary
| File | Type | Errors | Warnings | Rule Counts |
| --- | --- | ---: | ---: | --- |
| sample-data/vendor_b_prices.csv | price | 0 | 1 | STALE_PRICE=1 |
docker compose up -dSQL Server starts on localhost,1433 with user sa and password YourStrong!Passw0rd.
Apply scripts from sql/ in numeric order using Azure Data Studio, SQL Server Management Studio, or sqlcmd.
To load staging tables from the CLI, install the optional SQL Server dependency and Microsoft ODBC Driver for SQL Server:
python -m pip install -e ".[mssql]"
edm-ref load-staging sample-data/vendor_a_securities.csv| Concept | Where To Look |
|---|---|
| Staging model | sql/002_create_staging_tables.sql |
| Core master model | sql/003_create_core_tables.sql |
| Audit and lineage | sql/004_create_audit_lineage_tables.sql |
| Validation rules | sql/005_validation_rules.sql, src/edm_reference_platform/file_validator.py |
| Survivorship | sql/006_survivorship_rules.sql, src/edm_reference_platform/reconciliation.py |
| Golden source views | sql/007_golden_source_views.sql |
| ADF samples | adf/pipelines/ |
| Azure infrastructure | infra/bicep/ |
The Bicep templates are reference templates for:
- ADLS Gen2-style storage account.
- Azure SQL logical server and database.
- Azure Key Vault secret pattern.
- Log Analytics and Application Insights.
They are intentionally simple so teams can adapt them to their own network, identity, private endpoint, policy, and naming standards.
This project is suitable as a learning tool, architecture accelerator, and local simulator. Before production use, add:
- Enterprise authentication and managed identity.
- Private networking and Key Vault-backed secrets.
- Production-grade orchestration and retry policy.
- Data contracts, schema evolution, and source file reconciliation.
- Formal access controls, PII controls, and audit retention.
- Performance testing with representative volumes.
- Add dbt-style documentation for SQL entities.
- Add optional Azure SQL deployment scripts.
- Add more asset classes and corporate action examples.
- Add sample data quality dashboard output.
- Add containerized SQL script bootstrap.
Contributions are welcome. See CONTRIBUTING.md.
Use fake or synthetic data only. Do not submit proprietary vendor code, real bank data, or confidential schemas.