Financial firms depend on reference data, the master records describing every security they trade. ISIN identifiers, currency codes, asset classes, expiry dates. When this data is wrong, trades fail, risk calculations break, and regulatory reports become inaccurate.
This project is a full reference data quality system built to simulate the validation infrastructure used by data operations teams at hedge funds. It ingests a securities dataset, runs automated validation across five data quality dimensions, enforces SLA rules against defined thresholds, scores vendor data quality, and surfaces everything in a live Streamlit dashboard.
data/securities.csv
|
v
db/schema.py Creates PostgreSQL database and tables
db/seed.py Loads securities data into PostgreSQL
|
v
validation/sql_checks.py 10 SQL validation queries
validation/schema_validation.py Pandera schema enforcement
validation/sla_engine.py SLA rules engine and vendor scorecard
|
v
dashboard/app.py Streamlit dashboard: metrics, failures, SLA, drill-down
| Check | Dimension | Severity | Result |
|---|---|---|---|
| Null ISIN values | Completeness | High | FAIL |
| Null CUSIP values | Completeness | High | FAIL |
| Duplicate ISIN values | Uniqueness | High | FAIL |
| Invalid ISIN format | Validity | High | FAIL |
| Negative price values | Validity | High | FAIL |
| Expired securities with active status | Consistency | Medium | FAIL |
| Currency country mismatch | Consistency | Medium | FAIL |
| Invalid status values | Validity | Medium | PASS |
| Null sector values | Completeness | Low | PASS |
| Zero price on active securities | Validity | Medium | PASS |
| Pandera: price_usd schema enforcement | Validity | High | FAIL |
| Rule | Threshold | Actual | Status |
|---|---|---|---|
| ISIN Completeness | 95% | 94.4% | FAIL |
| Zero Duplicate ISINs | 0 | 1 | FAIL |
| Zero Negative Prices | 0 | 1 | FAIL |
| All Active Securities Have Price | 0 missing | 0 missing | PASS |
| No Expired Securities Marked Active | 0 | 1 | FAIL |
| Vendor A ISIN Completeness | 95% | 100.0% | PASS |
| Vendor B ISIN Completeness | 95% | 88.9% | FAIL |
| Finding | Detail |
|---|---|
| Total securities | 18 records across equity and fixed income |
| Validation failures | 8 out of 11 checks failed |
| SLA breaches | 5 out of 7 rules breached |
| Critical issues | Null ISIN, duplicate ISIN, invalid ISIN format, negative price |
| Consistency violations | 1 expired security still marked Active, 1 currency/country mismatch |
| Vendor performance | Vendor A clean at 100% ISIN completeness, Vendor B failing at 88.9% |
Python | PostgreSQL | Pandas | Pandera | Streamlit | SQLAlchemy
-
Install dependencies:
pip install pandas pandera streamlit sqlalchemy psycopg2-binary -
Set database password:
export DB_PASSWORD=your_password_here -
Create database and tables:
python db/schema.py -
Load securities data:
python db/seed.py -
Run SQL validation checks:
python validation/sql_checks.py -
Run schema validation:
python validation/schema_validation.py -
Run SLA engine:
python validation/sla_engine.py -
Launch dashboard:
streamlit run dashboard/app.py