-
Notifications
You must be signed in to change notification settings - Fork 0
Data Quality
Giacomo Saccaggi edited this page Jun 19, 2026
·
1 revision
| Check | Description |
|---|---|
| Overview | Row/column count, memory usage, type distribution |
| Missing values | Count and percentage per column |
| Cardinality | Unique values, flags near-unique (possible IDs) and binary columns |
| Constants | Columns with only 1 unique value |
| Duplicates | Exact duplicate rows |
| High correlations | Pairs with correlation ≥ 0.95 |
| Type inference | Detects dates and numbers stored as strings |
from scomp_link import DataQualityReport
dqr = DataQualityReport(df)
report = dqr.generate()
# Access individual sections
print(report['overview'])
print(report['missing'])
print(report['constants'])
print(report['duplicates'])
print(report['correlations'])
print(report['cardinality'])
# Generate standalone HTML report
dqr.save_html('quality_report.html')scomp-link quality --data raw_data.csv --output quality_report.htmlThe generated HTML report includes:
- Summary cards (rows, columns, memory, duplicates, constants)
- Missing values table (sorted by percentage)
- Cardinality table with flags
- Type inference table
- High correlation pairs