Skip to content

Data Quality

Giacomo Saccaggi edited this page Jun 19, 2026 · 1 revision

Data Quality

What It Checks

Check Description
Overview Row/column count, memory usage, type distribution
Missing values Count and percentage per column
Cardinality Unique values, flags near-unique (possible IDs) and binary columns
Constants Columns with only 1 unique value
Duplicates Exact duplicate rows
High correlations Pairs with correlation ≥ 0.95
Type inference Detects dates and numbers stored as strings

Python API

from scomp_link import DataQualityReport

dqr = DataQualityReport(df)
report = dqr.generate()

# Access individual sections
print(report['overview'])
print(report['missing'])
print(report['constants'])
print(report['duplicates'])
print(report['correlations'])
print(report['cardinality'])

# Generate standalone HTML report
dqr.save_html('quality_report.html')

CLI

scomp-link quality --data raw_data.csv --output quality_report.html

HTML Report Contents

The generated HTML report includes:

  • Summary cards (rows, columns, memory, duplicates, constants)
  • Missing values table (sorted by percentage)
  • Cardinality table with flags
  • Type inference table
  • High correlation pairs

Clone this wiki locally