v0.2.0
Correlation family — 5 correlation types, 2 new data quality alerts.
Added
- 5 correlation types: Spearman, Kendall tau-b, Cramér's V, Phi K — with interactive tab-based HTML heatmaps
- Phi K displayed first as universal metric (all column types, single comparable scale)
- Kendall tau-b via scipy C merge-sort — O(n log n) replacing O(n²) polars-statistics
- Cramér's V via Polars
group_by+pivotcontingency tables +ps.cramers_v(Rust) - Phi K via Polars contingency tables +
ps.chisq_test(Rust) +phik_from_chi2with noise correction HIGH_CORRELATIONalert — scans Phi K matrix for pairs above threshold (default 0.8)UNIFORMalert — chi-squared goodness-of-fit test viaps.chisq_goodness_of_fit(Rust)- Extensible
Alert.detailsdict for structured metadata (column_b, method, p_value) - Alert HTML rendering: color-coded badges per type, HIGH_CORRELATION shows both columns
Changed
- Correlation threshold default: 0.9 → 0.8
- Pipeline order: correlations computed before alerts (alerts now receive correlation results)
Dependencies
- Added
scipy>=1.12(Kendall tau-b C implementation) - Added
phik>=0.12(Phi K scalar conversion)
PyPI: https://pypi.org/project/dataxid-profiling/0.2.0/
Full Changelog: v0.1.0...v0.2.0