Skip to content

v0.2.0

Choose a tag to compare

@BurakUlver BurakUlver released this 27 Mar 16:10
· 1 commit to main since this release
41175d7

Correlation family — 5 correlation types, 2 new data quality alerts.

Added

  • 5 correlation types: Spearman, Kendall tau-b, Cramér's V, Phi K — with interactive tab-based HTML heatmaps
  • Phi K displayed first as universal metric (all column types, single comparable scale)
  • Kendall tau-b via scipy C merge-sort — O(n log n) replacing O(n²) polars-statistics
  • Cramér's V via Polars group_by + pivot contingency tables + ps.cramers_v (Rust)
  • Phi K via Polars contingency tables + ps.chisq_test (Rust) + phik_from_chi2 with noise correction
  • HIGH_CORRELATION alert — scans Phi K matrix for pairs above threshold (default 0.8)
  • UNIFORM alert — chi-squared goodness-of-fit test via ps.chisq_goodness_of_fit (Rust)
  • Extensible Alert.details dict for structured metadata (column_b, method, p_value)
  • Alert HTML rendering: color-coded badges per type, HIGH_CORRELATION shows both columns

Changed

  • Correlation threshold default: 0.9 → 0.8
  • Pipeline order: correlations computed before alerts (alerts now receive correlation results)

Dependencies

  • Added scipy>=1.12 (Kendall tau-b C implementation)
  • Added phik>=0.12 (Phi K scalar conversion)

PyPI: https://pypi.org/project/dataxid-profiling/0.2.0/
Full Changelog: v0.1.0...v0.2.0