Skip to content

v1.0.0

Choose a tag to compare

@fdosani fdosani released this 25 May 21:18
· 7 commits to main since this release
377c876

What's Changed

Full Changelog: v0.19.5...v1.0.0

Summary

This is the v1.0.0 general availability release of DataComPy, a major version bump from 0.19.x. It introduces a new comparator architecture, sensitive column handling, and several quality-of-life improvements across all backends.

Breaking Changes

  • Comparator subpackage (datacompy/comparator/): New strategy-pattern architecture for column comparison logic. Numeric, string, and array comparators are now backend-specific classes (PandasNumericComparator, SparkStringComparator, etc.) rather than inline logic. Custom comparators can be injected into any backend compare class.
  • validate_tolerance_parameter is now a public API (renamed from _validate_tolerance_parameter).
  • Fugue integration removed.
  • Python minimum is now 3.12 (CI/tooling); PySpark dependency split by Python version.
  • Snowflake: snowflake-snowpark-python minimum bumped to 1.37.

New Features

  • Sensitive column masking across all four backends (Pandas, Polars, Spark, Snowflake) — columns can be hidden or hashed without modifying the original DataFrames.
  • Custom comparators — pass a comparator instance per column to any backend for fully custom comparison logic.
  • cols_with_mismatches method — programmatic access to the set of mismatching column names.
  • cache_intermediates option for SparkSQLCompare — controls intermediate DataFrame caching to tune Spark job performance.
  • Pandas 3 support.
  • Per-column tolerance as a dict in addition to a global float.

Fixes

  • SparkSQLCompare: forbid case-sensitive join columns.
  • Snowflake: max_diff now correctly defaults to 0 when None.
  • Join columns None handling fixed.

Other

  • Jinja2 template-based report rendering.
  • Copyright year updated to 2026.
  • CLAUDE.md added.
  • Dependency ranges updated across all backends.