Skip to content

v1.25.0

Choose a tag to compare

@benzsevern benzsevern released this 01 Jun 13:49
1ccab9e

Arrow-native pipeline groundwork (roadmap Phases 0-6), single-node memory/wall optimizations, an optional native Rust kernel for the cluster oversized-split path, and several user-facing bug fixes.

All Arrow / native work is additive and opt-in: the default pure-Python + Polars pipeline is behavior-unchanged, and remains the byte-for-byte reference. The columnar entry points and goldenmatch._native kernels are exercised by the bench/profiler harnesses and wired in behind follow-up parity work.

Highlights:

  • Columnar pair-stream scorer entry points + native columnar inner loop for score_blocks_columnar; two-frame ClusterFrames representation; hash-by-cluster_id partitioner.
  • Native (Rust / Arrow-C) kernels in the optional goldenmatch._native extension: build_clusters, record_fingerprints, dedup_pairs, and a max-weight-spanning-tree oversized-split kernel.
  • Standardization native Polars chains (name_proper, address) and golden/bucket memory slimming (default ON): -2.6 GB and -3.8 GB peak at 10M rows.
  • Fix: GoldenCheck quality scan silently dropped every finding in the scan-only path (MCP scan_quality, A2A quality skill, web /api/v1/quality). Findings now serialize correctly.
  • Fix: prep-cache id() recycle flake.

Full detail: packages/python/goldenmatch/CHANGELOG.md (section 1.25.0). PRs #588-#650.

Note: 1.25.0 was published to PyPI via workflow_dispatch on 2026-06-01; this Release backfills the missing tag/Release object so release history stays consistent.