goldenmatch 1.30.0
goldenmatch 1.30.0
New since 1.29.0:
-
Native PPRL bloom CLK kernel (opt-in, default off). New
goldenmatch-native
symbolbloom_clk_batch(rayon + GIL-release, 256-bit Cryptographic Longterm
Key encoding) accelerates the PPRLbloom_filtertransform. Reachable via
GOLDENMATCH_NATIVE=1; pure-Python stays the reproducible default and the
graceful fallback when the symbol is absent. Needsgoldenmatch-native0.1.5
(released separately). (#826) -
Probabilistic EM training-pair sampling is now deterministic (#829).
_sample_blocked_pairsseeded-shuffled bare block indices whose order was
itself non-deterministic (parallel / hash-bucketed construction), so the EM
training sample (and thus the m/u weights, threshold, and precision/recall)
varied run-to-run. On one CI run, three invocations of the identical
probabilistic path gave historical_50k pairwise F1 of 0.805 / 0.779 / 0.643.
The fix sorts blocks by their stableblock_keybefore the seeded shuffle;
post-fix the three bench harnesses agree within 0.002. The committed Splink
head-to-head and bake-off numbers are now deterministic (see
docs/benchmarks/2026-06-09-splink-bakeoff.md). The previously published
dblp_acm = 0.879 was a non-deterministic lucky draw; the reproducible value
is 0.377 -- use the weighted path for bibliographic data (0.964 on DBLP-ACM).
Full changelog: packages/python/goldenmatch/CHANGELOG.md