goldenmatch 1.29.0
goldenmatch 1.29.0
Probabilistic (Fellegi-Sunter) auto-config v2 -- default-on; GOLDENMATCH_FS_AUTOCONFIG_V2=0 restores the legacy field set byte-identically.
Under the shared bench_er_headtohead evaluator (pairwise F1), the probabilistic auto-config path now matches or beats Splink on every measured ER dataset:
| Dataset | goldenmatch v2 | Splink |
|---|---|---|
| historical_50k | 0.779 | 0.757 |
| febrl3 | 0.991 | 0.965 |
| synthetic_person | 0.998 | 0.996 |
| dblp_acm | 0.879 | (skips) |
Levers (probabilistic auto-config path only; the weighted and zero-config dedupe_df paths are untouched): admit dob/date columns as a levenshtein discriminator; drop redundant name composites when atomic given+family exist; additively diversify blocking onto orthogonal stable keys (date-year + postcode/zip); admit description/multi_name as token_sort.
Note: these are pairwise F1 under one shared evaluator; the often-cited ~0.97 Splink figure on historical_50k is a cluster-level metric, not within-cluster pairwise F1.
Wheel and sdist are cosign-keyless signed (sigstore bundles attached) with a build-provenance attestation.