goldenmatch v1.13.0
goldenmatch v1.13.0
Release plumbing wave. No algorithm changes - DQbench / Febrl3 / NCVR / DBLP-ACM numbers unchanged from v1.12.0.
Added
- Typed accessor API on
MatchkeyConfig/MatchkeyField(PR #151). New properties:MatchkeyConfig.fuzzy_threshold,MatchkeyField.fuzzy_scorer,MatchkeyField.fuzzy_weight,MatchkeyField.resolved_field. Each raisesValueErrorwhen the matchkey is not a fuzzy/weighted type, so the invariant is now enforceable in pyright strict.
from goldenmatch.config.schemas import MatchkeyConfig, MatchkeyField
mk = MatchkeyConfig(
name="identity",
type="weighted",
threshold=0.85,
fields=[MatchkeyField(field="name", transforms=["lowercase"], scorer="jaro_winkler", weight=1.0)],
)
assert mk.fuzzy_threshold == 0.85 # safe on weighted matchkey
# mk.fuzzy_threshold on an exact matchkey raises ValueErrordocs/scale-envelope.md(PR #149): Polars / DuckDB / Ray operating ranges plus block-size failure modes.- Postgres CI lane (PR #144): flipped from skipped to live.
Changed
- PyPI metadata corrected (PR #148):
[project.urls]Homepage / Repository / Documentation now point at the monorepo. This release is what makes the refresh land on PyPI.
Fixed
- Reproducibility of all four published benchmark numbers (PR #152, replaces #150): DQbench composite 91.04, DBLP-ACM 0.9641, Febrl3 0.9443, NCVR 0.9719 all reproduce from a fresh clone. See
docs/reproducing-benchmarks.md.
Internal (contributors only)
- Ruff lint expanded to F / I / B-narrowed / UP rule sets across
packages/python/(PR #146). - Pyright strict on the 21-file core slice of
goldenmatch(PR #147). Typed accessors in PR #151 eliminated 7 type-suppression workarounds.
Benchmarks (zero-config, no LLM)
Unchanged vs v1.12.0 - algorithm not touched this wave.
| Dataset | v1.12.0 | v1.13.0 | Delta |
|---|---|---|---|
| DBLP-ACM | 0.9641 | 0.9641 | +0.0000 |
| Febrl3 | 0.9443 | 0.9443 | +0.0000 |
| NCVR | 0.9719 | 0.9719 | +0.0000 |
| DQbench composite | 91.04 | 91.04 | +0.00 |