v0.3.1 -- Domain Packs, Evaluate CLI, Incremental Matching, GitHub Actions Try-It
What's New
Domain Packs (7 built-in)
Pre-built YAML rulebooks for instant domain-specific entity resolution:
- Electronics -- model numbers, SKUs, specs (36 brands)
- Software -- versions, editions, platforms (23 brands)
- Healthcare -- NDC, NPI, ICD-10, pharma brands (20 brands)
- Financial -- CUSIP, ISIN, LEI, institutions (20 brands)
- Real Estate -- ZIP, APN, MLS, property attributes (10 brokerages)
- People -- SSN, DOB, phone, email patterns
- Retail -- UPC, EAN, GTIN, CPG brands (20 brands)
Custom packs: drop a YAML file in .goldenmatch/domains/ and it's auto-discovered.
New CLI Commands
goldenmatch evaluate-- measure precision/recall/F1 against ground truth CSVgoldenmatch incremental-- match new CSV records against an existing base dataset without re-running the full pipeline
GitHub Actions "Try It"
Zero-install demo: paste a CSV URL into the workflow_dispatch form, get deduplication results as a downloadable artifact. No setup required.
Codespaces
One-click dev environment via .devcontainer. Open a Codespace, start coding immediately.
dbt Integration
New dbt-goldenmatch package for DuckDB-based entity resolution in dbt pipelines.
Community
- GitHub Discussions enabled with seed posts
- Bug report and feature request issue templates
- Contributing guide, Code of Conduct, Security policy
- Download count badge on README
Stats
- 855 tests passing (+ 6 skipped)
- 19 CLI commands
- 268 PyPI downloads in first 3 days
Install / Upgrade
pip install --upgrade goldenmatch