Release v0.5.0 -- In-Context LLM Clustering + Uncertainty Scores · benseverndev-oss/goldenmatch

Phase 1 of the v1.0.0 Roadmap

In-Context LLM Clustering

Instead of asking the LLM "is A the same as B?" one pair at a time, GoldenMatch now sends blocks of 50-100 borderline records in a single prompt and asks the LLM to cluster them directly. The LLM sees all candidates at once and can make better group decisions.

llm_scorer:
  enabled: true
  mode: cluster          # new! (default: "pairwise" for legacy behavior)
  cluster_max_size: 100
  cluster_min_size: 5
  budget:
    max_cost_usd: 0.50

How it works:

Traditional pipeline scores all pairs as usual
Borderline pairs (0.75-0.95) are grouped into connected components
Each component is sent to the LLM as a single prompt
LLM returns cluster assignments with confidence scores
Results merge back into the pipeline seamlessly

Smart degradation: If a block is too large, it splits by removing weakest edges. If the LLM call fails, it falls back to pairwise scoring. If the budget runs out, it stops gracefully.

Uncertainty Scores

Every LLM cluster now carries a confidence score (0.0-1.0) from the LLM. Low-confidence clusters (< 0.7) are auto-flagged for human review.

Stats

875 tests passing (20 new, 0 regressions)
CI green on Python 3.11/3.12/3.13

Install / Upgrade

pip install --upgrade goldenmatch

What's Next

v0.6.0: Privacy-preserving record linkage (multi-party SMC)
v1.0.0: API freeze, production-stable release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.5.0 -- In-Context LLM Clustering + Uncertainty Scores

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Phase 1 of the v1.0.0 Roadmap

In-Context LLM Clustering

Uncertainty Scores

Stats

Install / Upgrade

What's Next

Uh oh!