Skip to content

v1.3.0 — CCMS Cluster Comparison & Sensitivity Analysis

Choose a tag to compare

@benzsevern benzsevern released this 03 Apr 19:37
· 1607 commits to main since this release

What's New

CCMS Cluster Comparison

Compare two ER clustering outcomes without ground truth using the Case Count Metric System (Talburt et al., arXiv:2601.02824v1).

import goldenmatch as gm

result = gm.compare_clusters(clusters_a, clusters_b)
print(result.summary())
# {"unchanged": 42, "merged": 3, "partitioned": 5, "overlapping": 1, "twi": 0.92, ...}

CLI:

goldenmatch compare-clusters run_a.json run_b.json --details --case-type merged

Parameter Sensitivity Analysis

Sweep config parameters and see how clustering changes at each value:

from goldenmatch import run_sensitivity, SweepParam

results = run_sensitivity(
    file_specs=[("data.csv", "src")],
    config=cfg,
    sweep_params=[SweepParam("threshold", 0.70, 0.95, 0.05)],
    sample_size=5000,
)
for r in results:
    print(r.stability_report())

CLI:

goldenmatch sensitivity data.csv -c config.yaml --sweep threshold:0.70:0.95:0.05 --sample 5000

Details

  • 4 cluster transformation cases: unchanged, merged, partitioned, overlapping
  • Talburt-Wang Index (TWI) for normalized similarity measure
  • Per-point error handling — failed sweep points logged and skipped, partial results preserved
  • 16 new tests, 1260 total passing

Full Changelog: v1.2.7...v1.3.0