What's Changed
- Ingesting research plan gen by @delip in #5
- Distinguish error-induced verdicts from real verdicts (#4) by @delip in #6
- Route compute_metrics CANNOT_ASSESS handling through filter_cannot_assess by @delip in #7
- Report inter-judge agreement (Krippendorff's alpha + Fleiss' kappa) in compute_metrics by @delip in #8
- Metrics & scoring consistency audit: unified scoring core, first-class abstention, undefined→None metrics by @delip in #9
- Add HealthBench dataset ingestion (scripts, docs, paper) by @delip in #10
- Align agreement-metrics reporting with the reporting checklist by @delip in #11
Full Changelog: v1.0.1...v1.5.0