Skip to content

v1.5.0

Latest

Choose a tag to compare

@delip delip released this 30 May 21:50
· 4 commits to main since this release

What's Changed

  • Ingesting research plan gen by @delip in #5
  • Distinguish error-induced verdicts from real verdicts (#4) by @delip in #6
  • Route compute_metrics CANNOT_ASSESS handling through filter_cannot_assess by @delip in #7
  • Report inter-judge agreement (Krippendorff's alpha + Fleiss' kappa) in compute_metrics by @delip in #8
  • Metrics & scoring consistency audit: unified scoring core, first-class abstention, undefined→None metrics by @delip in #9
  • Add HealthBench dataset ingestion (scripts, docs, paper) by @delip in #10
  • Align agreement-metrics reporting with the reporting checklist by @delip in #11

Full Changelog: v1.0.1...v1.5.0