Skip to content

v0.3.0

Choose a tag to compare

@adewale adewale released this 10 Jun 23:53
· 17 commits to main since this release
04610f6

v0.3.0

Adds three eval-quality features:

  • script objective assertions for deterministic repo-owned oracle commands, gated by explicit --allow-scripts.
  • Prompt/assertion leakage lint in validate and audit-manifest; use --strict-leakage to fail validation.
  • skill-benchmark judge --judge-cmd ... for pluggable judge backends that emit judge-results.jsonl compatible with existing benchmark merging.

Also includes tests and README updates.