Skip to content

KORA v0.2.0-alpha

Pre-release
Pre-release

Choose a tag to compare

@hkalbertkim hkalbertkim released this 05 May 21:23
· 497 commits to main since this release
0082f35

Summary

KORA v0.2.0-alpha expands the deterministic-heavy benchmark evidence path while keeping release claims bounded to reproducible simulated benchmark evidence.

This alpha release adds deterministic expected-output correctness checks, benchmark Markdown summary generation from result JSON artifacts, expanded correctness/error/fallback benchmark coverage, and a raw artifact freeze decision for this release.

Benchmark Evidence Expansion

Current deterministic-heavy benchmark evidence:

Metric Value
Workload experiments/workloads/deterministic_heavy_v1_100.json
Total tasks 100
Deterministic/no-model tasks 80
Fallback/model-candidate tasks 20
Direct-baseline simulated model invocations 100
KORA-controlled simulated model invocations 20
Avoided simulated model invocations 80
Avoided invocation rate 80%
Deterministic outputs checked 80
Mismatches 0
Fallback/model-candidate skipped 20

Safe claim:

In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.

Included Changes

  • Deterministic expected-output correctness checks in the benchmark runner.
  • Markdown benchmark summary generation from benchmark result JSON artifacts.
  • Expanded correctness/error/fallback benchmark test coverage.
  • Raw artifact freeze decision: raw benchmark JSON artifacts are not frozen or committed for this alpha release.
  • Reproducible regeneration path through the tracked workload, generator, benchmark runner, summary generator, and documentation.

Regeneration

See docs/reports/benchmark_artifact_policy.md for commands to regenerate the workload, benchmark result JSON files under /tmp, and the Markdown benchmark summary.

Non-Claims

This release does not claim:

  • production cost reduction proof
  • real API-cost reduction proof
  • production benchmark proof
  • full runtime-integrated benchmark evidence
  • broad workload superiority proof
  • energy reduction evidence

Release Notes

  • Pre-release: yes
  • Assets uploaded: none
  • Raw benchmark JSON artifacts uploaded: none