KORA v0.2.0-alpha
Pre-releaseSummary
KORA v0.2.0-alpha expands the deterministic-heavy benchmark evidence path while keeping release claims bounded to reproducible simulated benchmark evidence.
This alpha release adds deterministic expected-output correctness checks, benchmark Markdown summary generation from result JSON artifacts, expanded correctness/error/fallback benchmark coverage, and a raw artifact freeze decision for this release.
Benchmark Evidence Expansion
Current deterministic-heavy benchmark evidence:
| Metric | Value |
|---|---|
| Workload | experiments/workloads/deterministic_heavy_v1_100.json |
| Total tasks | 100 |
| Deterministic/no-model tasks | 80 |
| Fallback/model-candidate tasks | 20 |
| Direct-baseline simulated model invocations | 100 |
| KORA-controlled simulated model invocations | 20 |
| Avoided simulated model invocations | 80 |
| Avoided invocation rate | 80% |
| Deterministic outputs checked | 80 |
| Mismatches | 0 |
| Fallback/model-candidate skipped | 20 |
Safe claim:
In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.
Included Changes
- Deterministic expected-output correctness checks in the benchmark runner.
- Markdown benchmark summary generation from benchmark result JSON artifacts.
- Expanded correctness/error/fallback benchmark test coverage.
- Raw artifact freeze decision: raw benchmark JSON artifacts are not frozen or committed for this alpha release.
- Reproducible regeneration path through the tracked workload, generator, benchmark runner, summary generator, and documentation.
Regeneration
See docs/reports/benchmark_artifact_policy.md for commands to regenerate the workload, benchmark result JSON files under /tmp, and the Markdown benchmark summary.
Non-Claims
This release does not claim:
- production cost reduction proof
- real API-cost reduction proof
- production benchmark proof
- full runtime-integrated benchmark evidence
- broad workload superiority proof
- energy reduction evidence
Release Notes
- Pre-release: yes
- Assets uploaded: none
- Raw benchmark JSON artifacts uploaded: none