Releases: Krako-Labs/KORA
v0.3.0-alpha
KORA v0.3.0-alpha
This prerelease adds KORA's initial runtime-path benchmark evidence flow.
Highlights
- Initial runtime-path benchmark harness:
python3 -m kora run runtime_integrated_benchmark -- --offline
- Runtime benchmark JSON output path:
python3 -m kora run runtime_integrated_benchmark -- --offline --json-out /tmp/kora_runtime_integrated_benchmark.json
- Markdown evidence packet/report generator:
python3 examples/runtime_integrated_benchmark/report.py --input /tmp/kora_runtime_integrated_benchmark.json --md-out /tmp/kora_runtime_integrated_benchmark.md
- Telemetry-connected summary path:
python3 -m kora telemetry --input /tmp/kora_runtime_integrated_benchmark.json --json-out /tmp/kora_runtime_integrated_benchmark.telemetry.json --md-out /tmp/kora_runtime_integrated_benchmark.telemetry.md
- Reviewer-facing reproduction guide.
- Release-readiness checklist.
- Docs cross-link audit and claim-boundary review.
- Release validation packet and approval checkpoint.
Current bounded benchmark claim
In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.
Expected counters
total_tasks: 100deterministic_route_count: 80fallback_or_model_candidate_route_count: 20simulated_baseline_model_invocations: 100kora_controlled_model_invocations: 20avoided_simulated_model_invocations: 80avoided_simulated_model_invocation_rate: 0.8deterministic_outputs_checked: 80mismatch_count: 0runtime_path_execution_status: oktelemetry_event_count: 100
Expected telemetry counters
total_llm_calls: 20events_ok: 100events_fail: 0events_skipped: 0stage_counts: ADAPTER 20 / DETERMINISTIC 80
Non-claims
This prerelease does not claim:
- production cost reduction proof
- real API-cost reduction proof
- production benchmark proof
- full runtime-integrated benchmark evidence
- broad workload superiority proof
- energy reduction evidence
- formal government validation
- signed partner validation
- guaranteed adoption or funding
Artifact policy
No release assets are uploaded for this prerelease. Raw generated benchmark JSON/Markdown artifacts are not uploaded. Generated outputs should be reproduced locally in /tmp or another user-provided output path.
KORA v0.2.0-alpha
Summary
KORA v0.2.0-alpha expands the deterministic-heavy benchmark evidence path while keeping release claims bounded to reproducible simulated benchmark evidence.
This alpha release adds deterministic expected-output correctness checks, benchmark Markdown summary generation from result JSON artifacts, expanded correctness/error/fallback benchmark coverage, and a raw artifact freeze decision for this release.
Benchmark Evidence Expansion
Current deterministic-heavy benchmark evidence:
| Metric | Value |
|---|---|
| Workload | experiments/workloads/deterministic_heavy_v1_100.json |
| Total tasks | 100 |
| Deterministic/no-model tasks | 80 |
| Fallback/model-candidate tasks | 20 |
| Direct-baseline simulated model invocations | 100 |
| KORA-controlled simulated model invocations | 20 |
| Avoided simulated model invocations | 80 |
| Avoided invocation rate | 80% |
| Deterministic outputs checked | 80 |
| Mismatches | 0 |
| Fallback/model-candidate skipped | 20 |
Safe claim:
In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.
Included Changes
- Deterministic expected-output correctness checks in the benchmark runner.
- Markdown benchmark summary generation from benchmark result JSON artifacts.
- Expanded correctness/error/fallback benchmark test coverage.
- Raw artifact freeze decision: raw benchmark JSON artifacts are not frozen or committed for this alpha release.
- Reproducible regeneration path through the tracked workload, generator, benchmark runner, summary generator, and documentation.
Regeneration
See docs/reports/benchmark_artifact_policy.md for commands to regenerate the workload, benchmark result JSON files under /tmp, and the Markdown benchmark summary.
Non-Claims
This release does not claim:
- production cost reduction proof
- real API-cost reduction proof
- production benchmark proof
- full runtime-integrated benchmark evidence
- broad workload superiority proof
- energy reduction evidence
Release Notes
- Pre-release: yes
- Assets uploaded: none
- Raw benchmark JSON artifacts uploaded: none
KORA v0.1.1-alpha
KORA v0.1.1-alpha
Summary
KORA v0.1.1-alpha adds CI-backed validation and the first controlled benchmark skeleton. In a deterministic-heavy benchmark skeleton, KORA-controlled execution avoided 16 of 20 simulated model invocations versus a naive direct baseline.
This is an alpha maintenance / benchmark skeleton release. It keeps the v0.1.0-alpha terminal-first surface intact while adding a reproducible CI and benchmark path for future technical preview work.
What Changed Since v0.1.0-alpha
- Added GitHub Actions CI for editable install, release smoke, and pytest validation.
- Added the
experiments/benchmark experiment directory. - Added the deterministic-heavy workload draft at
experiments/workloads/deterministic_heavy_v0.json. - Added dry-run benchmark mode for workload validation and task/category counting.
- Added direct-baseline benchmark mode to simulate one model invocation per task.
- Added KORA-controlled benchmark mode to simulate deterministic-first execution from workload metadata.
- Added the 2026-05-04 progress report.
- Added the v0.1.1-alpha release note.
CI-Backed Validation
GitHub Actions now validates changes to main with:
python3 -m pip install -e ".[dev]"
./scripts/release_smoke.sh
python3 -m pytest -qThe release candidate was also verified locally with release smoke, benchmark modes, JSON validation, and pytest.
Benchmark Skeleton
This release introduces the initial benchmark structure:
experiments/
experiments/README.md
experiments/run_benchmark.py
experiments/workloads/deterministic_heavy_v0.json
experiments/results/.gitkeep
The benchmark runner currently supports:
dry-rundirect-baselinekora-controlled
These modes are simulated and metadata-based. They do not call real models or external APIs.
Current Controlled Benchmark Result
| Metric | Value |
|---|---|
| Total tasks | 20 |
| Direct-baseline simulated model invocations | 20 |
| KORA-controlled simulated model invocations | 4 |
| Deterministic resolutions | 16 |
| Fallback candidates | 4 |
| Avoided simulated model invocations | 16 |
| Avoided invocation rate | 80% |
How To Verify Locally
./scripts/release_smoke.sh
python3 -m pytest -q
python3 experiments/run_benchmark.py --mode dry-run --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.dry_run.json
python3 experiments/run_benchmark.py --mode direct-baseline --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.direct_baseline.json
python3 experiments/run_benchmark.py --mode kora-controlled --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.kora_controlled.jsonOptional JSON checks:
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.dry_run.json > /tmp/kora_dry_run_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.direct_baseline.json > /tmp/kora_direct_baseline_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.kora_controlled.json > /tmp/kora_kora_controlled_check.jsonLimitations
- The benchmark is simulated.
- No real model calls are made.
- No external API calls are made.
- No real API-cost measurement is included.
- No production cost reduction claim is supported.
- No full KORA runtime integration is implemented yet.
- The workload is intentionally small and deterministic-heavy.
Next Planned Work
- Decide whether benchmark result artifacts should remain generated-only or be committed under a tracked path.
- Add a concise benchmark result summary markdown if the result should be reviewed publicly.
- Extend the benchmark runner toward real KORA runtime integration in a later alpha.
- Continue expanding deterministic-heavy workloads beyond the initial 20-task draft.