Skip to content

Releases: Krako-Labs/KORA

v0.3.0-alpha

07 May 18:05
078de07

Choose a tag to compare

v0.3.0-alpha Pre-release
Pre-release

KORA v0.3.0-alpha

This prerelease adds KORA's initial runtime-path benchmark evidence flow.

Highlights

  • Initial runtime-path benchmark harness:
    • python3 -m kora run runtime_integrated_benchmark -- --offline
  • Runtime benchmark JSON output path:
    • python3 -m kora run runtime_integrated_benchmark -- --offline --json-out /tmp/kora_runtime_integrated_benchmark.json
  • Markdown evidence packet/report generator:
    • python3 examples/runtime_integrated_benchmark/report.py --input /tmp/kora_runtime_integrated_benchmark.json --md-out /tmp/kora_runtime_integrated_benchmark.md
  • Telemetry-connected summary path:
    • python3 -m kora telemetry --input /tmp/kora_runtime_integrated_benchmark.json --json-out /tmp/kora_runtime_integrated_benchmark.telemetry.json --md-out /tmp/kora_runtime_integrated_benchmark.telemetry.md
  • Reviewer-facing reproduction guide.
  • Release-readiness checklist.
  • Docs cross-link audit and claim-boundary review.
  • Release validation packet and approval checkpoint.

Current bounded benchmark claim

In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.

Expected counters

  • total_tasks: 100
  • deterministic_route_count: 80
  • fallback_or_model_candidate_route_count: 20
  • simulated_baseline_model_invocations: 100
  • kora_controlled_model_invocations: 20
  • avoided_simulated_model_invocations: 80
  • avoided_simulated_model_invocation_rate: 0.8
  • deterministic_outputs_checked: 80
  • mismatch_count: 0
  • runtime_path_execution_status: ok
  • telemetry_event_count: 100

Expected telemetry counters

  • total_llm_calls: 20
  • events_ok: 100
  • events_fail: 0
  • events_skipped: 0
  • stage_counts: ADAPTER 20 / DETERMINISTIC 80

Non-claims

This prerelease does not claim:

  • production cost reduction proof
  • real API-cost reduction proof
  • production benchmark proof
  • full runtime-integrated benchmark evidence
  • broad workload superiority proof
  • energy reduction evidence
  • formal government validation
  • signed partner validation
  • guaranteed adoption or funding

Artifact policy

No release assets are uploaded for this prerelease. Raw generated benchmark JSON/Markdown artifacts are not uploaded. Generated outputs should be reproduced locally in /tmp or another user-provided output path.

KORA v0.2.0-alpha

05 May 21:23
0082f35

Choose a tag to compare

KORA v0.2.0-alpha Pre-release
Pre-release

Summary

KORA v0.2.0-alpha expands the deterministic-heavy benchmark evidence path while keeping release claims bounded to reproducible simulated benchmark evidence.

This alpha release adds deterministic expected-output correctness checks, benchmark Markdown summary generation from result JSON artifacts, expanded correctness/error/fallback benchmark coverage, and a raw artifact freeze decision for this release.

Benchmark Evidence Expansion

Current deterministic-heavy benchmark evidence:

Metric Value
Workload experiments/workloads/deterministic_heavy_v1_100.json
Total tasks 100
Deterministic/no-model tasks 80
Fallback/model-candidate tasks 20
Direct-baseline simulated model invocations 100
KORA-controlled simulated model invocations 20
Avoided simulated model invocations 80
Avoided invocation rate 80%
Deterministic outputs checked 80
Mismatches 0
Fallback/model-candidate skipped 20

Safe claim:

In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.

Included Changes

  • Deterministic expected-output correctness checks in the benchmark runner.
  • Markdown benchmark summary generation from benchmark result JSON artifacts.
  • Expanded correctness/error/fallback benchmark test coverage.
  • Raw artifact freeze decision: raw benchmark JSON artifacts are not frozen or committed for this alpha release.
  • Reproducible regeneration path through the tracked workload, generator, benchmark runner, summary generator, and documentation.

Regeneration

See docs/reports/benchmark_artifact_policy.md for commands to regenerate the workload, benchmark result JSON files under /tmp, and the Markdown benchmark summary.

Non-Claims

This release does not claim:

  • production cost reduction proof
  • real API-cost reduction proof
  • production benchmark proof
  • full runtime-integrated benchmark evidence
  • broad workload superiority proof
  • energy reduction evidence

Release Notes

  • Pre-release: yes
  • Assets uploaded: none
  • Raw benchmark JSON artifacts uploaded: none

KORA v0.1.1-alpha

03 May 19:57
0f4c761

Choose a tag to compare

KORA v0.1.1-alpha

Summary

KORA v0.1.1-alpha adds CI-backed validation and the first controlled benchmark skeleton. In a deterministic-heavy benchmark skeleton, KORA-controlled execution avoided 16 of 20 simulated model invocations versus a naive direct baseline.

This is an alpha maintenance / benchmark skeleton release. It keeps the v0.1.0-alpha terminal-first surface intact while adding a reproducible CI and benchmark path for future technical preview work.

What Changed Since v0.1.0-alpha

  • Added GitHub Actions CI for editable install, release smoke, and pytest validation.
  • Added the experiments/ benchmark experiment directory.
  • Added the deterministic-heavy workload draft at experiments/workloads/deterministic_heavy_v0.json.
  • Added dry-run benchmark mode for workload validation and task/category counting.
  • Added direct-baseline benchmark mode to simulate one model invocation per task.
  • Added KORA-controlled benchmark mode to simulate deterministic-first execution from workload metadata.
  • Added the 2026-05-04 progress report.
  • Added the v0.1.1-alpha release note.

CI-Backed Validation

GitHub Actions now validates changes to main with:

python3 -m pip install -e ".[dev]"
./scripts/release_smoke.sh
python3 -m pytest -q

The release candidate was also verified locally with release smoke, benchmark modes, JSON validation, and pytest.

Benchmark Skeleton

This release introduces the initial benchmark structure:

experiments/
experiments/README.md
experiments/run_benchmark.py
experiments/workloads/deterministic_heavy_v0.json
experiments/results/.gitkeep

The benchmark runner currently supports:

  • dry-run
  • direct-baseline
  • kora-controlled

These modes are simulated and metadata-based. They do not call real models or external APIs.

Current Controlled Benchmark Result

Metric Value
Total tasks 20
Direct-baseline simulated model invocations 20
KORA-controlled simulated model invocations 4
Deterministic resolutions 16
Fallback candidates 4
Avoided simulated model invocations 16
Avoided invocation rate 80%

How To Verify Locally

./scripts/release_smoke.sh
python3 -m pytest -q
python3 experiments/run_benchmark.py --mode dry-run --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.dry_run.json
python3 experiments/run_benchmark.py --mode direct-baseline --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.direct_baseline.json
python3 experiments/run_benchmark.py --mode kora-controlled --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.kora_controlled.json

Optional JSON checks:

python3 -m json.tool /tmp/kora_deterministic_heavy_v0.dry_run.json > /tmp/kora_dry_run_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.direct_baseline.json > /tmp/kora_direct_baseline_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.kora_controlled.json > /tmp/kora_kora_controlled_check.json

Limitations

  • The benchmark is simulated.
  • No real model calls are made.
  • No external API calls are made.
  • No real API-cost measurement is included.
  • No production cost reduction claim is supported.
  • No full KORA runtime integration is implemented yet.
  • The workload is intentionally small and deterministic-heavy.

Next Planned Work

  • Decide whether benchmark result artifacts should remain generated-only or be committed under a tracked path.
  • Add a concise benchmark result summary markdown if the result should be reviewed publicly.
  • Extend the benchmark runner toward real KORA runtime integration in a later alpha.
  • Continue expanding deterministic-heavy workloads beyond the initial 20-task draft.