07 May 18:05

hkalbertkim

078de07

v0.3.0-alpha Pre-release

Pre-release

KORA v0.3.0-alpha

This prerelease adds KORA's initial runtime-path benchmark evidence flow.

Highlights

Initial runtime-path benchmark harness:
- python3 -m kora run runtime_integrated_benchmark -- --offline
Runtime benchmark JSON output path:
- python3 -m kora run runtime_integrated_benchmark -- --offline --json-out /tmp/kora_runtime_integrated_benchmark.json
Markdown evidence packet/report generator:
- python3 examples/runtime_integrated_benchmark/report.py --input /tmp/kora_runtime_integrated_benchmark.json --md-out /tmp/kora_runtime_integrated_benchmark.md
Telemetry-connected summary path:
- python3 -m kora telemetry --input /tmp/kora_runtime_integrated_benchmark.json --json-out /tmp/kora_runtime_integrated_benchmark.telemetry.json --md-out /tmp/kora_runtime_integrated_benchmark.telemetry.md
Reviewer-facing reproduction guide.
Release-readiness checklist.
Docs cross-link audit and claim-boundary review.
Release validation packet and approval checkpoint.

Current bounded benchmark claim

In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.

Expected counters

total_tasks: 100
deterministic_route_count: 80
fallback_or_model_candidate_route_count: 20
simulated_baseline_model_invocations: 100
kora_controlled_model_invocations: 20
avoided_simulated_model_invocations: 80
avoided_simulated_model_invocation_rate: 0.8
deterministic_outputs_checked: 80
mismatch_count: 0
runtime_path_execution_status: ok
telemetry_event_count: 100

Expected telemetry counters

total_llm_calls: 20
events_ok: 100
events_fail: 0
events_skipped: 0
stage_counts: ADAPTER 20 / DETERMINISTIC 80

Non-claims

This prerelease does not claim:

production cost reduction proof
real API-cost reduction proof
production benchmark proof
full runtime-integrated benchmark evidence
broad workload superiority proof
energy reduction evidence
formal government validation
signed partner validation
guaranteed adoption or funding

Artifact policy

No release assets are uploaded for this prerelease. Raw generated benchmark JSON/Markdown artifacts are not uploaded. Generated outputs should be reproduced locally in /tmp or another user-provided output path.

Assets 2

05 May 21:23

hkalbertkim

v0.2.0-alpha

0082f35

KORA v0.2.0-alpha Pre-release

Pre-release

Summary

KORA v0.2.0-alpha expands the deterministic-heavy benchmark evidence path while keeping release claims bounded to reproducible simulated benchmark evidence.

This alpha release adds deterministic expected-output correctness checks, benchmark Markdown summary generation from result JSON artifacts, expanded correctness/error/fallback benchmark coverage, and a raw artifact freeze decision for this release.

Benchmark Evidence Expansion

Current deterministic-heavy benchmark evidence:

Metric	Value
Workload	`experiments/workloads/deterministic_heavy_v1_100.json`
Total tasks	`100`
Deterministic/no-model tasks	`80`
Fallback/model-candidate tasks	`20`
Direct-baseline simulated model invocations	`100`
KORA-controlled simulated model invocations	`20`
Avoided simulated model invocations	`80`
Avoided invocation rate	`80%`
Deterministic outputs checked	`80`
Mismatches	`0`
Fallback/model-candidate skipped	`20`

Safe claim:

In a reproducible 100-task deterministic-heavy benchmark workload, KORA-controlled execution avoided 80 of 100 simulated model invocations versus a naive direct baseline.

Included Changes

Deterministic expected-output correctness checks in the benchmark runner.
Markdown benchmark summary generation from benchmark result JSON artifacts.
Expanded correctness/error/fallback benchmark test coverage.
Raw artifact freeze decision: raw benchmark JSON artifacts are not frozen or committed for this alpha release.
Reproducible regeneration path through the tracked workload, generator, benchmark runner, summary generator, and documentation.

Regeneration

See docs/reports/benchmark_artifact_policy.md for commands to regenerate the workload, benchmark result JSON files under /tmp, and the Markdown benchmark summary.

Non-Claims

This release does not claim:

production cost reduction proof
real API-cost reduction proof
production benchmark proof
full runtime-integrated benchmark evidence
broad workload superiority proof
energy reduction evidence

Release Notes

Pre-release: yes
Assets uploaded: none
Raw benchmark JSON artifacts uploaded: none

Assets 2

03 May 19:57

hkalbertkim

v0.1.1-alpha

0f4c761

KORA v0.1.1-alpha Latest

Latest

KORA v0.1.1-alpha

Summary

KORA v0.1.1-alpha adds CI-backed validation and the first controlled benchmark skeleton. In a deterministic-heavy benchmark skeleton, KORA-controlled execution avoided 16 of 20 simulated model invocations versus a naive direct baseline.

This is an alpha maintenance / benchmark skeleton release. It keeps the v0.1.0-alpha terminal-first surface intact while adding a reproducible CI and benchmark path for future technical preview work.

What Changed Since v0.1.0-alpha

Added GitHub Actions CI for editable install, release smoke, and pytest validation.
Added the experiments/ benchmark experiment directory.
Added the deterministic-heavy workload draft at experiments/workloads/deterministic_heavy_v0.json.
Added dry-run benchmark mode for workload validation and task/category counting.
Added direct-baseline benchmark mode to simulate one model invocation per task.
Added KORA-controlled benchmark mode to simulate deterministic-first execution from workload metadata.
Added the 2026-05-04 progress report.
Added the v0.1.1-alpha release note.

CI-Backed Validation

GitHub Actions now validates changes to main with:

python3 -m pip install -e ".[dev]"
./scripts/release_smoke.sh
python3 -m pytest -q

The release candidate was also verified locally with release smoke, benchmark modes, JSON validation, and pytest.

Benchmark Skeleton

This release introduces the initial benchmark structure:

experiments/
experiments/README.md
experiments/run_benchmark.py
experiments/workloads/deterministic_heavy_v0.json
experiments/results/.gitkeep

The benchmark runner currently supports:

dry-run
direct-baseline
kora-controlled

These modes are simulated and metadata-based. They do not call real models or external APIs.

Current Controlled Benchmark Result

Metric	Value
Total tasks	20
Direct-baseline simulated model invocations	20
KORA-controlled simulated model invocations	4
Deterministic resolutions	16
Fallback candidates	4
Avoided simulated model invocations	16
Avoided invocation rate	80%

How To Verify Locally

./scripts/release_smoke.sh
python3 -m pytest -q
python3 experiments/run_benchmark.py --mode dry-run --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.dry_run.json
python3 experiments/run_benchmark.py --mode direct-baseline --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.direct_baseline.json
python3 experiments/run_benchmark.py --mode kora-controlled --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.kora_controlled.json

Optional JSON checks:

python3 -m json.tool /tmp/kora_deterministic_heavy_v0.dry_run.json > /tmp/kora_dry_run_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.direct_baseline.json > /tmp/kora_direct_baseline_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.kora_controlled.json > /tmp/kora_kora_controlled_check.json

Limitations

The benchmark is simulated.
No real model calls are made.
No external API calls are made.
No real API-cost measurement is included.
No production cost reduction claim is supported.
No full KORA runtime integration is implemented yet.
The workload is intentionally small and deterministic-heavy.

Next Planned Work

Decide whether benchmark result artifacts should remain generated-only or be committed under a tracked path.
Add a concise benchmark result summary markdown if the result should be reviewed publicly.
Extend the benchmark runner toward real KORA runtime integration in a later alpha.
Continue expanding deterministic-heavy workloads beyond the initial 20-task draft.

Assets 2

Releases: Krako-Labs/KORA

v0.3.0-alpha

KORA v0.3.0-alpha

Highlights

Current bounded benchmark claim

Expected counters

Expected telemetry counters

Non-claims

Artifact policy

Uh oh!

KORA v0.2.0-alpha

Summary

Benchmark Evidence Expansion

Included Changes

Regeneration

Non-Claims

Release Notes

Uh oh!

KORA v0.1.1-alpha

KORA v0.1.1-alpha

Summary

What Changed Since v0.1.0-alpha

CI-Backed Validation

Benchmark Skeleton

Current Controlled Benchmark Result

How To Verify Locally

Limitations

Next Planned Work

Uh oh!