KORA v0.1.1-alpha

Summary

KORA v0.1.1-alpha adds CI-backed validation and the first controlled benchmark skeleton. In a deterministic-heavy benchmark skeleton, KORA-controlled execution avoided 16 of 20 simulated model invocations versus a naive direct baseline.

This is an alpha maintenance / benchmark skeleton release. It keeps the v0.1.0-alpha terminal-first surface intact while adding a reproducible CI and benchmark path for future technical preview work.

What Changed Since v0.1.0-alpha

Added GitHub Actions CI for editable install, release smoke, and pytest validation.
Added the experiments/ benchmark experiment directory.
Added the deterministic-heavy workload draft at experiments/workloads/deterministic_heavy_v0.json.
Added dry-run benchmark mode for workload validation and task/category counting.
Added direct-baseline benchmark mode to simulate one model invocation per task.
Added KORA-controlled benchmark mode to simulate deterministic-first execution from workload metadata.
Added the 2026-05-04 progress report.
Added the v0.1.1-alpha release note.

CI-Backed Validation

GitHub Actions now validates changes to main with:

python3 -m pip install -e ".[dev]"
./scripts/release_smoke.sh
python3 -m pytest -q

The release candidate was also verified locally with release smoke, benchmark modes, JSON validation, and pytest.

Benchmark Skeleton

This release introduces the initial benchmark structure:

experiments/
experiments/README.md
experiments/run_benchmark.py
experiments/workloads/deterministic_heavy_v0.json
experiments/results/.gitkeep

The benchmark runner currently supports:

dry-run
direct-baseline
kora-controlled

These modes are simulated and metadata-based. They do not call real models or external APIs.

Current Controlled Benchmark Result

Metric	Value
Total tasks	20
Direct-baseline simulated model invocations	20
KORA-controlled simulated model invocations	4
Deterministic resolutions	16
Fallback candidates	4
Avoided simulated model invocations	16
Avoided invocation rate	80%

How To Verify Locally

./scripts/release_smoke.sh
python3 -m pytest -q
python3 experiments/run_benchmark.py --mode dry-run --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.dry_run.json
python3 experiments/run_benchmark.py --mode direct-baseline --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.direct_baseline.json
python3 experiments/run_benchmark.py --mode kora-controlled --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.kora_controlled.json

Optional JSON checks:

python3 -m json.tool /tmp/kora_deterministic_heavy_v0.dry_run.json > /tmp/kora_dry_run_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.direct_baseline.json > /tmp/kora_direct_baseline_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.kora_controlled.json > /tmp/kora_kora_controlled_check.json

Limitations

The benchmark is simulated.
No real model calls are made.
No external API calls are made.
No real API-cost measurement is included.
No production cost reduction claim is supported.
No full KORA runtime integration is implemented yet.
The workload is intentionally small and deterministic-heavy.

Next Planned Work

Decide whether benchmark result artifacts should remain generated-only or be committed under a tracked path.
Add a concise benchmark result summary markdown if the result should be reviewed publicly.
Extend the benchmark runner toward real KORA runtime integration in a later alpha.
Continue expanding deterministic-heavy workloads beyond the initial 20-task draft.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KORA v0.1.1-alpha

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

KORA v0.1.1-alpha

Summary

What Changed Since v0.1.0-alpha

CI-Backed Validation

Benchmark Skeleton

Current Controlled Benchmark Result

How To Verify Locally

Limitations

Next Planned Work

Uh oh!