KORA v0.1.1-alpha
Summary
KORA v0.1.1-alpha adds CI-backed validation and the first controlled benchmark skeleton. In a deterministic-heavy benchmark skeleton, KORA-controlled execution avoided 16 of 20 simulated model invocations versus a naive direct baseline.
This is an alpha maintenance / benchmark skeleton release. It keeps the v0.1.0-alpha terminal-first surface intact while adding a reproducible CI and benchmark path for future technical preview work.
What Changed Since v0.1.0-alpha
- Added GitHub Actions CI for editable install, release smoke, and pytest validation.
- Added the
experiments/benchmark experiment directory. - Added the deterministic-heavy workload draft at
experiments/workloads/deterministic_heavy_v0.json. - Added dry-run benchmark mode for workload validation and task/category counting.
- Added direct-baseline benchmark mode to simulate one model invocation per task.
- Added KORA-controlled benchmark mode to simulate deterministic-first execution from workload metadata.
- Added the 2026-05-04 progress report.
- Added the v0.1.1-alpha release note.
CI-Backed Validation
GitHub Actions now validates changes to main with:
python3 -m pip install -e ".[dev]"
./scripts/release_smoke.sh
python3 -m pytest -qThe release candidate was also verified locally with release smoke, benchmark modes, JSON validation, and pytest.
Benchmark Skeleton
This release introduces the initial benchmark structure:
experiments/
experiments/README.md
experiments/run_benchmark.py
experiments/workloads/deterministic_heavy_v0.json
experiments/results/.gitkeep
The benchmark runner currently supports:
dry-rundirect-baselinekora-controlled
These modes are simulated and metadata-based. They do not call real models or external APIs.
Current Controlled Benchmark Result
| Metric | Value |
|---|---|
| Total tasks | 20 |
| Direct-baseline simulated model invocations | 20 |
| KORA-controlled simulated model invocations | 4 |
| Deterministic resolutions | 16 |
| Fallback candidates | 4 |
| Avoided simulated model invocations | 16 |
| Avoided invocation rate | 80% |
How To Verify Locally
./scripts/release_smoke.sh
python3 -m pytest -q
python3 experiments/run_benchmark.py --mode dry-run --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.dry_run.json
python3 experiments/run_benchmark.py --mode direct-baseline --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.direct_baseline.json
python3 experiments/run_benchmark.py --mode kora-controlled --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.kora_controlled.jsonOptional JSON checks:
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.dry_run.json > /tmp/kora_dry_run_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.direct_baseline.json > /tmp/kora_direct_baseline_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.kora_controlled.json > /tmp/kora_kora_controlled_check.jsonLimitations
- The benchmark is simulated.
- No real model calls are made.
- No external API calls are made.
- No real API-cost measurement is included.
- No production cost reduction claim is supported.
- No full KORA runtime integration is implemented yet.
- The workload is intentionally small and deterministic-heavy.
Next Planned Work
- Decide whether benchmark result artifacts should remain generated-only or be committed under a tracked path.
- Add a concise benchmark result summary markdown if the result should be reviewed publicly.
- Extend the benchmark runner toward real KORA runtime integration in a later alpha.
- Continue expanding deterministic-heavy workloads beyond the initial 20-task draft.