Skip to content

KORA v0.1.1-alpha

Latest

Choose a tag to compare

@hkalbertkim hkalbertkim released this 03 May 19:57
· 520 commits to main since this release
0f4c761

KORA v0.1.1-alpha

Summary

KORA v0.1.1-alpha adds CI-backed validation and the first controlled benchmark skeleton. In a deterministic-heavy benchmark skeleton, KORA-controlled execution avoided 16 of 20 simulated model invocations versus a naive direct baseline.

This is an alpha maintenance / benchmark skeleton release. It keeps the v0.1.0-alpha terminal-first surface intact while adding a reproducible CI and benchmark path for future technical preview work.

What Changed Since v0.1.0-alpha

  • Added GitHub Actions CI for editable install, release smoke, and pytest validation.
  • Added the experiments/ benchmark experiment directory.
  • Added the deterministic-heavy workload draft at experiments/workloads/deterministic_heavy_v0.json.
  • Added dry-run benchmark mode for workload validation and task/category counting.
  • Added direct-baseline benchmark mode to simulate one model invocation per task.
  • Added KORA-controlled benchmark mode to simulate deterministic-first execution from workload metadata.
  • Added the 2026-05-04 progress report.
  • Added the v0.1.1-alpha release note.

CI-Backed Validation

GitHub Actions now validates changes to main with:

python3 -m pip install -e ".[dev]"
./scripts/release_smoke.sh
python3 -m pytest -q

The release candidate was also verified locally with release smoke, benchmark modes, JSON validation, and pytest.

Benchmark Skeleton

This release introduces the initial benchmark structure:

experiments/
experiments/README.md
experiments/run_benchmark.py
experiments/workloads/deterministic_heavy_v0.json
experiments/results/.gitkeep

The benchmark runner currently supports:

  • dry-run
  • direct-baseline
  • kora-controlled

These modes are simulated and metadata-based. They do not call real models or external APIs.

Current Controlled Benchmark Result

Metric Value
Total tasks 20
Direct-baseline simulated model invocations 20
KORA-controlled simulated model invocations 4
Deterministic resolutions 16
Fallback candidates 4
Avoided simulated model invocations 16
Avoided invocation rate 80%

How To Verify Locally

./scripts/release_smoke.sh
python3 -m pytest -q
python3 experiments/run_benchmark.py --mode dry-run --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.dry_run.json
python3 experiments/run_benchmark.py --mode direct-baseline --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.direct_baseline.json
python3 experiments/run_benchmark.py --mode kora-controlled --workload experiments/workloads/deterministic_heavy_v0.json --output /tmp/kora_deterministic_heavy_v0.kora_controlled.json

Optional JSON checks:

python3 -m json.tool /tmp/kora_deterministic_heavy_v0.dry_run.json > /tmp/kora_dry_run_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.direct_baseline.json > /tmp/kora_direct_baseline_check.json
python3 -m json.tool /tmp/kora_deterministic_heavy_v0.kora_controlled.json > /tmp/kora_kora_controlled_check.json

Limitations

  • The benchmark is simulated.
  • No real model calls are made.
  • No external API calls are made.
  • No real API-cost measurement is included.
  • No production cost reduction claim is supported.
  • No full KORA runtime integration is implemented yet.
  • The workload is intentionally small and deterministic-heavy.

Next Planned Work

  • Decide whether benchmark result artifacts should remain generated-only or be committed under a tracked path.
  • Add a concise benchmark result summary markdown if the result should be reviewed publicly.
  • Extend the benchmark runner toward real KORA runtime integration in a later alpha.
  • Continue expanding deterministic-heavy workloads beyond the initial 20-task draft.