Skip to content

fix(slots): correct kokoro/moonshine state probe + add idle state (closes #30 #31)#50

Merged
thinmintdev merged 1 commit into
mainfrom
fix/issue-30-31-slot-state
May 17, 2026
Merged

fix(slots): correct kokoro/moonshine state probe + add idle state (closes #30 #31)#50
thinmintdev merged 1 commit into
mainfrom
fix/issue-30-31-slot-state

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

  • Slot state machine reports offline while kokoro/moonshine slots are serving #30 (HIGH): SlotManager.status() only reconciled one direction — READY/SERVING/IDLE → ERROR when systemd had died. The inverse drift was unreconciled, so a kokoro / moonshine slot started outside load() (e.g. via boot-time systemctl enable --now or an external orchestrator) was reported as offline indefinitely even though the dispatcher was routing traffic to it. Adds a one-shot _probe_once health probe; status() now adopts running-but-OFFLINE slots into ready (model loaded) or idle (no model) via a forced transition.
  • Add idle slot state for process-up-no-model containers #31 (MEDIUM): _await_ready previously had only two terminal outcomes — READY or SlotHealthFailed → ERROR. A llama-server launched with --model "" either timed out, or worse, for /health-only providers got marked ready despite serving zero models. _await_ready now returns a SlotState; after a short stabilisation window an alive-but-empty upstream resolves to idle instead. The /health-based probe also pairs /health with /v1/models so a llama-server with no model loaded lands in idle.
  • Adds WARMING → IDLE to LEGAL_TRANSITIONS and documents the resulting nine-state lifecycle in ARCHITECTURE.md §State.

Root cause for #30

SlotManager.status() (src/hal0/slots/manager.py) only handled one drift direction: it would transition READY/SERVING/IDLE → ERROR when systemctl is-active reported the unit dead. The inverse case — state.json says OFFLINE but the unit is up — was silently dropped. State.json is only written by load(); any slot whose container was started outside that path was stuck at OFFLINE forever in /api/slots. Most visible on kokoro/moonshine because those slots are commonly pre-started for testing without going through the hal0 lifecycle.

Files changed

  • src/hal0/slots/state.py — added WARMING → IDLE to LEGAL_TRANSITIONS, broadened the IDLE docstring to cover both "process-up-no-model" and "warm but quiet".
  • src/hal0/slots/manager.py_await_ready returns a SlotState; new _probe_once one-shot health probe; new _maybe_adopt_running_slot helper; status() runs the bidirectional reconciler; _IDLE_STABILISE_S tunable for the warming→idle window.
  • ARCHITECTURE.md — new "Slot lifecycle state machine" subsection in §State.
  • tests/slots/test_adoption_and_idle.py — 15 new tests covering both fixes.
  • tests/slots/conftest.py, tests/api/test_slots_routes.py — updated stub_await_ready to return SlotState.READY (was a void function).

Test plan

  • pytest tests/slots/ — 70 pass, 3 integration skipped (no hal0-slot@.service template on this host)
  • pytest tests/ — 667 pass, 6 skipped (3 integration + 3 unrelated)
  • ruff check src/hal0/slots/ tests/slots/ — clean
  • ruff format --check — clean
  • Run bash scripts/harness.sh and confirm new slot-state rows pass (deferred: harness requires the wider hal0 install / toolbox images)
  • On hal0-test LXC (10.0.1.230): curl :8080/api/slots and confirm kokoro reports ready (or idle if no model is loaded), not offline

Acceptance criteria

#30:

  • After a successful health probe to a kokoro slot, /api/slots/<name>.state reports ready — covered by test_status_adopts_running_slot_with_offline_state + test_probe_once_kokoro_ready
  • After a successful health probe to a moonshine slot, /api/slots/<name>.state reports ready — covered by test_probe_once_moonshine_model_not_loaded_returns_idle (and the _await_ready moonshine test for the happy path)
  • Unit test exercises the state writer for each non-llama provider type — test_await_ready_returns_*_for_{kokoro,moonshine} + test_probe_once_*
  • β-tier integration test: POST to /v1/audio/speech + check state — deferred to release-gate
  • β-tier integration test: POST to /v1/audio/transcriptions + check state — deferred to release-gate

#31:

  • Slot state vocabulary includes idle distinct from ready and offline — was already present; broadened semantics
  • A slot with --model "" reports idle, not readytest_load_lands_in_idle_when_no_model + test_await_ready_returns_idle_when_*
  • A slot with a model loaded and /v1/models non-empty reports ready — happy-path tests pass
  • Dashboard slot card distinguishes idle from ready visually — already in place (SlotCard.vue has distinct sc-state-idle CSS, badge text renders status string verbatim)
  • State-transition unit tests cover the idle ↔ ready edges — test_warming_to_idle_is_a_legal_transition + existing state_transitions tests
  • ARCHITECTURE.md reflects the new state — added a §State subsection with the full lifecycle diagram

🤖 Generated with Claude Code

…oses #30 #31)

#30: ``SlotManager.status()`` only reconciled one direction —
READY/SERVING/IDLE → ERROR when the systemd unit had died.  It never
reconciled the inverse drift, so a kokoro / moonshine slot whose
container was started outside the hal0 ``load()`` lifecycle (e.g.
``systemctl enable --now`` on boot, or by an external orchestrator)
sat at OFFLINE forever in ``/api/slots`` even while the dispatcher
routed traffic to it.  Add a one-shot ``_probe_once`` health probe and
call it from ``status()`` whenever the on-disk state is OFFLINE/ERROR
but the unit is active.  When the probe converges the slot is adopted
into READY (model loaded) or IDLE (no model) via a forced transition.

#31: ``ready`` used to be the only successful resolution of ``load()``,
so a llama-server launched with ``--model ""`` either timed out in
WARMING and landed in ERROR, or — for ``/health``-only providers —
got marked READY despite serving zero models.  Either way the
dispatcher treated the slot as routable and every inference request
4xx'd.  Broaden ``_await_ready`` to return a ``SlotState`` (READY or
IDLE); after a short stabilisation window an alive-but-empty upstream
resolves to IDLE.  Pair ``/health`` with ``/v1/models`` for the
llama-server / kokoro strategy so an empty model list lands in IDLE
not READY.  Add WARMING → IDLE to ``LEGAL_TRANSITIONS``.

Documents the resulting nine-state lifecycle in ARCHITECTURE.md §State.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev added a commit that referenced this pull request May 16, 2026
Format-only — unblocks CI for #45/#46/#47/#48/#49/#50.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit 81eaba3 into main May 17, 2026
3 of 6 checks passed
@thinmintdev thinmintdev deleted the fix/issue-30-31-slot-state branch May 21, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant