fix(slots): correct kokoro/moonshine state probe + add idle state (closes #30 #31)#50
Merged
Merged
Conversation
…oses #30 #31) #30: ``SlotManager.status()`` only reconciled one direction — READY/SERVING/IDLE → ERROR when the systemd unit had died. It never reconciled the inverse drift, so a kokoro / moonshine slot whose container was started outside the hal0 ``load()`` lifecycle (e.g. ``systemctl enable --now`` on boot, or by an external orchestrator) sat at OFFLINE forever in ``/api/slots`` even while the dispatcher routed traffic to it. Add a one-shot ``_probe_once`` health probe and call it from ``status()`` whenever the on-disk state is OFFLINE/ERROR but the unit is active. When the probe converges the slot is adopted into READY (model loaded) or IDLE (no model) via a forced transition. #31: ``ready`` used to be the only successful resolution of ``load()``, so a llama-server launched with ``--model ""`` either timed out in WARMING and landed in ERROR, or — for ``/health``-only providers — got marked READY despite serving zero models. Either way the dispatcher treated the slot as routable and every inference request 4xx'd. Broaden ``_await_ready`` to return a ``SlotState`` (READY or IDLE); after a short stabilisation window an alive-but-empty upstream resolves to IDLE. Pair ``/health`` with ``/v1/models`` for the llama-server / kokoro strategy so an empty model list lands in IDLE not READY. Add WARMING → IDLE to ``LEGAL_TRANSITIONS``. Documents the resulting nine-state lifecycle in ARCHITECTURE.md §State. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev
added a commit
that referenced
this pull request
May 16, 2026
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SlotManager.status()only reconciled one direction —READY/SERVING/IDLE → ERRORwhen systemd had died. The inverse drift was unreconciled, so a kokoro / moonshine slot started outsideload()(e.g. via boot-timesystemctl enable --nowor an external orchestrator) was reported asofflineindefinitely even though the dispatcher was routing traffic to it. Adds a one-shot_probe_oncehealth probe;status()now adopts running-but-OFFLINE slots intoready(model loaded) oridle(no model) via a forced transition._await_readypreviously had only two terminal outcomes —READYorSlotHealthFailed → ERROR. A llama-server launched with--model ""either timed out, or worse, for/health-only providers got markedreadydespite serving zero models._await_readynow returns aSlotState; after a short stabilisation window an alive-but-empty upstream resolves toidleinstead. The/health-based probe also pairs/healthwith/v1/modelsso a llama-server with no model loaded lands inidle.WARMING → IDLEtoLEGAL_TRANSITIONSand documents the resulting nine-state lifecycle inARCHITECTURE.md §State.Root cause for #30
SlotManager.status()(src/hal0/slots/manager.py) only handled one drift direction: it would transitionREADY/SERVING/IDLE → ERRORwhensystemctl is-activereported the unit dead. The inverse case —state.jsonsays OFFLINE but the unit is up — was silently dropped. State.json is only written byload(); any slot whose container was started outside that path was stuck at OFFLINE forever in/api/slots. Most visible on kokoro/moonshine because those slots are commonly pre-started for testing without going through the hal0 lifecycle.Files changed
src/hal0/slots/state.py— addedWARMING → IDLEtoLEGAL_TRANSITIONS, broadened theIDLEdocstring to cover both "process-up-no-model" and "warm but quiet".src/hal0/slots/manager.py—_await_readyreturns aSlotState; new_probe_onceone-shot health probe; new_maybe_adopt_running_slothelper;status()runs the bidirectional reconciler;_IDLE_STABILISE_Stunable for the warming→idle window.ARCHITECTURE.md— new "Slot lifecycle state machine" subsection in §State.tests/slots/test_adoption_and_idle.py— 15 new tests covering both fixes.tests/slots/conftest.py,tests/api/test_slots_routes.py— updatedstub_await_readyto returnSlotState.READY(was a void function).Test plan
pytest tests/slots/— 70 pass, 3 integration skipped (no hal0-slot@.service template on this host)pytest tests/— 667 pass, 6 skipped (3 integration + 3 unrelated)ruff check src/hal0/slots/ tests/slots/— cleanruff format --check— cleanbash scripts/harness.shand confirm new slot-state rows pass (deferred: harness requires the wider hal0 install / toolbox images)curl :8080/api/slotsand confirm kokoro reportsready(oridleif no model is loaded), notofflineAcceptance criteria
#30:
/api/slots/<name>.statereportsready— covered bytest_status_adopts_running_slot_with_offline_state+test_probe_once_kokoro_ready/api/slots/<name>.statereportsready— covered bytest_probe_once_moonshine_model_not_loaded_returns_idle(and the_await_readymoonshine test for the happy path)test_await_ready_returns_*_for_{kokoro,moonshine}+test_probe_once_*/v1/audio/speech+ check state — deferred to release-gate/v1/audio/transcriptions+ check state — deferred to release-gate#31:
idledistinct fromreadyandoffline— was already present; broadened semantics--model ""reportsidle, notready—test_load_lands_in_idle_when_no_model+test_await_ready_returns_idle_when_*/v1/modelsnon-empty reportsready— happy-path tests passidlefromreadyvisually — already in place (SlotCard.vuehas distinctsc-state-idleCSS, badge text renders status string verbatim)idle ↔ readyedges —test_warming_to_idle_is_a_legal_transition+ existing state_transitions testsARCHITECTURE.mdreflects the new state — added a §State subsection with the full lifecycle diagram🤖 Generated with Claude Code