feat(slots): wire PULLING / SERVING / IDLE — complete 9-state lifecycle#11
Merged
Merged
Conversation
2 tasks
PLAN.md §5 declared 9 slot states; only 6 had writers. Adds the missing three so dashboards, idle-unload heuristics, and load progress reflect reality: - PULLING: load() now transitions OFFLINE→PULLING→STARTING when an injected pull_runner reports the model is not on disk. Cached models skip pulling and go straight OFFLINE→STARTING (both edges already in LEGAL_TRANSITIONS). pull_runner failures surface as ERROR via the existing try/except, and bubble the exception. - SERVING: SlotManager.serving() is a per-slot async context manager guarded by an asyncio.Lock + in-flight counter. First entry flips READY/IDLE → SERVING; last exit flips SERVING → READY. Dispatcher threads a slot_name through UpstreamCall and wraps forward() in the context — streaming responses hold SERVING until the body iterator drains. SingleFlightGroup prefetch never enters the context (it lives in dispatch(), not forward()). - IDLE: a background sweeper (start_idle_monitor) demotes READY slots whose last_used is older than IDLE_AFTER_S (default 300s) to IDLE. serving() exit bumps last_used so a steady request stream resets the clock. API lifespan starts the monitor at startup and cancels it on shutdown. LEGAL_TRANSITIONS untouched — every needed edge already existed. Test fixtures (slot_root / systemctl_stub / stub_await_ready) moved from test_manager.py to tests/slots/conftest.py so the new suite can share them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2e696e9 to
7be987c
Compare
5 tasks
thinmintdev
added a commit
that referenced
this pull request
May 21, 2026
…§30 path traversal) (#85) Two critical pre-launch findings from the v1.0 security review: §29 — the installer router was mounted bare, so an unauthenticated LAN peer could re-run /api/install/probe (rewriting /etc/hal0/hardware.json), fire /api/install/complete (writing the first-run sentinel and locking the operator out of the password flow), kick off /api/install/pick-default (multi-GB HuggingFace pulls), or PUT a model into any slot. The router now declares require_token at include_router time and every mutating endpoint additionally declares Depends(require_writer) at the route level — matches the #11 admin-router pattern. §30 — _assign_to_slot built the slot config path with f"{slot}.toml" from a user-supplied body / URL field. slot="../../tmp/pwn" used to resolve to /tmp/pwn.toml on a default install, giving an attacker an arbitrary TOML write primitive. Both /api/install/pick-default (body slot) and PUT /api/install/slots/{slot}/model (URL slot) now validate against ^[a-z0-9][a-z0-9_-]{0,31}$ — the same regex SlotConfig.name already enforces — BEFORE any filesystem op. First-run wizard preserved: when HAL0_AUTH_ENABLED is unset, both require_token and require_writer short-circuit to a pass-through, so the wizard still runs anonymously on a fresh install with no password set. Once auth is on, the wizard rides the operator's session cookie like any other admin surface. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev
added a commit
that referenced
this pull request
May 22, 2026
…138) Lands the upstream-codebase research handoff that resolved several open questions on ADR-0006, plus the ADR + migration-plan tightenings those answers produced. ## What lands - `docs/internal/lemonade-repo-deep-dive-2026-05-22.md` (276L) — read of `lemonade-sdk/lemonade@7af26f75` (HEAD of main, 2026-05-21): dev internals, full API surface, embeddable build, omni recipe, WS protocol. The "what's actually in there" companion to the spike's "what happens when we run it" findings. ## ADR-0006 tightenings (from deep-dive) - §3 (Drive method): "HTTP-first with CLI fallback" → "HTTP only". The spike's `/v1/load` "type must be string but is null" failure was a malformed body (nlohmann::json[] throws on null access), not a missing field. CLI fallback isn't needed. - §3 (Schema): documents that only `model_name` is required for `/v1/load`; everything else is optional. - §5/§6 (Process supervision + bundling): "containerised lemond" → "AMD's embeddable tarball + bare systemd unit". Lemonade ships an `embeddable` cmake target producing a portable lemond+lemonade tarball — that's the official redistributable artifact. Building a container around it duplicates AMD's work and reintroduces the docker-build apparmor pain hal0 has on LXC. ## migration-plan tightenings - Decision #7 (Bundling+pin): `{image, digest, version}` → `{tarball_url, sha256, version}` to match the embeddable distribution. - Decision #11 (Drive method): mirrors ADR §3 — HTTP-only with the resolved schema, no CLI bootstrap fallback. ## Why a separate PR from #137 (the client skeleton) PR #137 already implements what these docs describe — keeping the docs/code split on a per-PR basis makes the migration story easier to read in `git log`: "client conforms to ADR-0006 §3 (HTTP-only)" reads cleanly because the ADR §3 it conforms to is the post-tightening version, landed alongside. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
load()transitions OFFLINE→PULLING→STARTING when injectedpull_runnerreports model not on disk; cached models skip pullingSlotManager.serving()async context manager guarded by asyncio.Lock + in-flight counter; dispatcher wrapsforward()so streaming responses hold SERVING until body drains; SingleFlightGroup prefetch never entersTest plan
tests/slots/green (87 pass / 3 skipped — was 72 before)tests/dispatcher/green🤖 Generated with Claude Code