Skip to content

feat(slots): wire PULLING / SERVING / IDLE — complete 9-state lifecycle#11

Merged
thinmintdev merged 1 commit into
mainfrom
feat/slot-states-complete-2026-05-15
May 16, 2026
Merged

feat(slots): wire PULLING / SERVING / IDLE — complete 9-state lifecycle#11
thinmintdev merged 1 commit into
mainfrom
feat/slot-states-complete-2026-05-15

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

  • PLAN.md §5 declared 9 slot states; only 6 had writers. Wires the missing 3:
    • PULLING: load() transitions OFFLINE→PULLING→STARTING when injected pull_runner reports model not on disk; cached models skip pulling
    • SERVING: SlotManager.serving() async context manager guarded by asyncio.Lock + in-flight counter; dispatcher wraps forward() so streaming responses hold SERVING until body drains; SingleFlightGroup prefetch never enters
    • IDLE: background sweeper demotes READY → IDLE when last_used > IDLE_AFTER_S (default 300s); serving exit bumps last_used; API lifespan starts/cancels monitor
  • LEGAL_TRANSITIONS untouched — every needed edge already existed
  • Closes task docs(release-manifest): document hal0-web releases hosting flow #10

Test plan

  • tests/slots/ green (87 pass / 3 skipped — was 72 before)
  • tests/dispatcher/ green
  • +981 lines, mostly tests

🤖 Generated with Claude Code

PLAN.md §5 declared 9 slot states; only 6 had writers. Adds the
missing three so dashboards, idle-unload heuristics, and load
progress reflect reality:

- PULLING: load() now transitions OFFLINE→PULLING→STARTING when an
  injected pull_runner reports the model is not on disk. Cached
  models skip pulling and go straight OFFLINE→STARTING (both edges
  already in LEGAL_TRANSITIONS). pull_runner failures surface as
  ERROR via the existing try/except, and bubble the exception.

- SERVING: SlotManager.serving() is a per-slot async context manager
  guarded by an asyncio.Lock + in-flight counter. First entry flips
  READY/IDLE → SERVING; last exit flips SERVING → READY. Dispatcher
  threads a slot_name through UpstreamCall and wraps forward() in
  the context — streaming responses hold SERVING until the body
  iterator drains. SingleFlightGroup prefetch never enters the
  context (it lives in dispatch(), not forward()).

- IDLE: a background sweeper (start_idle_monitor) demotes READY
  slots whose last_used is older than IDLE_AFTER_S (default 300s)
  to IDLE. serving() exit bumps last_used so a steady request
  stream resets the clock. API lifespan starts the monitor at
  startup and cancels it on shutdown.

LEGAL_TRANSITIONS untouched — every needed edge already existed.
Test fixtures (slot_root / systemctl_stub / stub_await_ready)
moved from test_manager.py to tests/slots/conftest.py so the new
suite can share them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@thinmintdev thinmintdev force-pushed the feat/slot-states-complete-2026-05-15 branch from 2e696e9 to 7be987c Compare May 16, 2026 02:08
@thinmintdev thinmintdev merged commit 8ca27b7 into main May 16, 2026
2 of 4 checks passed
thinmintdev added a commit that referenced this pull request May 21, 2026
…§30 path traversal) (#85)

Two critical pre-launch findings from the v1.0 security review:

§29 — the installer router was mounted bare, so an unauthenticated LAN
peer could re-run /api/install/probe (rewriting /etc/hal0/hardware.json),
fire /api/install/complete (writing the first-run sentinel and locking
the operator out of the password flow), kick off /api/install/pick-default
(multi-GB HuggingFace pulls), or PUT a model into any slot. The router
now declares require_token at include_router time and every mutating
endpoint additionally declares Depends(require_writer) at the route
level — matches the #11 admin-router pattern.

§30 — _assign_to_slot built the slot config path with f"{slot}.toml"
from a user-supplied body / URL field. slot="../../tmp/pwn" used to
resolve to /tmp/pwn.toml on a default install, giving an attacker an
arbitrary TOML write primitive. Both /api/install/pick-default (body
slot) and PUT /api/install/slots/{slot}/model (URL slot) now validate
against ^[a-z0-9][a-z0-9_-]{0,31}$ — the same regex SlotConfig.name
already enforces — BEFORE any filesystem op.

First-run wizard preserved: when HAL0_AUTH_ENABLED is unset, both
require_token and require_writer short-circuit to a pass-through, so
the wizard still runs anonymously on a fresh install with no password
set. Once auth is on, the wizard rides the operator's session cookie
like any other admin surface.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev deleted the feat/slot-states-complete-2026-05-15 branch May 21, 2026 20:11
thinmintdev added a commit that referenced this pull request May 22, 2026
…138)

Lands the upstream-codebase research handoff that resolved several
open questions on ADR-0006, plus the ADR + migration-plan tightenings
those answers produced.

## What lands

- `docs/internal/lemonade-repo-deep-dive-2026-05-22.md` (276L) — read
  of `lemonade-sdk/lemonade@7af26f75` (HEAD of main, 2026-05-21): dev
  internals, full API surface, embeddable build, omni recipe, WS
  protocol. The "what's actually in there" companion to the spike's
  "what happens when we run it" findings.

## ADR-0006 tightenings (from deep-dive)

- §3 (Drive method): "HTTP-first with CLI fallback" → "HTTP only". The
  spike's `/v1/load` "type must be string but is null" failure was a
  malformed body (nlohmann::json[] throws on null access), not a
  missing field. CLI fallback isn't needed.
- §3 (Schema): documents that only `model_name` is required for
  `/v1/load`; everything else is optional.
- §5/§6 (Process supervision + bundling): "containerised lemond" →
  "AMD's embeddable tarball + bare systemd unit". Lemonade ships an
  `embeddable` cmake target producing a portable lemond+lemonade
  tarball — that's the official redistributable artifact. Building
  a container around it duplicates AMD's work and reintroduces the
  docker-build apparmor pain hal0 has on LXC.

## migration-plan tightenings

- Decision #7 (Bundling+pin): `{image, digest, version}` →
  `{tarball_url, sha256, version}` to match the embeddable distribution.
- Decision #11 (Drive method): mirrors ADR §3 — HTTP-only with the
  resolved schema, no CLI bootstrap fallback.

## Why a separate PR from #137 (the client skeleton)

PR #137 already implements what these docs describe — keeping the
docs/code split on a per-PR basis makes the migration story easier to
read in `git log`: "client conforms to ADR-0006 §3 (HTTP-only)" reads
cleanly because the ADR §3 it conforms to is the post-tightening
version, landed alongside.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant