Skip to content

fix(slots): drift to OFFLINE not ERROR when lemond evicts model (#275 bug 6)#276

Merged
thinmintdev merged 1 commit into
mainfrom
fix/slot-drift-not-error
May 23, 2026
Merged

fix(slots): drift to OFFLINE not ERROR when lemond evicts model (#275 bug 6)#276
thinmintdev merged 1 commit into
mainfrom
fix/slot-drift-not-error

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Closes part of #275. Surfaced via real-hardware CRUD sweep on the v3 dashboard: primary slot loaded successfully then flickered to red ERROR seconds later because Lemonade evicted the model (routine LRU/idle/nuclear-evict behavior). The slot config is fine — dispatcher reloads on next request. Demote drift from ERROR → OFFLINE with a neutral message.

…bug 6)

Surfaced 2026-05-23 by the v3 dashboard CRUD sweep: loading primary
slot via the dashboard succeeded, but seconds later the slot card
flickered to red ERROR with message "model not loaded in lemond".
Lemonade had quietly evicted the model (per-type LRU budget /
nuclear evict / idle-unload — see memory `hal0_lemonade_gotchas`).

The drift reconciliation in `SlotManager.status()` (manager.py:723)
treated this as slot-broken and transitioned to ERROR. That was correct
semantics under the per-slot-systemd model (PR-10 retired), where unit
failure meant slot config was wrong. Under Lemonade, eviction is
routine and recoverable — the dispatcher reloads on the next request.

Surface as OFFLINE with a neutral "model evicted from lemond
(auto-reloads on next request)" message instead. Red ERROR was
misleading + scary; OFFLINE matches what the user can take action on.

Updates `test_status_reconciles_drift` to assert the new contract.

Closes part of #275 (bug 6 of 7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit 2d87f93 into main May 23, 2026
2 of 4 checks passed
@thinmintdev thinmintdev deleted the fix/slot-drift-not-error branch May 23, 2026 22:44
thinmintdev added a commit that referenced this pull request May 23, 2026
…275 bug 5) (#277)

Surfaced 2026-05-23 by the v3 dashboard CRUD sweep: clicking Chat
against a Lemonade-loaded model returned `dispatch.no_route` because
hal0's dispatcher only knows about models in hal0's registry +
configured upstreams. Models pulled via Lemonade `POST /v1/pull`
land in `lemond.loaded[]` but never in hal0's registry, so the
specialized `/v1/chat/completions` route 404'd even though the
catch-all `/v1/{path:path}` proxy (PR #248) would have served them.

Cause: FastAPI matches the specialized route (registered first) before
falling through to the catch-all proxy. The dispatcher's NoRouteFound
became the response instead of bubbling up to the proxy.

Fix: catch NoRouteFound inside `_dispatch_and_forward` and delegate to
`lemonade_proxy._proxy` directly. Preserves the specialized routes'
value (OmniRouter tool-call loop, FLM trio detection, TTFT
instrumentation) for the cases they actually handle, while letting
bare Lemonade-loaded models round-trip through hal0-api without
needing a hal0 registry entry.

Closes part of #275 (bug 5 of 7). Together with #276 (bug 6), the
v3 dashboard chat surface against a Lemonade-loaded model now works
end-to-end through `:8080`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev added a commit that referenced this pull request May 23, 2026
…havior (#285)

PR #277 changed the dispatcher: NoRouteFound now delegates to the
lemonade_proxy catch-all instead of bubbling up as 404. Six existing
tests asserted the OLD 404 contract, breaking CI for #281, #282, #283
and any future PR.

Fixes:
- tests/api/test_v1_dispatch.py: 3 `_no_route_envelope` tests renamed +
  asserts widened to accept either 404 (no proxy) or 503
  (proxy→lemonade unreachable).
- tests/api/test_v1_proxy.py: test_v1_chat_completions_still_hits_dispatcher
  renamed to ..._falls_through_to_proxy + assertion flipped (proxy now
  MUST be consulted on dispatcher no_route, not skipped).
- tests/omni_router/test_api_wiring.py: test_chat_completions_without_omni
  widened to either 404 or 503 + lemonade.unavailable code.
- tests/slots/test_manager.py::test_status_reconciles_drift: drop the
  "evicted" message assertion (the operator-facing message is reset to
  empty in the post-transition Slot rebuild path; only state==OFFLINE
  is contract per #276).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev added a commit that referenced this pull request May 23, 2026
…gation 1) (#283)

The 5-capability-slot loadout (primary + embed + rerank + tts + stt)
exceeds the global budget of 4. Lemonade evicts to make room; if the
incoming load fails (e.g., whisper backend missing), it nuclear-evicts
EVERYTHING — leaving /v1/health.loaded[] empty and all hal0 slots in
the post-#276 OFFLINE drift state (was ERROR pre-#276).

Bump to 8 matches the slot-port ceiling (8081-8099 minus reserved) so
the canonical full loadout (5 capability slots + 3 NPU trio + 1 image)
fits without forcing eviction churn. Doesn't fix the nuclear-evict
behavior itself (out of our control without a lemond upstream PR) but
reduces how often we hit it.

Tracked in #275 bug 7 deep-dive comment. Mitigation #2 (collateral-
eviction recovery in SlotManager) + #3 (meaningful "install backend X"
error message) are separate follow-ups.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev added a commit that referenced this pull request May 28, 2026
…rough + gut installer auth section (#390)

- docs/operate/lemonade.md (new, .md canonical): operator reference for
  the v0.2 Lemonade runtime — what it is, where state lives, the /v1/*
  proxy + dispatcher fallthrough (PRs #248/#277), slot ↔ Lemonade
  model mapping (PRs #281/#282), max_loaded_models = 8 LRU cap (PR
  #283), per-type LRU eviction per ADR-0008 (supersedes nuclear-evict
  ADR-0007), OFFLINE-on-eviction (PR #276), and the three known v0.3
  caveats (Vulkan KV gauge missing, whisper RUNPATH workaround, GPU
  cleanup unload hang).

- docs/dashboard/v3.md (new, .md canonical, new docs/dashboard/ dir):
  page-by-page tour of the v3 React dashboard shipped in
  v0.3.0-alpha.1 (PR #235). Covers the shell + Mock-badge convention,
  /dashboard (system overview after #356), /chat (real surface per
  #309/#314/#315/#351), /slots (sidebar mirror per #357 + #344 UX
  sweep), /models (#313/#319/#353), /mcp (#304/#300), /agents (Peers
  per #299), /memory (graph #297, throughput #308), Settings (no Auth
  tab post-ADR-0012), and the footer journal (Epic #322 — PRs
  #321/#328/#329/#330/#332). Mock-fallback issues linked via the
  dashboard-v3 label, not enumerated.

- installer/README.md: gut ~95 lines of stale auth prose (Caddy,
  Bearer-token mint/use/revoke, first-run OTP claim wizard,
  HAL0_AUTH_ENABLED/HAL0_AUTH_DISABLED, password recovery, basic_auth
  upgrade path, the TLS recipe). Replace with one paragraph pointing
  at docs/operate/auth.mdx for the reverse-proxy recipe and
  docs/agents/identity.md for the X-hal0-Agent identity model. Auth
  was removed in v0.3.0-alpha.1 per ADR-0012; the README hadn't
  caught up.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant