feat(cli): hal0 slot create gains --type + derives Lemonade device (#275 bug 3)#282
Merged
Conversation
…275 bugs 1+2) Surfaced 2026-05-23 by the v3 dashboard CRUD sweep. Two compat hops at the POST /api/slots boundary: **Bug 1** — body accepts top-level `model: "name"` (Lemonade-shape, per the slots audit + the v3 dashboard's create-slot modal) but the serializer at slots.py:191 reads `cfg.get("model").get("default")` (the nested [model] table that the SlotConfig pydantic model and persistent TOML loaders both use). Result: `model_default` MISSING from /api/slots responses for any slot created via POST. Cards rendered with no model name despite the TOML having the model. Workaround was hand-writing TOMLs in nested shape. **Bug 2** — POST never auto-assigns a port. New slots persisted as `port=0`, dashboard card chips showed port=0 instead of a useable port. Fix: add `_normalize_create_body()` that runs before `sm.create()`: - Top-level string `model` → nested `{"default": <string>}`. - Missing/zero `port` → next free port in 8081-8099 via the new `_next_free_slot_port()` helper (walks existing slot TOMLs to find the lowest unclaimed port). Closes parts of #275 (bugs 1+2 of 7). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bug 3) Surfaced 2026-05-23 by the v3 dashboard CRUD sweep: the CLI couldn't create embed/rerank/transcription/tts slots because `hal0 slot create` had no `--type` flag and used the v0.1 hardware enum ([vulkan|rocm|cpu]) instead of the Lemonade `device` enum ([gpu-vulkan|gpu-rocm|cpu|npu]). Operators creating non-LLM slots had to bypass the CLI and POST to /api/slots directly. Adds: - `SlotType` enum (`llm | embedding | reranking | transcription | tts | image`) — the Lemonade type vocab. - `--type` / `-t` flag on `hal0 slot create`, defaults to `llm` for backward compat with the v0.1 chat-only create path. - Derives `device` from `--hardware` (vulkan→gpu-vulkan, rocm→gpu-rocm, cpu→cpu) so the body POST'd matches the audit-recommended Lemonade-shape SlotConfig. Keeps `--provider` + `--hardware` flags (now documented as legacy v0.1 compat). Pairs with PR #281 which normalizes the POST body server-side. Closes part of #275 (bug 3 of 7). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev
added a commit
that referenced
this pull request
May 23, 2026
…havior (#285) PR #277 changed the dispatcher: NoRouteFound now delegates to the lemonade_proxy catch-all instead of bubbling up as 404. Six existing tests asserted the OLD 404 contract, breaking CI for #281, #282, #283 and any future PR. Fixes: - tests/api/test_v1_dispatch.py: 3 `_no_route_envelope` tests renamed + asserts widened to accept either 404 (no proxy) or 503 (proxy→lemonade unreachable). - tests/api/test_v1_proxy.py: test_v1_chat_completions_still_hits_dispatcher renamed to ..._falls_through_to_proxy + assertion flipped (proxy now MUST be consulted on dispatcher no_route, not skipped). - tests/omni_router/test_api_wiring.py: test_chat_completions_without_omni widened to either 404 or 503 + lemonade.unavailable code. - tests/slots/test_manager.py::test_status_reconciles_drift: drop the "evicted" message assertion (the operator-facing message is reset to empty in the post-transition Slot rebuild path; only state==OFFLINE is contract per #276). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thinmintdev
added a commit
that referenced
this pull request
May 28, 2026
…rough + gut installer auth section (#390) - docs/operate/lemonade.md (new, .md canonical): operator reference for the v0.2 Lemonade runtime — what it is, where state lives, the /v1/* proxy + dispatcher fallthrough (PRs #248/#277), slot ↔ Lemonade model mapping (PRs #281/#282), max_loaded_models = 8 LRU cap (PR #283), per-type LRU eviction per ADR-0008 (supersedes nuclear-evict ADR-0007), OFFLINE-on-eviction (PR #276), and the three known v0.3 caveats (Vulkan KV gauge missing, whisper RUNPATH workaround, GPU cleanup unload hang). - docs/dashboard/v3.md (new, .md canonical, new docs/dashboard/ dir): page-by-page tour of the v3 React dashboard shipped in v0.3.0-alpha.1 (PR #235). Covers the shell + Mock-badge convention, /dashboard (system overview after #356), /chat (real surface per #309/#314/#315/#351), /slots (sidebar mirror per #357 + #344 UX sweep), /models (#313/#319/#353), /mcp (#304/#300), /agents (Peers per #299), /memory (graph #297, throughput #308), Settings (no Auth tab post-ADR-0012), and the footer journal (Epic #322 — PRs #321/#328/#329/#330/#332). Mock-fallback issues linked via the dashboard-v3 label, not enumerated. - installer/README.md: gut ~95 lines of stale auth prose (Caddy, Bearer-token mint/use/revoke, first-run OTP claim wizard, HAL0_AUTH_ENABLED/HAL0_AUTH_DISABLED, password recovery, basic_auth upgrade path, the TLS recipe). Replace with one paragraph pointing at docs/operate/auth.mdx for the reverse-proxy recipe and docs/agents/identity.md for the X-hal0-Agent identity model. Auth was removed in v0.3.0-alpha.1 per ADR-0012; the README hadn't caught up. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes part of #275. Adds
SlotTypeenum +--typeflag tohal0 slot createso embedding/rerank/transcription/tts slots are creatable via CLI. Derives Lemonadedevicefrom the v0.1--hardwareflag. Pairs with #281 which does the server-side body normalization.