feat(providers): LemonadeProvider + capability dispatch via Lemonade (PR-8)#161
Merged
Merged
Conversation
…(PR-8) Wires hal0's capability layer to Lemonade Server (lemond) end-to-end without breaking SlotManager's v0.1.x caller surface. PR-8 is the facade landing for the v0.2 migration; PR-10 (later) simplifies SlotManager and migrates the seven+ caller files. LemonadeProvider (src/hal0/providers/lemonade.py) - Implements the Provider ABC. Docker/systemd-shaped methods are informational stubs (build_env, start_cmd, image_ref) or raise NotImplementedError (container_spec, render_systemd_override) — those paths are unreachable when SlotManager dispatches via Lemonade. - New lifecycle methods (load / unload / status) translate SlotConfig fields onto LemonadeClient calls. - device_to_backend maps hal0's device enum (gpu-rocm | gpu-vulkan | cpu | npu) onto Lemonade's (recipe, llamacpp_backend) pair per plan §4.1 + ADR-0008 §6. NPU → recipe="flm" (no llamacpp_backend); every other device → recipe=None + concrete llamacpp_backend. - lemonade_active() is the env-var gate (HAL0_BACKEND=lemonade); shared with SlotManager via the providers/lemonade module so both subsystems read the same predicate. SlotManager bridge (src/hal0/slots/manager.py) - _spawn_locked, terminate, _await_ready, _is_active all gate on _lemonade_active(). Under Lemonade mode they delegate to the new provider methods instead of writing override.conf + driving systemctl; the (now-retired) hal0-slot@.service template is no longer required. v0.1.x toolbox path is preserved verbatim for any deployment not yet on Lemonade. - LemonadeError subclasses are wrapped as SlotSpawnFailed so the existing "except Exception → ERROR" branch in SlotManager.load keeps recording stable error envelopes. Caller surface preserved (per PR-10's stopped-agent map) - src/hal0/api/__init__.py (iter_configs / start_idle_monitor / stop_idle_monitor): unchanged. - src/hal0/api/routes/slots.py (list / create / status / delete / get_config / update_config / load / unload / restart / swap / state_stream): unchanged. - src/hal0/api/routes/backends.py / models.py / hardware.py / health.py: unchanged. - src/hal0/dispatcher/router.py:539 serving() context: unchanged. - src/hal0/capabilities/orchestrator.py (status / load / unload / swap / create / update_config): unchanged. Provider registry - LemonadeProvider added to _PROVIDERS under name "lemonade". Existing singletons (llama-server / flm / moonshine / kokoro / comfyui) survive per the anti-scope in PR-8's brief — PR-10 owns their removal. - New helper providers.lemonade_provider() returns the singleton with the typed return value SlotManager prefers. Tests - tests/providers/test_lemonade.py: 33 unit tests covering device mapping, /v1/load body construction (rocm / vulkan / cpu / npu paths + extra_args string serialisation), /v1/unload idempotence, /v1/health status derivation (both loaded[] and all_models_loaded[] field names), error envelopes, ABC stubs, and registry integration. - tests/slots/test_manager_lemonade_bridge.py: 5 integration tests driving the full SlotManager → LemonadeProvider path with an httpx.MockTransport-backed Lemonade — load / unload / swap / error propagation / _is_active reconciliation. - Full suite: 1517 passed, 8 skipped (1479 baseline + 33 lemonade + 5 bridge). No regressions. References: - docs/internal/lemonade-adoption-plan-2026-05-22.md §11 PR-8 (re-sequenced ahead of PR-10) - ADR-0008 §1 / §2 / §6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-8 of the v0.2 Lemonade migration: wire hal0's capability layer to Lemonade Server end-to-end (chat.primary, embed, rerank, stt, tts, img) without breaking any existing caller of SlotManager. This is the facade landing —
SlotManager's v0.1.x method signatures stay STABLE so the seven+ caller files outsidesrc/hal0/slots/keep working untouched. PR-10 (later) will simplify the SlotManager surface and migrate callers.Re-sequenced ahead of PR-10 per the locked plan: PR-10's brief originally rewrote SlotManager, which would break 7+ callers. PR-8 lands first to ship the Lemonade dispatch path under stable signatures.
Tracks plan §11 PR-8 and ADR-0008 §1 / §2 / §6.
LemonadeProvider design
The existing
hal0.providers.base.ProviderABC is shaped around docker+systemd:build_env,start_cmd,container_spec,image_ref,render_systemd_override. None of those abstractions map to a daemon-backed runtime wherelemondowns process lifecycle and there is no per-slot container to render.LemonadeProviderimplements the ABC to satisfy the contract, but the docker/systemd-shaped methods are operationally vestigial:build_envreturns a slot-identity env block (HAL0_SLOT_NAME,HAL0_PORT,HAL0_DEVICE,HAL0_PROVIDER=lemonade,HAL0_LEMONADE_RECIPE, …) — useful for diagnostics, audit logs, and `hal0 slot inspect`, never written to a file.start_cmdreturns the conceptual control-plane invocation (/opt/lemonade/lemond /var/lib/hal0/lemonade --slot-name=…) so anyone tracing a slot back to its process can find the responsible unit.image_refreturns a stablelemonade://recipe/<recipe>orlemonade://llamacpp/<backend>identifier so audit logs and the slot inspector can distinguish two slots on different devices.container_specandrender_systemd_overrideraiseNotImplementedErrorwith pointers to the module docstring. These methods are only reached viahal0.slots.unit_template.render_override, which SlotManager skips on the Lemonade-active branch — so the raising stubs are unreachable in production code but the ABC still has them defined.On top of the ABC, three new lifecycle methods do the real work:
LemonadeProvider.load(slot_cfg, model_info=None)translatesslot_cfg.deviceto(recipe, llamacpp_backend)viadevice_to_backend, pullsmodel.default/model.context_size/server.extra_argsoff the slot config, and callsLemonadeClient.load(...). Returns the parsed body. PropagatesLemonadeLoadError/ otherLemonadeErrorsubclasses unchanged.LemonadeProvider.unload(slot_cfg)callsLemonadeClient.unload(model.default). When the slot has no model assigned, returns{ok: True, noop: "no model to unload"}without hitting the network.LemonadeProvider.status(slot_cfg)filtersLemonadeClient.health().loaded[](also acceptsall_models_loaded[]— Lemonade has used both field names across versions) for the slot's model. Never raises — this method is on the dashboard hot path.The
device_to_backendmapping is locked per plan §4.1 + ADR-0008 §6:gpu-rocmrocmgpu-vulkanvulkancpucpunpuflm"None" causes the corresponding key to be omitted from the
/v1/loadbody so Lemonade picks its own defaults.Bridging strategy
The bridge between
SlotManager's public surface andLemonadeProviderlives entirely insideSlotManagerprivate methods. No public signatures change.Gating:
_lemonade_active()readsHAL0_BACKEND(same env var that gates the existing idle-unload driver inapi/__init__.py). When set tolemonade, four private methods route through Lemonade:_spawn_lockedskips the toolbox path (write_slot_env+render_override+systemctl start hal0-slot@…) and callsLemonadeProvider.load. Critically:hal0-slot@.serviceno longer exists post-PR-9, so the legacy path can't run on a real v0.2 install regardless — but the existing test suite mocks_systemctland would keep passing either way. This gate makes the runtime correct, not just the tests.terminateskipssystemctl stopand callsLemonadeProvider.unload. Lemonade's/v1/unloadis synchronous; we keep thetimeout_spoll loop intact for signature compatibility but it normally exits immediately._await_readyshort-circuits —LemonadeClient.loadblocks until the model is paged in, so if_spawn_lockedreturned cleanly the slot is by construction ready. A confirmingLemonadeProvider.statusprobe separates the rare "loaded but missing from /v1/health" case (lands in IDLE) from the normal "loaded" case (lands in READY), matching the toolbox provider's --model "" handling for symmetry._is_activereadsLemonadeClient.health().loaded[]instead ofsystemctl is-active. Drives the status() drift reconciler correctly under Lemonade.Errors are wrapped:
LemonadeErrorsubclasses raised byLemonadeProvider.loadare caught in a new_spawn_via_lemonadehelper and re-raised asSlotSpawnFailed, soSlotManager.load's existingexcept Exception → ERRORbranch keeps producing stableslot.spawn_failederror envelopes. The toolbox path is unchanged.Caller surface preserved
All v0.1.x SlotManager methods keep their signatures + return shapes (mapped by PR-10's stopped agent — not re-derived):
src/hal0/api/__init__.py→iter_configs/start_idle_monitor/stop_idle_monitorsrc/hal0/api/routes/slots.py→list/create/status/delete/get_config/update_config/load/unload/restart/swap/state_streamsrc/hal0/api/routes/backends.py→status/list/create/load/unload/deletesrc/hal0/api/routes/models.py→status/unloadsrc/hal0/api/routes/hardware.py→listsrc/hal0/api/routes/health.py→listsrc/hal0/dispatcher/router.py:539→serving()context managersrc/hal0/capabilities/orchestrator.py→status/load/unload/swap/create/update_configNone of these required source changes. Verified by running the full test suite (1479 baseline → 1517 with new tests; 0 regressions).
Anti-scope respected
SEEDED_SLOTS/route_for_request/default_slot_for/add_slot/remove_slot(PR-10).src/hal0/api/routes/*.py.src/hal0/providers/{llama_server,flm,moonshine,kokoro,comfyui}.py— they survive as registered singletons until PR-10.extra.*namespace usage (ADR-0008 §7).Files touched
src/hal0/providers/lemonade.py(new, 539 lines)src/hal0/providers/__init__.py(+25 lines)lemonade_provider()helper.src/hal0/slots/manager.py(+227 / -2)_spawn_locked/terminate/_await_ready/_is_activeon_lemonade_active(); add_spawn_via_lemonade/_terminate_via_lemonade/_lemonade_await_ready/_lemonade_is_activehelpers.tests/providers/test_lemonade.py(new, 414 lines, 33 tests)tests/slots/test_manager_lemonade_bridge.py(new, 221 lines, 5 tests)LOC delta: +1428 / -2.
Test plan
ruff check src testscleanruff format --check src testscleanHAL0_BACKEND=lemonade— out of scope for PR-8; PR-11 (dashboard wiring) is the first PR that requires a live lemond.🤖 Generated with Claude Code