Skip to content

feat(providers): LemonadeProvider + capability dispatch via Lemonade (PR-8)#161

Merged
thinmintdev merged 1 commit into
mainfrom
feat/lemonade-capability-dispatch-pr8
May 23, 2026
Merged

feat(providers): LemonadeProvider + capability dispatch via Lemonade (PR-8)#161
thinmintdev merged 1 commit into
mainfrom
feat/lemonade-capability-dispatch-pr8

Conversation

@thinmintdev
Copy link
Copy Markdown
Contributor

Summary

PR-8 of the v0.2 Lemonade migration: wire hal0's capability layer to Lemonade Server end-to-end (chat.primary, embed, rerank, stt, tts, img) without breaking any existing caller of SlotManager. This is the facade landing — SlotManager's v0.1.x method signatures stay STABLE so the seven+ caller files outside src/hal0/slots/ keep working untouched. PR-10 (later) will simplify the SlotManager surface and migrate callers.

Re-sequenced ahead of PR-10 per the locked plan: PR-10's brief originally rewrote SlotManager, which would break 7+ callers. PR-8 lands first to ship the Lemonade dispatch path under stable signatures.

Tracks plan §11 PR-8 and ADR-0008 §1 / §2 / §6.

LemonadeProvider design

The existing hal0.providers.base.Provider ABC is shaped around docker+systemd: build_env, start_cmd, container_spec, image_ref, render_systemd_override. None of those abstractions map to a daemon-backed runtime where lemond owns process lifecycle and there is no per-slot container to render. LemonadeProvider implements the ABC to satisfy the contract, but the docker/systemd-shaped methods are operationally vestigial:

  • build_env returns a slot-identity env block (HAL0_SLOT_NAME, HAL0_PORT, HAL0_DEVICE, HAL0_PROVIDER=lemonade, HAL0_LEMONADE_RECIPE, …) — useful for diagnostics, audit logs, and `hal0 slot inspect`, never written to a file.
  • start_cmd returns the conceptual control-plane invocation (/opt/lemonade/lemond /var/lib/hal0/lemonade --slot-name=…) so anyone tracing a slot back to its process can find the responsible unit.
  • image_ref returns a stable lemonade://recipe/<recipe> or lemonade://llamacpp/<backend> identifier so audit logs and the slot inspector can distinguish two slots on different devices.
  • container_spec and render_systemd_override raise NotImplementedError with pointers to the module docstring. These methods are only reached via hal0.slots.unit_template.render_override, which SlotManager skips on the Lemonade-active branch — so the raising stubs are unreachable in production code but the ABC still has them defined.

On top of the ABC, three new lifecycle methods do the real work:

  • LemonadeProvider.load(slot_cfg, model_info=None) translates slot_cfg.device to (recipe, llamacpp_backend) via device_to_backend, pulls model.default / model.context_size / server.extra_args off the slot config, and calls LemonadeClient.load(...). Returns the parsed body. Propagates LemonadeLoadError / other LemonadeError subclasses unchanged.
  • LemonadeProvider.unload(slot_cfg) calls LemonadeClient.unload(model.default). When the slot has no model assigned, returns {ok: True, noop: "no model to unload"} without hitting the network.
  • LemonadeProvider.status(slot_cfg) filters LemonadeClient.health().loaded[] (also accepts all_models_loaded[] — Lemonade has used both field names across versions) for the slot's model. Never raises — this method is on the dashboard hot path.

The device_to_backend mapping is locked per plan §4.1 + ADR-0008 §6:

device recipe llamacpp_backend
gpu-rocm None rocm
gpu-vulkan None vulkan
cpu None cpu
npu flm None
unknown / empty None None

"None" causes the corresponding key to be omitted from the /v1/load body so Lemonade picks its own defaults.

Bridging strategy

The bridge between SlotManager's public surface and LemonadeProvider lives entirely inside SlotManager private methods. No public signatures change.

Gating: _lemonade_active() reads HAL0_BACKEND (same env var that gates the existing idle-unload driver in api/__init__.py). When set to lemonade, four private methods route through Lemonade:

  • _spawn_locked skips the toolbox path (write_slot_env + render_override + systemctl start hal0-slot@…) and calls LemonadeProvider.load. Critically: hal0-slot@.service no longer exists post-PR-9, so the legacy path can't run on a real v0.2 install regardless — but the existing test suite mocks _systemctl and would keep passing either way. This gate makes the runtime correct, not just the tests.
  • terminate skips systemctl stop and calls LemonadeProvider.unload. Lemonade's /v1/unload is synchronous; we keep the timeout_s poll loop intact for signature compatibility but it normally exits immediately.
  • _await_ready short-circuits — LemonadeClient.load blocks until the model is paged in, so if _spawn_locked returned cleanly the slot is by construction ready. A confirming LemonadeProvider.status probe separates the rare "loaded but missing from /v1/health" case (lands in IDLE) from the normal "loaded" case (lands in READY), matching the toolbox provider's --model "" handling for symmetry.
  • _is_active reads LemonadeClient.health().loaded[] instead of systemctl is-active. Drives the status() drift reconciler correctly under Lemonade.

Errors are wrapped: LemonadeError subclasses raised by LemonadeProvider.load are caught in a new _spawn_via_lemonade helper and re-raised as SlotSpawnFailed, so SlotManager.load's existing except Exception → ERROR branch keeps producing stable slot.spawn_failed error envelopes. The toolbox path is unchanged.

Caller surface preserved

All v0.1.x SlotManager methods keep their signatures + return shapes (mapped by PR-10's stopped agent — not re-derived):

  • src/hal0/api/__init__.pyiter_configs / start_idle_monitor / stop_idle_monitor
  • src/hal0/api/routes/slots.pylist / create / status / delete / get_config / update_config / load / unload / restart / swap / state_stream
  • src/hal0/api/routes/backends.pystatus / list / create / load / unload / delete
  • src/hal0/api/routes/models.pystatus / unload
  • src/hal0/api/routes/hardware.pylist
  • src/hal0/api/routes/health.pylist
  • src/hal0/dispatcher/router.py:539serving() context manager
  • src/hal0/capabilities/orchestrator.pystatus / load / unload / swap / create / update_config

None of these required source changes. Verified by running the full test suite (1479 baseline → 1517 with new tests; 0 regressions).

Anti-scope respected

  • DID NOT change SlotManager public signatures (PR-10).
  • DID NOT add SEEDED_SLOTS / route_for_request / default_slot_for / add_slot / remove_slot (PR-10).
  • DID NOT rewire src/hal0/api/routes/*.py.
  • DID NOT delete src/hal0/providers/{llama_server,flm,moonshine,kokoro,comfyui}.py — they survive as registered singletons until PR-10.
  • DID NOT introduce extra.* namespace usage (ADR-0008 §7).
  • DID NOT reintroduce preload validation (ADR-0008 §3, removed in chore(lemonade): remove preload validation per ADR-0008 §3 #155).

Files touched

File Rationale
src/hal0/providers/lemonade.py (new, 539 lines) LemonadeProvider class + device_to_backend + lemonade_active helper.
src/hal0/providers/__init__.py (+25 lines) Register "lemonade" in _PROVIDERS, add typed lemonade_provider() helper.
src/hal0/slots/manager.py (+227 / -2) Gate _spawn_locked / terminate / _await_ready / _is_active on _lemonade_active(); add _spawn_via_lemonade / _terminate_via_lemonade / _lemonade_await_ready / _lemonade_is_active helpers.
tests/providers/test_lemonade.py (new, 414 lines, 33 tests) Unit coverage for LemonadeProvider + device mapping + active-env gating.
tests/slots/test_manager_lemonade_bridge.py (new, 221 lines, 5 tests) Integration coverage for SlotManager → LemonadeProvider dispatch.

LOC delta: +1428 / -2.

Test plan

  • Full suite passes: 1517 passed, 8 skipped (1479 baseline + 33 lemonade + 5 bridge)
  • No existing test required modification
  • ruff check src tests clean
  • ruff format --check src tests clean
  • CI green (will populate after push)
  • Manual smoke on hal0 LXC under HAL0_BACKEND=lemonade — out of scope for PR-8; PR-11 (dashboard wiring) is the first PR that requires a live lemond.

🤖 Generated with Claude Code

…(PR-8)

Wires hal0's capability layer to Lemonade Server (lemond) end-to-end
without breaking SlotManager's v0.1.x caller surface. PR-8 is the facade
landing for the v0.2 migration; PR-10 (later) simplifies SlotManager
and migrates the seven+ caller files.

LemonadeProvider (src/hal0/providers/lemonade.py)
- Implements the Provider ABC. Docker/systemd-shaped methods are
  informational stubs (build_env, start_cmd, image_ref) or raise
  NotImplementedError (container_spec, render_systemd_override) — those
  paths are unreachable when SlotManager dispatches via Lemonade.
- New lifecycle methods (load / unload / status) translate SlotConfig
  fields onto LemonadeClient calls.
- device_to_backend maps hal0's device enum (gpu-rocm | gpu-vulkan |
  cpu | npu) onto Lemonade's (recipe, llamacpp_backend) pair per plan
  §4.1 + ADR-0008 §6. NPU → recipe="flm" (no llamacpp_backend); every
  other device → recipe=None + concrete llamacpp_backend.
- lemonade_active() is the env-var gate (HAL0_BACKEND=lemonade); shared
  with SlotManager via the providers/lemonade module so both subsystems
  read the same predicate.

SlotManager bridge (src/hal0/slots/manager.py)
- _spawn_locked, terminate, _await_ready, _is_active all gate on
  _lemonade_active(). Under Lemonade mode they delegate to the new
  provider methods instead of writing override.conf + driving
  systemctl; the (now-retired) hal0-slot@.service template is no longer
  required. v0.1.x toolbox path is preserved verbatim for any deployment
  not yet on Lemonade.
- LemonadeError subclasses are wrapped as SlotSpawnFailed so the
  existing "except Exception → ERROR" branch in SlotManager.load keeps
  recording stable error envelopes.

Caller surface preserved (per PR-10's stopped-agent map)
- src/hal0/api/__init__.py (iter_configs / start_idle_monitor /
  stop_idle_monitor): unchanged.
- src/hal0/api/routes/slots.py (list / create / status / delete /
  get_config / update_config / load / unload / restart / swap /
  state_stream): unchanged.
- src/hal0/api/routes/backends.py / models.py / hardware.py /
  health.py: unchanged.
- src/hal0/dispatcher/router.py:539 serving() context: unchanged.
- src/hal0/capabilities/orchestrator.py (status / load / unload /
  swap / create / update_config): unchanged.

Provider registry
- LemonadeProvider added to _PROVIDERS under name "lemonade". Existing
  singletons (llama-server / flm / moonshine / kokoro / comfyui) survive
  per the anti-scope in PR-8's brief — PR-10 owns their removal.
- New helper providers.lemonade_provider() returns the singleton with
  the typed return value SlotManager prefers.

Tests
- tests/providers/test_lemonade.py: 33 unit tests covering device
  mapping, /v1/load body construction (rocm / vulkan / cpu / npu paths
  + extra_args string serialisation), /v1/unload idempotence, /v1/health
  status derivation (both loaded[] and all_models_loaded[] field
  names), error envelopes, ABC stubs, and registry integration.
- tests/slots/test_manager_lemonade_bridge.py: 5 integration tests
  driving the full SlotManager → LemonadeProvider path with an
  httpx.MockTransport-backed Lemonade — load / unload / swap / error
  propagation / _is_active reconciliation.
- Full suite: 1517 passed, 8 skipped (1479 baseline + 33 lemonade +
  5 bridge). No regressions.

References:
- docs/internal/lemonade-adoption-plan-2026-05-22.md §11 PR-8
  (re-sequenced ahead of PR-10)
- ADR-0008 §1 / §2 / §6

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit 3e429f4 into main May 23, 2026
4 checks passed
@thinmintdev thinmintdev deleted the feat/lemonade-capability-dispatch-pr8 branch May 27, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant