What to build
Single-model end-to-end vertical slice. A fresh hal0 box installs Lemonade, brings up lemond under systemd, and serves a chat completion via Lemonade for the primary slot when HAL0_BACKEND=lemonade flag is set. v0.1.x toolbox path remains default + working.
Touches every layer end-to-end:
- install.sh — download AMD's embeddable tarball, sha256-verify, extract to
/opt/lemonade, apt-install unzip + libxrt-npu2, run lemonade backends install llamacpp:rocm at first boot
- systemd — write
/etc/systemd/system/lemond.service running lemond /opt/lemonade --port 9100 with hardening directives (NoNewPrivileges, ProtectSystem=strict, ProtectHome, PrivateTmp, RestrictAddressFamilies)
- manifest.json schema v2 —
lemonade: { tarball_url, sha256, version }
src/hal0/lemonade/client.py — minimal HTTP client wrapping /v1/load, /v1/unload, /v1/health, /v1/chat/completions, /live. /v1/load body shape: {"model_name": "..."} (only required field, per research)
src/hal0/providers/lemonade.py — LemonadeProvider for the primary slot only; delegates to LemonadeClient
HAL0_BACKEND=lemonade flag — env var or CLI flag plumbing; old LlamaServerProvider is default; new path only when flag set
Demoable: SSH to test box → set flag → /v1/chat/completions returns a non-empty response from a Lemonade-served primary slot.
Acceptance criteria
Blocked by
What to build
Single-model end-to-end vertical slice. A fresh hal0 box installs Lemonade, brings up
lemondunder systemd, and serves a chat completion via Lemonade for theprimaryslot whenHAL0_BACKEND=lemonadeflag is set. v0.1.x toolbox path remains default + working.Touches every layer end-to-end:
/opt/lemonade, apt-installunzip+libxrt-npu2, runlemonade backends install llamacpp:rocmat first boot/etc/systemd/system/lemond.servicerunninglemond /opt/lemonade --port 9100with hardening directives (NoNewPrivileges,ProtectSystem=strict,ProtectHome,PrivateTmp,RestrictAddressFamilies)lemonade: { tarball_url, sha256, version }src/hal0/lemonade/client.py— minimal HTTP client wrapping/v1/load,/v1/unload,/v1/health,/v1/chat/completions,/live./v1/loadbody shape:{"model_name": "..."}(only required field, per research)src/hal0/providers/lemonade.py—LemonadeProviderfor theprimaryslot only; delegates toLemonadeClientHAL0_BACKEND=lemonadeflag — env var or CLI flag plumbing; oldLlamaServerProvideris default; new path only when flag setDemoable: SSH to test box → set flag →
/v1/chat/completionsreturns a non-empty response from a Lemonade-servedprimaryslot.Acceptance criteria
HAL0_BACKEND=lemonadebrings uplemond.servicehealthy via/liveLemonadeClient.load("qwen3.5-0.8b")returns success;/v1/healthshows the model loaded/v1/chat/completionsproxied through hal0 returns a valid streaming responsemanifest.jsonv2 schema validatesunzip+libxrt-npu2prereqs handled in install.sh idempotentlyBlocked by