What to build
Operational hardening per ADR-0007 (nuclear-evict-all mitigation observed live in spike).
Pre-load validation (src/hal0/lemonade/preload.py):
- Run BEFORE every
LemonadeClient.load() call
- Check file exists at registry path
- Check sha256 matches registry entry
- Check size matches registry
- Check GGUF magic bytes
- On any failure: slot state →
error, do NOT call /v1/load (so Lemonade's nuclear-evict-all is NOT triggered for the other loaded models)
Idle-unload driver (src/hal0/lemonade/idle.py):
- Poll
/v1/health.all_models_loaded[].last_use periodically (existing 30s cadence)
- For models stale beyond 300s (existing policy), call
POST /v1/unload {"model_name": "..."}
- Preserves hal0's existing idle policy that Lemonade has no built-in equivalent of
Load timeout in LemonadeClient.load():
- Wrap in
asyncio.wait_for with default 120s timeout (configurable per slot)
- On timeout: surface
PreloadError.LOAD_TIMEOUT, slot → error, do NOT retry (retry would risk evict-all again)
Acceptance criteria
Blocked by
What to build
Operational hardening per ADR-0007 (nuclear-evict-all mitigation observed live in spike).
Pre-load validation (
src/hal0/lemonade/preload.py):LemonadeClient.load()callerror, do NOT call /v1/load (so Lemonade's nuclear-evict-all is NOT triggered for the other loaded models)Idle-unload driver (
src/hal0/lemonade/idle.py):/v1/health.all_models_loaded[].last_useperiodically (existing 30s cadence)POST /v1/unload {"model_name": "..."}Load timeout in
LemonadeClient.load():asyncio.wait_forwith default 120s timeout (configurable per slot)PreloadError.LOAD_TIMEOUT, slot → error, do NOT retry (retry would risk evict-all again)Acceptance criteria
errorstate without/v1/loadbeing called/v1/unload/v1/load→ surfaces error in 120s, no retryPreloadErrorvariantBlocked by