Add a CPU profile and a kind-based reconciler CPU canary#21
Add a CPU profile and a kind-based reconciler CPU canary#21EdHasNoLife wants to merge 3 commits into
Conversation
Lets the CacheBackend reconciler stand up a healthy, serving backend without a GPU so the substrate can be validated off-GPU. - backendConfig profile=cpu renders a GPU-free vLLM engine: no nvidia.com/gpu limit and no LMCache connector, but prefix caching and the KV-event publisher stay on, plus CPU flags (--dtype/--max-model-len/--enforce-eager) and VLLM_CPU_KVCACHE_SPACE, defaulting to the vLLM CPU image + a tiny ungated model. The default (gpu) profile is unchanged and still owns real LMCache offload, which requires a GPU. Selection is case-insensitive; unknown/empty falls back to gpu. Covered by unit tests; the GPU rendering is byte-identical. - docs/reference-stack/scripts/canary_c2_reconcile.sh + an on-demand workflow: bring up kind, run the controller, apply a profile=cpu CacheBackend, and assert the reconciler reports Ready with a published endpoint, an engine prefix-cache hit through the Service, and owner-ref GC on delete — the real-pod path envtest can't cover. - A profile=cpu sample and the backendConfig key reference in the design doc.
Codex reviewBlocking
Should-fix
Nit
No vendor-neutral naming issues found in the PR diff, and there are no proto/contract changes to mirror. Verdict: changes-requested. |
Address review feedback on the CPU profile: - The update path now reconciles the container Resources (the GPU limit differs by profile) and the pod Volumes (the shm size differs), so switching an existing backend gpu<->cpu actually reaches the live Deployment instead of only taking effect on create. Both fields are builder-owned and not API-server-defaulted, so reconciling them stays churn-free. Adds a controller test for the profile switch. - profile=cpu now requires backendConfig.image: the upstream CPU image is arch- tagged (latest-arm64 / latest-x86_64) with no safe multi-arch default, so the builder errors rather than rendering a bogus :latest. Tests + design doc updated.
Codex reviewBlocking Should-Fix
Nit I did not find vendor-neutral naming violations, proto/contract drift, or generated-code issues in this diff. I could not run the focused Go tests because this review sandbox is read-only and Go could not create Verdict: changes-requested. |
The CPU image is arch-tagged with no multi-arch tag, so a hardcoded latest-arm64 would fail on x86 hosts. Ship a deliberate :latest-<arch> placeholder (matching the reference stack's non-applyable-image convention) so applying as-is fails fast rather than silently running the wrong arch.
Codex reviewBlocking: None. Should-fix: None. Nit: None. Verdict: approve. The PR stays vendor-neutral in the changed surfaces, does not touch proto/gRPC contracts, keeps the CPU-specific behavior isolated in the backend adapter/controller rendering path, and adds focused unit coverage plus an on-demand real-cluster canary. I did not run the test suite in this read-only review environment. |
Summary
Lets the CacheBackend reconciler stand up a healthy, serving backend without a GPU, so the substrate can be validated off-GPU (closing the one C2 DoD item — "apply CR → healthy pods" — that previously needed a GPU host).
backendConfig.profile: cpu): renders a GPU-free vLLM engine — nonvidia.com/gpulimit and no LMCache connector, but prefix caching + the KV-event publisher stay on, plus CPU flags (--dtype=bfloat16 --max-model-len=8192 --enforce-eager) andVLLM_CPU_KVCACHE_SPACE, defaulting to the vLLM CPU image + a tiny ungated model (Qwen/Qwen2.5-0.5B-Instruct). The defaultgpuprofile is unchanged and still owns real LMCache offload (which requires a GPU). Selection is case-insensitive; unknown/empty falls back togpu.docs/reference-stack/scripts/canary_c2_reconcile.sh+ an on-demand workflow): brings up kind, runs the controller, applies aprofile=cpuCacheBackend, and asserts the reconciler reportsReadywith a published endpoint, an engine prefix-cache hit through the Service, and owner-ref GC on delete — the real-pod path envtest can't cover.profile=cpusample and thebackendConfigkey reference indocs/design/cachebackend-api.md.This complements PR #17 (the C1-chain CPU canary); once C1 lands on
main, the two canaries can be unified.Test plan
make ci/make pre-prgreen; GPU rendering is byte-identical (existing builder + controller unit tests unchanged and passing).bash -nclean; focused diff review found no defects.c2-reconciler-canaryworkflow dispatch or locally.