Add full-chain CPU e2e canary for the cache substrate#17
Draft
EdHasNoLife wants to merge 3 commits into
Draft
Conversation
scripts/canary_e2e.sh brings up the whole stack on a CPU vLLM engine (no GPU) —
engine → ZMQ KV events → kvevent-subscriber → policy server → index — fires a
repeated long prefix, and asserts both an engine prefix-cache hit
(vllm:prefix_cache_hits increases) and that the server index populated
(inferencecache_index_entries{model} > 0). Builds the binaries, manages/cleans up
the engine container, and exits non-zero on failure. On-demand (needs Docker +
the vLLM CPU image + adequate VM RAM), not a blocking CI gate. Documented in the
reference-stack README.
Verified locally: prefix_cache_hits 0->2560, index_entries{model=canary}=20, PASS.
Codex reviewBlocking Should-fix Nit Verdict I reviewed the full PR diff and surrounding server/subscriber/index code. No vendor-neutral naming violations, proto/CRD contract changes, or fail-open semantic regressions were introduced. Static checks passed: |
cpu-substrate-canary.yml runs docs/reference-stack/scripts/canary_e2e.sh on a nightly cron and on manual dispatch (with a runner-label input to target a self-hosted Docker host). Not a per-PR gate — it pulls a multi-GB image and needs ~10 GiB RAM. Uploads server/subscriber logs on failure. Depends on cmd/kvevent-subscriber, so it only functions once C1 is on the default branch.
Codex reviewBlocking Should-fix
Nit
Vendor-neutral check passed for the PR diff; no proto/CRD/core API contract changes detected. Verdict: changes-requested. |
…nxuedward/c1-cpu-e2e-canary
Codex reviewBlocking
Should-fix None. Nit None. Verdict: changes-requested. |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A one-shot, GPU-free canary that exercises the whole substrate end to end and asserts it works:
docs/reference-stack/scripts/canary_e2e.shbuildsserver+kvevent-subscriber, starts a CPU vLLM engine (with KV events), the policy server, and the subscriber, fires a repeated long prefix, then asserts:vllm:prefix_cache_hits_totalincreases), andinferencecache_index_entries{model} > 0).It manages/cleans up the engine container, exits non-zero on failure, and is arch-aware (arm64/x86_64 image tag). Documented in the reference-stack README.
How it's run
On-demand (not a blocking CI gate): it needs Docker, pulls the vLLM CPU image (multi-GB), and a Docker VM with ~10+ GiB RAM. Run locally, or wire into a scheduled/dispatch job on a Docker-capable runner.
Verification
Ran locally end-to-end: engine healthy, requests 200/200,
prefix_cache_hits 0→2560,index_entries{model=canary}=20, PASS.Note
Stacked on #15 (C1 — the
kvevent-subscriberthe full chain needs). Base is the C1 branch; will retarget tomainonce C1 merges.make pre-prgreen.