Add full-chain CPU e2e canary for the cache substrate by EdHasNoLife · Pull Request #17 · cachebox-project/inference-cache

EdHasNoLife · 2026-05-27T19:12:52Z

Summary

A one-shot, GPU-free canary that exercises the whole substrate end to end and asserts it works:

CPU vLLM engine ──ZMQ KV events──▶ kvevent-subscriber ──gRPC──▶ policy server ──▶ index

docs/reference-stack/scripts/canary_e2e.sh builds server + kvevent-subscriber, starts a CPU vLLM engine (with KV events), the policy server, and the subscriber, fires a repeated long prefix, then asserts:

an engine prefix-cache hit (vllm:prefix_cache_hits_total increases), and
the index populated end-to-end (inferencecache_index_entries{model} > 0).

It manages/cleans up the engine container, exits non-zero on failure, and is arch-aware (arm64/x86_64 image tag). Documented in the reference-stack README.

How it's run

On-demand (not a blocking CI gate): it needs Docker, pulls the vLLM CPU image (multi-GB), and a Docker VM with ~10+ GiB RAM. Run locally, or wire into a scheduled/dispatch job on a Docker-capable runner.

Verification

Ran locally end-to-end: engine healthy, requests 200/200, prefix_cache_hits 0→2560, index_entries{model=canary}=20, PASS.

Note

Stacked on #15 (C1 — the kvevent-subscriber the full chain needs). Base is the C1 branch; will retarget to main once C1 merges. make pre-pr green.

scripts/canary_e2e.sh brings up the whole stack on a CPU vLLM engine (no GPU) — engine → ZMQ KV events → kvevent-subscriber → policy server → index — fires a repeated long prefix, and asserts both an engine prefix-cache hit (vllm:prefix_cache_hits increases) and that the server index populated (inferencecache_index_entries{model} > 0). Builds the binaries, manages/cleans up the engine container, and exits non-zero on failure. On-demand (needs Docker + the vLLM CPU image + adequate VM RAM), not a blocking CI gate. Documented in the reference-stack README. Verified locally: prefix_cache_hits 0->2560, index_entries{model=canary}=20, PASS.

github-actions · 2026-05-27T19:15:02Z

Codex review

Blocking
None.

Should-fix
None.

Nit
None.

Verdict
Approve.

I reviewed the full PR diff and surrounding server/subscriber/index code. No vendor-neutral naming violations, proto/CRD contract changes, or fail-open semantic regressions were introduced. Static checks passed: bash -n, make verify-naming, and make verify-no-internal-refs. Shellcheck only reported informational A && B || true cleanup-style notes, not a correctness issue here. Full Docker/vLLM canary was not run in this review environment.

cpu-substrate-canary.yml runs docs/reference-stack/scripts/canary_e2e.sh on a nightly cron and on manual dispatch (with a runner-label input to target a self-hosted Docker host). Not a per-PR gate — it pulls a multi-GB image and needs ~10 GiB RAM. Uploads server/subscriber logs on failure. Depends on cmd/kvevent-subscriber, so it only functions once C1 is on the default branch.

github-actions · 2026-05-27T19:20:53Z

Codex review

Blocking
None.

Should-fix

docs/reference-stack/scripts/canary_e2e.sh: the canary logs the two completion HTTP status codes but never asserts they are 200. A run can still reach the later metric assertions even if one request failed, which weakens the “full-chain works” signal described in the PR body. Capture both codes and fail immediately unless both are 200.

Nit

docs/reference-stack/scripts/canary_e2e.sh: cleanup/log upload only preserves server and subscriber logs. Engine startup/request failures are likely to be the common failure mode, so dumping or saving docker logs "$CONTAINER" on failure would make the scheduled canary much easier to diagnose.

Vendor-neutral check passed for the PR diff; no proto/CRD/core API contract changes detected.

Verdict: changes-requested.

…nxuedward/c1-cpu-e2e-canary

github-actions · 2026-05-27T21:32:42Z

Codex review

Blocking

docs/reference-stack/scripts/canary_e2e.sh: go build -o bin/server and the next line assume bin/ already exists. It is ignored by .gitignore and not tracked, so a fresh GitHub Actions checkout will fail immediately with a missing parent directory. Add mkdir -p bin before building, or build through a target that guarantees the output directory exists. This also breaks the new workflow invocation at .github/workflows/cpu-substrate-canary.yml.

Should-fix

None.

Nit

None.

Verdict: changes-requested.

EdHasNoLife marked this pull request as draft May 27, 2026 19:18

Merge branch 'sunxuedward/cac-21-c1-vllm-kv-event-subscriber' into su…

3b9c857

…nxuedward/c1-cpu-e2e-canary

EdHasNoLife mentioned this pull request May 27, 2026

Add a CPU profile and a kind-based reconciler CPU canary #21

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add full-chain CPU e2e canary for the cache substrate#17

Add full-chain CPU e2e canary for the cache substrate#17
EdHasNoLife wants to merge 3 commits into
sunxuedward/cac-21-c1-vllm-kv-event-subscriberfrom
sunxuedward/c1-cpu-e2e-canary

EdHasNoLife commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EdHasNoLife commented May 27, 2026

Summary

How it's run

Verification

Note

Uh oh!

github-actions Bot commented May 27, 2026

Codex review

Uh oh!

github-actions Bot commented May 27, 2026

Codex review

Uh oh!

github-actions Bot commented May 27, 2026

Codex review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant