Add vLLM KV-event subscriber → ReportCacheState (C1)#15
Conversation
First slice of the vLLM KV-event subscriber (pkg/adapters/engine): - events.go: decode vLLM's msgpack EventBatch wire format (msgspec array-tagged) into BlockStored/BlockRemoved/AllBlocksCleared. token_ids/parent_block_hash/ lora_id are intentionally never materialized — metadata only. Unknown event tags are skipped (forward-compatible). - mapper.go: translate events to the gRPC contract — BlockStored → CacheStateUpdate (ReportCacheState, additive), BlockRemoved → PREFIX_EVICTED, AllBlocksCleared → ALL_CLEARED. uint64 block hashes encode as 8-byte big-endian prefix_hash bytes; every message stamped with replica/model/tenant/hash_scheme. - config.go: per-replica identity + validation (non-empty hash_scheme, which would otherwise be dropped server-side). - Unit tests for decode (incl. >2^63 hashes, malformed input, unknown-tag skip) and mapping/validation. Adds vmihailenco/msgpack/v5. ZMQ subscriber + gRPC forwarder + cmd entrypoint + live integration follow.
- forwarder.go: Reporter forwards events to the server — BlockStored adds are debounced over a window onto a long-lived ReportCacheState stream; removals go via unary PublishEvent (PREFIX_EVICTED / ALL_CLEARED). Fail-soft: stream reconnects with backoff, errors are logged not propagated, a clear supersedes buffered adds. bufconn integration test against a recording gRPC server. - subscriber.go: ZMQ SUB loop behind a frameSource interface (pure-Go go-zeromq/zmq4), decodes each batch and emits it; reconnects with backoff; malformed batches are dropped. Unit-tested via a fake source (real socket interop verified separately against a live engine). - cmd/kvevent-subscriber: sidecar entrypoint wiring config (engine endpoint, topic, server, replica/model/tenant/hash-scheme, window) + signal-driven graceful drain. - Build the new binary in `make build`; refresh package doc to the sidecar model.
Codex reviewBlocking
Should-fix
Nit
Vendor-neutral naming and proto/CRD scope look fine. I did not run the test suite in this read-only sandbox; Verdict: changes-requested. |
Codex reviewBlocking Should-fix
Nit Vendor-neutral surfaces and proto/CRD contract look clean: no forbidden OCI/Oracle identity leakage, no proto changes, and the adapter stays metadata-only. I could not run tests in this sandbox because the filesystem is read-only and Go could not create a build cache. Verdict: changes-requested |
Codex reviewBlocking Should-Fix
Nit
Verdict I could not run |
… guard - Block hashes are now opaque bytes ([][]byte). vLLM's ExternalBlockHash is a union of bytes and int; the decoder accepts both (binary passes through, int normalizes to 8-byte big-endian). Previously only uint64 was decoded, so a byte-hash engine would have its batches dropped as undecodable. Adds a byte-hash wire test. (Blocking) - Graceful shutdown now truly drains: the reporter runs on a background context and stops by draining a closed `out` channel, so batches already buffered on shutdown are flushed instead of lost to a ctx-cancel race; the final flush reopens the stream under a detached context. Adds a shutdown-flush test. - NewReporter clamps a non-positive window so time.NewTicker can't panic (-window=0). Adds a clamp test.
|
Addressed the Codex review in the latest commit (also rebased onto current main):
Verified locally: |
Codex reviewBlocking Should-fix
Nit
Vendor-neutral naming looks fine: no OCI/Oracle identity leakage, and the vLLM-specific code stays under Verdict: changes-requested. |
…ckoff Codex round 2: - Reporter now uses a bounded per-flush/per-call context for every gRPC op (fresh time-bounded ReportCacheState stream per flush; bounded PublishEvent), so a stalled/unreachable server can't block the loop indefinitely — restores the fail-soft sidecar contract. Removes the long-lived-stream lifecycle. - Decoder rejects a known tag with a truncated tuple (e.g. BlockStored without block_size) instead of indexing token_count=0. Adds a test. - pendTs is reset after each flush (so a later ts<=0 batch keeps "0 = server now"). - NewSubscriber clamps a non-positive backoff (no tight reconnect loop).
|
Round 2 addressed in the latest commit:
Verified: race tests clean, coverage 79.0% ≥ 65%, |
Codex reviewBlocking Should-fix
Nit
Verification Verdict |
Codex round 3 (should-fix + nit): - StoredPrefixes now reports a cumulative token_count per block hash — (i+1)*block_size — since vLLM block hashes chain their parent, so block i's hash covers (i+1) blocks. Constant block_size flattened the matched_tokens ranking signal. Counts within-event only (parent length isn't in the event); never uses token contents. Updated the mapper test to assert [128, 256]. - events.go: note the EventBatch may carry trailing fields (e.g. a data-parallel rank) which the decoder ignores.
|
Round 3 addressed in b838695: |
Codex reviewBlocking
Should-fix
Nit
I did not find vendor-neutral naming, proto/CRD, or metadata-contract violations in the PR. I could not run tests in this environment because the filesystem is read-only and Go cannot create its build cache. Verdict: changes-requested. |
Codex round 4 (blocking): adds are debounced but removals published immediately, so a block stored-then-evicted within one window sent the eviction first and the later flush re-added the evicted prefix (additive adds vs. removal pruning) — leaving a stale index entry until TTL. The BlockRemoved path now flushes pending adds before publishing the eviction, preserving order. Adds an over-the-wire test asserting the add is received before the evict.
|
Round 4 (blocking) addressed in 5126603: the |
Codex reviewBlocking Should-fix
Nit Verdict I did not find vendor-neutral naming violations in the new core identity surfaces, and no proto/CRD/generated-code changes were introduced. I could not run |
Codex should-fix: a BlockStored with block_size <= 0 would forward a zero/negative token_count and could create a PREFIX_MATCH hint with no valid matched tokens. Reject it during decode so the batch fails open instead. Test.
|
Addressed the latest should-fix in aac4782: a |
Codex reviewBlocking Should-fix Nit Verdict: approve. I reviewed the PR diff and surrounding contract/index/server code. The changes stay in the optional engine adapter/sidecar path, do not touch proto/CRD/core identity, preserve metadata-only forwarding, and include focused tests for decode, mapping, forwarding, and subscriber control flow. Verification note: I could not run |
…-vllm-kv-event-subscriber # Conflicts: # go.sum
Codex reviewBlocking Should-fix
Nit Verdict I did not find vendor-neutral naming violations or proto/CRD contract drift in this PR. I could not run tests in this environment because the filesystem is read-only and Go cannot create its build cache, even with |
|
Addressed the should-fix in a686dd4: |
Codex reviewBlocking Should-Fix
Nit Verdict I also checked the vendor-neutral rule against the changed files; no OCI/Oracle core identity issue found. Proto/CRD files are unchanged. I could not run |
A msgpack nil in block_hashes reached reflect-based normalization; guard it with an explicit nil check that returns an error so one malformed engine frame is logged and skipped (fail-soft) rather than risking a decoder panic. Test added.
|
Addressed in 0cdfad2: |
Codex reviewBlocking Should-Fix Nit Verdict I reviewed the PR diff and surrounding contract/index code. The changes stay in the optional engine adapter plus a sidecar command, do not touch proto/CRD surfaces, do not introduce OCI/Oracle naming in the PR diff, preserve metadata-only reporting, and include focused unit/integration coverage for decoding, mapping, forwarding, and subscriber behavior. I could not run tests in this sandbox: |
Codex reviewBlocking Should-fix
Nit
Verdict I could not run tests in this environment because the filesystem is read-only and Go could not initialize the build cache: |
Codex reviewBlocking
Should-fix
Nit
Verdict: changes-requested. |
Summary
Adds the vLLM KV-event subscriber — a sidecar that subscribes to a vLLM engine replica's KV-cache events over ZMQ and reports cache state to the policy server, keeping the B6
CacheIndexlive from real engine events. Metadata only — never tokens or prompt text. Lands inpkg/adapters/engine/+ a newcmd/kvevent-subscriberbinary.events.go— decodes vLLM's msgpackEventBatchwire format (msgspec array-tagged) intoBlockStored/BlockRemoved/AllBlocksCleared.token_ids/parent_block_hash/lora_idare never materialized into the Go types (metadata-only by construction); unknown event tags are skipped (forward-compatible).mapper.go— translates to the merged gRPC contract:BlockStored→CacheStateUpdate(additive, viaReportCacheState),BlockRemoved→PREFIX_EVICTED,AllBlocksCleared→ALL_CLEARED(viaPublishEvent). uint64 block hashes encode as 8-byte big-endianprefix_hash; every message is stamped with replica/model/tenant/hash_scheme.forwarder.go—Reporterdebounces adds over a window onto a long-livedReportCacheStatestream; removals go via unaryPublishEvent. Fail-soft: reconnect-with-backoff, errors logged not propagated, a clear supersedes buffered adds, graceful-shutdown final flush.subscriber.go— ZMQ SUB loop (pure-Gogo-zeromq/zmq4) behind aframeSourceinterface; reconnects with backoff; malformed batches dropped.cmd/kvevent-subscriber— sidecar entrypoint (engine endpoint, topic, server, replica/model/tenant/hash-scheme, window) + signal-driven drain.No proto/CRD changes. Vendor-neutral; new deps:
vmihailenco/msgpack/v5,go-zeromq/zmq4.Verification
Unit / integration (CI):
bufconnto a recording gRPC server (stream adds,PublishEventremovals, clear-supersedes-adds).make pre-prgreen (naming + fmt + vet + test + build + no drift).Live end-to-end (run locally, GPU-free): real vLLM
0.21.0CPU engine → ZMQ →kvevent-subscriber→ policy server → B6 index. Two prefix requests populated the index:inferencecache_index_entries{model="qwen"} 17. Confirms the pure-Go ZMQ ↔ vLLM libzmq interop and the full add path against the real server.Follow-ups (out of scope)
buffer_steps/ROUTER) for gap recovery on reconnect — soft state tolerates loss for now.