Back LookupRoute with an in-memory CacheIndex (B6 engine)#7
Conversation
Implement the index engine half of B6: a concurrent, soft-state cache
aggregator in pkg/index keyed by (tenant, model, hash_scheme,
prefix_hash) → replica entries, plus per-replica stats.
- ReportCacheState ingests authoritative updates (idempotent on
replica/scheme/prefix); PublishEvent applies PREFIX_EVICTED /
REPLICA_UPDATED / ALL_CLEARED deltas (PREFIX_ADDED refreshes known
entries — ReportCacheState is authoritative for additions).
- LookupRoute returns replicas holding the exact prefix within a
matching hash_scheme, ranked by matched-tokens x freshness; no match
fails open with NO_HINT. GetCacheState returns the (tenant, model)
aggregate.
- TTL eviction goroutine (tied to Serve's ctx) + a max-entries cap bound
memory; prefix_hash stays engine-opaque.
- /readyz now consults index.Ready(); /metrics adds index_entries{model}
and lookup_route_calls_total / lookup_route_latency_seconds. Drained
models have their index_entries gauge zeroed rather than left stale.
The CacheIndex CRD + controller status surface is split to CAC-50.
Codex reviewBlocking
Should-fix
Nit
I did not find vendor-neutral naming issues or proto/CRD drift in this PR. I could not run tests because the environment is read-only and Go could not create its build cache. Verdict: changes-requested. |
Address PR review: the code described ReportCacheState ingestion as "authoritative" while behaving as additive/refresh, which is contradictory if read as snapshot semantics. Make the contract explicit instead: CacheStateUpdate is an incremental delta — a prefix's absence from a later update does not remove it. Removals arrive via CacheEvent (PREFIX_EVICTED / ALL_CLEARED) or TTL expiry, matching the engine KV-event model (vLLM BlockStored / BlockRemoved). Documented in code and grpc-contract.md.
|
Thanks — this is a real contract ambiguity, addressed in 430b1b4 by making the semantics explicit rather than switching behavior. The key decision: Why delta (not snapshot-replace):
So the defect you caught was the wording: the code said "authoritative" (reads as snapshot) while behaving as additive/refresh. Fixed by nailing the contract in three places:
Removals are already handled: |
Codex reviewBlocking Should-fix
Nit
Vendor-neutral naming check: no new core identity violations found. I could not run tests because the environment is read-only and Go cannot create a build cache, even with Verdict: changes-requested |
Address PR review: CacheStateUpdate carries a redundant nested stats.replica_id alongside the top-level replica_id that the index keys on. If an adapter mismatches or omits the nested field, GetCacheState would report a replica whose stats payload disagrees with its key. Enforce the top-level replica_id as authoritative in the index (and at the proto translation boundary), so the stored stats always match their key. Regression test covers a deliberately mismatched nested id.
|
Addressed in c8e5642. Should-fix (stats replica id): fixed. The top-level Nit (wrap Recv error): intentionally leaving it raw. The non-EOF |
Codex reviewShould-Fix
Nit
No vendor-neutral naming violations found in the PR diff. No proto/generated-code drift concerns because I could not run tests because the sandbox filesystem is read-only and Go could not create its build cache. Verdict: approve-with-nits. |
Address PR review nits: - grpc-contract.md no longer says LookupRoute et al. are fail-open stubs / out of scope: note that B6 backs LookupRoute, ReportCacheState, PublishEvent, and GetCacheState with the in-memory CacheIndex. - index.New clamps non-positive TTL / sweep interval to defaults so a misconfigured WithSweepInterval(0) can't panic time.NewTicker(0) (and TTL can't divide-by-zero in freshness).
|
Both addressed in adcc348.
|
Codex reviewBlocking
Should-fix
Nit
Vendor-neutral naming looks clean in the changed core surfaces. I could not run tests because the sandbox is read-only and Go cannot create its build cache. Verdict: changes-requested. |
Address PR review: - CacheEvent carries no hash_scheme and prefix_hash is only meaningful within a scheme, so events no longer refresh scheme-specific prefix freshness (that risked keeping one engine's stale entry fresh via another engine's event → a false PREFIX_MATCH). PREFIX_ADDED is now a no-op (ReportCacheState is authoritative for adds/refreshes); REPLICA_UPDATED only refreshes replica stats liveness; PREFIX_EVICTED / ALL_CLEARED removals stay (removal is conservative/fail-soft). - Serve now derives a cancelable context with defer cancel() for index.Start, so the eviction goroutine and Ready() are torn down on any return (including the internal error branch), not just caller ctx done. - Tighten the grpc-contract.md B4/B6 scope text.
|
All three addressed in 8f99dd0.
|
Codex reviewBlocking
Should-fix
Nit
Verdict: changes-requested. I could not run tests: the sandbox is read-only, so |
Address PR review (Blocking): an empty/unspecified hash_scheme would collapse all engines into one compatibility domain, letting an ingest or lookup that omits the tag produce a PREFIX_MATCH on engine-opaque bytes. The index now drops ingest prefixes without a hash_scheme and returns no hint for lookups without one (stats stay scheme-independent). Documented the guarantee in grpc-contract.md; regression test covers both paths.
|
Addressed in ea216a6. Blocking (empty
So a |
Codex reviewBlocking Should-fix
Nit Verification Verdict: changes-requested. |
Address PR review: reportEntries snapshotted per-model counts before acquiring reportMu, so two concurrent reporters could publish out of order and leave a stale inferencecache_index_entries gauge. Take the snapshot while holding reportMu, so reporters are serialized and the last one always publishes the live count (lock order stays reportMu -> i.mu). Adds a concurrent-ingest regression test asserting the final gauge.
|
Addressed in ee089a3. Should-fix (stale |
Codex reviewBlocking Should-Fix Nit Verdict I reviewed the PR diff against the vendor-neutral rule, gRPC contract/design doc, fail-open lookup semantics, and generated-code scope. I did not find issues in the introduced changes. Verification note: I attempted |
Summary
Implements the index-engine half of B6 (CAC-20) — real ingestion + ranked lookups behind the previously fail-open stubs. The
CacheIndexCRD + controller status surface is split to CAC-50 (kept this PR server-only and focused).New package
pkg/index: a concurrent, soft-state aggregator keyed by(tenant, model, hash_scheme, prefix_hash)→ replica entries (+ per-replica stats).ReportCacheStateingests authoritative updates — idempotent on(replica, hash_scheme, prefix_hash).PublishEventappliesPREFIX_EVICTED/REPLICA_UPDATED/ALL_CLEAREDdeltas (PREFIX_ADDEDrefreshes known entries;ReportCacheStateis authoritative for additions).LookupRoutereturns replicas holding the exact prefix within a matchinghash_scheme, ranked by matched-tokens × freshness; no match → empty +NO_HINT(fail open, side-effect-free apart from metrics).GetCacheStatereturns the(tenant, model)aggregate (replica stats + prefix count).Serve's ctx) + a max-entries cap bound memory.prefix_hashstays engine-opaque./readyznow consultsindex.Ready();/metricsaddsinferencecache_index_entries{model},inferencecache_lookup_route_calls_total{model,reason_code,hint_used},inferencecache_lookup_route_latency_seconds{model}. A drained model'sindex_entriesseries is zeroed rather than left stale.Design notes
pkg/indexis decoupled from proto (pure domain types); the service translates proto ↔ index, so the index stays unit-testable in isolation./readyzgating (waiting for initial KV-event sync) arrives with the C1 hook; today the index is ready as soon asServestarts it.Request Flow
A new request arrives whose stable prefix hashes (in vLLM's scheme) to X:
Test plan
make pre-prgreen (naming + buf lint + fmt + vet + test + build + drift)go test ./pkg/... -race— index unit tests (ranking/top-K, idempotency, tenant + hash-scheme isolation, TTL eviction, max-entries cap, drained-gauge zeroing, ready lifecycle) + over-the-wireReportCacheState→LookupRoutePREFIX_MATCHgrpcurl): ingest →PREFIX_MATCH(replica ranked, matched_tokens=128), cross-scheme →NO_HINT,GetCacheStateaggregate,/metricsshowsindex_entries+lookup_route_calls_total