A Kubernetes-native cache plane for LLM inference.
One operator, split across two binaries plus the CRDs.
CRDs — the API
api/v1alpha1/— Go types (CacheBackend;CachePolicy,CacheTenant,PromptTemplate,PDTopology,CacheIndexas they land) + generated deepcopyconfig/— generated CRD, RBAC, and sample manifests
inferencecache-controller (cmd/controller) — watches CRDs and provisions cache backends
cmd/controller/— controller-runtime manager entrypointinternal/controller/— reconcilerspkg/adapters/runtime/— runtime adapter library: inject backend config into engine podspkg/adapters/backend/— cache-backend provisioning helpers
inferencecache-server (cmd/server) — gRPC policy server + cache-state index + metrics
cmd/server/— gRPC + HTTP server entrypointpkg/server/— gRPC service (LookupRoute,RenderTemplate, …), health, metricsproto/(+ generated stubs) — the gRPC contractpkg/index/— cache-state aggregator (CacheIndex)pkg/render/— mutable-slot prompt rendering engine (the wedge); importable librarypkg/adapters/engine/— engine KV-event hook (feeds the index)
Shared — pkg/version/, hack/, dockerfiles/, .githooks/
make proto-gen
make build
make testRun the server locally:
bin/server --grpc-bind-address=:9090 --http-bind-address=:8080
curl -i http://localhost:8080/healthz # liveness
curl -i http://localhost:8080/readyz # readiness
curl -s http://localhost:8080/metrics # Prometheus metrics (inferencecache_*)Create a kind cluster for controller development:
make dev-clusterBy default this creates or reuses a cluster named inference-cache. You can override the name and node image:
make dev-cluster KIND_CLUSTER=cache-dev KIND_NODE_IMAGE=kindest/node:v1.31.0make build: build controller and server binariesmake test: run unit testsmake lint: run gofmt and go vetmake ci-lint: run golangci-lintmake proto-gen: regenerate protobuf Go codemake generate: regenerate Kubernetes deepcopy codemake manifests: regenerate CRD and RBAC manifestsmake image-build: build controller and server images
Design docs live under docs/:
docs/design/grpc-contract.md— theInferenceCachegRPC service contract (B4)
Contributor guide: CONTRIBUTING.md (layout, naming rule, push/PR gates).
Apache-2.0