inference-cache

A Kubernetes-native cache plane for LLM inference.

Repository layout

One operator, split across two binaries plus the CRDs.

CRDs — the API

api/v1alpha1/ — Go types (CacheBackend; CachePolicy, CacheTenant, PromptTemplate, PDTopology, CacheIndex as they land) + generated deepcopy
config/ — generated CRD, RBAC, and sample manifests

inferencecache-controller (cmd/controller) — watches CRDs and provisions cache backends

cmd/controller/ — controller-runtime manager entrypoint
internal/controller/ — reconcilers
pkg/adapters/runtime/ — runtime adapter library: inject backend config into engine pods
pkg/adapters/backend/ — cache-backend provisioning helpers

inferencecache-server (cmd/server) — gRPC policy server + cache-state index + metrics

cmd/server/ — gRPC + HTTP server entrypoint
pkg/server/ — gRPC service (LookupRoute, RenderTemplate, …), health, metrics
proto/ (+ generated stubs) — the gRPC contract
pkg/index/ — cache-state aggregator (CacheIndex)
pkg/render/ — mutable-slot prompt rendering engine (the wedge); importable library
pkg/adapters/engine/ — engine KV-event hook (feeds the index)

Shared — pkg/version/, hack/, dockerfiles/, .githooks/

Quick Start

make proto-gen
make build
make test

Run the server locally:

bin/server --grpc-bind-address=:9090 --http-bind-address=:8080
curl -i http://localhost:8080/healthz   # liveness
curl -i http://localhost:8080/readyz    # readiness
curl -s http://localhost:8080/metrics   # Prometheus metrics (inferencecache_*)

Local Development Cluster

Create a kind cluster for controller development:

make dev-cluster

By default this creates or reuses a cluster named inference-cache. You can override the name and node image:

make dev-cluster KIND_CLUSTER=cache-dev KIND_NODE_IMAGE=kindest/node:v1.31.0

Common Targets

make build: build controller and server binaries
make test: run unit tests
make lint: run gofmt and go vet
make ci-lint: run golangci-lint
make proto-gen: regenerate protobuf Go code
make generate: regenerate Kubernetes deepcopy code
make manifests: regenerate CRD and RBAC manifests
make image-build: build controller and server images

Documentation

Design docs live under docs/:

docs/design/grpc-contract.md — the InferenceCache gRPC service contract (B4)

Contributor guide: CONTRIBUTING.md (layout, naming rule, push/PR gates).

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.githooks		.githooks
.github		.github
api/v1alpha1		api/v1alpha1
cmd		cmd
config		config
dockerfiles		dockerfiles
docs		docs
hack		hack
internal/controller		internal/controller
pkg		pkg
proto/inferencecache/v1alpha1		proto/inferencecache/v1alpha1
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
buf.yaml		buf.yaml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inference-cache

Repository layout

Quick Start

Local Development Cluster

Common Targets

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

inference-cache

Repository layout

Quick Start

Local Development Cluster

Common Targets

Documentation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages