Skip to content

cachebox-project/inference-cache

Repository files navigation

inference-cache

A Kubernetes-native cache plane for LLM inference.

Repository layout

One operator, split across two binaries plus the CRDs.

CRDs — the API

  • api/v1alpha1/ — Go types (CacheBackend; CachePolicy, CacheTenant, PromptTemplate, PDTopology, CacheIndex as they land) + generated deepcopy
  • config/ — generated CRD, RBAC, and sample manifests

inferencecache-controller (cmd/controller) — watches CRDs and provisions cache backends

  • cmd/controller/ — controller-runtime manager entrypoint
  • internal/controller/ — reconcilers
  • pkg/adapters/runtime/ — runtime adapter library: inject backend config into engine pods
  • pkg/adapters/backend/ — cache-backend provisioning helpers

inferencecache-server (cmd/server) — gRPC policy server + cache-state index + metrics

  • cmd/server/ — gRPC + HTTP server entrypoint
  • pkg/server/ — gRPC service (LookupRoute, RenderTemplate, …), health, metrics
  • proto/ (+ generated stubs) — the gRPC contract
  • pkg/index/ — cache-state aggregator (CacheIndex)
  • pkg/render/ — mutable-slot prompt rendering engine (the wedge); importable library
  • pkg/adapters/engine/ — engine KV-event hook (feeds the index)

Sharedpkg/version/, hack/, dockerfiles/, .githooks/

Quick Start

make proto-gen
make build
make test

Run the server locally:

bin/server --grpc-bind-address=:9090 --http-bind-address=:8080
curl -i http://localhost:8080/healthz   # liveness
curl -i http://localhost:8080/readyz    # readiness
curl -s http://localhost:8080/metrics   # Prometheus metrics (inferencecache_*)

Local Development Cluster

Create a kind cluster for controller development:

make dev-cluster

By default this creates or reuses a cluster named inference-cache. You can override the name and node image:

make dev-cluster KIND_CLUSTER=cache-dev KIND_NODE_IMAGE=kindest/node:v1.31.0

Common Targets

  • make build: build controller and server binaries
  • make test: run unit tests
  • make lint: run gofmt and go vet
  • make ci-lint: run golangci-lint
  • make proto-gen: regenerate protobuf Go code
  • make generate: regenerate Kubernetes deepcopy code
  • make manifests: regenerate CRD and RBAC manifests
  • make image-build: build controller and server images

Documentation

Design docs live under docs/:

Contributor guide: CONTRIBUTING.md (layout, naming rule, push/PR gates).

License

Apache-2.0

About

A Kubernetes-native cache plane for LLM inference

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors