inferenced-operator

Kubernetes operator that orchestrates fleets of inferenced daemons across Apple Silicon hosts.

inferenced-operator watches two custom resources:

InferenceHost — one per inferenced daemon in your fleet. Declares the daemon's HTTP endpoint and labels.
InferenceModel — declares a model that should be served by the cluster. Has a host selector + replica count.

For each InferenceModel, the operator picks compatible hosts, calls each chosen daemon's POST /admin/models to load the model, and creates a Kubernetes Service + EndpointSlice so cluster pods can reach the fleet under a single in-cluster DNS name like qwen-3b.inferenced.svc.cluster.local:11434.

                          ┌──────────────────────────────────────────┐
   InferenceModel ─────►  │        inferenced-operator               │
   InferenceHost  ─────►  │   (kube-rs Controller, Rust)             │
                          └────────────────┬─────────────────────────┘
                                           │ HTTP /admin/models
                                           ▼
                ┌──────────────────────────────────────────────────┐
                │  inferenced  ║  inferenced  ║  inferenced        │
                │  (mac-1)     ║  (mac-2)     ║  (mac-3)           │
                └─────┬────────╨──────┬───────╨──────┬─────────────┘
                      │ Apple GPU     │              │
                      ▼               ▼              ▼
                  Metal/MLX       Metal/MLX      Metal/MLX

The operator publishes a Service named after each InferenceModel:

kubectl apply -f examples/host-amelia.yaml
kubectl apply -f examples/model-qwen-3b.yaml

kubectl get inferencemodel
# NAME      MODEL                                       REPLICAS  READY  AGE
# qwen-3b   mlx-community/Qwen2.5-3B-Instruct-4bit      2         2      30s

kubectl run -it --rm chat --image=alpine -- sh
> apk add curl
> curl -s qwen-3b.inferenced.svc.cluster.local:11434/v1/chat/completions \
    -H 'content-type: application/json' \
    -d '{"model":"mlx-community/Qwen2.5-3B-Instruct-4bit","messages":[{"role":"user","content":"hi"}]}'

Documentation


Architecture	The reconcile loops, what the operator owns, what the daemon owns.
Installation	Helm chart install + RBAC + CRD installation.
CRD Reference	Full schema for `InferenceHost` and `InferenceModel`, with examples.
Examples	Working YAML for hosts + models.
Development	Building, running locally against a kubeconfig, contributing.
Troubleshooting	Common reconcile failures and how to diagnose.

Quick install (Helm)

# 1. Install the operator (CRDs included).
helm install inferenced-operator \
  oci://ghcr.io/dormlab/charts/inferenced-operator \
  --namespace inferenced \
  --create-namespace

# 2. Tell it about your hosts.
kubectl apply -f examples/host-amelia.yaml

# 3. Declare a model.
kubectl apply -f examples/model-qwen-3b.yaml

# 4. Wait for ready.
kubectl get inferencemodel -w

Quick install (raw manifests, without Helm)

# CRDs.
kubectl apply -f crds/

# Operator (use your own image or build from source).
kubectl create namespace inferenced
kubectl apply -n inferenced -f deploy/  # see docs/installation.md

Status

v0.1 — InferenceHost and InferenceModel CRDs, two reconcile loops, owner-references on Service and EndpointSlice for proper GC. Tests cover CRD parsing and reconcile error paths.

Roadmap in docs/architecture.md#roadmap.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
charts/inferenced-operator		charts/inferenced-operator
crds		crds
docs		docs
examples		examples
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inferenced-operator

Documentation

Quick install (Helm)

Quick install (raw manifests, without Helm)

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

inferenced-operator

Documentation

Quick install (Helm)

Quick install (raw manifests, without Helm)

Status

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages