CastSlice

"Stop burning GPU dollars. Start slicing."

🔴 The Problem in 2026

In the era of ubiquitous AI, GPU scarcity is no longer the only bottleneck—GPU Waste is. Most development, CI/CD, and inference workloads request a full NVIDIA GPU but utilize less than 15% of its hardware capability.

Cloud Bills: You pay for 100% of a GPU while your workloads use a fraction.
Scheduling Bottlenecks: Pending Pods waiting for a "Full GPU" while existing GPUs sit idle.
Developer Friction: Teams manually editing YAMLs to share resources.

🟢 The CastSlice Solution

CastSlice is a lightweight, non-invasive Kubernetes Mutating Webhook that automatically converts "Whole GPU" requests into "Fractional/Shared GPU" slices based on smart policy.

It sits in your K8s Control Plane, intercepts Pod creation, and performs on-the-fly resource transformation—without changing a single line of your application code.

✨ Key Features

Feature	The "Old" Way	The CastSlice Way
Cost	Full GPU per Pod	Shared GPU across multiple Pods
Concurrency	1 Pod per GPU	Multiple Pods per GPU
Developer UX	Manual YAML changes	Zero-touch. Just add an annotation.
Vendor Lock-in	Locked to specific CSP tools	Cloud Agnostic. Works on EKS, GKE, AKS, or On-prem.

🛠 How It Works

CastSlice transparently rewrites nvidia.com/gpu resource requests into nvidia.com/gpu-shared resource requests for Pods that opt in via an annotation.

Pod CREATE request
       │
       ▼
 Kubernetes API server
       │ (forwards to webhook)
       ▼
 CastSlice webhook
       │
       ├── castops.io/optimize: "true" annotation present?
       │        │ YES                       │ NO
       │        ▼                           ▼
       │  resolve slice ratio           allow unchanged
       │  (slice-ratio > workload-type > default: 1)
       │        │
       │  remove nvidia.com/gpu
       │  add    nvidia.com/gpu-shared: <ratio>
       │        │
       ▼        ▼
 JSON Patch returned → Pod scheduled with shared GPU

Annotations:

Annotation	Value	Effect
`castops.io/optimize`	`"true"`	Enable GPU slice optimization (required)
`castops.io/workload-type`	`training` / `inference` / `batch` / `dev`	Select preset slice ratio
`castops.io/slice-ratio`	`"N"` (positive integer)	Override slice count directly

Preset ratios by workload type:

Workload Type	GPU Slices	Use Case
`training`	4	Model training jobs — higher GPU share
`inference`	2	Serving / Triton inference servers
`batch`	2	Batch preprocessing and feature extraction
`dev`	1	Development and debugging (default)

🚀 Quick Start

Prerequisites

Go 1.24+
A Kubernetes cluster (KinD / Minikube for local testing)
cert-manager for TLS certificate injection

1. Install CastSlice

# Install cert-manager (if not already present)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# Wait for cert-manager to be ready
kubectl rollout status deployment/cert-manager -n cert-manager

# Deploy CastSlice
kubectl apply -f https://github.com/castops/cast-slice/releases/latest/download/install.yaml

# Create the TLS certificate for the webhook (issued by cert-manager)
kubectl apply -f config/cert/certificate.yaml

# Wait for the webhook pod to be ready
kubectl rollout status deployment/cast-slice -n cast-slice

2. Deploy an Optimized Workload

Add the castops.io/optimize annotation and optionally specify the workload type:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama-inference
spec:
  template:
    metadata:
      annotations:
        castops.io/optimize: "true"
        castops.io/workload-type: "inference"  # → gpu-shared: 2
    spec:
      containers:
      - name: ollama
        image: ollama/ollama
        resources:
          limits:
            nvidia.com/gpu: 1 # CastSlice rewrites this based on workload type

For fine-grained control, use an explicit ratio:

annotations:
  castops.io/optimize: "true"
  castops.io/slice-ratio: "8"   # explicit override → gpu-shared: 8

3. Verify It's Working

# Check the mutated pod
kubectl get pod -o yaml | grep gpu-shared
# training workload: nvidia.com/gpu-shared: "4"
# inference workload: nvidia.com/gpu-shared: "2"
# dev workload (default): nvidia.com/gpu-shared: "1"

📊 Metrics & Monitoring (v0.3.0)

CastSlice exposes Prometheus metrics on :8080/metrics via the standard controller-runtime metrics server.

Exposed Metrics

Metric	Type	Description
`castslice_requests_total`	Counter	Total admission requests processed
`castslice_mutations_total`	Counter	Pods mutated with GPU slice rewrites
`castslice_noop_total`	Counter	Pods allowed without mutation
`castslice_errors_total`	Counter	Requests rejected with an error

Access the Metrics Endpoint

# Port-forward the metrics service
kubectl port-forward svc/cast-slice-metrics 8080:8080 -n cast-slice

# Scrape metrics
curl http://localhost:8080/metrics | grep castslice

Example output:

# HELP castslice_errors_total Total number of admission requests rejected with an error.
# TYPE castslice_errors_total counter
castslice_errors_total 0
# HELP castslice_mutations_total Total number of Pods mutated with GPU slice rewrites.
# TYPE castslice_mutations_total counter
castslice_mutations_total 42
# HELP castslice_noop_total Total number of Pods allowed without mutation (no annotation or no GPU limits).
# TYPE castslice_noop_total counter
castslice_noop_total 158
# HELP castslice_requests_total Total number of admission requests processed by the CastSlice webhook.
# TYPE castslice_requests_total counter
castslice_requests_total 200

Grafana Dashboard

A ready-to-use Grafana dashboard is included at config/monitoring/grafana-dashboard.yaml. It provides 5 panels:

Webhook Request Rate — overall admission throughput
GPU Slice Mutations Rate — how many Pods per second get GPU sharing enabled
No-op Rate — Pods passing through unchanged
Error Rate — invalid annotation rejections (alert if non-zero)
Mutation Efficiency — fraction of requests resulting in a GPU slice (higher = more GPU sharing)

Import via kubectl (auto-loads if kube-prometheus-stack sidecar dashboards are enabled):

kubectl apply -f config/monitoring/grafana-dashboard.yaml

Manual import: Grafana → Dashboards → Import → paste the JSON from the ConfigMap's castslice-finops-dashboard.json key.

Prometheus Scrape Configuration

The cast-slice-metrics Service is deployed with standard Prometheus annotations (prometheus.io/scrape: "true") so node-based Prometheus auto-discovery picks it up automatically. No additional scrape config is required for most setups.

📁 Project Structure

cast-slice/
├── main.go                          # Manager + Webhook registration
├── TODOS.md                         # Planned improvements and deferred work
├── internal/
│   └── webhook/
│       ├── pod_webhook.go           # Mutating webhook handler
│       ├── metrics.go               # Prometheus counter definitions
│       └── pod_webhook_test.go      # Unit tests
├── config/
│   ├── deploy/deployment.yaml       # Namespace, SA, Deployment, Services (webhook + metrics)
│   ├── webhook/mutating_webhook.yaml# MutatingWebhookConfiguration
│   └── monitoring/
│       └── grafana-dashboard.yaml   # FinOps Grafana dashboard ConfigMap
└── docs/
    ├── local-testing.md             # How to test without a real GPU
    ├── node-mock.yaml               # Mock node labels
    └── test-pod.yaml                # Test Pod that triggers the webhook

🧪 Development

Build from Source

go build -o cast-slice .

Run Tests

go test ./...

Manual Deployment

# Apply workload manifests
kubectl apply -f config/deploy/deployment.yaml
kubectl apply -f config/webhook/mutating_webhook.yaml

Local Testing Without a GPU

See docs/local-testing.md for a step-by-step guide on mocking GPU capacity and validating webhook behavior.

🏗 Roadmap

v0.1.0: Basic Mutating Webhook (Static Slicing).
v0.2.0: Smart Slicing (Dynamic ratios based on workload type).
v0.3.0: FinOps Dashboard (Live GPU utilization metrics).
v0.4.0: Policy Engine (Namespace-level and label-based rules).
v0.5.0: Multi-GPU Support (Cross-node GPU sharing).

🤝 Contributing

We're looking for FinOps-minded engineers to help optimize GPU infrastructure for the AI era.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.

Built by CastOps - Engineering the Future of AI Infrastructure.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
config		config
docs		docs
internal/webhook		internal/webhook
website		website
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TODOS.md		TODOS.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CastSlice

🔴 The Problem in 2026

🟢 The CastSlice Solution

✨ Key Features

🛠 How It Works

🚀 Quick Start

Prerequisites

1. Install CastSlice

2. Deploy an Optimized Workload

3. Verify It's Working

📊 Metrics & Monitoring (v0.3.0)

Exposed Metrics

Access the Metrics Endpoint

Grafana Dashboard

Prometheus Scrape Configuration

📁 Project Structure

🧪 Development

Build from Source

Run Tests

Manual Deployment

Local Testing Without a GPU

🏗 Roadmap

🤝 Contributing

📄 License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CastSlice

🔴 The Problem in 2026

🟢 The CastSlice Solution

✨ Key Features

🛠 How It Works

🚀 Quick Start

Prerequisites

1. Install CastSlice

2. Deploy an Optimized Workload

3. Verify It's Working

📊 Metrics & Monitoring (v0.3.0)

Exposed Metrics

Access the Metrics Endpoint

Grafana Dashboard

Prometheus Scrape Configuration

📁 Project Structure

🧪 Development

Build from Source

Run Tests

Manual Deployment

Local Testing Without a GPU

🏗 Roadmap

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages