A custom Kubernetes Autoscaler.
Instead of using HPA based on CPU or RAM, the K8s controller in this sandbox project monitors real-time application load (Redis Streams depth) to dynamically scale consumer replicas.
This implementation bridges K8s orchestration with application-level concurrency by leveraging:
- Core Engine:
workerpoolhandles in-memory task execution and backpressure within the pods - Ingress Bridge:
redisstreamadapter to pull tasks from Redis Streams into the worker pool
- Infrastructure: Redis-backed task pipeline (Producer + Worker)
- Observability: Live stream metrics pulled via
XINFO GROUPS - Control Loop: Go binary using
k8s.io/client-goto check stream load andPATCHdeployment replicas
This project is configured to deploy directly to a remote K3s cluster (in my case Raspberry Pi 4) isolated within the worker-scaler namespace.
Ensure your KUBECONFIG is pointing to your remote cluster, then execute the full pipeline. This will build the cross-platform images, push them to your registry, create the namespace, and apply all manifests:
make deploy-allTo test how the autoscaler reacts, we need to simulate traffic. The producer component floods the stream with synthetic tasks, artificially inflating the Lag and Pending metrics so we can watch the controller trigger scale-up events in real time:
kubectl apply -n worker-scaler -f manifests/producer-job.yaml# watch the controller logs to see scaling
kubectl logs -n worker-scaler -l app=scaler-controller -f
# worker activity
kubectl logs -n worker-scaler -l app=redis-worker --tail=-1 -f
# inspect stream data
kubectl exec -n worker-scaler -it deployment/redis -- redis-cli XINFO GROUPS orders_streamThese rules govern how the autoscaler calculates demand and protects cluster stability.
The controller evaluates target replicas purely on backlog demand, bounded by safe limits.
-
Backlog: total active work (
$Lag + Pending$ ) -
TasksPerPod: target capacity per pod (Goroutines
$\times$ target efficiency) - Clamp: prevents scaling outside predefined boundaries
Rule: calculate backlog using
XINFO GROUPS.
- Lag: messages sitting in the stream that have never been read
- Pending: msgs read by a worker but not yet acknowledged (
XACK)
Rule: Maximize vertical capacity before scaling horizontally.
- Goroutines (Internal): cheap and fast. Controlled inside the application via
workerpool - Pods (External): expensive and slow to provision. Controlled by the K8s API
- Keep the internal pool optimized for high throughput. Only scale pods horizontally when underlying node resources or network limits saturate.
Keeping at least one pod active eliminates cold-start latency and keeps the Redis connection warm.
To guarantee "at-least-once" processing, a pod must finish its accepted batch before exiting.
- catch
SIGTERM - stop fetching new tasks from the stream
- complete processing for all active in-flight Goroutines
- issue
XACKto Redis for completed tasks, then exit
| Action / Metric | Strategy | Reason |
|---|---|---|
| Metrics Source | XINFO GROUPS |
captures both unread traffic and active in-flight load |
| Sync Period | eval loop every 10s
|
balances responsiveness with API server overhead |
| Scale Down Protection | MinReplicas: 1 |
eliminates cold-start delays and system jitter |
| Cluster Protection | strict MaxReplicas cap |
prevents runaway resource consumption during spikes |
| Pod Exit Lifecycle |
SIGTERM XACK |
prevents orphaned tasks from getting trapped in the PEL |