# Autoscaling Seldon Deployments


## Prerequisites
 
- The cluster should have `metric-server` running in the `kube-system` namespace
- For Kind install `../../testing/scripts/metrics.yaml` See https://github.com/kubernetes-sigs/kind/issues/398
- For Minikube run:
    
    ```
    minikube addons enable metrics-server
    ```
    

## Setup Seldon Core

Use the setup notebook to [Setup Cluster](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Setup-Cluster) with [Ambassador Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Ambassador) and [Install Seldon Core](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Install-Seldon-Core). Instructions [also online](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

In [1]:
!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists


In [2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "kind-kind" modified.


## Create model with v2beta1 autoscaler

To create a model with an HorizontalPodAutoscaler there are three steps:


  1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory, e.g.:
  
```
          resources:
            requests:
              cpu: '0.5'
     
```
     
  1. Add an v2beta1 HPA Spec referring to this Deployment, e.g.:
  
```
    - hpaSpec:
        maxReplicas: 3
        minReplicas: 1
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 10
          type: Resource

```

The full SeldonDeployment spec is shown below.

In [3]:
!pygmentize model_with_hpa_v2beta1.yaml

[94mapiVersion[39;49;00m:[37m [39;49;00mmachinelearning.seldon.io/v1[37m[39;49;00m
[94mkind[39;49;00m:[37m [39;49;00mSeldonDeployment[37m[39;49;00m
[94mmetadata[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m[94mname[39;49;00m:[37m [39;49;00mseldon-model[37m[39;49;00m
[94mspec[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m[94mname[39;49;00m:[37m [39;49;00mtest-deployment[37m[39;49;00m
[37m  [39;49;00m[94mpredictors[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m-[37m [39;49;00m[94mcomponentSpecs[39;49;00m:[37m[39;49;00m
[37m    [39;49;00m-[37m [39;49;00m[94mhpaSpec[39;49;00m:[37m[39;49;00m
[37m        [39;49;00m[94mmaxReplicas[39;49;00m:[37m [39;49;00m3[37m[39;49;00m
[37m        [39;49;00m[94mmetrics[39;49;00m:[37m[39;49;00m
[37m        [39;49;00m-[37m [39;49;00m[94mresource[39;49;00m:[37m[39;49;00m
[37m            [39;49;00m[94mname[39;49;00m:[37m [39;49;00mcpu[37m[39;49;00m
[37m            [39;49;00m[94mtargetA

In [4]:
!kubectl create -f model_with_hpa_v2beta1.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created


In [5]:
!kubectl wait sdep/seldon-model \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/seldon-model condition met


### Create Load

We label some nodes for the loadtester. We attempt the first two as for Kind the first node shown will be the master.

In [6]:
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') role=locust

node/kind-control-plane not labeled


In [8]:
!helm install loadtester ../../../helm-charts/seldon-core-loadtesting  \
    --set locust.host=http://seldon-model-example:8000 \
    --set oauth.enabled=false \
    --set locust.hatchRate=1 \
    --set locust.clients=1 \
    --set loadtest.sendFeedback=0 \
    --set locust.minWait=0 \
    --set locust.maxWait=0 \
    --set replicaCount=1

NAME: loadtester
LAST DEPLOYED: Thu Nov 20 09:58:26 2025
NAMESPACE: seldon
STATUS: deployed
REVISION: 1
TEST SUITE: None


After a few mins you should see the deployment `my-dep` scaled to 3 deployments

In [9]:
import json
import time


def getNumberPods():
    dp = !kubectl get deployment seldon-model-example-0-classifier -o json
    dp = json.loads("".join(dp))
    return dp["status"]["replicas"]


scaled = False
for i in range(60):
    pods = getNumberPods()
    print(pods)
    if pods > 1:
        scaled = True
        break
    time.sleep(5)
assert scaled

1
1
1
1
1
1
1
1
3


In [10]:
!kubectl get pods,deployments,hpa

NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/graph-metadata-grpc-example-0-node-one-node-two-98d867d8c-nzrzj   3/3     Running   0          5m5s
pod/locust-master-1-hp6rw                                             1/1     Running   0          64s
pod/locust-slave-1-gbh28                                              1/1     Running   0          64s
pod/seldon-model-example-0-classifier-8666ff588b-462vd                2/2     Running   0          2m26s
pod/seldon-model-example-0-classifier-8666ff588b-cwsw7                0/2     Running   0          11s
pod/seldon-model-example-0-classifier-8666ff588b-zsbs9                0/2     Running   0          11s
pod/simplest-77ff97f74b-8r5hh                                         1/1     Running   0          19h
pod/simplest-agent-daemonset-vqmzq                                    1/1     Running   0          19h

NAME                                                              REA

In [11]:
!helm delete loadtester -n seldon

release "loadtester" uninstalled


In [12]:
!kubectl delete -f model_with_hpa_v2beta1.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted


## Create model with v2 autoscaler

To create a model with an HorizontalPodAutoscaler there are three steps:


  1. Ensure you have a resource request for the metric you want to scale on if it is a standard metric such as cpu or memory, e.g.:
  
```
          resources:
            requests:
              cpu: '0.5'
     
```
     
  1. Add an v2beta1 HPA Spec referring to this Deployment, e.g.:
  
```
    - hpaSpec:
        maxReplicas: 3
        minReplicas: 1
        metricsv2:
        - resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 10
          type: Resource
```

The full SeldonDeployment spec is shown below.

In [13]:
!pygmentize model_with_hpa_v2.yaml

[94mapiVersion[39;49;00m:[37m [39;49;00mmachinelearning.seldon.io/v1[37m[39;49;00m
[94mkind[39;49;00m:[37m [39;49;00mSeldonDeployment[37m[39;49;00m
[94mmetadata[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m[94mname[39;49;00m:[37m [39;49;00mseldon-model[37m[39;49;00m
[94mspec[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m[94mname[39;49;00m:[37m [39;49;00mtest-deployment[37m[39;49;00m
[37m  [39;49;00m[94mpredictors[39;49;00m:[37m[39;49;00m
[37m  [39;49;00m-[37m [39;49;00m[94mcomponentSpecs[39;49;00m:[37m[39;49;00m
[37m    [39;49;00m-[37m [39;49;00m[94mhpaSpec[39;49;00m:[37m[39;49;00m
[37m        [39;49;00m[94mmaxReplicas[39;49;00m:[37m [39;49;00m3[37m[39;49;00m
[37m        [39;49;00m[94mmetricsv2[39;49;00m:[37m[39;49;00m
[37m        [39;49;00m-[37m [39;49;00m[94mresource[39;49;00m:[37m[39;49;00m
[37m            [39;49;00m[94mname[39;49;00m:[37m [39;49;00mcpu[37m[39;49;00m
[37m            [39;49;00m[94mtarge

In [14]:
!kubectl create -f model_with_hpa_v2.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created


In [15]:
!kubectl wait sdep/seldon-model \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/seldon-model condition met


### Create Load

We label some nodes for the loadtester. We attempt the first two as for Kind the first node shown will be the master.

In [16]:
!kubectl label nodes $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') role=locust

node/kind-control-plane not labeled


In [17]:
!helm install loadtester ../../../helm-charts/seldon-core-loadtesting  \
    --set locust.host=http://seldon-model-example:8000 \
    --set oauth.enabled=false \
    --set locust.hatchRate=1 \
    --set locust.clients=1 \
    --set loadtest.sendFeedback=0 \
    --set locust.minWait=0 \
    --set locust.maxWait=0 \
    --set replicaCount=1

NAME: loadtester
LAST DEPLOYED: Thu Nov 20 10:00:41 2025
NAMESPACE: seldon
STATUS: deployed
REVISION: 1
TEST SUITE: None


After a few mins you should see the deployment `my-dep` scaled to 3 deployments

In [18]:
import json
import time


def getNumberPods():
    dp = !kubectl get deployment seldon-model-example-0-classifier -o json
    dp = json.loads("".join(dp))
    return dp["status"]["replicas"]


scaled = False
for i in range(60):
    pods = getNumberPods()
    print(pods)
    if pods > 1:
        scaled = True
        break
    time.sleep(5)
assert scaled

1
1
1
1
1
1
1
1
3


In [19]:
!kubectl get pods,deployments,hpa

NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/graph-metadata-grpc-example-0-node-one-node-two-98d867d8c-nzrzj   3/3     Running   0          7m13s
pod/locust-master-1-zf5d8                                             1/1     Running   0          58s
pod/locust-slave-1-bh59m                                              1/1     Running   0          58s
pod/seldon-model-example-0-classifier-77b6bcfcb7-l26fl                0/2     Running   0          17s
pod/seldon-model-example-0-classifier-77b6bcfcb7-p5c7w                0/2     Running   0          17s
pod/seldon-model-example-0-classifier-77b6bcfcb7-t59rx                2/2     Running   0          92s
pod/simplest-77ff97f74b-8r5hh                                         1/1     Running   0          19h
pod/simplest-agent-daemonset-vqmzq                                    1/1     Running   0          19h

NAME                                                              READ

In [20]:
!helm delete loadtester -n seldon

release "loadtester" uninstalled


In [21]:
!kubectl delete -f model_with_hpa_v2.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted
