# Shadow Rollout with Seldon and Ambassador

This notebook shows how you can deploy "shadow" deployments to direct traffic not only to the main Seldon Deployment but also to a shadow deployment whose response will be dicarded. This allows you to test new models in a production setting and with production traffic and anlalyse how they perform before putting them live.

These are useful when you want to test a new model or higher latency inference piepline (e.g., with explanation components) with production traffic but without affecting the live deployment.


## Setup Seldon Core

Use the setup notebook to [Setup Cluster](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Setup-Cluster) with [Ambassador Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Ambassador) and [Install Seldon Core](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html#Install-Seldon-Core). Instructions [also online](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

In [1]:
!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists


In [2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "kind-kind" modified.


In [3]:
from IPython.core.magic import register_line_cell_magic


@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, "w") as f:
        f.write(cell.format(**globals()))

In [4]:
VERSION = !cat ../../../version.txt
VERSION = VERSION[0]
VERSION

'1.19.0-dev'

## Launch main model

We will create a very simple Seldon Deployment with a dummy model image `seldonio/mock_classifier:1.0`. This deployment is named `example`.

In [5]:
%%writetemplate ambassador-example-model.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: ambassador-example
spec:
  name: production-model
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: default
    replicas: 1


In [6]:
!kubectl apply -f ambassador-example-model.yaml

seldondeployment.machinelearning.seldon.io/ambassador-example created


In [7]:
!kubectl wait sdep/ambassador-example \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/ambassador-example condition met


### Get predictions

In [8]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="ambassador-example", namespace="seldon")

2025-11-26 12:00:38.949252: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764158438.966441 4083964 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764158438.971737 4083964 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-26 12:00:38.989713: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


#### REST Request

In [9]:

from tenacity import retry, stop_after_delay, wait_exponential

@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="ambassador", transport="rest")
    return r

r = make_prediction()
assert r.success == True
print(r)

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 1
    values: 0.56299089551262638
  }
}

Response:
{'data': {'names': ['proba'], 'tensor': {'shape': [1, 1], 'values': [0.08677404570660194]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mock_classifier:1.19.0-dev'}}}


## Launch Shadow

We will now create a new Seldon Deployment for our Shadow deployment with a new model.

In [10]:
%%writetemplate ambassador-example-model.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: ambassador-example
spec:
  name: shadow-model
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: default
    replicas: 1
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: shadow
    replicas: 1
    shadow: true
    traffic: 100


In [11]:
!kubectl apply -f ambassador-example-model.yaml

seldondeployment.machinelearning.seldon.io/ambassador-example configured


In [12]:
!kubectl wait sdep/ambassador-example \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/ambassador-example condition met


Let's send a bunch of requests to the endpoint.

In [13]:
import time

time.sleep(10) # wait before sending requests
for i in range(10):
    r = sc.predict(gateway="ambassador", transport="rest")

In [14]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_requests_count():
    count = !kubectl logs -l seldon-app==ambassador-example-default -c classifier --tail 1000 | grep "root.predict" | wc -l
    return int(count[0])

time.sleep(10)  # wait for logs to be flushed
default_count = get_requests_count()
print(f"main logs count {default_count}")

assert default_count == 11

main logs count 11


In [15]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_shadow_requests_count():
    count = !kubectl logs -l seldon-app==ambassador-example-shadow -c classifier --tail 1000 | grep "root.predict" | wc -l
    return int(count[0])

time.sleep(10)  # wait for logs to be flushed
shadow_count = get_shadow_requests_count()
print(f"shadow logs count {shadow_count}")

assert shadow_count == 10

shadow logs count 10


## TearDown

In [16]:
!kubectl delete -f ambassador-example-model.yaml

seldondeployment.machinelearning.seldon.io "ambassador-example" deleted
