# Example Model Servers with Seldon

Follow [docs](https://docs.seldon.io/projects/seldon-core/en/latest/) to install Seldon Core.

In [1]:
!kubectl create namespace seldon || echo "Already exists"

Error from server (AlreadyExists): namespaces "seldon" already exists
Already exists


In [2]:
import json

In [3]:
from IPython.core.magic import register_line_cell_magic


@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, "w") as f:
        f.write(cell.format(**globals()))

In [4]:
VERSION = !cat ../version.txt
VERSION = VERSION[0]
VERSION

'1.19.0-dev'

## Serve SKLearn Iris Model

**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).

We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.

In order to deploy SKLearn artifacts, we can leverage the [pre-packaged SKLearn inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/sklearn.html).
The exposed API can follow either:
- Open Inference Protocol or V2 Protocol.
- Seldon protocol. 




### Default Seldon protocol

To deploy and start serving an SKLearn artifact using Seldon's default protocol, we can use a config like the one below:

In [5]:
%%writetemplate ../servers/sklearnserver/samples/iris.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  predictors:
  - graph:
      name: classifier
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/v{VERSION}/sklearn/iris
    name: default
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG

We can then apply it to deploy it to our Kubernetes cluster.

In [6]:
!kubectl apply -f ../servers/sklearnserver/samples/iris.yaml -n seldon

seldondeployment.machinelearning.seldon.io/sklearn created


In [7]:
!kubectl wait sdep/sklearn \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/sklearn condition met


Once it's deployed we can send our sklearn model requests

#### REST Requests

In [8]:
from tenacity import retry, stop_after_delay, wait_exponential

@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
       -X POST http://localhost:8004/seldon/seldon/sklearn/api/v1.0/predictions \
       -H "Content-Type: application/json"
    d=json.loads(X[0])
    return d

get_prediction()

{'data': {'names': ['t:0', 't:1', 't:2'],
  'ndarray': [[9.912315378486718e-07,
    0.0007015931307743852,
    0.9992974156376878]]},
 'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}}}

In [9]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="sklearn", namespace="seldon")

2025-12-05 19:32:20.038788: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764963140.056111 2338205 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764963140.061490 2338205 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-05 19:32:20.080759: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [10]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 4))
    assert r.success == True
    return r


get_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 4
    values: 0.14787194124962422
    values: 0.32198446931492986
    values: 0.33941049815591862
    values: 0.76553379633129714
  }
}

Response:
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.2198365064139343, 0.30168774346547544, 0.47847575012059024]}}, 'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}}}

#### gRPC Requests

In [11]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 4))
    assert r.success == True
    return r


get_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 4], 'values': [0.17106294799117205, 0.07662539230757592, 0.7878235218109609, 0.49791035829811037]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}}, 'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.08252667562106618, 0.41591785926004404, 0.5015554651188898]}}}

In [12]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
            -rpc-header seldon:sklearn -rpc-header namespace:seldon \
            -plaintext \
            -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
    d=json.loads("".join(X))
    return d

get_prediction()

{'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}},
 'data': {'names': ['t:0', 't:1', 't:2'],
  'ndarray': [[9.912315378486718e-07,
    0.0007015931307743852,
    0.9992974156376878]]}}

And delete the model we deployed

In [13]:
!kubectl delete -f ../servers/sklearnserver/samples/iris.yaml -n seldon

seldondeployment.machinelearning.seldon.io "sklearn" deleted


### Open Inference Protocol or V2 Protocol

For example, we can consider the config below:

In [14]:
%%writetemplate ./resources/iris-sklearn-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: iris
  protocol: v2
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris-0.23.2/lr_model
      name: classifier
    name: default
    replicas: 1

We can then apply it to deploy our model to our Kubernetes cluster.

In [15]:
!kubectl apply -f resources/iris-sklearn-v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io/sklearn created


In [16]:
!kubectl wait sdep/sklearn \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/sklearn condition met


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [17]:
import requests
from tenacity import retry, stop_after_delay, wait_exponential


@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    inference_request = {
        "inputs": [
            {
                "name": "predict",
                "shape": [1, 4],
                "datatype": "FP32",
                "data": [[1, 2, 3, 4]],
            }
        ]
    }

    endpoint = "http://localhost:8004/seldon/seldon/sklearn/v2/models/infer"
    response = requests.post(endpoint, json=inference_request)

    assert response.status_code == 200
    return response.json()


response = get_prediction()
print(json.dumps(response, indent=2))

{
  "model_name": "classifier",
  "model_version": "v1",
  "id": "f6d4be09-639a-4874-9311-ffec15d2eda8",
  "parameters": {},
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1,
        1
      ],
      "datatype": "INT64",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        2
      ]
    }
  ]
}


Finally, we can delete the model we deployed.

In [18]:
!kubectl delete -f resources/iris-sklearn-v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io "sklearn" deleted


## Serve XGBoost Iris Model

In order to deploy XGBoost models, we can leverage the [pre-packaged XGBoost inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/xgboost.html).
The exposed API can follow either:

- The default Seldon protocol. 
- The V2 protocol

### Default Seldon protocol

We can deploy a XGBoost model uploaded to an object store by using the XGBoost model server implementation as shown in the config below:

In [19]:
%%writefile resources/iris-xgboost.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
spec:
  name: iris
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: classifier
    name: default
    replicas: 1

Overwriting resources/iris-xgboost.yaml


And then we apply it to deploy it to our kubernetes cluster

In [20]:
!kubectl apply -f resources/iris-xgboost.yaml -n seldon

seldondeployment.machinelearning.seldon.io/xgboost created


In [21]:
!kubectl wait sdep/xgboost \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/xgboost condition met


#### Rest Requests

In [22]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
      -X POST http://localhost:8004/seldon/seldon/xgboost/api/v1.0/predictions \
      -H "Content-Type: application/json"
   d=json.loads(X[0])
   return d

make_prediction()

{'data': {'names': [], 'ndarray': [2.0]},
 'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}}}

In [23]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="xgboost", namespace="seldon")

In [24]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 4))
    assert r.success == True
    return r


make_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 4
    values: 0.92469335535301134
    values: 0.97894376239504333
    values: 0.59700550857578927
    values: 0.91746990073558743
  }
}

Response:
{'data': {'names': [], 'tensor': {'shape': [1], 'values': [0.0]}}, 'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}}}

#### gRPC Requests

In [25]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 4))
    assert r.success == True
    return r


make_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 4], 'values': [0.17500398214513357, 0.23679446418379912, 0.607292745959222, 0.395723978746674]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}}, 'data': {'tensor': {'shape': [1], 'values': [0.0]}}}

In [26]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
            -rpc-header seldon:xgboost -rpc-header namespace:seldon \
            -plaintext \
            -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
    d=json.loads("".join(X))
    return d

make_prediction()

{'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}},
 'data': {'ndarray': [2]}}

And delete the model we deployed

In [27]:
!kubectl delete -f resources/iris-xgboost.yaml -n seldon

seldondeployment.machinelearning.seldon.io "xgboost" deleted


### V2 protocol

We can deploy a XGBoost model, exposing an API compatible with v2 protocol by specifying the `protocol` of our `SeldonDeployment` as `v2`.
For example, we can consider the config below:

In [28]:
%%writefile ./resources/iris-xgboost-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost-v2
spec:
  name: iris
  protocol: v2
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: iris
    name: default
    replicas: 1

Overwriting ./resources/iris-xgboost-v2.yaml


We can then apply it to deploy our model to our Kubernetes cluster.

In [29]:
!kubectl apply -f ./resources/iris-xgboost-v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io/xgboost-v2 created


In [30]:
!kubectl wait sdep/xgboost-v2 \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/xgboost-v2 condition met


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [31]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    inference_request = {
        "inputs": [
            {
                "name": "predict",
                "shape": [1, 4],
                "datatype": "FP32",
                "data": [[1, 2, 3, 4]],
            }
        ]
    }

    endpoint = "http://localhost:8004/seldon/seldon/xgboost-v2/v2/models/infer"
    response = requests.post(endpoint, json=inference_request)
    assert response.status_code == 200
    return response.json()


print(json.dumps(make_prediction(), indent=2))

{
  "model_name": "iris",
  "model_version": "v0.1.0",
  "id": "4dcd8b9d-1fb4-4ba6-92a2-d91b4f8f1d6f",
  "parameters": {},
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1,
        1
      ],
      "datatype": "FP32",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        2.0
      ]
    }
  ]
}


Finally, we can delete the model we deployed.

In [32]:
!kubectl delete -f ./resources/iris-xgboost-v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io "xgboost-v2" deleted


## Serve Tensorflow MNIST Model
We can deploy a tensorflow model uploaded to an object store by using the
tensorflow model server implementation as the config below.

This notebook contains two examples, one which shows how you can use the
TFServing prepackaged serve with the Seldon Protocol, and a second one which
shows how you can deploy it using the tensorlfow protocol (so you can send
requests of the exact format as you would to a tfserving server).

### Serve Tensorflow MNIST Model with Seldon Protocol

The config file below shows how you can deploy your Tensorflow model which
exposes the Seldon protocol.

In [33]:
%%writefile ./resources/mnist_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  name: mnist
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/mnist-model
      name: mnist-model
      parameters:
        - name: signature_name
          type: STRING
          value: predict_images
        - name: model_name
          type: STRING
          value: mnist-model
        - name: model_input
          type: STRING
          value: images
        - name: model_output
          type: STRING
          value: scores
    name: default
    replicas: 1

Overwriting ./resources/mnist_rest.yaml


In [34]:
!kubectl apply -f ./resources/mnist_rest.yaml -n seldon

seldondeployment.machinelearning.seldon.io/tfserving created


In [35]:
!kubectl wait sdep/tfserving \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/tfserving condition met


In [36]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="tfserving", namespace="seldon")

#### REST Request

In [37]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 784))
    assert r.success == True
    return r


make_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 784
    values: 0.31730886528229751
    values: 0.007801243769958055
    values: 0.49521745744268142
    values: 0.43602393012906826
    values: 0.21307895332513527
    values: 0.86628591656716847
    values: 0.15658265468352639
    values: 0.23779594848768226
    values: 0.05611141276379461
    values: 0.39863536963145696
    values: 0.83059947140581758
    values: 0.11047747367349947
    values: 0.80064602682196573
    values: 0.89120530350130767
    values: 0.78315035103034947
    values: 0.8559918117096269
    values: 0.61434364374151385
    values: 0.10748413531692247
    values: 0.82108993178840151
    values: 0.27943881704192564
    values: 0.588204122576824
    values: 0.35556354927313194
    values: 0.087056372552643646
    values: 0.57493491130152841
    values: 0.35497992011834689
    values: 0.31643983804212494
    values: 0.51165313686546776
    values: 0.86771488625404813
    values: 0.87954

#### gRPC Request

In [38]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 784))
    assert r.success == True
    return r


make_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 784], 'values': [0.7936773152319514, 0.06807525274986692, 0.4066587608943436, 0.7372724658158328, 0.08231395850425904, 0.6904584940474869, 0.7994788183638157, 0.9082402384532205, 0.8453120423607171, 0.38351260194195413, 0.45549333626291477, 0.9974876492483501, 0.3506423191105281, 0.5594988173120975, 0.34467672791053794, 0.963790509010875, 0.2683848236209404, 0.6803337959972149, 0.5137462905780573, 0.3870692655036201, 0.5793124553977409, 0.9035787734402588, 0.35970224324561406, 0.8666379513063275, 0.2140931650886856, 0.9144821612682054, 0.8567643436110552, 0.9549034782830401, 0.5681284033508572, 0.5925420513660424, 0.9296548639351296, 0.6712893678764253, 0.9477325194977938, 0.8504762389692805, 0.7982633643800328, 0.07440296339355612, 0.4442099848099216, 0.17114867483125806, 0.8933423955525915, 0.16078144862304233, 0.2604313763070444, 0.23918930614009204, 0.5355462113720195, 0.6128491759471206, 0.380370031548716

And delete the model we deployed

In [39]:
!kubectl delete -f ./resources/mnist_rest.yaml -n seldon

seldondeployment.machinelearning.seldon.io "tfserving" deleted


### Serve Tensorflow Model with Tensorflow protocol

The config file below shows how you can deploy your Tensorflow model which
exposes the Tensorflow protocol.

In [40]:
%%writefile ./resources/halfplustwo_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: hpt
spec:
  name: hpt
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/half_plus_two
      name:  halfplustwo
      parameters:
        - name: model_name
          type: STRING
          value: halfplustwo
    name: default
    replicas: 1

Overwriting ./resources/halfplustwo_rest.yaml


In [41]:
!kubectl apply -f ./resources/halfplustwo_rest.yaml -n seldon

seldondeployment.machinelearning.seldon.io/hpt created


In [42]:
!kubectl wait sdep/hpt \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/hpt condition met


In [43]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!curl -s -d '{"instances": [1.0, 2.0, 5.0]}' \
      -X POST http://localhost:8004/seldon/seldon/hpt/v1/models/halfplustwo/:predict \
      -H "Content-Type: application/json"
   d=json.loads("".join(X))
   assert(d["predictions"][0] == 2.5)
   return d

make_prediction()

{'predictions': [2.5, 3.0, 4.5]}

In [44]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!cd ../executor/proto && grpcurl \
      -d '{"model_spec":{"name":"halfplustwo"},"inputs":{"x":{"dtype": 1, "tensor_shape": {"dim":[{"size": 3}]}, "floatVal" : [1.0, 2.0, 3.0]}}}' \
      -rpc-header seldon:hpt -rpc-header namespace:seldon \
      -plaintext -proto ./prediction_service.proto \
      0.0.0.0:8004 tensorflow.serving.PredictionService/Predict
   d=json.loads("".join(X))
   assert(d["outputs"]["x"]["floatVal"][0] == 2.5)
   return d

make_prediction()

{'outputs': {'x': {'dtype': 'DT_FLOAT',
   'tensorShape': {'dim': [{'size': '3'}]},
   'floatVal': [2.5, 3, 3.5]}},
 'modelSpec': {'name': 'halfplustwo',
  'version': '123',
  'signatureName': 'serving_default'}}

In [45]:
!kubectl delete -f ./resources/halfplustwo_rest.yaml -n seldon

seldondeployment.machinelearning.seldon.io "hpt" deleted


## Serve MLFlow Elasticnet Wines Model

In order to deploy MLflow models, we can leverage the [pre-packaged MLflow inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/mlflow.html).
The exposed API can follow either:

- The default Seldon protocol. 
- The V2 protocol.

### Default Seldon protocol

We can deploy an MLFlow model uploaded to an object store by using the MLFlow model server implementation as the config below:

In [46]:
%%writetemplate ./resources/elasticnet_wine.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  name: wines
  predictors:
  - componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: classifier
          livenessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
    graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://seldon-models/v{VERSION}/mlflow/elasticnet_wine
      name: classifier
    name: default
    replicas: 1

In [47]:
!kubectl apply -f ./resources/elasticnet_wine.yaml -n seldon

seldondeployment.machinelearning.seldon.io/mlflow created


In [48]:
!kubectl wait sdep/mlflow \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/mlflow condition met


#### REST requests

In [49]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!curl -s -d '{"data": {"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
      -X POST http://localhost:8004/seldon/seldon/mlflow/api/v1.0/predictions \
      -H "Content-Type: application/json"
   d=json.loads(X[0])
   return d

make_prediction()

{'data': {'names': [], 'ndarray': [5.275558760255375]},
 'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}}

In [50]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="mlflow", namespace="seldon")

In [51]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 11))
    assert r.success == True
    return r


make_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 11
    values: 0.8966108067974875
    values: 0.37550041364676123
    values: 0.57545189884807013
    values: 0.010278591412713922
    values: 0.48100392009063631
    values: 0.60901404777494739
    values: 0.48850027144684305
    values: 0.72479941629254541
    values: 0.16320618504982587
    values: 0.35789387810004469
    values: 0.59777331626506691
  }
}

Response:
{'data': {'names': [], 'tensor': {'shape': [1], 'values': [5.228265131394911]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}}

#### gRPC Requests

In [52]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
            -rpc-header seldon:mlflow -rpc-header namespace:seldon \
            -plaintext \
            -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
    d=json.loads("".join(X))
    return d

make_prediction()

{'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}},
 'data': {'ndarray': [5.275558760255375]}}

In [53]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 11))
    assert r.success == True
    return r


make_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 11], 'values': [0.7489959664575474, 0.04255962515445855, 0.07747823532166664, 0.2337930266543109, 0.09490371717575774, 0.6545193298514377, 0.5194049659811394, 0.5216506599075572, 0.6492525538169563, 0.7144466316568587, 0.5908586797895121]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}, 'data': {'tensor': {'shape': [1], 'values': [5.2278422746819055]}}}

In [54]:
!kubectl delete -f ./resources/elasticnet_wine.yaml -n seldon

seldondeployment.machinelearning.seldon.io "mlflow" deleted


### V2 protocol

We can deploy a MLflow model, exposing an API compatible with v2 protocol by specifying the `protocol` of our `SeldonDeployment` as `v2`.
For example, we can consider the config below:

In [55]:
%%writetemplate ./resources/elasticnet_wine_v2.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  protocol: v2  # Activate v2 protocol
  name: wines
  predictors:
    - graph:
        children: []
        implementation: MLFLOW_SERVER
        modelUri: gs://seldon-models/v{VERSION}/mlflow/elasticnet_wine
        name: classifier
      name: default
      replicas: 1

In [56]:
!kubectl apply -f ./resources/elasticnet_wine_v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io/mlflow created


In [57]:
!kubectl wait sdep/mlflow \
  --for=condition=ready \
  --timeout=300s \
  -n seldon

seldondeployment.machinelearning.seldon.io/mlflow condition met


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [59]:
import json

import requests


@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    inference_request = {
        "parameters": {"content_type": "pd"},
        "inputs": [
            {
                "name": "fixed acidity",
                "shape": [1],
                "datatype": "FP32",
                "data": [7.4],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "volatile acidity",
                "shape": [1],
                "datatype": "FP32",
                "data": [0.7000],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "citric acid",
                "shape": [1],
                "datatype": "FP32",
                "data": [0],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "residual sugar",
                "shape": [1],
                "datatype": "FP32",
                "data": [1.9],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "chlorides",
                "shape": [1],
                "datatype": "FP32",
                "data": [0.076],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "free sulfur dioxide",
                "shape": [1],
                "datatype": "FP32",
                "data": [11],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "total sulfur dioxide",
                "shape": [1],
                "datatype": "FP32",
                "data": [34],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "density",
                "shape": [1],
                "datatype": "FP32",
                "data": [0.9978],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "pH",
                "shape": [1],
                "datatype": "FP32",
                "data": [3.51],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "sulphates",
                "shape": [1],
                "datatype": "FP32",
                "data": [0.56],
                "parameters": {"content_type": "np"},
            },
            {
                "name": "alcohol",
                "shape": [1],
                "datatype": "FP32",
                "data": [9.4],
                "parameters": {"content_type": "np"},
            },
        ],
    }

    endpoint = "http://localhost:8004/seldon/seldon/mlflow/v2/models/infer"
    response = requests.post(endpoint, json=inference_request)

    assert response.status_code == 200

    return response.json()


print(json.dumps(make_prediction(), indent=2))

{
  "model_name": "classifier",
  "model_version": "v1",
  "id": "f913f957-31a3-4566-b68d-c4d05f970d75",
  "parameters": {
    "content_type": "np"
  },
  "outputs": [
    {
      "name": "output-1",
      "shape": [
        1,
        1
      ],
      "datatype": "FP64",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        6.016145744177843
      ]
    }
  ]
}


In [60]:
!kubectl delete -f ./resources/elasticnet_wine_v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io "mlflow" deleted
