# Example Model Servers with Seldon

Follow [docs](https://docs.seldon.io/projects/seldon-core/en/latest/) to install Seldon Core.

In [1]:
!kubectl create namespace seldon || echo "Already exists"

Error from server (AlreadyExists): namespaces "seldon" already exists
Already exists


In [3]:
import json

In [4]:
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [5]:
VERSION=!cat ../version.txt
VERSION=VERSION[0]
VERSION

'1.19.0-dev'

## Serve SKLearn Iris Model

**Note**: Seldon has adopted the industry-standard Open Inference Protocol (OIP) and is no longer maintaining the Seldon and TensorFlow protocols. This transition allows for greater interoperability among various model serving runtimes, such as MLServer. To learn more about implementing OIP for model serving in Seldon Core 1, see [MLServer](https://docs.seldon.ai/mlserver).

We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.

In order to deploy SKLearn artifacts, we can leverage the [pre-packaged SKLearn inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/sklearn.html).
The exposed API can follow either:
- Open Inference Protocol or V2 Protocol.
- Seldon protocol. 




### Default Seldon protocol

To deploy and start serving an SKLearn artifact using Seldon's default protocol, we can use a config like the one below:

In [6]:
%%writetemplate ../servers/sklearnserver/samples/iris.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  predictors:
  - graph:
      name: classifier
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/v{VERSION}/sklearn/iris
    name: default
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG

We can then apply it to deploy it to our Kubernetes cluster.

In [7]:
!kubectl apply -f ../servers/sklearnserver/samples/iris.yaml

seldondeployment.machinelearning.seldon.io/sklearn created


In [9]:
!kubectl wait sdep/sklearn \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/sklearn condition met


Once it's deployed we can send our sklearn model requests

#### REST Requests

In [12]:
from tenacity import retry, stop_after_delay, wait_exponential

@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
       -X POST http://localhost:8004/seldon/seldon/sklearn/api/v1.0/predictions \
       -H "Content-Type: application/json"
    d=json.loads(X[0])
    return d

get_prediction()

{'data': {'names': ['t:0', 't:1', 't:2'],
  'ndarray': [[9.912315378486718e-07,
    0.0007015931307743852,
    0.9992974156376878]]},
 'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}}}

In [11]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="sklearn", namespace="seldon")

2025-12-04 08:44:46.522582: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764837886.539905 3508983 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764837886.545132 3508983 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-04 08:44:46.564514: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [13]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 4))
    assert r.success == True
    return r

get_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 4
    values: 0.0845312565148979
    values: 0.579166945101894
    values: 0.9623785247068104
    values: 0.86666503216397861
  }
}

Response:
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.09019749677529479, 0.26997995460333873, 0.6398225486213664]}}, 'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}}}

#### gRPC Requests

In [14]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 4))
    assert r.success == True
    return r

get_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 4], 'values': [0.9123345796189455, 0.20504276667353116, 0.8092876586267712, 0.10797168995005357]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}}, 'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.19352912388430024, 0.5850512351026881, 0.22141964101301165]}}}

In [15]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
            -rpc-header seldon:sklearn -rpc-header namespace:seldon \
            -plaintext \
            -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
    d=json.loads("".join(X))
    return d

get_prediction()

{'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.19.0-dev'}},
 'data': {'names': ['t:0', 't:1', 't:2'],
  'ndarray': [[9.912315378486718e-07,
    0.0007015931307743852,
    0.9992974156376878]]}}

And delete the model we deployed

In [16]:
!kubectl delete -f ../servers/sklearnserver/samples/iris.yaml

seldondeployment.machinelearning.seldon.io "sklearn" deleted


### Open Inference Protocol or V2 Protocol

For example, we can consider the config below:

In [17]:
%%writetemplate ./resources/iris-sklearn-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: iris
  protocol: v2
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris-0.23.2/lr_model
      name: classifier
    name: default
    replicas: 1

We can then apply it to deploy our model to our Kubernetes cluster.

In [18]:
!kubectl apply -f resources/iris-sklearn-v2.yaml

seldondeployment.machinelearning.seldon.io/sklearn created


In [19]:
!kubectl wait sdep/sklearn \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/sklearn condition met


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [23]:
import requests

from tenacity import retry, stop_after_delay, wait_exponential

@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def get_prediction():
    inference_request = {
        "inputs": [
            {"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}
        ]
    }

    endpoint = "http://localhost:8004/seldon/seldon/sklearn/v2/models/infer"
    response = requests.post(endpoint, json=inference_request)

    assert response.status_code == 200
    return response.json()

response = get_prediction()
print(json.dumps(response, indent=2))

{
  "model_name": "classifier",
  "model_version": "v1",
  "id": "a7dedce6-f64c-408a-8f34-0a429e30833b",
  "parameters": {},
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1,
        1
      ],
      "datatype": "INT64",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        2
      ]
    }
  ]
}


Finally, we can delete the model we deployed.

In [24]:
!kubectl delete -f resources/iris-sklearn-v2.yaml

seldondeployment.machinelearning.seldon.io "sklearn" deleted


## Serve XGBoost Iris Model

In order to deploy XGBoost models, we can leverage the [pre-packaged XGBoost inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/xgboost.html).
The exposed API can follow either:

- The default Seldon protocol. 
- The V2 protocol

### Default Seldon protocol

We can deploy a XGBoost model uploaded to an object store by using the XGBoost model server implementation as shown in the config below:

In [25]:
%%writefile resources/iris-xgboost.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
spec:
  name: iris
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: classifier
    name: default
    replicas: 1

Writing resources/iris-xgboost.yaml


And then we apply it to deploy it to our kubernetes cluster

In [26]:
!kubectl apply -f resources/iris-xgboost.yaml

seldondeployment.machinelearning.seldon.io/xgboost created


In [27]:
!kubectl wait sdep/xgboost \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/xgboost condition met


#### Rest Requests

In [28]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
      -X POST http://localhost:8004/seldon/seldon/xgboost/api/v1.0/predictions \
      -H "Content-Type: application/json"
   d=json.loads(X[0])
   return d

make_prediction()

{'data': {'names': [], 'ndarray': [2.0]},
 'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}}}

In [29]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="xgboost", namespace="seldon")

In [30]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 4))
    assert r.success == True
    return r

make_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 4
    values: 0.4203723887636549
    values: 0.40155950698810738
    values: 0.16298335960382315
    values: 0.965110212738053
  }
}

Response:
{'data': {'names': [], 'tensor': {'shape': [1], 'values': [0.0]}}, 'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}}}

#### gRPC Requests

In [31]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 4))
    assert r.success == True
    return r

make_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 4], 'values': [0.633458304320003, 0.221914946681799, 0.1807787312067315, 0.03487693275107551]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}}, 'data': {'tensor': {'shape': [1], 'values': [0.0]}}}

In [32]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
            -rpc-header seldon:xgboost -rpc-header namespace:seldon \
            -plaintext \
            -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
    d=json.loads("".join(X))
    return d

make_prediction()

{'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.19.0-dev'}},
 'data': {'ndarray': [2]}}

And delete the model we deployed

In [33]:
!kubectl delete -f resources/iris-xgboost.yaml

seldondeployment.machinelearning.seldon.io "xgboost" deleted


### V2 protocol

We can deploy a XGBoost model, exposing an API compatible with v2 protocol by specifying the `protocol` of our `SeldonDeployment` as `v2`.
For example, we can consider the config below:

In [34]:
%%writefile ./resources/iris-xgboost-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost-v2
spec:
  name: iris
  protocol: v2
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: iris
    name: default
    replicas: 1

Writing ./resources/iris-xgboost-v2.yaml


We can then apply it to deploy our model to our Kubernetes cluster.

In [35]:
!kubectl apply -f ./resources/iris-xgboost-v2.yaml

seldondeployment.machinelearning.seldon.io/xgboost-v2 created


In [36]:
!kubectl wait sdep/xgboost-v2 \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/xgboost-v2 condition met


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [38]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    inference_request = {
        "inputs": [
            {"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}
        ]
    }

    endpoint = "http://localhost:8004/seldon/seldon/xgboost-v2/v2/models/infer"
    response = requests.post(endpoint, json=inference_request)
    assert response.status_code == 200
    return response.json()

print(json.dumps(make_prediction(), indent=2))

{
  "model_name": "iris",
  "model_version": "v0.1.0",
  "id": "2e319f5b-4b0b-48d4-91aa-0a8c45c3f3bc",
  "parameters": {},
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1,
        1
      ],
      "datatype": "FP32",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        2.0
      ]
    }
  ]
}


Finally, we can delete the model we deployed.

In [39]:
!kubectl delete -f ./resources/iris-xgboost-v2.yaml

seldondeployment.machinelearning.seldon.io "xgboost-v2" deleted


## Serve Tensorflow MNIST Model
We can deploy a tensorflow model uploaded to an object store by using the
tensorflow model server implementation as the config below.

This notebook contains two examples, one which shows how you can use the
TFServing prepackaged serve with the Seldon Protocol, and a second one which
shows how you can deploy it using the tensorlfow protocol (so you can send
requests of the exact format as you would to a tfserving server).

### Serve Tensorflow MNIST Model with Seldon Protocol

The config file below shows how you can deploy your Tensorflow model which
exposes the Seldon protocol.

In [40]:
%%writefile ./resources/mnist_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  name: mnist
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/mnist-model
      name: mnist-model
      parameters:
        - name: signature_name
          type: STRING
          value: predict_images
        - name: model_name
          type: STRING
          value: mnist-model
        - name: model_input
          type: STRING
          value: images
        - name: model_output
          type: STRING
          value: scores
    name: default
    replicas: 1

Writing ./resources/mnist_rest.yaml


In [41]:
!kubectl apply -f ./resources/mnist_rest.yaml

seldondeployment.machinelearning.seldon.io/tfserving created


In [42]:
!kubectl wait sdep/tfserving \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/tfserving condition met


In [43]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="tfserving", namespace="seldon")

#### REST Request

In [44]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="rest", shape=(1, 784))
    assert r.success == True
    return r

make_prediction()

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 784
    values: 0.70807869209877061
    values: 0.757964801036474
    values: 0.987183276260433
    values: 0.0053221209467174679
    values: 0.60867939682113614
    values: 0.99882004410268055
    values: 0.87887074389099351
    values: 0.33795531517308419
    values: 0.7133529238566918
    values: 0.59299325927648328
    values: 0.51914745678942442
    values: 0.72286789044540745
    values: 0.069179062030471639
    values: 0.15727206874209632
    values: 0.46173938311783813
    values: 0.10277633871152636
    values: 0.10967092801358025
    values: 0.6191969646279295
    values: 0.015162463324234099
    values: 0.66130121510046747
    values: 0.86313736989329726
    values: 0.47957347646389625
    values: 0.69837539015901662
    values: 0.6185380094370404
    values: 0.28626048892210265
    values: 0.24020020098079731
    values: 0.78656598730170812
    values: 0.60145311367662158
    values: 0.8716253

#### gRPC Request

In [45]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
    r = sc.predict(gateway="istio", transport="grpc", shape=(1, 784))
    assert r.success == True
    return r

make_prediction()

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 784], 'values': [0.26120701685911873, 0.6747735987461442, 0.5463491785127622, 0.12510489417095305, 0.5028660800645561, 0.2406770476387079, 0.615085952536505, 0.7301176608965265, 0.02404964707011148, 0.9317752648954618, 0.5269328852587479, 0.42706473563140646, 0.560781446993061, 0.4815380572780623, 0.1450853029398057, 0.5974570624743423, 0.5557924650602404, 0.6673242785497608, 0.7764314992467034, 0.8384663519657212, 0.012783110826785848, 0.7637767299566334, 0.4376964086373618, 0.46159407901403116, 0.01397202357128513, 0.2627738650881558, 0.9112402400772235, 0.9715468880956024, 0.2424954473224129, 0.22871838418856827, 0.18767867030568697, 0.7027874719313539, 0.978302823876292, 0.2539619257361121, 0.4212297617816261, 0.2047723858456557, 0.39334804691375913, 0.3286485115713359, 0.630345397663343, 0.46011173032031116, 0.9444325482536657, 0.17353135040532108, 0.885502884997245, 0.446260546159743, 0.45686974829964055

And delete the model we deployed

In [46]:
!kubectl delete -f ./resources/mnist_rest.yaml

seldondeployment.machinelearning.seldon.io "tfserving" deleted


### Serve Tensorflow Model with Tensorflow protocol

The config file below shows how you can deploy your Tensorflow model which
exposes the Tensorflow protocol.

In [47]:
%%writefile ./resources/halfplustwo_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: hpt
spec:
  name: hpt
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/half_plus_two
      name:  halfplustwo
      parameters:
        - name: model_name
          type: STRING
          value: halfplustwo
    name: default
    replicas: 1

Writing ./resources/halfplustwo_rest.yaml


In [48]:
!kubectl apply -f ./resources/halfplustwo_rest.yaml

seldondeployment.machinelearning.seldon.io/hpt created


In [49]:
!kubectl wait sdep/hpt \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/hpt condition met


In [50]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!curl -s -d '{"instances": [1.0, 2.0, 5.0]}' \
      -X POST http://localhost:8004/seldon/seldon/hpt/v1/models/halfplustwo/:predict \
      -H "Content-Type: application/json"
   d=json.loads("".join(X))
   assert(d["predictions"][0] == 2.5)
   return d

make_prediction()

{'predictions': [2.5, 3.0, 4.5]}

In [51]:
@retry(stop=stop_after_delay(300), wait=wait_exponential(multiplier=1, min=0.5, max=5))
def make_prediction():
   X=!cd ../executor/proto && grpcurl \
      -d '{"model_spec":{"name":"halfplustwo"},"inputs":{"x":{"dtype": 1, "tensor_shape": {"dim":[{"size": 3}]}, "floatVal" : [1.0, 2.0, 3.0]}}}' \
      -rpc-header seldon:hpt -rpc-header namespace:seldon \
      -plaintext -proto ./prediction_service.proto \
      0.0.0.0:8004 tensorflow.serving.PredictionService/Predict
   d=json.loads("".join(X))
   assert(d["outputs"]["x"]["floatVal"][0] == 2.5)
   return d

make_prediction()

{'outputs': {'x': {'dtype': 'DT_FLOAT',
   'tensorShape': {'dim': [{'size': '3'}]},
   'floatVal': [2.5, 3, 3.5]}},
 'modelSpec': {'name': 'halfplustwo',
  'version': '123',
  'signatureName': 'serving_default'}}

In [52]:
!kubectl delete -f ./resources/halfplustwo_rest.yaml

seldondeployment.machinelearning.seldon.io "hpt" deleted


## Serve MLFlow Elasticnet Wines Model

In order to deploy MLflow models, we can leverage the [pre-packaged MLflow inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/mlflow.html).
The exposed API can follow either:

- The default Seldon protocol. 
- The V2 protocol.

### Default Seldon protocol

We can deploy an MLFlow model uploaded to an object store by using the MLFlow model server implementation as the config below:

In [None]:
%%writetemplate ./resources/elasticnet_wine.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  name: wines
  predictors:
  - componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: classifier
          livenessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
    graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://seldon-models/v{VERSION}/mlflow/elasticnet_wine
      name: classifier
    name: default
    replicas: 1

In [23]:
!kubectl apply -f ./resources/elasticnet_wine.yaml -n seldon

seldondeployment.machinelearning.seldon.io/mlflow created


In [24]:
!kubectl wait sdep/mlflow \
  --for=condition=ready \
  --timeout=120s \
  -n seldon

seldondeployment.machinelearning.seldon.io/mlflow condition met


#### REST requests

In [25]:
X=!curl -s -d '{"data": {"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
   -X POST http://localhost:8004/seldon/seldon/mlflow/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)

{'data': {'names': [], 'ndarray': [5.275558760255375]}, 'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}}


In [26]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="mlflow", namespace="seldon")

In [27]:
r = sc.predict(gateway="istio", transport="rest", shape=(1, 11))
print(r)
assert r.success == True

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 11
    values: 0.39279240633315315
    values: 0.30804070327445643
    values: 0.78529499806346414
    values: 0.26345135374877937
    values: 0.17842204517493565
    values: 0.47775754930976766
    values: 0.060324943302633827
    values: 0.26301745164767087
    values: 0.683629604071006
    values: 0.90045141098129777
    values: 0.57310781025162838
  }
}

Response:
{'data': {'names': [], 'tensor': {'shape': [1], 'values': [5.226650247867405]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}}


#### gRPC Requests

In [28]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
         -rpc-header seldon:mlflow -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)

{'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}, 'data': {'ndarray': [5.275558760255375]}}


In [29]:
r = sc.predict(gateway="istio", transport="grpc", shape=(1, 11))
print(r)
assert r.success == True

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 11], 'values': [0.947317933866799, 0.48231711124722054, 0.6004800916452399, 0.833379857465156, 0.8831333949760046, 0.7230304987801658, 0.2410341482134596, 0.5973407475528579, 0.5680036340671593, 0.23218710005947252, 0.2879490396941179]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.19.0-dev'}}, 'data': {'tensor': {'shape': [1], 'values': [5.200483178159554]}}}


In [30]:
!kubectl delete -f ./resources/elasticnet_wine.yaml -n seldon

seldondeployment.machinelearning.seldon.io "mlflow" deleted


### V2 protocol

We can deploy a MLflow model, exposing an API compatible with v2 protocol by specifying the `protocol` of our `SeldonDeployment` as `v2`.
For example, we can consider the config below:

In [None]:
%%writefile ./resources/elasticnet_wine_v2.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  protocol: v2  # Activate v2 protocol
  name: wines
  predictors:
    - graph:
        children: []
        implementation: MLFLOW_SERVER
        modelUri: gs://seldon-models/v1.19.0-dev/mlflow/elasticnet_wine
        name: classifier
      name: default
      replicas: 1

Overwriting ./resources/elasticnet_wine_v2.yaml


In [None]:
!kubectl apply -f ./resources/elasticnet_wine_v2.yaml -n seldon

seldondeployment.machinelearning.seldon.io/mlflow created


In [19]:
!kubectl wait sdep/mlflow \
  --for=condition=ready \
  --timeout=300s \
  -n seldon

seldondeployment.machinelearning.seldon.io/mlflow condition met


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [21]:
import json

import requests

inference_request = {
    "parameters": {"content_type": "pd"},
    "inputs": [
        {
            "name": "fixed acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [7.4],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "volatile acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.7000],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "citric acid",
            "shape": [1],
            "datatype": "FP32",
            "data": [0],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "residual sugar",
            "shape": [1],
            "datatype": "FP32",
            "data": [1.9],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "chlorides",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.076],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "free sulfur dioxide",
            "shape": [1],
            "datatype": "FP32",
            "data": [11],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "total sulfur dioxide",
            "shape": [1],
            "datatype": "FP32",
            "data": [34],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "density",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.9978],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "pH",
            "shape": [1],
            "datatype": "FP32",
            "data": [3.51],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "sulphates",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.56],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "alcohol",
            "shape": [1],
            "datatype": "FP32",
            "data": [9.4],
            "parameters": {"content_type": "np"},
        },
    ],
}

endpoint = "http://localhost:8004/seldon/seldon/mlflow/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok

{
  "model_name": "classifier",
  "model_version": "v1",
  "id": "3def86ed-86fe-4326-bb13-9d1de9c5ac22",
  "parameters": {
    "content_type": "np"
  },
  "outputs": [
    {
      "name": "output-1",
      "shape": [
        1,
        1
      ],
      "datatype": "FP64",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        6.016145744177843
      ]
    }
  ]
}


In [22]:
!kubectl delete -f ./resources/elasticnet_wine_v2.yaml

seldondeployment.machinelearning.seldon.io "mlflow" deleted
