# Example Model Servers with Seldon

Follow [docs](https://docs.seldon.io/projects/seldon-core/en/latest/) to install Seldon Core.

In [1]:
!kubectl create namespace seldon || echo "Already exists"

Error from server (AlreadyExists): namespaces "seldon" already exists
Already exists


In [2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "kind-kind" modified.


In [3]:
import json

## Serve SKLearn Iris Model

**Note**: Seldon is no longer maintaining the Seldon and TensorFlow protocols. Instead, Seldon is adopting the industry-standard Open Inference Protocol (OIP). As part of this transition, you need to use [MLServer](https://github.com/SeldonIO/MLServer) for model serving in Seldon Core 1.

We strongly encourage you to adopt the OIP, which provides seamless integration across diverse model serving runtimes, supports the development of versatile client and benchmarking tools, and ensures a high-performance, consistent, and unified inference experience.

In order to deploy SKLearn artifacts, we can leverage the [pre-packaged SKLearn inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/sklearn.html).
The exposed API can follow either:
- Open Inference Protocol or V2 Protocol.
- Seldon protocol. 




### Default Seldon protocol

To deploy and start serving an SKLearn artifact using Seldon's default protocol, we can use a config like the one below:

In [4]:
%%writefile ../servers/sklearnserver/samples/iris.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  predictors:
  - graph:
      name: classifier
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/v1.19.0-dev/sklearn/iris
    name: default
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG

Overwriting ../servers/sklearnserver/samples/iris.yaml


We can then apply it to deploy it to our Kubernetes cluster.

In [5]:
!kubectl apply -f ../servers/sklearnserver/samples/iris.yaml

seldondeployment.machinelearning.seldon.io/sklearn created


In [6]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=sklearn -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "sklearn-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "sklearn-default-0-classifier" successfully rolled out


Once it's deployed we can send our sklearn model requests

#### REST Requests

In [7]:
X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
   -X POST http://localhost:8004/seldon/seldon/sklearn/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)

{'data': {'names': ['t:0', 't:1', 't:2'], 'ndarray': [[9.912315378486718e-07, 0.0007015931307743852, 0.9992974156376878]]}, 'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.18.0'}}}


In [8]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="sklearn", namespace="seldon")

2024-01-10 16:09:43.411334: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-10 16:09:43.448748: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-10 16:09:43.527779: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-10 16:09:43.528677: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [9]:
r = sc.predict(gateway="istio", transport="rest", shape=(1, 4))
print(r)
assert r.success == True

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 4
    values: 0.17141173176497349
    values: 0.5937726115406986
    values: 0.3305595820782128
    values: 0.2046623631664879
  }
}

Response:
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.41804496555456905, 0.3930727629239069, 0.1888822715215241]}}, 'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.18.0'}}}


#### gRPC Requests

In [10]:
r = sc.predict(gateway="istio", transport="grpc", shape=(1, 4))
print(r)
assert r.success == True

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 4], 'values': [0.39156951217983293, 0.9277883706101252, 0.965137780400803, 0.7889296899083068]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.18.0'}}, 'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.17859883395119414, 0.2418372547184021, 0.5795639113304039]}}}


In [11]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
         -rpc-header seldon:sklearn -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)

{'meta': {'requestPath': {'classifier': 'seldonio/sklearnserver:1.18.0'}}, 'data': {'names': ['t:0', 't:1', 't:2'], 'ndarray': [[9.912315378486718e-07, 0.0007015931307743852, 0.9992974156376878]]}}


And delete the model we deployed

In [12]:
!kubectl delete -f ../servers/sklearnserver/samples/iris.yaml

seldondeployment.machinelearning.seldon.io "sklearn" deleted


### V2 protocol

For example, we can consider the config below:

In [13]:
%%writefile ./resources/iris-sklearn-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: iris
  protocol: v2
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris-0.23.2/lr_model
      name: classifier
    name: default
    replicas: 1

Overwriting ./resources/iris-sklearn-v2.yaml


We can then apply it to deploy our model to our Kubernetes cluster.

In [14]:
!kubectl apply -f resources/iris-sklearn-v2.yaml

seldondeployment.machinelearning.seldon.io/sklearn created


In [15]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=sklearn -o jsonpath='{.items[0].metadata.name}')

deployment "sklearn-default-0-classifier" successfully rolled out


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [16]:
import json

import requests

inference_request = {
    "inputs": [
        {"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}
    ]
}

endpoint = "http://localhost:8004/seldon/seldon/sklearn/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok

{
  "model_name": "classifier",
  "model_version": "v1",
  "id": "99c6d14e-4e8b-4458-b41c-6d2922c544a4",
  "parameters": {},
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1,
        1
      ],
      "datatype": "INT64",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        2
      ]
    }
  ]
}


Finally, we can delete the model we deployed.

In [17]:
!kubectl delete -f resources/iris-sklearn-v2.yaml

seldondeployment.machinelearning.seldon.io "sklearn" deleted


## Serve XGBoost Iris Model

In order to deploy XGBoost models, we can leverage the [pre-packaged XGBoost inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/xgboost.html).
The exposed API can follow either:

- The default Seldon protocol. 
- The V2 protocol

### Default Seldon protocol

We can deploy a XGBoost model uploaded to an object store by using the XGBoost model server implementation as shown in the config below:

In [18]:
%%writefile resources/iris.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
spec:
  name: iris
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: classifier
    name: default
    replicas: 1

Overwriting resources/iris.yaml


And then we apply it to deploy it to our kubernetes cluster

In [19]:
!kubectl apply -f resources/iris.yaml

seldondeployment.machinelearning.seldon.io/xgboost created


In [20]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=xgboost -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "xgboost-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "xgboost-default-0-classifier" successfully rolled out


#### Rest Requests

In [21]:
X=!curl -s -d '{"data": {"ndarray":[[1.0, 2.0, 5.0, 6.0]]}}' \
   -X POST http://localhost:8004/seldon/seldon/xgboost/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)

{'data': {'names': [], 'ndarray': [2.0]}, 'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.18.0'}}}


In [22]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="xgboost", namespace="seldon")

In [23]:
r = sc.predict(gateway="istio", transport="rest", shape=(1, 4))
print(r)
assert r.success == True

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 4
    values: 0.5059863560669698
    values: 0.6819976066125761
    values: 0.8325416286359293
    values: 0.2531200764967788
  }
}

Response:
{'data': {'names': [], 'tensor': {'shape': [1], 'values': [0.0]}}, 'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.18.0'}}}


#### gRPC Requests

In [24]:
r = sc.predict(gateway="istio", transport="grpc", shape=(1, 4))
print(r)
assert r.success == True

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 4], 'values': [0.3402484582438542, 0.7075322398328269, 0.2491942933114213, 0.08837167380879685]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.18.0'}}, 'data': {'tensor': {'shape': [1], 'values': [0.0]}}}


In [25]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[1.0,2.0,5.0,6.0]]}}' \
         -rpc-header seldon:xgboost -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)

{'meta': {'requestPath': {'classifier': 'seldonio/xgboostserver:1.18.0'}}, 'data': {'ndarray': [2]}}


And delete the model we deployed

In [26]:
!kubectl delete -f resources/iris.yaml

seldondeployment.machinelearning.seldon.io "xgboost" deleted


### V2 protocol

We can deploy a XGBoost model, exposing an API compatible with v2 protocol by specifying the `protocol` of our `SeldonDeployment` as `v2`.
For example, we can consider the config below:

In [27]:
%%writefile ./resources/iris-xgboost-v2.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: xgboost
spec:
  name: iris
  protocol: v2
  predictors:
  - graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://seldon-models/xgboost/iris
      name: iris
    name: default
    replicas: 1

Overwriting ./resources/iris-xgboost-v2.yaml


We can then apply it to deploy our model to our Kubernetes cluster.

In [28]:
!kubectl apply -f ./resources/iris-xgboost-v2.yaml

seldondeployment.machinelearning.seldon.io/xgboost created


In [29]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=xgboost -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "xgboost-default-0-iris" rollout to finish: 0 of 1 updated replicas are available...
deployment "xgboost-default-0-iris" successfully rolled out


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [30]:
import json

import requests

inference_request = {
    "inputs": [
        {"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[1, 2, 3, 4]]}
    ]
}

endpoint = "http://localhost:8004/seldon/seldon/xgboost/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok

{
  "model_name": "iris",
  "model_version": "v0.1.0",
  "id": "3cab5b54-2ab7-4fe4-b9a2-7dd0f5346e2f",
  "parameters": {},
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1,
        1
      ],
      "datatype": "FP32",
      "parameters": {
        "content_type": "np"
      },
      "data": [
        2.0
      ]
    }
  ]
}


Finally, we can delete the model we deployed.

In [31]:
!kubectl delete -f ./resources/iris-xgboost-v2.yaml

seldondeployment.machinelearning.seldon.io "xgboost" deleted


## Serve Tensorflow MNIST Model
We can deploy a tensorflow model uploaded to an object store by using the
tensorflow model server implementation as the config below.

This notebook contains two examples, one which shows how you can use the
TFServing prepackaged serve with the Seldon Protocol, and a second one which
shows how you can deploy it using the tensorlfow protocol (so you can send
requests of the exact format as you would to a tfserving server).

### Serve Tensorflow MNIST Model with Seldon Protocol

The config file below shows how you can deploy your Tensorflow model which
exposes the Seldon protocol.

In [32]:
%%writefile ./resources/mnist_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  name: mnist
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/mnist-model
      name: mnist-model
      parameters:
        - name: signature_name
          type: STRING
          value: predict_images
        - name: model_name
          type: STRING
          value: mnist-model
        - name: model_input
          type: STRING
          value: images
        - name: model_output
          type: STRING
          value: scores
    name: default
    replicas: 1

Overwriting ./resources/mnist_rest.yaml


In [33]:
!kubectl apply -f ./resources/mnist_rest.yaml

seldondeployment.machinelearning.seldon.io/tfserving created


In [34]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=tfserving -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "tfserving-default-0-mnist-model" rollout to finish: 0 of 1 updated replicas are available...
deployment "tfserving-default-0-mnist-model" successfully rolled out


In [35]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="tfserving", namespace="seldon")

#### REST Request

In [36]:
r = sc.predict(gateway="istio", transport="rest", shape=(1, 784))
print(r)
assert r.success == True

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 784
    values: 0.40042030856645994
    values: 0.12706617370273143
    values: 0.5995010949374973
    values: 0.20673616604277123
    values: 0.267965079749103
    values: 0.28732441282638177
    values: 0.05431953974251236
    values: 0.9386931014283844
    values: 0.7783730270524005
    values: 0.33817086214875713
    values: 0.29145765680083213
    values: 0.44285879591906574
    values: 0.6551587968022324
    values: 0.31238417218286163
    values: 0.4256316305209036
    values: 0.6511456989870854
    values: 0.5006871383565527
    values: 0.4686249869286161
    values: 0.12155327610393096
    values: 0.37477288407329035
    values: 0.8231317995094778
    values: 0.48279144040735
    values: 0.6486814039080503
    values: 0.4618496078306792
    values: 0.849852223594817
    values: 0.5694862005802025
    values: 0.6927834049569376
    values: 0.7441208124168749
    values: 0.06791460371138147
    val

#### gRPC Request

In [37]:
r = sc.predict(gateway="istio", transport="grpc", shape=(1, 784))
print(r)
assert r.success == True

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 784], 'values': [0.36172822444251884, 0.4699208601130421, 0.04599955578582804, 0.34137886436611387, 0.40651901318089734, 0.11725893073124771, 0.08081417032125349, 0.6187537605124053, 0.6572519990872054, 0.22070654283129476, 0.14889592272489427, 0.6104922652730083, 0.10322906851201841, 0.27183310335921584, 0.5855684888514012, 0.42078354827751796, 0.672782225323545, 0.2211601879314843, 0.5768984691728823, 0.42885906901903037, 0.40884498929123125, 0.08394924428324924, 0.24106290579477363, 0.6005980762674409, 0.3923047694739721, 0.8139942032757727, 0.6311803028136543, 0.4017185415933836, 0.6317912433120589, 0.8611012609994128, 0.2575202918955518, 0.9212772313103703, 0.4970618587075495, 0.22865387697765416, 0.4863167559295244, 0.9278355712441783, 0.9985559720177586, 0.7497805206325209, 0.2175860779635055, 0.2512634254597744, 0.9202650545933119, 0.46307333534983375, 0.911638722228198, 0.10659921520125704, 0.70650090

And delete the model we deployed

In [38]:
!kubectl delete -f ./resources/mnist_rest.yaml

seldondeployment.machinelearning.seldon.io "tfserving" deleted


### Serve Tensorflow Model with Tensorflow protocol

The config file below shows how you can deploy your Tensorflow model which
exposes the Tensorflow protocol.

In [39]:
%%writefile ./resources/halfplustwo_rest.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: hpt
spec:
  name: hpt
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: gs://seldon-models/tfserving/half_plus_two
      name:  halfplustwo
      parameters:
        - name: model_name
          type: STRING
          value: halfplustwo
    name: default
    replicas: 1

Overwriting ./resources/halfplustwo_rest.yaml


In [40]:
!kubectl apply -f ./resources/halfplustwo_rest.yaml

seldondeployment.machinelearning.seldon.io/hpt created


In [41]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=hpt -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "hpt-default-0-halfplustwo" rollout to finish: 0 of 1 updated replicas are available...
deployment "hpt-default-0-halfplustwo" successfully rolled out


In [42]:
import json
X=!curl -s -d '{"instances": [1.0, 2.0, 5.0]}' \
   -X POST http://localhost:8004/seldon/seldon/hpt/v1/models/halfplustwo/:predict \
   -H "Content-Type: application/json"
d=json.loads("".join(X))
print(d)
assert(d["predictions"][0] == 2.5)

{'predictions': [2.5, 3.0, 4.5]}


In [43]:
X=!cd ../executor/proto && grpcurl \
   -d '{"model_spec":{"name":"halfplustwo"},"inputs":{"x":{"dtype": 1, "tensor_shape": {"dim":[{"size": 3}]}, "floatVal" : [1.0, 2.0, 3.0]}}}' \
   -rpc-header seldon:hpt -rpc-header namespace:seldon \
   -plaintext -proto ./prediction_service.proto \
   0.0.0.0:8004 tensorflow.serving.PredictionService/Predict
d=json.loads("".join(X))
print(d)
assert(d["outputs"]["x"]["floatVal"][0] == 2.5)

{'outputs': {'x': {'dtype': 'DT_FLOAT', 'tensorShape': {'dim': [{'size': '3'}]}, 'floatVal': [2.5, 3, 3.5]}}, 'modelSpec': {'name': 'halfplustwo', 'version': '123', 'signatureName': 'serving_default'}}


In [44]:
!kubectl delete -f ./resources/halfplustwo_rest.yaml

seldondeployment.machinelearning.seldon.io "hpt" deleted


## Serve MLFlow Elasticnet Wines Model

In order to deploy MLflow models, we can leverage the [pre-packaged MLflow inference server](https://docs.seldon.io/projects/seldon-core/en/latest/servers/mlflow.html).
The exposed API can follow either:

- The default Seldon protocol. 
- The V2 protocol.

### Default Seldon protocol

We can deploy an MLFlow model uploaded to an object store by using the MLFlow model server implementation as the config below:

In [45]:
%%writefile ./resources/elasticnet_wine.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  name: wines
  predictors:
  - componentSpecs:
    - spec:
        # We are setting high failureThreshold as installing conda dependencies
        # can take long time and we want to avoid k8s killing the container prematurely
        containers:
        - name: classifier
          livenessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
          readinessProbe:
            initialDelaySeconds: 80
            failureThreshold: 200
            periodSeconds: 5
            successThreshold: 1
            httpGet:
              path: /health/ping
              port: http
              scheme: HTTP
    graph:
      children: []
      implementation: MLFLOW_SERVER
      modelUri: gs://seldon-models/v1.18.0/mlflow/elasticnet_wine
      name: classifier
    name: default
    replicas: 1

Overwriting ./resources/elasticnet_wine.yaml


In [46]:
!kubectl apply -f ./resources/elasticnet_wine.yaml

seldondeployment.machinelearning.seldon.io/mlflow created


In [47]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=mlflow -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "mlflow-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "mlflow-default-0-classifier" successfully rolled out


#### REST requests

In [48]:
X=!curl -s -d '{"data": {"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
   -X POST http://localhost:8004/seldon/seldon/mlflow/api/v1.0/predictions \
   -H "Content-Type: application/json"
d=json.loads(X[0])
print(d)

{'data': {'names': [], 'ndarray': [5.275558760255375]}, 'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.18.0'}}}


In [49]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(deployment_name="mlflow", namespace="seldon")

In [50]:
r = sc.predict(gateway="istio", transport="rest", shape=(1, 11))
print(r)
assert r.success == True

Success:True message:
Request:
meta {
}
data {
  tensor {
    shape: 1
    shape: 11
    values: 0.5653756616147143
    values: 0.8420723980045103
    values: 0.5062962018951862
    values: 0.5278571532571091
    values: 0.8664864216170011
    values: 0.45973619467487525
    values: 0.9263453669947618
    values: 0.6150362295808135
    values: 0.5089439556920178
    values: 0.9282257119213217
    values: 0.7953848591020709
  }
}

Response:
{'data': {'names': [], 'tensor': {'shape': [1], 'values': [5.244154959197477]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.18.0'}}}


#### gRPC Requests

In [51]:
X=!cd ../executor/proto && grpcurl -d '{"data":{"ndarray":[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1]]}}' \
         -rpc-header seldon:mlflow -rpc-header namespace:seldon \
         -plaintext \
         -proto ./prediction.proto  0.0.0.0:8004 seldon.protos.Seldon/Predict
d=json.loads("".join(X))
print(d)

{'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.18.0'}}, 'data': {'ndarray': [5.275558760255375]}}


In [52]:
r = sc.predict(gateway="istio", transport="grpc", shape=(1, 11))
print(r)
assert r.success == True

Success:True message:
Request:
{'meta': {}, 'data': {'tensor': {'shape': [1, 11], 'values': [0.49331471179570063, 0.6224359102414585, 0.9804140644783785, 0.3380112388366434, 0.8942524358731484, 0.2837043142995588, 0.9867737754039229, 0.47015800689221765, 0.5398691845908773, 0.6674452878134565, 0.4007242201724862]}}}
Response:
{'meta': {'requestPath': {'classifier': 'seldonio/mlflowserver:1.18.0'}}, 'data': {'tensor': {'shape': [1], 'values': [5.204512993729161]}}}


In [53]:
!kubectl delete -f ./resources/elasticnet_wine.yaml

seldondeployment.machinelearning.seldon.io "mlflow" deleted


### V2 protocol

We can deploy a MLflow model, exposing an API compatible with v2 protocol by specifying the `protocol` of our `SeldonDeployment` as `v2`.
For example, we can consider the config below:

In [54]:
%%writefile ./resources/elasticnet_wine_v2.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: mlflow
spec:
  protocol: v2  # Activate v2 protocol
  name: wines
  predictors:
    - graph:
        children: []
        implementation: MLFLOW_SERVER
        modelUri: gs://seldon-models/v1.12.0-dev/mlflow/elasticnet_wine
        name: classifier
      name: default
      replicas: 1

Overwriting ./resources/elasticnet_wine_v2.yaml


In [55]:
!kubectl apply -f ./resources/elasticnet_wine_v2.yaml

seldondeployment.machinelearning.seldon.io/mlflow created


In [56]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=mlflow -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "mlflow-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "mlflow-default-0-classifier" successfully rolled out


Once it's deployed, we can send inference requests to our model.
Note that, since it's using the V2 Protocol, these requests will be different to the ones using the default Seldon Protocol.

In [57]:
import json

import requests

inference_request = {
    "parameters": {"content_type": "pd"},
    "inputs": [
        {
            "name": "fixed acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [7.4],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "volatile acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.7000],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "citric acidity",
            "shape": [1],
            "datatype": "FP32",
            "data": [0],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "residual sugar",
            "shape": [1],
            "datatype": "FP32",
            "data": [1.9],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "chlorides",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.076],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "free sulfur dioxide",
            "shape": [1],
            "datatype": "FP32",
            "data": [11],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "total sulfur dioxide",
            "shape": [1],
            "datatype": "FP32",
            "data": [34],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "density",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.9978],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "pH",
            "shape": [1],
            "datatype": "FP32",
            "data": [3.51],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "sulphates",
            "shape": [1],
            "datatype": "FP32",
            "data": [0.56],
            "parameters": {"content_type": "np"},
        },
        {
            "name": "alcohol",
            "shape": [1],
            "datatype": "FP32",
            "data": [9.4],
            "parameters": {"content_type": "np"},
        },
    ],
}

endpoint = "http://localhost:8004/seldon/seldon/mlflow/v2/models/infer"
response = requests.post(endpoint, json=inference_request)

print(json.dumps(response.json(), indent=2))
assert response.ok

{
  "model_name": "classifier",
  "model_version": "v1",
  "id": "732f2602-cc8a-479d-86c8-ee99bafd1a1e",
  "parameters": null,
  "outputs": [
    {
      "name": "predict",
      "shape": [
        1
      ],
      "datatype": "FP64",
      "parameters": null,
      "data": [
        6.016145744177844
      ]
    }
  ]
}


In [58]:
!kubectl delete -f ./resources/elasticnet_wine_v2.yaml

seldondeployment.machinelearning.seldon.io "mlflow" deleted
