# Python Wrapper Benchmarking

## Prequisites

 * An authenticated K8S cluster with istio and Seldon Core installed
 * vegeta and ghz benchmarking tools
 
 Port forward to istio
 
 ```
 kubectl port-forward $(kubectl get pods -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].metadata.name}') -n istio-system 8003:8080
 ```

In [120]:
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w') as f:
        f.write(cell.format(**globals()))

In [121]:
VERSION=!cat ../../../version.txt
VERSION=VERSION[0]
VERSION

'1.10.0-dev'

In [122]:
!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists


In [123]:
!helm upgrade --install seldon-core seldon-core-operator --repo https://storage.googleapis.com/seldon-charts --version 1.9.0 --namespace seldon-system --set istio.enabled="true" --set istio.gateway="seldon-gateway.istio-system.svc.cluster.local"

Release "seldon-core" has been upgraded. Happy Helming!
NAME: seldon-core
LAST DEPLOYED: Mon Jun 28 10:20:06 2021
NAMESPACE: seldon-system
STATUS: deployed
REVISION: 3
TEST SUITE: None


## Test with Predict method

The `seldontest_predict` has simply a `predict` method that does a loop with a configurable number of iterations (default 1) to simulate work. The iterations can be set as a Seldon parameter but in this case we are looking to benchmark the serialization/deserialization cost so want a minimal amount of work.

In [139]:
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
  namespace: seldon
spec:
  predictors:
  - annotations:
      seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/seldontest_predict:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
          resources:
            requests:
              cpu: 1
            limits:
              cpu: 1
          env:
          - name: GUNICORN_WORKERS
            value: "1"
          - name: GUNICORN_THREADS
            value: "1"
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    graph:
      children: []
      name: classifier
      type: MODEL
    name: default
    replicas: 1

In [140]:
!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created


In [141]:
!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5445bd4ccf-rfj8n condition met


Create payloads and associated vegeta configurations for

  1. ndarray
  1. tensor
  1. tftensor
  
  We will create an array of 100,000 consecutive integers.

In [142]:
sz = 100000
vals = list(range(sz))
valStr = f"{vals}"
payload = '{"data": {"ndarray": ['+valStr+']}}'
with open("data_ndarray.json","w") as f:
    f.write(payload)
payload_tensor = '{"data":{"tensor":{"shape":[1,'+str(sz)+'],"values":'+valStr+'}}}'
with open("data_tensor.json","w") as f:
    f.write(payload_tensor)

In [143]:
import tensorflow as tf
from google.protobuf import json_format 
import numpy as np
array = np.array(vals)
tftensor = tf.make_tensor_proto(array)
jStrTensor = json_format.MessageToJson(tftensor)
jTensor = json.loads(jStrTensor)
payload_tftensor = '{"data":{"tftensor":'+json.dumps(jTensor,separators=(',', ':'))+'}}'
with open("data_tftensor.json","w") as f:
    f.write(payload_tftensor)

In [144]:
import base64
import json
  
sample_string_bytes = payload_tensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {"method": "POST", 
             "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", 
             "body": base64_string,
             "header": {"Content-Type": ["application/json"]}}
with open("vegeta_tensor.json","w") as f:
    f.write(json.dumps(jqPayload,separators=(',', ':')))
    f.write("\n")
    
sample_string_bytes = payload.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {"method": "POST", 
             "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", 
             "body": base64_string,
             "header": {"Content-Type": ["application/json"]}}
with open("vegeta_ndarray.json","w") as f:
    f.write(json.dumps(jqPayload,separators=(',', ':')))
    f.write("\n")
    
    
sample_string_bytes = payload_tftensor.encode("ascii")
base64_bytes = base64.b64encode(sample_string_bytes)
base64_string = base64_bytes.decode("ascii")
jqPayload = {"method": "POST", 
             "url": "http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions", 
             "body": base64_string,
             "header": {"Content-Type": ["application/json"]}}
with open("vegeta_tftensor.json","w") as f:
    f.write(json.dumps(jqPayload,separators=(',', ':')))
    f.write("\n")

Smoke test port-forward to check everything is working

In [145]:
!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_tftensor.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

{"data":{"names":[],"tftensor":{"dtype":"DT_INT64","int64Val":["1"],"tensorShape":{"dim":[{"size":"1"}]}}},"meta":{"requestPath":{"classifier":"seldonio/seldontest_predict:1.10.0-dev"}}}


Test REST

 1. ndarray
 1. tensor
 1. tftensor
 
 This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

In [146]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         505, 50.49, 50.39
Duration      [total, attack, wait]             10.022s, 10.002s, 19.945ms
Latencies     [min, mean, 50, 90, 95, 99, max]  17.513ms, 19.841ms, 19.551ms, 21.102ms, 22.914ms, 25.205ms, 31.239ms
Bytes In      [total, mean]                     58075, 115.00
Bytes Out     [total, mean]                     347902075, 688915.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:505  
Error Set:


In [147]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         509, 50.85, 50.75
Duration      [total, attack, wait]             10.029s, 10.01s, 18.859ms
Latencies     [min, mean, 50, 90, 95, 99, max]  17.584ms, 19.7ms, 19.531ms, 20.669ms, 21.421ms, 24.327ms, 26.721ms
Bytes In      [total, mean]                     69733, 137.00
Bytes Out     [total, mean]                     350670460, 688940.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:509  
Error Set:


In [148]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         620, 61.94, 61.84
Duration      [total, attack, wait]             10.026s, 10.01s, 16.001ms
Latencies     [min, mean, 50, 90, 95, 99, max]  13.766ms, 16.168ms, 15.876ms, 17.819ms, 18.89ms, 21.744ms, 32.567ms
Bytes In      [total, mean]                     115940, 187.00
Bytes Out     [total, mean]                     661398020, 1066771.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:620  
Error Set:


Example results

| ndarray | tensor | tftensor |
| ------- | ------ | -------- |
| 19.8ms | 19.7ms | 16.2ms |

 Test gRPC
 
  1. ndarray
  1. tensor
  1. tftensor

In [149]:
%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_ndarray.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003


Summary:
  Count:	23
  Total:	10.06 s
  Slowest:	285.51 ms
  Fastest:	254.39 ms
  Average:	253.76 ms
  Requests/sec:	2.29

Response time histogram:
  254.391 [1]	|∎∎∎∎∎∎∎
  257.502 [1]	|∎∎∎∎∎∎∎
  260.614 [6]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  263.725 [3]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  266.837 [5]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  269.948 [0]	|
  273.060 [2]	|∎∎∎∎∎∎∎∎∎∎∎∎∎
  276.171 [1]	|∎∎∎∎∎∎∎
  279.283 [2]	|∎∎∎∎∎∎∎∎∎∎∎∎∎
  282.394 [0]	|
  285.506 [1]	|∎∎∎∎∎∎∎

Latency distribution:
  10 % in 257.61 ms 
  25 % in 258.97 ms 
  50 % in 265.91 ms 
  75 % in 272.90 ms 
  90 % in 278.27 ms 
  95 % in 285.51 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         22 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing   



In [150]:
%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003


Summary:
  Count:	90
  Total:	10.10 s
  Slowest:	22.38 ms
  Fastest:	4.89 ms
  Average:	8.02 ms
  Requests/sec:	8.91

Response time histogram:
  4.892 [1]	|∎
  6.641 [48]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  8.390 [16]	|∎∎∎∎∎∎∎∎∎∎∎∎∎
  10.139 [11]	|∎∎∎∎∎∎∎∎∎
  11.888 [3]	|∎∎∎
  13.637 [1]	|∎
  15.386 [0]	|
  17.135 [2]	|∎∎
  18.884 [2]	|∎∎
  20.633 [3]	|∎∎∎
  22.382 [2]	|∎∎

Latency distribution:
  10 % in 5.56 ms 
  25 % in 5.97 ms 
  50 % in 6.55 ms 
  75 % in 8.83 ms 
  90 % in 16.66 ms 
  95 % in 19.07 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         89 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing   



In [151]:
%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tftensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003


Summary:
  Count:	420
  Total:	10.05 s
  Slowest:	13.37 ms
  Fastest:	3.89 ms
  Average:	5.24 ms
  Requests/sec:	41.80

Response time histogram:
  3.890 [1]	|
  4.837 [178]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.785 [159]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6.733 [43]	|∎∎∎∎∎∎∎∎∎∎
  7.680 [22]	|∎∎∎∎∎
  8.628 [7]	|∎∎
  9.576 [4]	|∎
  10.524 [3]	|∎
  11.471 [1]	|
  12.419 [0]	|
  13.367 [1]	|

Latency distribution:
  10 % in 4.36 ms 
  25 % in 4.61 ms 
  50 % in 4.96 ms 
  75 % in 5.48 ms 
  90 % in 6.59 ms 
  95 % in 7.34 ms 
  99 % in 10.42 ms 

Status code distribution:
  [OK]         419 responses   
  [Canceled]   1 responses     

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing   



Example results

| ndarray | tensor | tftensor |
| ------- | ------ | -------- |
| 253ms | 8.4ms | 5.5ms |

## Conclusions

 * gRPC is generally faster than REST except for ndarray which is much worse and should not be used with gRPC
 * tftensor is fastest

In [152]:
!kubectl delete -f model.yaml

seldondeployment.machinelearning.seldon.io "seldon-model" deleted


## Test Predct Raw

In [158]:
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
  namespace: seldon
spec:
  predictors:
  - annotations:
      seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - image: seldonio/seldontest_predict_raw:{VERSION}
          imagePullPolicy: IfNotPresent
          name: classifier
          resources:
            requests:
              cpu: 1
            limits:
              cpu: 1
          env:
          - name: GUNICORN_WORKERS
            value: "1"
          - name: GUNICORN_THREADS
            value: "1"
        tolerations:
        - key: model
          operator: Exists
          effect: NoSchedule
    graph:
      children: []
      name: classifier
      type: MODEL
    name: default
    replicas: 1

In [159]:
!kubectl apply -f model.yaml

seldondeployment.machinelearning.seldon.io/seldon-model created


In [160]:
!kubectl wait --for condition=ready --timeout=600s pods --all -n seldon

pod/seldon-model-default-0-classifier-5dc8fbd597-kk7td condition met


Smoke test port-forward to check everything is working

In [161]:
!curl -X POST -H 'Content-Type: application/json' \
   -d '@./data_tftensor.json' \
    http://localhost:8003/seldon/seldon/seldon-model/api/v1.0/predictions

[1]


Test REST

 1. ndarray
 1. tensor
 1. tftensor
 
 This can be done locally as the results should be indicative of the relative differences rather than very accurate timings.

In [162]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_ndarray.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         724, 72.35, 72.25
Duration      [total, attack, wait]             10.021s, 10.007s, 14.458ms
Latencies     [min, mean, 50, 90, 95, 99, max]  12.228ms, 13.838ms, 13.683ms, 14.641ms, 15.489ms, 17.888ms, 22.263ms
Bytes In      [total, mean]                     2896, 4.00
Bytes Out     [total, mean]                     498774460, 688915.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:724  
Error Set:


In [163]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         724, 72.32, 72.22
Duration      [total, attack, wait]             10.025s, 10.011s, 14.307ms
Latencies     [min, mean, 50, 90, 95, 99, max]  12.362ms, 13.844ms, 13.701ms, 14.655ms, 15.493ms, 17.976ms, 18.802ms
Bytes In      [total, mean]                     2896, 4.00
Bytes Out     [total, mean]                     498792560, 688940.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:724  
Error Set:


In [164]:
%%bash
vegeta attack -format=json -duration=10s -rate=0 -max-workers=1 -targets=vegeta_tftensor.json | 
  vegeta report -type=text

Requests      [total, rate, throughput]         901, 90.04, 89.93
Duration      [total, attack, wait]             10.018s, 10.007s, 11.64ms
Latencies     [min, mean, 50, 90, 95, 99, max]  8.955ms, 11.116ms, 10.994ms, 12.099ms, 12.721ms, 15.208ms, 19.918ms
Bytes In      [total, mean]                     3604, 4.00
Bytes Out     [total, mean]                     961160671, 1066771.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:901  
Error Set:


Example results

| ndarray | tensor | tftensor |
| ------- | ------ | -------- |
| 13.3ms | 13.3ms | 11.1ms |

 Test gRPC
 
  1. ndarray
  1. tensor
  1. tftensor

In [165]:
%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_ndarray.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003


Summary:
  Count:	44
  Total:	10.04 s
  Slowest:	69.07 ms
  Fastest:	44.44 ms
  Average:	46.03 ms
  Requests/sec:	4.38

Response time histogram:
  44.440 [1]	|∎
  46.904 [31]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  49.367 [6]	|∎∎∎∎∎∎∎∎
  51.831 [2]	|∎∎∎
  54.294 [2]	|∎∎∎
  56.758 [0]	|
  59.221 [0]	|
  61.684 [0]	|
  64.148 [0]	|
  66.611 [0]	|
  69.075 [1]	|∎

Latency distribution:
  10 % in 45.05 ms 
  25 % in 45.40 ms 
  50 % in 46.30 ms 
  75 % in 47.34 ms 
  90 % in 50.16 ms 
  95 % in 53.38 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         43 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing   



In [166]:
%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003


Summary:
  Count:	92
  Total:	10.10 s
  Slowest:	19.81 ms
  Fastest:	4.93 ms
  Average:	7.91 ms
  Requests/sec:	9.11

Response time histogram:
  4.932 [1]	|∎
  6.419 [53]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  7.907 [12]	|∎∎∎∎∎∎∎∎∎
  9.395 [5]	|∎∎∎∎
  10.882 [4]	|∎∎∎
  12.370 [1]	|∎
  13.858 [3]	|∎∎
  15.346 [3]	|∎∎
  16.833 [2]	|∎∎
  18.321 [3]	|∎∎
  19.809 [4]	|∎∎∎

Latency distribution:
  10 % in 5.21 ms 
  25 % in 5.68 ms 
  50 % in 6.04 ms 
  75 % in 8.27 ms 
  90 % in 15.77 ms 
  95 % in 19.04 ms 
  0 % in 0 ns 

Status code distribution:
  [OK]         91 responses   
  [Canceled]   1 responses    

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing   



In [167]:
%%bash
ghz \
    --insecure \
    --proto ../../../proto/prediction.proto \
    --call seldon.protos.Seldon/Predict \
    --data-file=./data_tftensor.json \
    --qps=0 \
    --cpus=1 \
    --concurrency=1 \
    --duration="10s" \
    --format summary \
    --metadata='{"seldon": "seldon-model", "namespace": "seldon"}' \
    localhost:8003


Summary:
  Count:	426
  Total:	10.03 s
  Slowest:	11.74 ms
  Fastest:	3.67 ms
  Average:	5.02 ms
  Requests/sec:	42.48

Response time histogram:
  3.668 [1]	|
  4.475 [174]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.282 [141]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6.089 [43]	|∎∎∎∎∎∎∎∎∎∎
  6.897 [30]	|∎∎∎∎∎∎∎
  7.704 [16]	|∎∎∎∎
  8.511 [6]	|∎
  9.318 [8]	|∎∎
  10.126 [2]	|
  10.933 [1]	|
  11.740 [3]	|∎

Latency distribution:
  10 % in 4.08 ms 
  25 % in 4.27 ms 
  50 % in 4.61 ms 
  75 % in 5.30 ms 
  90 % in 6.62 ms 
  95 % in 7.66 ms 
  99 % in 10.26 ms 

Status code distribution:
  [OK]         425 responses   
  [Canceled]   1 responses     

Error distribution:
  [1]   rpc error: code = Canceled desc = grpc: the client connection is closing   



Example results

| ndarray | tensor | tftensor |
| ------- | ------ | -------- |
| 46ms | 7.9ms | 5.0ms |

## Conclusions

 * `predict_raw` is faster than `predict` but you will need to handle the serialization/deserializtion yourself which maybe will make them equivalent unless specific techniques can be applied for your use case.