# Service Orchestrator Benchmark Tests

Using a pretrained model for [Tensorflow flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)

 * Tests the extra latency added by the svcOrch for a medium size image (224x224) classification model.
 
 ## Setup
 
  * Create a 3 node GCP cluster with n1-standard-8 node
  * Install Seldon Core

In [1]:
!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists


In [2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "gke_seldon-demos_europe-west2-b_loadtest" modified.


In [11]:
import sys
sys.path.append('../')
from vegeta_utils import *

## Tensorflow Flowers Model - Latency Test

In [12]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [13]:
run_model("tf_flowers.yaml")

Available


In [14]:
results = run_test("tf_vegeta_cfg.yaml","vegeta_1worker.yaml","21m")
print(json.dumps(results, indent=4))
mean_with_executor=results["latencies"]["mean"]

{
    "latencies": {
        "total": 1200122285781,
        "mean": 87434233,
        "50th": 86289248,
        "90th": 92084428,
        "95th": 94640378,
        "99th": 100283524,
        "max": 744456093,
        "min": 81534907
    },
    "bytes_in": {
        "total": 2992268,
        "mean": 218
    },
    "bytes_out": {
        "total": 222018050,
        "mean": 16175
    },
    "earliest": "2020-07-09T07:33:37.975420123Z",
    "latest": "2020-07-09T07:53:38.044683957Z",
    "end": "2020-07-09T07:53:38.143936319Z",
    "duration": 1200069263834,
    "wait": 99252362,
    "requests": 13726,
    "rate": 11.437673152421189,
    "throughput": 11.436727271854549,
    "success": 1,
    "status_codes": {
        "200": 13726
    },
    "errors": []
}


## Tensorflow Flowers Model - No executor - Latency Test
This test will show that the executor adds around 1-2ms on each request when compared to the last test.

In [15]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    annotations:
        seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [16]:
run_model("tf_flowers.yaml")

Failed
Available


In [17]:
results = run_test("tf_standalone_vegeta_cfg.yaml","vegeta_1worker.yaml","21m")
print(json.dumps(results, indent=4))
mean_no_executor=results["latencies"]["mean"]

{
    "latencies": {
        "total": 1200085544000,
        "mean": 83834128,
        "50th": 82507619,
        "90th": 89062704,
        "95th": 94367150,
        "99th": 100908596,
        "max": 887568018,
        "min": 76060432
    },
    "bytes_in": {
        "total": 3120670,
        "mean": 218
    },
    "bytes_out": {
        "total": 231545125,
        "mean": 16175
    },
    "earliest": "2020-07-09T07:59:00.137250198Z",
    "latest": "2020-07-09T08:19:00.19068938Z",
    "end": "2020-07-09T08:19:00.273006669Z",
    "duration": 1200053439182,
    "wait": 82317289,
    "requests": 14315,
    "rate": 11.928635452899185,
    "throughput": 11.927817268017469,
    "success": 1,
    "status_codes": {
        "200": 14315
    },
    "errors": []
}


If there are errors above you will need to rerun as this may skew result becuase of network error.
Observed difference is between 2 and 4ms.

In [18]:
diff = (mean_with_executor - mean_no_executor) / 1e6
print("Diff in ms",diff)

Diff in ms 3.600105
