# Tensorflow Load and Benchmark Tests

Using a pretrained model for [Tensorflow flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)

 * Tests the extra latency added by the svcOrch for a medium size image (224x224) classification model.
 * Load test the model at fixed rate
 * Benchmark the model to find maximum throughput and saturation handling
 
 ## Setup
 
  * Create a 1 node GCP cluster with n1-standard-8 node
  * Install Seldon Core

In [1]:
!kubectl create namespace seldon

namespace/seldon created


In [2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "gke_seldon-demos_europe-west4-a_cluster-2" modified.


In [3]:
import json
import time
import yaml

## Tensorflow Flowers Model - Latency Test

In [4]:
def run_model(model_name):
    with open(model_name, 'r') as stream:
        resource = yaml.safe_load(stream)
        metaName = resource["metadata"]["name"]
        %env metaName=$metaName
        !kubectl apply -f $model_name
        !kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=$metaName \
                                 -o jsonpath='{.items[0].metadata.name}')
        for i in range(60):
            state=!kubectl get sdep $metaName -o jsonpath='{.status.state}'
            state=state[0]
            print(state)
            if state=="Available":
                break
            time.sleep(1)
def run_test(vegeta_cfg,vegeta_job,wait_time):
    with open(vegeta_job, 'r') as stream:
        resource = yaml.safe_load(stream)
        metaName = resource["metadata"]["name"]
        %env metaName=$metaName
        !kubectl apply -f $vegeta_cfg
        !kubectl create -f $vegeta_job
        !kubectl wait --for=condition=complete --timeout=$wait_time job/tf-load-test
        raw=!kubectl logs $(kubectl get pods -l job-name=$metaName -o  jsonpath='{.items[0].metadata.name}')
        results = json.loads(raw[0])
        !kubectl delete -f $vegeta_cfg
        !kubectl delete -f $vegeta_job
        return results

def print_results(results):
    print("Latencies:")
    print("\tmean:",results["latencies"]["mean"]/1e6,"ms")
    print("\t50th:",results["latencies"]["50th"]/1e6,"ms")
    print("\t90th:",results["latencies"]["90th"]/1e6,"ms")
    print("\t95th:",results["latencies"]["95th"]/1e6,"ms")
    print("\t99th:",results["latencies"]["99th"]/1e6,"ms")
    print("")
    print("Throughput:",str(results["throughput"])+"/s")
    print("Errors:",len(results["errors"])>0)

In [5]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [6]:
run_model("tf_flowers.yaml")

env: metaName=tf-flowers
seldondeployment.machinelearning.seldon.io/tf-flowers created
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 0 of 1 updated replicas are available...
deployment "tf-flowers-default-0-flowers" successfully rolled out
Available


In [10]:
results = run_test("tf_vegeta_cfg.yaml","vegeta_1worker.yaml","21m")
print(json.dumps(results, indent=4))
mean_with_executor=results["latencies"]["mean"]

env: metaName=tf-load-test
configmap/tf-vegeta-cfg unchanged
job.batch/tf-load-test created
job.batch/tf-load-test condition met
configmap "tf-vegeta-cfg" deleted
job.batch "tf-load-test" deleted
{
    "latencies": {
        "total": 1200035534611,
        "mean": 77803133,
        "50th": 76854636,
        "90th": 82629749,
        "95th": 85047245,
        "99th": 89765289,
        "max": 127443810,
        "min": 71799948
    },
    "bytes_in": {
        "total": 3347008,
        "mean": 217
    },
    "bytes_out": {
        "total": 249483200,
        "mean": 16175
    },
    "earliest": "2020-07-04T18:03:08.918608479Z",
    "latest": "2020-07-04T18:23:08.921688249Z",
    "end": "2020-07-04T18:23:08.998265316Z",
    "duration": 1200003079770,
    "wait": 76577067,
    "requests": 15424,
    "rate": 12.853300345659328,
    "throughput": 12.852480176734598,
    "success": 1,
    "status_codes": {
        "200": 15424
    },
    "errors": []
}


## Tensorflow Flowers Model - No executor - Latency Test
This test will show that the executor adds around 1-2ms on each request when compared to the last test.

In [11]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    annotations:
        seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [12]:
run_model("tf_flowers.yaml")

env: metaName=tf-flowers
seldondeployment.machinelearning.seldon.io/tf-flowers configured
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 1 old replicas are pending termination...
deployment "tf-flowers-default-0-flowers" successfully rolled out
Failed
Available


In [18]:
results = run_test("tf_standalone_vegeta_cfg.yaml","vegeta_1worker.yaml","21m")
print(json.dumps(results, indent=4))
mean_no_executor=results["latencies"]["mean"]

env: metaName=tf-load-test
configmap/tf-vegeta-cfg created
job.batch/tf-load-test created
job.batch/tf-load-test condition met
configmap "tf-vegeta-cfg" deleted
job.batch "tf-load-test" deleted
{
    "latencies": {
        "total": 1200069382510,
        "mean": 76059664,
        "50th": 75062613,
        "90th": 80622085,
        "95th": 83526579,
        "99th": 88975382,
        "max": 101852309,
        "min": 70775664
    },
    "bytes_in": {
        "total": 3423826,
        "mean": 217
    },
    "bytes_out": {
        "total": 255209150,
        "mean": 16175
    },
    "earliest": "2020-07-04T18:57:37.737338947Z",
    "latest": "2020-07-04T19:17:37.779148449Z",
    "end": "2020-07-04T19:17:37.854295243Z",
    "duration": 1200041809502,
    "wait": 75146794,
    "requests": 15778,
    "rate": 13.147875244903044,
    "throughput": 13.147051974581446,
    "success": 1,
    "status_codes": {
        "200": 15778
    },
    "errors": []
}


If there are errors above you will need to rerun as this may skew result becuase of network error

In [20]:
diff = (mean_with_executor - mean_no_executor) / 1e6
print("Diff in ms",diff)

Diff in ms 1.743469


## Benchmark with Saturation Test

In [21]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [22]:
run_model("tf_flowers.yaml")

env: metaName=tf-flowers
seldondeployment.machinelearning.seldon.io/tf-flowers configured
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 1 old replicas are pending termination...
deployment "tf-flowers-default-0-flowers" successfully rolled out
Failed
Available


In [23]:
results = run_test("tf_vegeta_cfg.yaml","vegeta_max.yaml","11m")
print(json.dumps(results, indent=4))
saturation_throughput=int(results["throughput"])
print("Max Throughtput=",saturation_throughput)

env: metaName=tf-load-test
configmap/tf-vegeta-cfg created
job.batch/tf-load-test created
job.batch/tf-load-test condition met
configmap "tf-vegeta-cfg" deleted
job.batch "tf-load-test" deleted
{
    "latencies": {
        "total": 6230883133856,
        "mean": 4591660378,
        "50th": 4229617284,
        "90th": 6169903843,
        "95th": 7032964423,
        "99th": 7872317613,
        "max": 8901616694,
        "min": 886076277
    },
    "bytes_in": {
        "total": 294469,
        "mean": 217
    },
    "bytes_out": {
        "total": 21949475,
        "mean": 16175
    },
    "earliest": "2020-07-04T19:31:56.520502929Z",
    "latest": "2020-07-04T19:32:56.548603582Z",
    "end": "2020-07-04T19:33:00.898401339Z",
    "duration": 60028100653,
    "wait": 4349797757,
    "requests": 1357,
    "rate": 22.60607924019301,
    "throughput": 21.078662608054525,
    "success": 1,
    "status_codes": {
        "200": 1357
    },
    "errors": []
}
Max Throughtput= 21


## Load Tests with HPA

Run with an HPA at twice max saturation rate to check:
  * Latencies affected by scaling
  * Check for errors


In [24]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - hpaSpec:
        minReplicas: 1
        maxReplicas: 5
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 70
          type: Resource
      spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '1'
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
          readinessProbe:
            failureThreshold: 3
            initialDelaySeconds: 20
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [25]:
run_model("tf_flowers.yaml")

env: metaName=tf-flowers
seldondeployment.machinelearning.seldon.io/tf-flowers configured
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "tf-flowers-default-0-flowers" rollout to finish: 1 old replicas are pending termination...
deployment "tf-flowers-default-0-flowers" successfully rolled out
Available


In [26]:
rate=saturation_throughput*2
duration="10m"
%env DURATION=$duration
%env RATE=$rate/1s
!cat vegeta_cfg.tmpl.yaml | envsubst > vegeta.tmp.yaml
!cat vegeta.tmp.yaml

env: DURATION=10m
env: RATE=42/1s
apiVersion: batch/v1
kind: Job
metadata:
  name: tf-load-test
spec:
  backoffLimit: 6
  parallelism: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
        - args:
            - vegeta -cpus=4 attack -keepalive=false -duration=10m -rate=42/1s -targets=/var/vegeta/cfg
              | vegeta report -type=json
          command:
            - sh
            - -c
          image: peterevans/vegeta:latest
          imagePullPolicy: Always
          name: vegeta
          volumeMounts:
            - mountPath: /var/vegeta
              name: tf-vegeta-cfg
      restartPolicy: Never
      volumes:
        - configMap:
            defaultMode: 420
            name: tf-vegeta-cfg
          name: tf-vegeta-cfg


In [27]:
results = run_test("tf_vegeta_cfg.yaml","vegeta.tmp.yaml","11m")
print(json.dumps(results, indent=4))

env: metaName=tf-load-test
configmap/tf-vegeta-cfg created
job.batch/tf-load-test created
job.batch/tf-load-test condition met
configmap "tf-vegeta-cfg" deleted
job.batch "tf-load-test" deleted
{
    "latencies": {
        "total": 149270246390025,
        "mean": 5923422475,
        "50th": 300207052,
        "90th": 30001304102,
        "95th": 30032935844,
        "99th": 30541862186,
        "max": 32350943067,
        "min": 72003155
    },
    "bytes_in": {
        "total": 4678954,
        "mean": 185.67277777777778
    },
    "bytes_out": {
        "total": 348991800,
        "mean": 13848.880952380952
    },
    "earliest": "2020-07-04T19:38:42.009484454Z",
    "latest": "2020-07-04T19:48:41.989427412Z",
    "end": "2020-07-04T19:48:42.179397209Z",
    "duration": 599979942958,
    "wait": 189969797,
    "requests": 25200,
    "rate": 42.00140403987481,
    "throughput": 35.9264927177414,
    "success": 0.8556349206349206,
    "status_codes": {
        "0": 3624,
        "200"

In [28]:
print_results(results)

Latencies:
	mean: 5923.422475 ms
	50th: 300.207052 ms
	90th: 30001.304102 ms
	95th: 30032.935844 ms
	99th: 30541.862186 ms

Throughput: 35.9264927177414/s
Errors: True
