# Tensorflow Load and Benchmark Tests

Using a pretrained model for [Tensorflow flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)

 * Load test the model at fixed rate
 * Benchmark the model to find maximum throughput and saturation handling
 
 ## Setup
 
  * Create a 3 node GCP cluster with n1-standard-8 node
  * Install Seldon Core
  
 ## TODO
 
  * gRPC
  * Run vegeta on separate node to model servers using affinity/taints

In [1]:
!kubectl create namespace seldon

Error from server (AlreadyExists): namespaces "seldon" already exists


In [2]:
!kubectl config set-context $(kubectl config current-context) --namespace=seldon

Context "gke_seldon-demos_europe-west2-b_loadtest" modified.


In [4]:
import sys
sys.path.append('../')
from vegeta_utils import *

## Benchmark with Saturation Test

In [5]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '2'
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [6]:
run_model("tf_flowers.yaml")

Run test to gather the max throughput of the model

In [7]:
results = run_test("tf_vegeta_cfg.yaml","vegeta_max.yaml","11m")
print(json.dumps(results, indent=4))
saturation_throughput=int(results["throughput"])

{
    "latencies": {
        "total": 18288021903509,
        "mean": 5842818499,
        "50th": 5514724729,
        "90th": 7604202154,
        "95th": 8413391930,
        "99th": 10730548917,
        "max": 14035522987,
        "min": 928993664
    },
    "bytes_in": {
        "total": 682340,
        "mean": 218
    },
    "bytes_out": {
        "total": 50627750,
        "mean": 16175
    },
    "earliest": "2020-07-09T12:52:58.723093584Z",
    "latest": "2020-07-09T12:55:58.804550443Z",
    "end": "2020-07-09T12:56:04.265767959Z",
    "duration": 180081456859,
    "wait": 5461217516,
    "requests": 3130,
    "rate": 17.38102331352597,
    "throughput": 16.869434541371124,
    "success": 1,
    "status_codes": {
        "200": 3130
    },
    "errors": []
}
Max Throughtput= 16


In [8]:
print("Max Throughtput=",saturation_throughput)

Max Throughtput= 16


## Load Tests with HPA

Run with an HPA at saturation rate to check:
  * Latencies affected by scaling


In [11]:
%%writefile tf_flowers.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tf-flowers
spec:
  protocol: tensorflow
  transport: rest
  predictors:
  - graph:
      implementation: TENSORFLOW_SERVER
      modelUri: gs://kfserving-samples/models/tensorflow/flowers
      name:  flowers
      parameters:
        - name: model_name
          type: STRING
          value: flowers
    componentSpecs:
    - hpaSpec:
        minReplicas: 1
        maxReplicas: 5
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 70
          type: Resource
      spec:
        containers:
        - name: flowers
          resources:
            requests:
              cpu: '1'
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
          readinessProbe:
            failureThreshold: 3
            initialDelaySeconds: 20
            periodSeconds: 5
            successThreshold: 1
            tcpSocket:
              port: http
            timeoutSeconds: 5
    name: default
    replicas: 1

Overwriting tf_flowers.yaml


In [12]:
run_model("tf_flowers.yaml")

In [7]:
rate=saturation_throughput
duration="10m"
%env DURATION=$duration
%env RATE=$rate/1s
!cat vegeta_cfg.tmpl.yaml | envsubst > vegeta.tmp.yaml
!cat vegeta.tmp.yaml

env: DURATION=10m
env: RATE=18/1s
apiVersion: batch/v1
kind: Job
metadata:
  name: tf-load-test
spec:
  backoffLimit: 6
  parallelism: 1
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
        - args:
            - vegeta -cpus=4 attack -keepalive=false -duration=10m -rate=18/1s -targets=/var/vegeta/cfg
              | vegeta report -type=json
          command:
            - sh
            - -c
          image: peterevans/vegeta:latest
          imagePullPolicy: Always
          name: vegeta
          volumeMounts:
            - mountPath: /var/vegeta
              name: tf-vegeta-cfg
      restartPolicy: Never
      volumes:
        - configMap:
            defaultMode: 420
            name: tf-vegeta-cfg
          name: tf-vegeta-cfg


In [8]:
results = run_test("tf_vegeta_cfg.yaml","vegeta.tmp.yaml","11m")
print(json.dumps(results, indent=4))

{
    "latencies": {
        "total": 6235317971355,
        "mean": 577344256,
        "50th": 114263582,
        "90th": 1927105619,
        "95th": 4408763192,
        "99th": 6560764472,
        "max": 11078287773,
        "min": 80305724
    },
    "bytes_in": {
        "total": 2354400,
        "mean": 218
    },
    "bytes_out": {
        "total": 174690000,
        "mean": 16175
    },
    "earliest": "2020-07-09T07:01:17.395810851Z",
    "latest": "2020-07-09T07:11:17.340159956Z",
    "end": "2020-07-09T07:11:17.475663766Z",
    "duration": 599944349105,
    "wait": 135503810,
    "requests": 10800,
    "rate": 18.00166968171547,
    "throughput": 17.99760473133198,
    "success": 1,
    "status_codes": {
        "200": 10800
    },
    "errors": []
}


In [9]:
print_results(results)

Latencies:
	mean: 577.344256 ms
	50th: 114.263582 ms
	90th: 1927.105619 ms
	95th: 4408.763192 ms
	99th: 6560.764472 ms

Throughput: 17.99760473133198/s
Errors: False
