## Benchmarking with Argo Worfklows & Vegeta

In this notebook we will dive into how you can run bench marking with batch processing with Argo Workflows, Seldon Core and Vegeta.

Dependencies:

* Seldon core installed as per the docs with Istio as an ingress 
* Argo Workfklows installed in cluster (and argo CLI for commands)


## Setup

### Install Seldon Core
Use the notebook to [set-up Seldon Core with Ambassador or Istio Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

Note: If running with KIND you need to make sure do follow [these steps](https://github.com/argoproj/argo/issues/2376#issuecomment-595593237) as workaround to the `/.../docker.sock` known issue.


### Install Argo Workflows
You can follow the instructions from the official [Argo Workflows Documentation](https://github.com/argoproj/argo#quickstart).

You also need to make sure that argo has permissions to create seldon deployments - for this you can just create a default-admin rolebinding as follows:

In [2]:
!kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default

rolebinding.rbac.authorization.k8s.io/default-admin created


In [117]:
def get_results(results, print_results=True):
    final = {}
    if "average" in results:
        final["mean"] = results["average"] / 1e6
        if results.get("latencyDistribution", False):
            final["50th"] = results["latencyDistribution"][-5]["latency"] / 1e6
            final["90th"] = results["latencyDistribution"][-3]["latency"] / 1e6
            final["95th"] = results["latencyDistribution"][-2]["latency"] / 1e6
            final["99th"] = results["latencyDistribution"][-1]["latency"] / 1e6
        final["rate"] = results["rps"]
        final["errors"] = results["statusCodeDistribution"]
    else:
        final["mean"] = results["latencies"]["mean"] / 1e6
        final["50th"] = results["latencies"]["50th"] / 1e6
        final["90th"] = results["latencies"]["90th"] / 1e6
        final["95th"] = results["latencies"]["95th"] / 1e6
        final["99th"] = results["latencies"]["99th"] / 1e6
        final["rate"] = results["throughput"]
        final["errors"] = results["errors"]
    if print_results:    
        print("Latencies:")
        print("\tmean:", final["mean"], "ms")
        print("\t50th:", final["50th"], "ms")
        print("\t90th:", final["90th"], "ms")
        print("\t95th:", final["95th"], "ms")
        print("\t99th:", final["99th"], "ms")
        print("")
        print("Rate:", str(final["rate"]) + "/s")
        print("Errors:", final["errors"])
    return final

### Create Benchmark Argo Workflow

In order to create a benchmark, we created a simple argo workflow template so you can leverage the power of the helm charts.

Before we dive into the contents of the full helm chart, let's first give it a try with some of the settings.

We will run a batch job that will set up a Seldon Deployment with 1 replicas and 4 cpus (with 100 max workers) to send requests.

In [2]:
!helm template seldon-benchmark-workflow helm-charts/seldon-benchmark-workflow/ \
    --set workflow.name=seldon-benchmark-process \
    --set seldonDeployment.name=sklearn \
    --set seldonDeployment.replicas=1 \
    --set seldonDeployment.serverWorkers=1 \
    --set seldonDeployment.serverThreads=10 \
    --set seldonDeployment.apiType=rest \
    --set benchmark.cpus=4 \
    --set benchmark.maxWorkers=100 \
    --set benchmark.duration=30s \
    --set benchmark.rate=0 \
    --set benchmark.data='\{"data": {"ndarray": [[0\,1\,2\,3]]\}\}' \
    | argo submit -

Name:                seldon-benchmark-process
Namespace:           default
ServiceAccount:      default
Status:              Pending
Created:             Fri Aug 07 18:09:40 +0100 (now)


In [3]:
!argo list

NAME                       STATUS      AGE   DURATION   PRIORITY
seldon-benchmark-process   Succeeded   2m    1m         0


In [4]:
!argo get seldon-benchmark-process

Name:                seldon-benchmark-process
Namespace:           default
ServiceAccount:      default
Status:              Succeeded
Created:             Fri Aug 07 18:09:40 +0100 (2 minutes ago)
Started:             Fri Aug 07 18:09:40 +0100 (2 minutes ago)
Finished:            Fri Aug 07 18:11:09 +0100 (51 seconds ago)
Duration:            1 minute 29 seconds

[39mSTEP[0m                                                             PODNAME                              DURATION  MESSAGE
 [32m✔[0m seldon-benchmark-process (seldon-benchmark-process)                                                          
 ├---[32m✔[0m create-seldon-resource (create-seldon-resource-template)  seldon-benchmark-process-3980407503  2s        
 ├---[32m✔[0m wait-seldon-resource (wait-seldon-resource-template)      seldon-benchmark-process-2136965893  49s       
 └---[32m✔[0m run-benchmark (run-benchmark-template)                    seldon-benchmark-process-780051119   32s       


In [5]:
!argo logs -w seldon-benchmark-process 

[35mcreate-seldon-resource[0m:	time="2020-08-07T17:09:41.804Z" level=info msg="Starting Workflow Executor" version=v2.9.3
[35mcreate-seldon-resource[0m:	time="2020-08-07T17:09:41.809Z" level=info msg="Creating a docker executor"
[35mcreate-seldon-resource[0m:	time="2020-08-07T17:09:41.809Z" level=info msg="Executor (version: v2.9.3, build_date: 2020-07-18T19:11:19Z) initialized (pod: default/seldon-benchmark-process-3980407503) with template:\n{\"name\":\"create-seldon-resource-template\",\"arguments\":{},\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: machinelearning.seldon.io/v1\\nkind: SeldonDeployment\\nmetadata:\\n  name: \\\"sklearn\\\"\\n  namespace: default\\n  ownerReferences:\\n  - apiVersion: argoproj.io/v1alpha1\\n    blockOwnerDeletion: true\\n    kind: Workflow\\n    name: \\\"seldon-benchmark-process\\\"\\n    uid: \\\"e0364966-b2c1-4ee7-a7cf-421952ba3c7a\\\"\\nspec:\\n  annotations:\\n    seldon.io/exec

In [6]:
import json
wf_logs = !argo logs -w seldon-benchmark-process 
wf_bench = wf_logs[-1]
wf_json_str = wf_bench[24:]
results = json.loads(wf_json_str)

print("Latencies:")
print("\tmean:", results["latencies"]["mean"] / 1e6, "ms")
print("\t50th:", results["latencies"]["50th"] / 1e6, "ms")
print("\t90th:", results["latencies"]["90th"] / 1e6, "ms")
print("\t95th:", results["latencies"]["95th"] / 1e6, "ms")
print("\t99th:", results["latencies"]["99th"] / 1e6, "ms")
print("")
print("Throughput:", str(results["throughput"]) + "/s")
print("Errors:", len(results["errors"]) > 0)

Latencies:
	mean: 339.033885 ms
	50th: 272.84063 ms
	90th: 339.539236 ms
	95th: 368.299307 ms
	99th: 4982.426813 ms

Throughput: 293.63176909007353/s
Errors: False


In [7]:
!argo delete seldon-benchmark-process

Workflow 'seldon-benchmark-process' deleted


## Create GRPC benchmark with GHZ and Argo Workflows 

In [24]:
!helm template seldon-benchmark-workflow helm-charts/seldon-benchmark-workflow/ \
    --set workflow.name=seldon-benchmark-process \
    --set seldonDeployment.name=sklearn \
    --set seldonDeployment.replicas=1 \
    --set seldonDeployment.serverWorkers=1 \
    --set seldonDeployment.serverThreads=10 \
    --set seldonDeployment.apiType=grpc \
    --set benchmark.cpus=4 \
    --set benchmark.maxWorkers=100 \
    --set benchmark.duration="120s" \
    --set benchmark.rate=0 \
    --set benchmark.data='\{"data": {"ndarray": [[0\,1\,2\,3]]\}\}' \
    | argo submit -

Name:                seldon-benchmark-process
Namespace:           default
ServiceAccount:      default
Status:              Pending
Created:             Fri Aug 07 18:22:38 +0100 (now)


In [25]:
!argo list

NAME                       STATUS      AGE   DURATION   PRIORITY
seldon-benchmark-process   Succeeded   4m    2m         0


In [26]:
!argo get seldon-benchmark-process

Name:                seldon-benchmark-process
Namespace:           default
ServiceAccount:      default
Status:              Succeeded
Created:             Fri Aug 07 18:22:38 +0100 (4 minutes ago)
Started:             Fri Aug 07 18:22:38 +0100 (4 minutes ago)
Finished:            Fri Aug 07 18:25:11 +0100 (1 minute ago)
Duration:            2 minutes 33 seconds

[39mSTEP[0m                                                             PODNAME                              DURATION  MESSAGE
 [32m✔[0m seldon-benchmark-process (seldon-benchmark-process)                                                          
 ├---[32m✔[0m create-seldon-resource (create-seldon-resource-template)  seldon-benchmark-process-3980407503  2s        
 ├---[32m✔[0m wait-seldon-resource (wait-seldon-resource-template)      seldon-benchmark-process-2136965893  26s       
 └---[32m✔[0m run-benchmark (run-benchmark-template)                    seldon-benchmark-process-780051119   2m        


In [27]:
!argo logs -w seldon-benchmark-process 

[35mcreate-seldon-resource[0m:	time="2020-08-07T17:22:39.446Z" level=info msg="Starting Workflow Executor" version=v2.9.3
[35mcreate-seldon-resource[0m:	time="2020-08-07T17:22:39.450Z" level=info msg="Creating a docker executor"
[35mcreate-seldon-resource[0m:	time="2020-08-07T17:22:39.450Z" level=info msg="Executor (version: v2.9.3, build_date: 2020-07-18T19:11:19Z) initialized (pod: default/seldon-benchmark-process-3980407503) with template:\n{\"name\":\"create-seldon-resource-template\",\"arguments\":{},\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: machinelearning.seldon.io/v1\\nkind: SeldonDeployment\\nmetadata:\\n  name: \\\"sklearn\\\"\\n  namespace: default\\n  ownerReferences:\\n  - apiVersion: argoproj.io/v1alpha1\\n    blockOwnerDeletion: true\\n    kind: Workflow\\n    name: \\\"seldon-benchmark-process\\\"\\n    uid: \\\"e472d69d-44ed-4a45-86b3-d4b64146002b\\\"\\nspec:\\n  name: \\\"sklearn\\\"\\n  transpo

In [28]:
import json
wf_logs = !argo logs -w seldon-benchmark-process 
wf_bench = wf_logs[-1]
wf_json_str = wf_bench[24:]
results = json.loads(wf_json_str)

print("Latencies:")
print("\tmean:", results["average"] / 1e6, "ms")
print("\t50th:", results["latencyDistribution"][-5]["latency"] / 1e6, "ms")
print("\t90th:", results["latencyDistribution"][-3]["latency"] / 1e6, "ms")
print("\t95th:", results["latencyDistribution"][-2]["latency"] / 1e6, "ms")
print("\t99th:", results["latencyDistribution"][-1]["latency"] / 1e6, "ms")
print("")
print("Rate:", str(results["rps"]) + "/s")
print("Errors:", results["statusCodeDistribution"].get("Unavailable", 0) > 0)
print("Errors:", results["statusCodeDistribution"])

Latencies:
	mean: 67.376309 ms
	50th: 65.257398 ms
	90th: 82.9396 ms
	95th: 89.5988 ms
	99th: 101.463001 ms

Rate: 740.6102874631579/s
Errors: True
Errors: {'OK': 88824, 'Unavailable': 50}


In [22]:
!argo delete seldon-benchmark-process

Workflow 'seldon-benchmark-process' deleted


## Run a set of tests

We can now leverage the helm charts we created above to run a grid search on a set of parameters.

In [None]:
import itertools as it
import json
import time

grid_opts = {
    "A-replicas": [1, 3],
    "B-serverWorkers": [1, 4],
    "C-serverThreads": [50, 200],
    "D-apiType": ["rest", "grpc"],
    "E-cpus": [1, 4],
    "F-maxWorkers": [100, 300],
    "G-useEngine": ["true", "false"],
}

allNames = sorted(grid_opts)
combinations = it.product(*(grid_opts[Name] for Name in allNames))
all_results = []
for curr_values in combinations:
    print("VALUES:", curr_values)
    replicas, server_workers, server_threads, api_type, cpus, max_wokers, use_engine = curr_values

    # For some reason python vars don't work with multiline helm charts
    %env REPLICAS=$replicas
    %env SERVER_WORKERS=$server_workers
    %env SERVER_THREADS=$server_threads
    %env API_TYPE=$api_type
    %env CPUS=$cpus
    %env MAX_WORKERS=$max_wokers
    %env USE_ENGINE=$use_engine
    
    !helm template seldon-benchmark-workflow helm-charts/seldon-benchmark-workflow/ \
        --set workflow.name=seldon-benchmark-process \
        --set seldonDeployment.name=sklearn \
        --set seldonDeployment.replicas=$REPLICAS \
        --set seldonDeployment.serverWorkers=$SERVER_WORKERS \
        --set seldonDeployment.serverThreads=$SERVER_THREADS \
        --set seldonDeployment.apiType=$API_TYPE \
        --set seldonDeployment.useEngine=\"$USE_ENGINE\" \
        --set benchmark.cpus=$CPUS \
        --set benchmark.maxWorkers=$MAX_WORKERS \
        --set benchmark.duration=120s \
        --set benchmark.rate=0 \
        --set benchmark.data='\{"data": {"ndarray": [[0\,1\,2\,3]]\}\}' \
        | argo submit --wait -
    
    !argo wait seldon-benchmark-process 
    
    wf_logs = !argo logs -w seldon-benchmark-process 
    wf_bench = wf_logs[-1]
    wf_json_str = wf_bench[24:]
    results = json.loads(wf_json_str)
    
    result = get_results(results)
    result["replicas"] = replicas
    result["server_workers"] = server_workers
    result["server_threads"] = server_threads
    result["apiType"] = api_type
    result["cpus"] = cpus
    result["max_wokers"] = max_wokers
    result["use_engine"] = use_engine
    all_results.append(result)
    
    !argo delete seldon-benchmark-process
    time.sleep(1)
    print("\n\n")
    

## Deeper Analysis
Now that we have all the parameters, we can do a deeper analysis

In [186]:
import pandas as pd
df = pd.DataFrame.from_dict(results)
df.head()

Unnamed: 0,replicas,server_workers,server_threads,apiType,cpus,max_wokers,use_engine,mean,50th,90th,95th,99th,rate,errors
0,1,1,50,rest,1,200,True,489.269344,455.617128,612.294382,672.510108,832.322767,407.879172,[]
1,1,1,50,rest,1,200,False,529.767457,514.151876,591.278115,621.463805,749.348556,376.649458,[]
2,1,1,50,rest,4,200,True,547.618426,526.472215,661.947413,720.039676,863.596098,364.363839,[]
3,1,1,50,rest,4,200,False,593.880113,602.945695,737.99329,770.777543,1003.510371,336.075411,[]
4,1,1,50,grpc,1,200,True,95.322943,97.896699,117.221999,125.8524,141.615501,523.62816,"{'OK': 62790, 'Unavailable': 50}"


### GRPC as expected outperforms REST

In [189]:
df.sort_values("rate", ascending=False)

Unnamed: 0,replicas,server_workers,server_threads,apiType,cpus,max_wokers,use_engine,mean,50th,90th,95th,99th,rate,errors
60,3,4,200,grpc,1,200,true,31.389861,23.769589,71.583795,78.881398,91.312797,1586.593680,"{'OK': 190361, 'Unavailable': 48}"
52,3,4,50,grpc,1,200,true,31.398451,26.313000,64.841515,73.035800,88.744198,1586.555365,"{'OK': 190333, 'Unavailable': 71}"
45,3,1,200,grpc,1,200,false,32.191240,30.448302,60.616301,68.724406,91.484308,1547.003054,"{'OK': 185606, 'Unavailable': 49}"
61,3,4,200,grpc,1,200,false,32.727674,28.483400,63.750796,72.597310,90.693812,1521.590875,"{'OK': 182555, 'Unavailable': 49}"
55,3,4,50,grpc,4,200,false,33.629848,29.610701,67.065895,77.773100,97.296599,1479.320474,"{'OK': 177471, 'Unavailable': 50}"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10,1,1,200,rest,4,200,true,571.452398,556.699256,693.093315,751.197598,1024.233714,348.889260,[]
11,1,1,200,rest,4,200,false,587.900216,556.869872,723.744376,774.244702,939.994423,339.396160,[]
3,1,1,50,rest,4,200,false,593.880113,602.945695,737.993290,770.777543,1003.510371,336.075411,[]
8,1,1,200,rest,1,200,true,633.043624,617.853285,741.229073,776.560578,1846.623159,314.908167,[]


### Deeper dive REST
As expected replicas has the biggest impact. It seems the parameters on the benchmark worker don't seem to affect throughput.

In [190]:
df[df["apiType"]=="rest"].sort_values("rate", ascending=False)

Unnamed: 0,replicas,server_workers,server_threads,apiType,cpus,max_wokers,use_engine,mean,50th,90th,95th,99th,rate,errors
41,3,1,200,rest,1,200,False,201.167546,8.844305,629.250888,690.807158,809.635194,992.298652,[]
48,3,4,50,rest,1,200,True,208.429576,11.377699,655.466848,685.265506,758.664504,957.846772,[]
51,3,4,50,rest,4,200,False,211.228526,13.592301,641.484819,675.713639,795.682869,945.09098,[]
59,3,4,200,rest,4,200,False,214.358834,13.573121,670.449768,690.048496,722.537613,930.694079,[]
57,3,4,200,rest,1,200,False,216.64632,9.336961,684.733598,704.485018,733.636276,921.350903,[]
40,3,1,200,rest,1,200,True,217.722397,16.593757,657.144743,695.158232,745.726065,916.80316,[]
32,3,1,50,rest,1,200,True,218.817952,10.808913,689.809571,757.737985,867.650689,912.589694,[]
56,3,4,200,rest,1,200,True,221.031876,9.197338,690.217169,711.800471,742.657817,903.072311,[]
50,3,4,50,rest,4,200,True,221.263249,16.583482,688.637696,711.870214,781.197685,902.31585,[]
58,3,4,200,rest,4,200,True,221.566956,11.037262,685.417461,713.923684,771.814053,901.132352,[]


### Deep dive on GRPC

In [191]:
df[df["apiType"]=="grpc"].sort_values("rate", ascending=False)

Unnamed: 0,replicas,server_workers,server_threads,apiType,cpus,max_wokers,use_engine,mean,50th,90th,95th,99th,rate,errors
60,3,4,200,grpc,1,200,True,31.389861,23.769589,71.583795,78.881398,91.312797,1586.59368,"{'OK': 190361, 'Unavailable': 48}"
52,3,4,50,grpc,1,200,True,31.398451,26.313,64.841515,73.0358,88.744198,1586.555365,"{'OK': 190333, 'Unavailable': 71}"
45,3,1,200,grpc,1,200,False,32.19124,30.448302,60.616301,68.724406,91.484308,1547.003054,"{'OK': 185606, 'Unavailable': 49}"
61,3,4,200,grpc,1,200,False,32.727674,28.4834,63.750796,72.59731,90.693812,1521.590875,"{'OK': 182555, 'Unavailable': 49}"
55,3,4,50,grpc,4,200,False,33.629848,29.610701,67.065895,77.7731,97.296599,1479.320474,"{'OK': 177471, 'Unavailable': 50}"
47,3,1,200,grpc,4,200,False,33.861023,30.2074,70.272698,83.485103,105.639301,1469.503585,"{'OK': 176302, 'Unavailable': 50}"
62,3,4,200,grpc,4,200,True,34.746801,31.896585,72.732796,84.032763,99.43309,1432.045405,"{'OK': 171799, 'Unavailable': 50}"
54,3,4,50,grpc,4,200,True,34.786883,32.141197,72.554313,82.649702,95.049705,1430.209225,"{'OK': 171578, 'Unavailable': 49}"
37,3,1,50,grpc,1,200,False,35.149376,35.153187,62.8428,72.7918,94.240299,1416.745392,"{'OK': 169973, 'Unavailable': 50}"
36,3,1,50,grpc,1,200,True,35.167657,31.8593,65.644895,76.240799,98.925899,1415.967279,"{'OK': 169880, 'Unavailable': 48}"
