## Benchmarking with Argo Worfklows & Vegeta

In this notebook we will dive into how you can run bench marking with batch processing with Argo Workflows, Seldon Core and Vegeta.

Dependencies:

* Seldon core installed as per the docs with Istio as an ingress 
* Argo Workfklows installed in cluster (and argo CLI for commands)


## Setup

### Install Seldon Core
Use the notebook to [set-up Seldon Core with Ambassador or Istio Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

Note: If running with KIND you need to make sure do follow [these steps](https://github.com/argoproj/argo/issues/2376#issuecomment-595593237) as workaround to the `/.../docker.sock` known issue.


### Install Argo Workflows
You can follow the instructions from the official [Argo Workflows Documentation](https://github.com/argoproj/argo#quickstart).

Download the right CLi for your environment following the documentation (https://github.com/argoproj/argo-workflows/releases/tag/v3.0.8)

You also need to make sure that argo has permissions to create seldon deployments - for this you can just create a default-admin rolebinding as follows:

In [600]:
!kubectl create namespace argo || echo "namespace already created"
!kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/stable/manifests/install.yaml
!kubectl rollout status -n argo deployment/argo-server
!kubectl rollout status -n argo deployment/workflow-controller

Error from server (AlreadyExists): namespaces "argo" already exists
namespace already created
customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io unchanged
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io unchanged
serviceaccount/argo unchanged
serviceaccount/argo-server unchanged
role.rbac.authorization.k8s.io/argo-role unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit unchanged
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view unchanged
clusterrole.rbac.authorization.k8s.io/argo-cluster-role unchanged
clusterrole.rbac.authorization.k8s.io/argo-server-cluster-role unchanged
rolebi

Set up the RBAC so the argo workflow is able to create seldon deployments.

In [646]:
!kubectl create rolebinding argo-default-admin --clusterrole=admin --serviceaccount=argo:default -n argo
!kubectl create rolebinding argo-seldon-workflow --clusterrole=seldon-manager-role-seldon-system --serviceaccount=argo:default -n argo

rolebinding.rbac.authorization.k8s.io/argo-default-admin created


Set up the configmap in order for it to work in KIND and other environments where Docker may not be thr main runtime (see https://github.com/argoproj/argo-workflows/issues/5243#issuecomment-792993742)

In [602]:
%%bash
kubectl apply -n argo -f - << END
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
data:
  containerRuntimeExecutor: k8sapi
END

configmap/workflow-controller-configmap configured


### Create Benchmark Argo Workflow

In order to create a benchmark, we created a simple argo workflow template so you can leverage the power of the helm charts.

Before we dive into the contents of the full helm chart, let's first give it a try with some of the settings.

We will run a batch job that will set up a Seldon Deployment with 1 replicas and 4 cpus (with 100 max workers) to send requests.

In [655]:
!helm template seldon-benchmark-workflow helm-charts/seldon-benchmark-workflow/ \
    --set workflow.namespace=argo \
    --set workflow.name=seldon-benchmark-process \
    --set workflow.parallelism=3 \
    --set seldonDeployment.name=sklearn \
    --set seldonDeployment.replicas="1|2" \
    --set seldonDeployment.serverWorkers="2|5|10" \
    --set seldonDeployment.serverThreads=1 \
    --set seldonDeployment.modelUri="gs://seldon-models/sklearn/iris" \
    --set seldonDeployment.server="SKLEARN_SERVER" \
    --set seldonDeployment.apiType="rest|grpc" \
    --set seldonDeployment.requests.cpu="500Mi|1000Mi|2000Mi" \
    --set seldonDeployment.requests.limits="2000Mi" \
    --set seldonDeployment.disableOrchestrator="true|false" \
    --set benchmark.cpu="5" \
    --set benchmark.concurrency="1" \
    --set benchmark.duration="30s" \
    --set benchmark.rate=0 \
    --set benchmark.data='\{"data": {"ndarray": [[0\,1\,2\,3]]\}\}' \
    | argo submit -

Name:                seldon-benchmark-process
Namespace:           argo
ServiceAccount:      default
Status:              Pending
Created:             Sun Jun 27 13:41:39 +0100 (now)
Progress:            


In [656]:
!argo list -n argo

NAME                       STATUS      AGE   DURATION   PRIORITY
seldon-benchmark-process   Running     1s    1s         0
hello-world-gpb5q          Succeeded   9m    10s        0
hello-world-dglk9          Error       10m   20s        0
hello-world-r87z6          Error       12m   20s        0
hello-world-c4bwz          Error       15m   20s        0
hello-world-4kdw7          Succeeded   16m   10s        0
hello-world-p4c7k          Error       17m   20s        0
hello-world-rp4vp          Error       18m   20s        0
hello-world-zscsv          Error       18m   20s        0
hello-world-ssqgp          Succeeded   30m   10s        0
hello-world-h2p9t          Error       31m   20s        0
hello-world-qjgfv          Succeeded   35m   10s        0
hello-world-26bzw          Succeeded   39m   10s        0
hello-world-l5nkr          Error       2h    42s        0


In [None]:
!argo logs -f seldon-benchmark-process -n argo

[33mseldon-benchmark-process-635956972: [{"name": "sklearn-0", "replicas": "1", "serverWorkers": "2", "serverThreads": "1", "modelUri": "gs://seldon-models/sklearn/iris", "image": "", "server": "SKLEARN_SERVER", "apiType": "rest", "requestsCpu": "500Mi", "requestsMemory": "100Mi", "limitsCpu": "50m", "limitsMemory": "1000Mi", "benchmarkCpu": "5", "concurrency": "1", "duration": "30s", "rate": "0", "disableOrchestrator": "true", "params": "{\"name\": \"sklearn-0\", \"replicas\": \"1\", \"serverWorkers\": \"2\", \"serverThreads\": \"1\", \"modelUri\": \"gs://seldon-models/sklearn/iris\", \"image\": \"\", \"server\": \"SKLEARN_SERVER\", \"apiType\": \"rest\", \"requestsCpu\": \"500Mi\", \"requestsMemory\": \"100Mi\", \"limitsCpu\": \"50m\", \"limitsMemory\": \"1000Mi\", \"benchmarkCpu\": \"5\", \"concurrency\": \"1\", \"duration\": \"30s\", \"rate\": \"0\", \"disableOrchestrator\": \"true\"}"}, {"name": "sklearn-1", "replicas": "1", "serverWorkers": "2", "serverThreads": "1", "modelUri":

[32mseldon-benchmark-process-1926760347: time="2021-06-27T12:41:50.906Z" level=info msg="Starting Workflow Executor" version="{v3.0.3 2021-05-11T21:14:20Z 02071057c082cf295ab8da68f1b2027ff8762b5a v3.0.3 clean go1.15.7 gc linux/amd64}"[0m
[32mseldon-benchmark-process-1926760347: time="2021-06-27T12:41:50.913Z" level=info msg="Creating a K8sAPI executor"[0m
[32mseldon-benchmark-process-1926760347: time="2021-06-27T12:41:50.913Z" level=info msg="Executor (version: v3.0.3, build_date: 2021-05-11T21:14:20Z) initialized (pod: argo/seldon-benchmark-process-1926760347) with template:\n{\"name\":\"create-seldon-resource-template\",\"inputs\":{\"parameters\":[{\"name\":\"inparam\",\"value\":\"sklearn-1\"},{\"name\":\"replicas\",\"value\":\"1\"},{\"name\":\"serverWorkers\",\"value\":\"2\"},{\"name\":\"serverThreads\",\"value\":\"1\"},{\"name\":\"modelUri\",\"value\":\"gs://seldon-models/sklearn/iris\"},{\"name\":\"image\",\"value\":\"\"},{\"name\":\"server\",\"value\":\"SKLEARN_SERVER\"},{\"

In [None]:
!argo get seldon-benchmark-process -n argo

## Process the results

We can now print the results in a consumable format.

In [494]:
def get_results(results, print_results=True):
    final = {}
    # For GHZ / grpc
    if "average" in results:
        final["mean"] = results["average"] / 1e6
        if results.get("latencyDistribution", False):
            final["50th"] = results["latencyDistribution"][-5]["latency"] / 1e6
            final["90th"] = results["latencyDistribution"][-3]["latency"] / 1e6
            final["95th"] = results["latencyDistribution"][-2]["latency"] / 1e6
            final["99th"] = results["latencyDistribution"][-1]["latency"] / 1e6
        final["throughputAchieved"] = results["rps"]
        final["success"] = results["statusCodeDistribution"].get("OK", 0)
        final["errors"] = sum(results["statusCodeDistribution"].values()) - final["success"]
    # For vegeta / rest
    else:
        final["mean"] = results["latencies"]["mean"] / 1e6
        final["50th"] = results["latencies"]["50th"] / 1e6
        final["90th"] = results["latencies"]["90th"] / 1e6
        final["95th"] = results["latencies"]["95th"] / 1e6
        final["99th"] = results["latencies"]["99th"] / 1e6
        final["throughputAchieved"] = results["throughput"]
        final["success"] = results["status_codes"].get("200", 0)
        final["errors"] = sum(results["status_codes"].values()) - final["success"]
    for k in results["params"].keys():
        final[k] = results["params"][k]
    if print_results:
        print("-----")
        print("ParamNames:", results["params"].keys())
        print("ParamNames:", results["params"].values())
        print("\tLatencies:")
        print("\t\tmean:", final["mean"], "ms")
        print("\t\t50th:", final["50th"], "ms")
        print("\t\t90th:", final["90th"], "ms")
        print("\t\t95th:", final["95th"], "ms")
        print("\t\t99th:", final["99th"], "ms")
        print("")
        print("\tRate:", str(final["throughputAchieved"]) + "/s")
        print("\tSuccess:", final["success"])
        print("\tErrors:", final["errors"])
    return final

In [None]:
wf_logs = !argo logs --no-color seldon-benchmark-process -n argo
wf_bench = [json.loads(":".join(w.split(":")[1:])) for w in wf_logs if "latenc" in w]
print(wf_bench)

In [None]:
results = []
for w in wf_bench:
    # Prints the results in a consumable format
    results.append(get_results(w))

In [654]:
!argo delete seldon-benchmark-process -n argo || echo "Argo workflow already deleted or not exists"

Workflow 'seldon-benchmark-process' not found


In [None]:
!kubectl delete -n argo -f https://raw.githubusercontent.com/argoproj/argo/v3.1.0/manifests/install.yaml

## Deeper Analysis
Now that we have all the parameters, we can do a deeper analysis

In [497]:
import pandas as pd

df = pd.DataFrame.from_dict(results)
df.head()

Unnamed: 0,mean,50th,90th,95th,99th,throughputAchieved,success,errors,name,replicas,...,apiType,requestsCpu,requestsMemory,limitsCpu,limitsMemory,benchmarkCpu,concurrency,duration,rate,disableOrchestrator
0,6.828045,6.39052,9.264792,10.765109,14.086176,146.276304,4389,0,sklearn-1,1,...,rest,500Mi,100Mi,50m,1000Mi,5,1,30s,0,False
1,5.401355,5.254559,6.584599,7.153629,8.984036,184.916561,5549,0,sklearn-5,1,...,rest,2000Mi,100Mi,50m,1000Mi,5,1,30s,0,False
2,5.291129,5.183541,6.444586,7.040763,9.199069,188.787112,5665,0,sklearn-4,1,...,rest,2000Mi,100Mi,50m,1000Mi,5,1,30s,0,True
3,4.646862,4.383044,5.952588,6.775633,9.886973,215.054059,6453,0,sklearn-15,1,...,rest,1000Mi,100Mi,50m,1000Mi,5,1,30s,0,False
4,6.56833,6.142206,8.891155,10.257702,13.721939,152.068454,4563,0,sklearn-0,1,...,rest,500Mi,100Mi,50m,1000Mi,5,1,30s,0,True
