# Batch processing with Argo Worfklows

In this notebook we will dive into how you can run batch processing with Argo Workflows and Seldon Core.

Dependencies:

* Seldon core installed as per the docs with an ingress
* Minio running in your cluster to use as local (s3) object storage
* Argo Workfklows installed in cluster (and argo CLI for commands)

### Setup

#### Install Seldon Core
Use the notebook to [set-up Seldon Core with Ambassador or Istio Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

Note: If running with KIND you need to make sure do follow [these steps](https://github.com/argoproj/argo-workflows/issues/2376#issuecomment-595593237) as workaround to the `/.../docker.sock` known issue.

#### Set up Minio in your cluster
Use the notebook to [set-up Minio in your cluster](https://docs.seldon.io/projects/seldon-core/en/latest/examples/minio_setup.html).

#### Create rclone configuration
In this example, our workflow stages responsible for pulling / pushing data to in-cluster MinIO S3 storage will use `rclone` CLI.
In order to configure the CLI we will create a following secret:

In [1]:
%%writefile rclone-config.yaml
apiVersion: v1
kind: Secret
metadata:
  name: rclone-config-secret
type: Opaque
stringData:
  rclone.conf: |
    [cluster-minio]
    type = s3
    provider = minio
    env_auth = false
    access_key_id = minioadmin
    secret_access_key = minioadmin
    endpoint = http://minio.minio-system.svc.cluster.local:9000

Overwriting rclone-config.yaml


In [2]:
!kubectl apply -n default -f rclone-config.yaml

secret/rclone-config-secret created


#### Install Argo Workflows
You can follow the instructions from the official [Argo Workflows Documentation](https://github.com/argoproj/argo#quickstart).

You also need to make sure that argo has permissions to create seldon deployments - for this you can create a role:

In [3]:
%%writefile role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: workflow
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - "apps"
  resources:
  - deployments
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - "*"
- apiGroups:
  - machinelearning.seldon.io
  resources:
  - "*"
  verbs:
  - "*"

Overwriting role.yaml


In [4]:
!!kubectl apply -n default -f role.yaml

['role.rbac.authorization.k8s.io/workflow created']

A service account:

In [5]:
!kubectl create -n default serviceaccount workflow

serviceaccount/workflow created


And a binding

In [6]:
!kubectl create rolebinding workflow -n default --role=workflow --serviceaccount=default:workflow

rolebinding.rbac.authorization.k8s.io/workflow created


### Create some input for our model

We will create a file that will contain the inputs that will be sent to our model

In [7]:
mkdir -p assets/

In [8]:
import os
import random

random.seed(0)
with open("assets/input-data.txt", "w") as f:
    for _ in range(10000):
        data = [random.random() for _ in range(4)]
        data = "[[" + ", ".join(str(x) for x in data) + "]]\n"
        f.write(data)

#### Check the contents of the file

In [9]:
!wc -l assets/input-data.txt
!head assets/input-data.txt

10000 assets/input-data.txt
[[0.8444218515250481, 0.7579544029403025, 0.420571580830845, 0.25891675029296335]]
[[0.5112747213686085, 0.4049341374504143, 0.7837985890347726, 0.30331272607892745]]
[[0.4765969541523558, 0.5833820394550312, 0.9081128851953352, 0.5046868558173903]]
[[0.28183784439970383, 0.7558042041572239, 0.6183689966753316, 0.25050634136244054]]
[[0.9097462559682401, 0.9827854760376531, 0.8102172359965896, 0.9021659504395827]]
[[0.3101475693193326, 0.7298317482601286, 0.8988382879679935, 0.6839839319154413]]
[[0.47214271545271336, 0.1007012080683658, 0.4341718354537837, 0.6108869734438016]]
[[0.9130110532378982, 0.9666063677707588, 0.47700977655271704, 0.8653099277716401]]
[[0.2604923103919594, 0.8050278270130223, 0.5486993038355893, 0.014041700164018955]]
[[0.7197046864039541, 0.39882354222426875, 0.824844977148233, 0.6681532012318508]]


#### Upload the file to our minio

In [10]:
!mc mb minio-seldon/data
!mc cp assets/input-data.txt minio-seldon/data/

[m[32;1mBucket created successfully `minio-seldon/data`.[0m
...-data.txt:  820.96 KiB / 820.96 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 71.44 MiB/s 0s[0m[0m[m[32;1m

#### Create Argo Workflow

In order to create our argo workflow we have made it simple so you can leverage the power of the helm charts.

Before we dive into the contents of the full helm chart, let's first give it a try with some of the settings.

We will run a batch job that will set up a Seldon Deployment with 10 replicas and 100 batch client workers to send requests.

In [11]:
!helm template seldon-batch-workflow helm-charts/seldon-batch-workflow/ \
    --set workflow.name=seldon-batch-process \
    --set seldonDeployment.name=sklearn \
    --set seldonDeployment.replicas=10 \
    --set seldonDeployment.serverWorkers=1 \
    --set seldonDeployment.serverThreads=10 \
    --set batchWorker.workers=100 \
    --set batchWorker.payloadType=ndarray \
    --set batchWorker.dataType=data \
    | argo submit --serviceaccount workflow -

Name:                seldon-batch-process
Namespace:           default
ServiceAccount:      workflow
Status:              Pending
Created:             Fri Jan 15 11:44:56 +0000 (now)
Progress:            


In [12]:
!argo list -n default

NAME                   STATUS    AGE   DURATION   PRIORITY
seldon-batch-process   Running   10s   10s        0


In [14]:
!argo get -n default seldon-batch-process

Name:                seldon-batch-process
Namespace:           default
ServiceAccount:      workflow
Status:              Succeeded
Conditions:          
 Completed           True
Created:             Fri Jan 15 11:44:56 +0000 (2 minutes ago)
Started:             Fri Jan 15 11:44:56 +0000 (2 minutes ago)
Finished:            Fri Jan 15 11:47:00 +0000 (36 seconds ago)
Duration:            2 minutes 4 seconds
Progress:            6/6
ResourcesDuration:   2m18s*(1 cpu),2m18s*(100Mi memory)

[39mSTEP[0m                           TEMPLATE                         PODNAME                          DURATION  MESSAGE
 [32m✔[0m seldon-batch-process        seldon-batch-process                                                          
 ├───[32m✔[0m create-seldon-resource  create-seldon-resource-template  seldon-batch-process-3626514072  2s          
 ├───[32m✔[0m wait-seldon-resource    wait-seldon-resource-template    seldon-batch-process-2052519094  31s         
 ├───[32m✔[0m download-o

In [15]:
!argo -n default logs seldon-batch-process

[32mseldon-batch-process-3626514072: time="2021-01-15T11:44:57.620Z" level=info msg="Starting Workflow Executor" version=v2.12.3[0m
[32mseldon-batch-process-3626514072: time="2021-01-15T11:44:57.622Z" level=info msg="Creating a K8sAPI executor"[0m
[32mseldon-batch-process-3626514072: time="2021-01-15T11:44:57.622Z" level=info msg="Executor (version: v2.12.3, build_date: 2021-01-05T00:54:54Z) initialized (pod: default/seldon-batch-process-3626514072) with template:\n{\"name\":\"create-seldon-resource-template\",\"arguments\":{},\"inputs\":{},\"outputs\":{},\"metadata\":{\"annotations\":{\"sidecar.istio.io/inject\":\"false\"}},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: machinelearning.seldon.io/v1\\nkind: SeldonDeployment\\nmetadata:\\n  name: \\\"sklearn\\\"\\n  namespace: default\\n  ownerReferences:\\n  - apiVersion: argoproj.io/v1alpha1\\n    blockOwnerDeletion: true\\n    kind: Workflow\\n    name: \\\"seldon-batch-process\\\"\\n    uid: \\\"511f64a2-0699-42

### Check output in object store

We can now visualise the output that we obtained in the object store.

First we can check that the file is present:

In [16]:
import json

wf_arr = !argo get -n default seldon-batch-process -o json
wf = json.loads("".join(wf_arr))
WF_UID = wf["metadata"]["uid"]
print(f"Workflow UID is {WF_UID}")

Workflow UID is 511f64a2-0699-42eb-897a-c0a57b24072c


In [17]:
!mc ls minio-seldon/data/output-data-"$WF_UID".txt

[m[32m[2021-01-15 11:46:42 GMT] [0m[33m 3.4MiB [0moutput-data-511f64a2-0699-42eb-897a-c0a57b24072c.txt
[0m

Now we can output the contents of the file created using the `mc head` command.

In [18]:
!mc cp minio-seldon/data/output-data-"$WF_UID".txt assets/output-data.txt
!head assets/output-data.txt

...4072c.txt:  3.36 MiB / 3.36 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 192.59 MiB/s 0s[0m[0m[m[32;1m{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.1859090109477526, 0.46433848375587844, 0.349752505296369]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "3c4000b8-5727-11eb-91c1-6e88dc41eb63", "batch_index": 1.0, "batch_instance_id": "3c40e1e0-5727-11eb-9fe5-6e88dc41eb63"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.1679456497678022, 0.42318259169768935, 0.4088717585345084]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {"tags": {"batch_id": "3c4000b8-5727-11eb-91c1-6e88dc41eb63", "batch_index": 22.0, "batch_instance_id": "3c42efb2-5727-11eb-9fe5-6e88dc41eb63"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.5329356306409886, 0.2531124742231082, 0.21395189513590318]]}, "meta": {"requestPath": {"classifier": "seldonio/sklearnserver:1.6.0-dev"}, "tags": {

In [19]:
!argo delete -n default seldon-batch-process

Workflow 'seldon-batch-process' deleted
