# Batch processing with Argo Worfklows

In this notebook we will dive into how you can run batch processing with Argo Workflows and Seldon Core.

Dependencies:

* Seldon core installed as per the docs with an ingress
* Minio running in your cluster to use as local (s3) object storage
* Argo Workfklows installed in cluster (and argo CLI for commands)


### Setup

#### Install Seldon Core
Use the notebook to [set-up Seldon Core with Ambassador or Istio Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

Note: If running with KIND you need to make sure do follow [these steps](https://github.com/argoproj/argo/issues/2376#issuecomment-595593237) as workaround to the `/.../docker.sock` known issue.

#### Set up Minio in your cluster
Use the notebook to [set-up Minio in your cluster](https://docs.seldon.io/projects/seldon-core/en/latest/examples/minio_setup.html).

#### Copy the Minio Secret to namespace

We need to re-use the minio secret for the batch job, so this can be done by just copying the minio secret created in the `minio-system`

The command below just copies the secred with the name "minio" from the minio-system namespace to the default namespace.

In [2]:
!kubectl get secret minio -n minio-system -o json | jq '{apiVersion,data,kind,metadata,type} | .metadata |= {"annotations", "name"}' | kubectl apply -n default -f -

secret/minio created


#### Install Argo Workflows
You can follow the instructions from the official [Argo Workflows Documentation](https://github.com/argoproj/argo#quickstart).

You also need to make sure that argo has permissions to create seldon deployments - for this you can create a role:

In [19]:
%%writefile role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: workflow
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - "apps"
  resources:
  - deployments
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - "*"
- apiGroups:
  - machinelearning.seldon.io
  resources:
  - "*"
  verbs:
  - "*"

Overwriting role.yaml


In [21]:
!!kubectl apply -f role.yaml

 'role.rbac.authorization.k8s.io/workflow configured']

A service account:

In [22]:
!kubectl create serviceaccount workflow

Error from server (AlreadyExists): serviceaccounts "workflow" already exists


And a binding

In [12]:
!kubectl create rolebinding workflow --role=workflow --serviceaccount=default:workflow

rolebinding.rbac.authorization.k8s.io/workflow created


### Create some input for our model

We will create a file that will contain the inputs that will be sent to our model

In [13]:
mkdir -p assets/

In [14]:
with open("assets/input-data.txt", "w") as f:
    for i in range(10000):
        f.write('[[1, 2, 3, 4]]\n')

#### Check the contents of the file

In [15]:
!wc -l assets/input-data.txt
!head assets/input-data.txt

10000 assets/input-data.txt
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]


#### Upload the file to our minio

In [16]:
!mc mb minio-seldon/data
!mc cp assets/input-data.txt minio-seldon/data/

[m[32;1mBucket created successfully `minio-seldon/data`.[0m
...-data.txt:  146.48 KiB / 146.48 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 14.57 MiB/s 0s[0m[0m[m[32;1m

#### Create Argo Workflow

In order to create our argo workflow we have made it simple so you can leverage the power of the helm charts.

Before we dive into the contents of the full helm chart, let's first give it a try with some of the settings.

We will run a batch job that will set up a Seldon Deployment with 10 replicas and 100 batch client workers to send requests.

In [32]:
!helm template seldon-batch-workflow helm-charts/seldon-batch-workflow/ \
    --set workflow.name=seldon-batch-process \
    --set seldonDeployment.name=sklearn \
    --set seldonDeployment.replicas=10 \
    --set seldonDeployment.serverWorkers=1 \
    --set seldonDeployment.serverThreads=10 \
    --set batchWorker.workers=100 \
    --set batchWorker.payloadType=ndarray \
    --set batchWorker.dataType=data \
    | argo submit --serviceaccount workflow -

Name:                seldon-batch-process
Namespace:           default
ServiceAccount:      workflow
Status:              Pending
Created:             Thu Nov 19 15:09:43 +0000 (now)


In [33]:
!argo list

NAME                   STATUS    AGE   DURATION   PRIORITY
seldon-batch-process   Running   1m    1m         0


In [25]:
!argo get seldon-batch-process

Name:                seldon-batch-process
Namespace:           default
ServiceAccount:      workflow
Status:              Succeeded
Conditions:          
 Completed           True
Created:             Thu Nov 19 14:20:58 +0000 (27 minutes ago)
Started:             Thu Nov 19 14:20:58 +0000 (27 minutes ago)
Finished:            Thu Nov 19 14:25:30 +0000 (23 minutes ago)
Duration:            4 minutes 32 seconds
ResourcesDuration:   7m52s*(1 cpu),7m52s*(100Mi memory)

[39mSTEP[0m                           TEMPLATE                         PODNAME                          DURATION  MESSAGE
 [32m✔[0m seldon-batch-process        seldon-batch-process                                                          
 ├---[32m✔[0m create-seldon-resource  create-seldon-resource-template  seldon-batch-process-3626514072  1s          
 ├---[32m✔[0m wait-seldon-resource    wait-seldon-resource-template    seldon-batch-process-2052519094  38s         
 ├---[32m✔[0m download-object-

In [27]:
!argo logs -w seldon-batch-process || argo logs seldon-batch-process # The 2nd command is for argo 2.8+

[32mseldon-batch-process-3626514072: time="2020-11-19T14:20:59.511Z" level=info msg="Starting Workflow Executor" version=v2.11.7[0m
[32mseldon-batch-process-3626514072: time="2020-11-19T14:20:59.514Z" level=info msg="Creating a K8sAPI executor"[0m
[32mseldon-batch-process-3626514072: time="2020-11-19T14:20:59.514Z" level=info msg="Executor (version: v2.11.7, build_date: 2020-11-02T21:05:12Z) initialized (pod: default/seldon-batch-process-3626514072) with template:\n{\"name\":\"create-seldon-resource-template\",\"arguments\":{},\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: machinelearning.seldon.io/v1\\nkind: SeldonDeployment\\nmetadata:\\n  name: \\\"sklearn\\\"\\n  namespace: default\\n  ownerReferences:\\n  - apiVersion: argoproj.io/v1alpha1\\n    blockOwnerDeletion: true\\n    kind: Workflow\\n    name: \\\"seldon-batch-process\\\"\\n    uid: \\\"3dc52b6d-937c-47a8-b5f7-d3ca99c74f4e\\\"\\nspec:\\n  name: \\\"sklear

### Check output in object store

We can now visualise the output that we obtained in the object store.

First we can check that the file is present:

In [34]:
import json
wf_arr = !argo get seldon-batch-process -o json
wf = json.loads("".join(wf_arr))
WF_ID = wf["metadata"]["uid"]
print(f"Workflow ID is {WF_ID}")

Workflow ID is 323b666e-c546-431f-a64d-5fc780d68a18


In [35]:
!mc ls minio-seldon/data/output-data-"$WF_ID".txt

[m[32m[2020-11-19 15:11:17 GMT][0m[33m 2.7MiB[0m[1m output-data-323b666e-c546-431f-a64d-5fc780d68a18.txt[0m
[0m

Now we can output the contents of the file created using the `mc head` command.

In [36]:
!mc cp minio-seldon/data/output-data-"$WF_ID".txt assets/output-data.txt
!head assets/output-data.txt

...68a18.txt:  2.75 MiB / 2.75 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 159.35 MiB/s 0s[0m[0m{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"batch_id": "60b44a9c-2a79-11eb-b0aa-820f6474333b", "batch_index": 0.0, "batch_instance_id": "60b49614-2a79-11eb-9b0d-820f6474333b"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"batch_id": "60b44a9c-2a79-11eb-b0aa-820f6474333b", "batch_index": 3.0, "batch_instance_id": "60b50932-2a79-11eb-9b0d-820f6474333b"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"batch_id": "60b44a9c-2a79-11eb-b0aa-820f6474333b", "batch_index": 1.0, "batch_instance_id": "60b49d9e-2a79-11eb-9b0d-820f6474333b"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [

In [37]:
!argo delete seldon-batch-process

Workflow 'seldon-batch-process' deleted
