## Batch processing with Argo Worfklows

In this notebook we will dive into how you can run batch processing with Argo Workflows and Seldon Core.

Dependencies:

* Seldon core installed as per the docs with an ingress
* Argo Workfklows installed in cluster (and argo CLI for commands)


## Seldon Core Batch with Object Store

In some cases we may want to read the data from an object source.

In this case we will show how you can read from an object store, in this case minio.

The workflow will look as follows:

![](assets/seldon-batch.jpg)

For this we will assume you have installed the Minio (mc) CLI - we will use a Minio client in the cluster but you can use another object store provider like S3, Google Cloud, Azure, etc.

### Set up Minio in your cluster

In [481]:
%%bash 
helm install minio stable/minio \
    --set accessKey=minioadmin \
    --set secretKey=minioadmin \
    --set image.tag=RELEASE.2020-04-15T19-42-18Z

NAME: minio
LAST DEPLOYED: Thu Apr 30 10:57:00 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Minio can be accessed via port 9000 on the following DNS name from within your cluster:
minio.default.svc.cluster.local

To access Minio from localhost, run the below commands:

  1. export POD_NAME=$(kubectl get pods --namespace default -l "release=minio" -o jsonpath="{.items[0].metadata.name}")

  2. kubectl port-forward $POD_NAME 9000 --namespace default

Read more about port forwarding here: http://kubernetes.io/docs/user-guide/kubectl/kubectl_port-forward/

You can now access Minio server on http://localhost:9000. Follow the below steps to connect to Minio server with mc client:

  1. Download the Minio mc client - https://docs.minio.io/docs/minio-client-quickstart-guide

  2. mc config host add minio-local http://localhost:9000 minioadmin minioadmin S3v4

  3. mc ls minio-local

Alternately, you can use your browser or the Minio SDK to access the server - ht

### Forward the Minio port so you can access it

You can do this by runnning the following command in your terminal:
```
kubectl port-forward svc/minio 9000:9000
    ```
    
### Configure local minio client

In [483]:
!mc config host add minio-local http://localhost:9000 minioadmin minioadmin

[m[32mAdded `minio-local` successfully.[0m
[0m

### Create some input for our model

We will create a file that will contain the inputs that will be sent to our model

In [123]:
with open("assets/input-data.txt", "w") as f:
    for i in range(10000):
        f.write('[[1, 2, 3, 4]]\n')

### Check the contents of the file

In [124]:
!wc -l assets/input-data.txt
!head assets/input-data.txt

10000 assets/input-data.txt
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]
[[1, 2, 3, 4]]


### Upload the file to our minio

In [126]:
!mc mb minio-local/data
!mc cp assets/input-data.txt minio-local/data/

[33;3mmc: <ERROR> [0m[33;3mUnable to make bucket `minio-local/data`. Your previous request to create the named bucket succeeded and you already own it.
...-data.txt:  146.48 KiB / 146.48 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 2.36 MiB/s 0s[0m[0m[m[32;1m

### Create Argo Workflow

In order to create our argo workflow we have made it simple so you can leverage the power of the helm charts.

Before we dive into the contents of the full helm chart, let's first give it a try with some of the settings.

We will run a batch job that will set up a Seldon Deployment with 10 replicas and 100 batch client workers to send requests.

In [132]:
!helm template seldon-batch-workflow helm-charts/seldon-batch-workflow/ \
    --set workflow.name=seldon-batch-process \
    --set seldonDeployment.name=sklearn \
    --set seldonDeployment.replicas=10 \
    --set batchWorker.workers=100 \
    --set batchWorker.payloadType=data \
    --set batchWorker.dataType=ndarray \
    | argo submit -

Name:                seldon-batch-process
Namespace:           default
ServiceAccount:      default
Status:              Pending
Created:             Thu Jun 04 18:48:39 +0100 (now)


In [133]:
!argo list

NAME                   STATUS      AGE   DURATION   PRIORITY
seldon-batch-process   Succeeded   1m    1m         0


In [134]:
!argo get seldon-batch-process

Name:                seldon-batch-process
Namespace:           default
ServiceAccount:      default
Status:              Succeeded
Created:             Thu Jun 04 18:48:39 +0100 (1 minute ago)
Started:             Thu Jun 04 18:48:39 +0100 (1 minute ago)
Finished:            Thu Jun 04 18:50:23 +0100 (3 seconds ago)
Duration:            1 minute 44 seconds

[39mSTEP[0m                                                             PODNAME                          DURATION  MESSAGE
 [32m✔[0m seldon-batch-process (seldon-batch-process)                                                              
 ├---[32m✔[0m create-seldon-resource (create-seldon-resource-template)  seldon-batch-process-3626514072  2s        
 ├---[32m✔[0m wait-seldon-resource (wait-seldon-resource-template)      seldon-batch-process-2052519094  27s       
 ├---[32m✔[0m download-object-store (download-object-store-template)    seldon-batch-process-1257652469  3s        
 ├---[32m✔[0m process-batch

In [135]:
!argo logs -w seldon-batch-process 

[35mcreate-seldon-resource[0m:	time="2020-06-04T17:48:40Z" level=info msg="Starting Workflow Executor" version=v2.8.0-rc4+8f69617.dirty
[35mcreate-seldon-resource[0m:	time="2020-06-04T17:48:40Z" level=info msg="Creating a docker executor"
[35mcreate-seldon-resource[0m:	time="2020-06-04T17:48:40Z" level=info msg="Executor (version: v2.8.0-rc4+8f69617.dirty, build_date: 2020-05-12T15:17:15Z) initialized (pod: default/seldon-batch-process-3626514072) with template:\n{\"name\":\"create-seldon-resource-template\",\"arguments\":{},\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: machinelearning.seldon.io/v1\\nkind: SeldonDeployment\\nmetadata:\\n  name: \\\"sklearn\\\"\\n  namespace: default\\n  ownerReferences:\\n  - apiVersion: argoproj.io/v1alpha1\\n    blockOwnerDeletion: true\\n    kind: Workflow\\n    name: \\\"seldon-batch-process\\\"\\n    uid: \\\"d80adc37-5794-45c5-9fb6-0d1e90ba84d0\\\"\\nspec:\\n  name: \\\"sklearn

## Check output in object store

We can now visualise the output that we obtained in the object store.

First we can check that the file is present:

In [136]:
import json
wf_arr = !argo get seldon-batch-process -o json
wf = json.loads("".join(wf_arr))
WF_ID = wf["metadata"]["uid"]
print(f"Workflow ID is {WF_ID}")

Workflow ID is d80adc37-5794-45c5-9fb6-0d1e90ba84d0


In [138]:
!mc ls minio-local/data/output-data-"$WF_ID".txt

[m[32m[2020-06-04 18:50:20 BST] [0m[33m 1.9MiB [0m[1moutput-data-d80adc37-5794-45c5-9fb6-0d1e90ba84d0.txt[0m
[0m

Now we can output the contents of the file created using the `mc head` command.

In [139]:
!mc cp minio-local/data/output-data-"$WF_ID".txt assets/output-data.txt
!head assets/output-data.txt

...a84d0.txt:  1.91 MiB / 1.91 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 27.26 MiB/s 0s[0m[0m[m[32;1m{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"puid": "b90c2bee-a68b-11ea-9fbf-3587242bbd96"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"puid": "b914e0fa-a68b-11ea-a606-2987bbff97f2"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"puid": "b90bfcc0-a68b-11ea-9052-7921f7c31425"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755, 0.9956334415074478]]}, "meta": {"tags": {"tags": {"puid": "b9253a1e-a68b-11ea-9b7a-a2bea649c686"}}}}
{"data": {"names": ["t:0", "t:1", "t:2"], "ndarray": [[0.0006985194531162841, 0.003668039039435755

In [140]:
!argo delete seldon-batch-process

Workflow 'seldon-batch-process' deleted
