## Batch processing with Argo Worfklows

In this notebook we will dive into how you can run batch processing with Argo Workflows and Seldon Core.

Dependencies:

* Seldon core installed as per the docs with an ingress
* Argo Workfklows installed in cluster (and argo CLI for commands)


## Argo Workflows Example

Let's try an argo workflows example to see intuitively how it works. 

In this case we will trigger a workflow with 3 steps (first one will execute and the other two jobs are dependent on that)

In [1]:
mkdir -p assets

In [12]:
%%writefile assets/argo-example.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: steps-
spec:
  entrypoint: hello-hello-hello
  # This spec contains two templates: hello-hello-hello and whalesay
  templates:
  - name: hello-hello-hello
    # Instead of just running a container
    # This template has a sequence of steps
    steps:
    - - name: hello1            # hello1 is run before the following steps
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello1"
    - - name: hello2a           # double dash => run after previous step
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2a"
      - name: hello2b           # single dash => run in parallel with previous step
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2b"
  # This is the same template as from the previous example
  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]
        

Overwriting assets/argo-example.yaml


In [13]:
!argo submit assets/argo-example.yaml

Name:                steps-9tgj9
Namespace:           default
ServiceAccount:      default
Status:              Pending
Created:             Fri Apr 17 08:27:14 +0100 (8 hours ago)


In [14]:
!argo list

NAME          STATUS      AGE   DURATION   PRIORITY
steps-9tgj9   Succeeded   8h    3m         0


In [20]:
output=!argo list | grep steps
WF_NAME=output[0].split()[0]
print(WF_NAME)

steps-9tgj9


In [23]:
!argo get $WF_NAME

Name:                steps-9tgj9
Namespace:           default
ServiceAccount:      default
Status:              Succeeded
Created:             Fri Apr 17 08:27:14 +0100 (8 hours ago)
Started:             Fri Apr 17 08:27:14 +0100 (8 hours ago)
Finished:            Fri Apr 17 08:30:48 +0100 (8 hours ago)
Duration:            3 minutes 34 seconds

[39mSTEP[0m                                PODNAME                 DURATION  MESSAGE
 [32m✔[0m steps-9tgj9 (hello-hello-hello)                                    
 ├---[32m✔[0m hello1 (whalesay)            steps-9tgj9-3240403473  3m        
 └-·-[32m✔[0m hello2a (whalesay)           steps-9tgj9-3510808138  3s        
   └-[32m✔[0m hello2b (whalesay)           steps-9tgj9-3494030519  5s        


In [22]:
!argo logs -w $WF_NAME

[37mhello1[0m:	 ________ 
[37mhello1[0m:	< hello1 >
[37mhello1[0m:	 -------- 
[37mhello1[0m:	    \
[37mhello1[0m:	     \
[37mhello1[0m:	      \     
[37mhello1[0m:	                    ##        .            
[37mhello1[0m:	              ## ## ##       ==            
[37mhello1[0m:	           ## ## ## ##      ===            
[37mhello1[0m:	       /""""""""""""""""___/ ===        
[37mhello1[0m:	  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~   
[37mhello1[0m:	       \______ o          __/            
[37mhello1[0m:	        \    \        __/             
[37mhello1[0m:	          \____\______/   
[37mhello2a[0m:	 _________ 
[37mhello2a[0m:	< hello2a >
[37mhello2a[0m:	 --------- 
[37mhello2a[0m:	    \
[37mhello2a[0m:	     \
[37mhello2a[0m:	      \     
[37mhello2a[0m:	                    ##        .            
[37mhello2a[0m:	              ## ## ##       ==            
[37mhello2a[0m:	           ## ## ## ##      ===            

In [25]:
!argo delete $WF_NAME

Workflow 'steps-9tgj9' deleted


## Seldon Core Batch 
Now we can leverage this functionality by using seldon core batch

In [47]:
%%writefile assets/seldon-batch.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: seldon-batch-
spec:
  entrypoint: seldon-batch-process
  templates:
  - name: seldon-batch-process
    steps:
    - - name: create-seldon-resource            
        template: create-seldon-resource-template
    - - name: wait-seldon-resource
        template: wait-seldon-resource-template
    - - name: process-batch-inputs
        template: process-batch-inputs-template
            
  - name: create-seldon-resource-template
    resource:
      action: create
      manifest: |
        apiVersion: machinelearning.seldon.io/v1
        kind: SeldonDeployment
        metadata:
          name: "{{workflow.uid}}"
          ownerReferences:
          - apiVersion: argoproj.io/v1alpha1
            blockOwnerDeletion: true
            kind: Workflow
            name: "{{workflow.name}}"
            uid: "{{workflow.uid}}"
        spec:
          name: "{{workflow.uid}}"
          predictors:
            - graph:
                children: []
                implementation: SKLEARN_SERVER
                modelUri: gs://seldon-models/sklearn/iris
                name: classifier
              name: default
              replicas: 1
                
  - name: wait-seldon-resource-template
    script:
      image: seldonio/core-builder:0.14
      command: [bash]
      source: |
        kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id="{{workflow.uid}}" -o jsonpath='{.items[0].metadata.name}')
        
  - name: process-batch-inputs-template
    script:
      image: seldonio/seldon-core-s2i-python3:1.1.1-SNAPSHOT
      command: [python]
      source: |
        from seldon_core.seldon_client import SeldonClient
        import numpy as np
        import time
        time.sleep(10)
        sc = SeldonClient(
            gateway_endpoint="istio-ingressgateway.istio-system.svc.cluster.local",
            deployment_name="{{workflow.uid}}",
            namespace="default")
        for i in range(10):
            data = np.array([[i, i, i, i]])
            output = sc.predict(data=data)
            print(output.response)
            

Overwriting assets/seldon-batch.yaml


In [48]:
!argo submit assets/seldon-batch.yaml

Name:                seldon-batch-wxbr5
Namespace:           default
ServiceAccount:      default
Status:              Pending
Created:             Sat Apr 18 12:36:57 +0100 (now)


In [49]:
!argo list

NAME                 STATUS      AGE   DURATION   PRIORITY
seldon-batch-wxbr5   Running     2s    2s         0
seldon-batch-kslgh   Succeeded   2m    42s        0


In [50]:
output=!argo list | grep seldon-batch
WF_NAME=output[0].split()[0]
print(WF_NAME)

seldon-batch-wxbr5


In [51]:
!argo get $WF_NAME

Name:                seldon-batch-wxbr5
Namespace:           default
ServiceAccount:      default
Status:              Running
Created:             Sat Apr 18 12:36:57 +0100 (3 seconds ago)
Started:             Sat Apr 18 12:36:57 +0100 (3 seconds ago)
Duration:            3 seconds

[39mSTEP[0m                                                             PODNAME                        DURATION  MESSAGE
 [36m●[0m seldon-batch-wxbr5 (seldon-batch-process)                                                              
 └---[36m●[0m create-seldon-resource (create-seldon-resource-template)  seldon-batch-wxbr5-2588046603  3s        


In [54]:
!argo logs -w $WF_NAME

[35mcreate-seldon-resource[0m:	time="2020-04-18T11:36:58Z" level=info msg="Starting Workflow Executor" version=vv2.7.4+50b209c.dirty
[35mcreate-seldon-resource[0m:	time="2020-04-18T11:36:58Z" level=info msg="Creating a docker executor"
[35mcreate-seldon-resource[0m:	time="2020-04-18T11:36:58Z" level=info msg="Executor (version: vv2.7.4+50b209c.dirty, build_date: 2020-04-16T16:37:57Z) initialized (pod: default/seldon-batch-wxbr5-2588046603) with template:\n{\"name\":\"create-seldon-resource-template\",\"arguments\":{},\"inputs\":{},\"outputs\":{},\"metadata\":{},\"resource\":{\"action\":\"create\",\"manifest\":\"apiVersion: machinelearning.seldon.io/v1\\nkind: SeldonDeployment\\nmetadata:\\n  name: \\\"b83971e5-e0e9-488f-9bd0-4c57dc97c79c\\\"\\n  ownerReferences:\\n  - apiVersion: argoproj.io/v1alpha1\\n    blockOwnerDeletion: true\\n    kind: Workflow\\n    name: \\\"seldon-batch-wxbr5\\\"\\n    uid: \\\"b83971e5-e0e9-488f-9bd0-4c57dc97c79c\\\"\\nspec:\\n  name: \\\"b83971e5-e0

In [55]:
outputs = !(argo logs -w $WF_NAME --no-color | grep "process-batch-inputs" | cut -c 23-)
for o in outputs:
    print(o)

{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.3664487684438811, 0.48528762951761806, 0.14826360203850078]}}, 'meta': {}}
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.2075509473561692, 0.2443463805811625, 0.5481026720626684]}}, 'meta': {}}
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.06995304386311439, 0.04864300564562103, 0.8814039504912645]}}, 'meta': {}}
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.01859472366015777, 0.006956450489196832, 0.9744488258506454]}}, 'meta': {}}
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.004653433216061866, 0.0009398331072469446, 0.9944067336766912]}}, 'meta': {}}
{'data': {'names': ['t:0', 't:1', 't:2'], 'tensor': {'shape': [1, 3], 'values': [0.0011463235173706913, 0.0001256712307515923, 0.9987280052518777]}}, 'meta': {}}
{'data': {'names': ['t:0', 't:1', 't:2'], 'ten

In [56]:
!argo delete $WF_NAME

Workflow 'seldon-batch-wxbr5' deleted


## Seldon Core Batch with Object Store

In some cases we may want to read the data from an object store.

For this we will assume you have installed the Minio (mc) CLI - we will use a Minio client in the cluster but you can use another object store provider like S3, Google Cloud, Azure, etc.

### Set up Minio in your cluster

In [None]:
%%bash 
kubectl create ns minio-system
helm install minio stable/minio \
    --set accessKey=minioadmin \
    --set secretKey=minioadmin \
    --namespace minio-system

In [58]:
!kubectl get pods -n minio-system

NAME                     READY   STATUS    RESTARTS   AGE
minio-79bcd67c45-xjfj9   1/1     Running   0          3m21s


### Forward the Minio port so you can access it

You can do this by runnning the following command in your terminal:
```
kubectl port-forward -n minio-system svc/minio 9000:9000
    ```
    
### Configure local minio client

In [61]:
!mc config host add minio-local http://localhost:9000 minioadmin minioadmin

[32;1mmc: [0m[32;1mConfiguration written to `/home/alejandro/.mc/config.json`. Please update your access credentials.
[0m[32;1mmc: [0m[32;1mSuccessfully created `/home/alejandro/.mc/share`.
[0m[32;1mmc: [0m[32;1mInitialized share uploads `/home/alejandro/.mc/share/uploads.json` file.
[0m[32;1mmc: [0m[32;1mInitialized share downloads `/home/alejandro/.mc/share/downloads.json` file.
[0m[m[32mAdded `minio-local` successfully.[0m
[0m

### Create some input for our model

We will create a file that will contain the inputs that will be sent to our model

In [71]:
with open("assets/input-data.txt", "w") as f:
    for i in range(10):
        f.write(f"[[{i}, {i}, {i}, {i}]]\n")

### Check the contents of the file

In [73]:
!cat assets/input-data.txt

[[0, 0, 0, 0]]
[[1, 1, 1, 1]]
[[2, 2, 2, 2]]
[[3, 3, 3, 3]]
[[4, 4, 4, 4]]
[[5, 5, 5, 5]]
[[6, 6, 6, 6]]
[[7, 7, 7, 7]]
[[8, 8, 8, 8]]
[[9, 9, 9, 9]]


### Upload the file to our minio

In [75]:
!mc mb minio-local/data
!mc cp assets/input-data.txt minio-local/data/

[m[32;1mBucket created successfully `minio-local/data`.[0m
...-data.txt:  150 B / 150 B ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 2.12 KiB/s 0s[0m[0m[m[32;1m

### Create Job to Execute

In [None]:
%%writefile assets/seldon-batch-store.yaml
---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: seldon-batch-
spec:
  entrypoint: seldon-batch-process
  templates:
  - name: seldon-batch-process
    steps:
    - - name: create-seldon-resource            
        template: create-seldon-resource-template
    - - name: wait-seldon-resource
        template: wait-seldon-resource-template
    - - name: download-object-store
        template: download-object-store-template
    - - name: process-batch-inputs
        template: process-batch-inputs-template
    - - name: upload-object-store
        template: upload-object-store-template
            
  - name: create-seldon-resource-template
    resource:
      action: create
      manifest: |
        apiVersion: machinelearning.seldon.io/v1
        kind: SeldonDeployment
        metadata:
          name: "{{workflow.uid}}"
          ownerReferences:
          - apiVersion: argoproj.io/v1alpha1
            blockOwnerDeletion: true
            kind: Workflow
            name: "{{workflow.name}}"
            uid: "{{workflow.uid}}"
        spec:
          name: "{{workflow.uid}}"
          predictors:
            - graph:
                children: []
                implementation: SKLEARN_SERVER
                modelUri: gs://seldon-models/sklearn/iris
                name: classifier
              name: default
              replicas: 1
                
  - name: wait-seldon-resource-template
    script:
      image: seldonio/core-builder:0.14
      command: [bash]
      source: |
        kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id="{{workflow.uid}}" -o jsonpath='{.items[0].metadata.name}')
                     
  - name: download-object-store-template
    script:
      image: seldonio/core-builder:0.14
      command: [bash]
      source: |
        kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id="{{workflow.uid}}" -o jsonpath='{.items[0].metadata.name}')
        
  - name: process-batch-inputs-template
    script:
      image: seldonio/seldon-core-s2i-python3:1.1.1-SNAPSHOT
      command: [python]
      source: |
        from seldon_core.seldon_client import SeldonClient
        import numpy as np
        import time
        time.sleep(10)
        sc = SeldonClient(
            gateway_endpoint="istio-ingressgateway.istio-system.svc.cluster.local",
            deployment_name="{{workflow.uid}}",
            namespace="default")
        for i in range(10):
            data = np.array([[i, i, i, i]])
            output = sc.predict(data=data)
            print(output.response)
            
  - name: upload-object-store-template
    script:
      image: seldonio/core-builder:0.14
      command: [bash]
      source: |
        kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id="{{workflow.uid}}" -o jsonpath='{.items[0].metadata.name}')
        