## Batch processing with Argo Worfklows

In this notebook we will dive into how you can run batch processing with Argo Workflows and Seldon Core.

Dependencies:

* Seldon core installed as per the docs with an ingress
* Argo Workfklows installed in cluster (and argo CLI for commands)


## Seldon Core Batch with Object Store

In some cases we may want to read the data from an object source.

In this case we will show how you can read from an object store, in this case minio.

The workflow will look as follows:

![](assets/seldon-batch.jpg)

For this we will assume you have installed the Minio (mc) CLI - we will use a Minio client in the cluster but you can use another object store provider like S3, Google Cloud, Azure, etc.

### Set up kubeflow pipeline

In [None]:
%%bash
export PIPELINE_VERSION=0.5.1
kubectl apply -k github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION

In [None]:
pip install kfp

In [2]:
mkdir -p assets/

In [31]:
%%writefile assets/seldon-batch-pipeline.py

import kfp.dsl as dsl
import yaml
from kubernetes import client as k8s

@dsl.pipeline(
  name='SeldonBatch',
  description='A batch processing pipeline for seldon models'
)
def nlp_pipeline(
        deployment_name="seldon-batch",
        namespace="kubeflow",
        seldon_server="SKLEARN_SERVER",
        model_path="gs://seldon-models/sklearn/iris",
        gateway_endpoint="istio-ingressgateway.istio-system.svc.cluster.local",
        retries=3,
        replicas=10,
        workers=100,
        input_path="s3://data/input-data.txt",
        output_path="s3://data/output-data.txt"):
    """
    Pipeline 
    """
    
    seldon_deployment_yaml = f"""
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: "{deployment_name}"
  namespace: "{namespace}"
spec:
  name: "{deployment_name}"
  predictors:
    - graph:
        children: []
        implementation: "{seldon_server}"
        modelUri: "{model_path}"
        name: classifier
      name: default
      replicas: "{replicas}"
    """
    
    deploy_step = dsl.ResourceOp(
        name="deploy_seldon",
        action="create",
        k8s_resource=yaml.safe_load(seldon_deployment_yaml))
    
    wait_for_ready = dsl.ContainerOp()

    batch_process_step = dsl.ContainerOp(
        name='data_downloader',
        image='seldonio/seldon-core-s2i-python37:1.1.1-SNAPSHOT',
        command="seldon-batch-processor",
        arguments=[
            "--deployment-name", deployment_name,
            "--namespace", namespace,
            "--host", gateway_endpoint,
            "--retries", retries,
            "--input-data-path", input_path,
            "--output-data-path", output_path
        ]
    )
    
    batch_process_step.after(deploy_step)

if __name__ == '__main__':
  import kfp.compiler as compiler
  compiler.Compiler().compile(nlp_pipeline, __file__ + '.tar.gz')


Overwriting assets/seldon-batch-pipeline.py


In [32]:
!python assets/seldon-batch-pipeline.py



In [30]:
!ls assets/

seldon-batch-pipeline.py  seldon-batch-pipeline.py.tar.gz  seldon-batch.jpg
