# Super simple Kubeflow Pipelines

Here we will schedule a one pipeline that will download artifacts from minio buckets

## Environment setup

In [3]:
%load_ext extensions

from ipython_secrets import *
from os import environ

# our minio credentials
S3_ENDPOINT = environ['S3_ENDPOINT']
S3_ACCESS_KEY = get_secret('S3_ACCESS_KEY')
S3_SECRET_KEY = get_secret('S3_SECRET_KEY')
S3_BUCKET = 'default'

IMAGE = 'harbor.svc.cluster3.antoncloud1.dev.superhub.io/library/hello'
REGISTRY_SECRET = 'harbor'

EXPERIMENT_NAME = 'Super-Simple'

Loading extensions from ~/dev/applications/app-templates/kubeflow-pipeline/extensions is deprecated. We recommend managing extensions like any other Python packages, in site-packages.


## Generate Docker and Kubernetes configs

Below we generate a dockerfile that will be used to put some `minio` awarness:
- `Dockerfile` - a docker container that will be built and pushed into private docker egistry
- `Kaniko` - deploument job to carry on our container build

In [4]:
%%template Dockerfile
FROM gcr.io/ml-pipeline/ml-pipeline-dataflow-tfdv:85c6413a2e13da4b8f198aeac1abc2f3a74fe789
RUN echo {{IMAGE}}

In [5]:
%templatefile extensions/templates/kaniko.yaml -o kaniko.yaml

### Upload to bucket
Generated files must be uploaded to object storage bucket (i.e s3, minio). Docker build process (Kaniko) will have to access to s3 bucket

In [10]:
import boto3

s3_client = boto3.client('s3',
    region_name = 'us-east-1',
    endpoint_url = S3_ENDPOINT,
    aws_access_key_id = S3_ACCESS_KEY,
    aws_secret_access_key = S3_SECRET_KEY)

s3_client.upload_file('Dockerfile' , S3_BUCKET, 'Dockerfile')

## Create an expereiment
We must create a new experiment if does not exist

## Define a Pipeline

In [22]:
import kfp.dsl as dsl
from kubernetes import client as k8sc

@dsl.pipeline(
  name='Super simple minio integration',
  description='I as a pipeline want to read a file from minio bucket'
)
def hello_minio_pipeline(filename: dsl.PipelineParam):
    op1 = dsl.ContainerOp(
        name='download',
        image='minio/mc',
        command=['mc', '--no-color'],
        arguments=['cp', 'minio/%s' % filename, '/tmp/results.txt'],
        file_outputs={'downloaded': '/tmp/results.txt'}
    ).add_env_variable(
        k8sc.V1EnvVar(
            name='MC_HOSTS_minio', 
            value='https://%s:%s@%s' % (S3_ACCESS_KEY, S3_SECRET_KEY, S3_ENDPOINT), 
        ))
    op2 = dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=['echo %s' % op1.output])

### Execute the pipeline

Code below will create a new experiement **Hello Minio** and run it

In [26]:
import kfp
import kfp.compiler as compiler

client = kfp.Client()

try:
    exp = client.get_experiment(experiment_name=EXPERIMENT_NAME)
except ValueError:
    exp = client.create_experiment(EXPERIMENT_NAME)

compiler.Compiler().compile(hello_minio_pipeline, 'pipeline.tar.gz')

run = client.run_pipeline(exp.id, 
                          'pipeline 1', 
                          'pipeline.tar.gz',
                          params={'filename': 'default/hello.txt'})


%%deploy_webhook 'pipeline.tar.gz' 'http://xyz'