# Super simple Kubeflow Pipelines

Here we will schedule a one pipeline that will download artifacts from minio buckets

In [2]:
%load_ext extensions
!pip install --upgrade 'https://storage.googleapis.com/ml-pipeline/release/0.1.8/kfp.tar.gz'

The extensions extension is already loaded. To reload it, use:
  %reload_ext extensions
Collecting https://storage.googleapis.com/ml-pipeline/release/0.1.8/kfp.tar.gz
  Using cached https://storage.googleapis.com/ml-pipeline/release/0.1.8/kfp.tar.gz
Installing collected packages: kfp
  Found existing installation: kfp 0.1
    Uninstalling kfp-0.1:
      Successfully uninstalled kfp-0.1
  Running setup.py install for kfp ... [?25ldone
[?25hSuccessfully installed kfp-0.1


## Environment setup

In [3]:
from ipython_secrets import *
from os import environ

EXPERIMENT_NAME = 'Das-Experiment-1'

AWS_S3_ENDPOINT = environ.get('AWS_S3_ENDPOINT', 's3.amazonaws.com')
AWS_S3_BUCKET = environ.get('AWS_S3_BUCKET')
AWS_DEFAULT_REGION = environ.get('AWS_DEFAULT_REGION', 'us-east-1')
AWS_ACCESS_KEY_ID = get_secret('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = get_secret('AWS_SECRET_ACCESS_KEY')
AWS_SECRET_NAME = environ.get('AWS_SECRET_NAME')

TAG = 'latest'
DOCKER_REGISTRY = environ.get('DOCKER_REGISTRY', 'docker.io')
DOCKER_IMAGE = f'{DOCKER_REGISTRY}/library/kubectl:{TAG}'

## Generate Docker and Kubernetes configs

Below we generate a dockerfile that will be used to put some `minio` awarness:
- `Dockerfile` - a docker container that will be built and pushed into private docker egistry
- `Kaniko` - deploument job to carry on our container build

In [4]:
%%template Dockerfile -v
FROM gcr.io/google-samples/ml-pipeline-t2ttrain:latest
RUN echo "{{DOCKER_IMAGE}}"

In [5]:
%templatefile extensions/templates/kaniko-workflow.yaml -o kaniko.yaml -v

### Upload generated files to object storate bucket
Generated files must be uploaded to object storage bucket (i.e s3, minio). Docker build process (Kaniko) will have to access to s3 bucket

In [6]:
import boto3
import tarfile

with tarfile.open("dockerbuild.tar.gz", "w:gz") as tar:
    tar.add("Dockerfile", arcname="Dockerfile")

s3_client = boto3.client('s3',
    region_name = AWS_DEFAULT_REGION,
    aws_access_key_id = AWS_ACCESS_KEY_ID,
    aws_secret_access_key = AWS_SECRET_ACCESS_KEY)
#     endpoint_url = AWS_S3_ENDPOINT)

s3_client.upload_file('dockerbuild.tar.gz' , AWS_S3_BUCKET, f'{EXPERIMENT_NAME}/dockerbuild.tar.gz')
[k['Key'] for k in s3_client.list_objects(Bucket=AWS_S3_BUCKET)['Contents']]

['Das-Experiment-1/dockerbuild.tar.gz']

## Kubernetes connection

Setup Kubernetes cluster connectivity. We must be able to run schedule templated `kaniko.yaml`.

In [7]:
from kubernetes import client as k8s_client
from kubernetes import config as k8s_config

# !kubectl apply -f kaniko.yaml 
# !kubectl wait --for=condition=Completed -f kaniko.yaml

## Create an expereiment
We must create a new experiment if does not exist

## Define a Pipeline

In [8]:
import kfp.dsl as dsl
from kubernetes import client as k8sc

@dsl.pipeline(
  name='Super simple minio integration',
  description='I as a pipeline want to read a file from minio bucket'
)
def hello_minio_pipeline(filename: dsl.PipelineParam):
    op1 = dsl.ContainerOp(
        name='download',
        image='minio/mc',
        command=['mc', '--no-color'],
        arguments=['cp', f'minio/{filename}', '/tmp/results.txt'],
        file_outputs={'downloaded': '/tmp/results.txt'}
    ).add_env_variable(
        k8sc.V1EnvVar(
            name='MC_HOSTS_minio', 
            value=f'https://{S3_ACCESS_KEY}:{S3_SECRET_KEY}@{S3_ENDPOINT}' 
        ))
    op2 = dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=[f'echo {op1.output}'])

### Execute the pipeline

Code below will create a new experiement **Hello Minio** and run it

In [9]:
import kfp
import kfp.compiler as compiler

client = kfp.Client()

try:
    exp = client.get_experiment(experiment_name=EXPERIMENT_NAME)
except ValueError:
    exp = client.create_experiment(EXPERIMENT_NAME)

compiler.Compiler().compile(hello_minio_pipeline, 'pipeline.tar.gz')

# run = client.run_pipeline(exp.id, 
#                           'pipeline 1', 
#                           'pipeline.tar.gz',
#                           params={'filename': 'default/hello.txt'})




MaxRetryError: HTTPConnectionPool(host='ml-pipeline.kubeflow.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments?page_token=&page_size=100&sort_by= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10f66ecc0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))