<a href="https://colab.research.google.com/github/deltorobarba/machinelearning/blob/master/kfp_erwinh_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
#KFP_PACKAGE = 'https://storage.googleapis.com/ml-pipeline/release/0.1.20/kfp.tar.gz'
#!pip3 install $KFP_PACKAGE --upgrade

# Hello world with KubeFlow Pipelines 

Welcome to your first step with KubeFlow Pipelines (KFP). This demo focusses on: 

* Creating a KubeFlow pipeline using the KFP SDK
* Creating your first experiment and submitting the pipeline to KFP run time enviroment using the SDK 

Run this notebook on your Jupyter Hub environment on Kubeflow.

Reference documentation: 
* https://www.kubeflow.org/docs/pipelines/sdk/build-component/
* https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/

## Setup

Change the following constants in the code to make sure it maches your project + bucket: 

- PROJECT_NAME
- OUTPUT_DIR

In [0]:
# Set your output and project. !!!Must Do before you can proceed!!!
EXPERIMENT_NAME = 'Data-Science'
PROJECT_NAME =  'lunar-demo' #'your-gcp-project-name'
OUTPUT_DIR = 'gs://lunar-demo/' # 'path-to-your-gcp'
BASE_IMAGE='tensorflow/tensorflow:1.11.0-py3'  # Based image used in various steps of the pipeline
TARGET_IMAGE='gcr.io/%s/pusher:latest' % PROJECT_NAME # Target image that will include our final code

In [0]:
# Let's import the libraries needed
import kfp
import kfp.dsl as dsl
from kfp.gcp import use_gcp_secret
from kubernetes import client as k8s_client
from kfp import compiler
from kfp import notebook
from kfp import components as comp

## Create a function that we will turn in a component

Here you need to implement a python function that takes two arguments, uses Numpy to multiply the two arguments and then returns the results. Later we will use this function to create a KFP component. We will first create a function that will use Numpy Add to add two integers. 

In [0]:
@dsl.python_component(
    name='mult_opp',
    description='component that takes two arguments and ads them',
    base_image=BASE_IMAGE  # note you can define the base image here, or during build time. 
    )

def computation(a: int, b: int) -> int:
    '''Multiply two arguments'''
    
    import numpy as np
    
    c = np.array(a)
    d = np.array(b)
    total = np.multiply(c, d)

    return total

In [0]:
# simpel test for our function

a = 2, 4
b = 2, 4

z = computation(a, b)

print (z)

[ 4 16]


## Build a Pipeline Step With the Above Function

#### Option One: Specify the dependency directly

First we create our component using the python function. Build_component automatically builds a container image for the component_func based on the base_image and pushes to the target_image.

In [0]:
# Build Python Component

from kfp import compiler

mult_opp = compiler.build_python_component(
          component_func=computation, # here we refer to function we created
          staging_gcs_path=OUTPUT_DIR, # staging directory
          dependency=[kfp.compiler.VersionedDependency(name='google-api-python-client', version='1.7.0')],
          base_image=BASE_IMAGE, # specify base image
          target_image=TARGET_IMAGE # target image
          )

2019-10-16 15:28:45:INFO:Build an image that is based on tensorflow/tensorflow:1.11.0-py3 and push the image to gcr.io/lunar-demo/pusher:latest
2019-10-16 15:28:45:INFO:Checking path: gs://lunar-demo/...
2019-10-16 15:28:45:INFO:Generate entrypoint and serialization codes.
2019-10-16 15:28:45:INFO:Generate build files.




OSError: Project was not passed and could not be determined from the environment.

## Build a pipeline using the component

Now we can create a pipeline

In [0]:
import kfp.dsl as dsl

@dsl.pipeline(
   name='Numpy multiply pipeline',
   description='A toy pipeline that performs numpy calculations.'
   )

def calc_pipeline(a, b, c):
    
    #Passing pipeline parameter and a constant value as operation arguments
    add_task_1 = mult_opp(a, b) #Returns a dsl.ContainerOp class instance. 
    
    #You can create explicit dependancy between the tasks using xyz_task.after(abc_task)
    add_task_2 = mult_opp(a, c)
    
    add_task_total = mult_opp(add_task_1.output, add_task_2.output)


### Compile the pipeline

In [0]:
pipeline_func = calc_pipeline
pipeline_filename = pipeline_func.__name__ + '.pipeline.zip'
import kfp.compiler as compiler
compiler.Compiler().compile(pipeline_func, pipeline_filename)

NameError: name 'mult_opp' is not defined

### Create experiment

In [0]:
import kfp
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)

### Submit Pipeline

In [0]:
import time

#Specify pipeline argument values
arguments = {'a': '2', 'b': '4', 'c': '8'}

#Submit a pipeline run
run_name = pipeline_func.__name__ + ' run-%s' % (int(time.time()))
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)

#This link leads to the run information page. 
#Note: There is a bug in JupyterLab that modifies the URL and makes the link stop working

# Copyright 2019 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.