# KubeFlow Pipelines - Building containers 

In this notebook, we will demo: 

* Buiding a container image to use as base image for component

Reference documentation: 
* https://www.kubeflow.org/docs/pipelines/sdk/build-component/
* https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/

In [2]:
# Install Kubeflow Pipelines SDK
!PIP_DISABLE_PIP_VERSION_CHECK=1 pip3 install 'kfp>=0.1.32.2' --quiet

[33mYou are using pip version 19.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [11]:
import kfp
# Initializing the client
client = kfp.Client()

# ! Use kfp.Client(host='https://xxxxx.notebooks.googleusercontent.com/') if working from GCP notebooks (or local notebooks)

## Create a python function

In [3]:
def add(a: float, b: float) -> float:
    '''Calculates sum of two arguments'''
    
    print("Adding two values %s and %s" %(a, b))
    
    return a + b

 ## Build a new container for and use is as the base image for components.
 Build and push a new container image that includes some required packages. The packages are specified in the requirements.txt file in the working directory.
 Use the new container image as the base image for components.
 Create a new component from the above function.
 The return value "add_op" represents a step that can be used directly in a pipeline function.

In [4]:
from pathlib import Path
print(Path('requirements.txt').read_text())

google-api-python-client == 1.7.0



In [5]:
import kfp
from kfp.containers import build_image_from_working_dir
from kfp.components import func_to_container_op

In [10]:
# Building and pushing new container image
image_with_packages = build_image_from_working_dir(
    #image_name=...,
)
print(image_with_packages)

gcr.io/avolkov-31337/notebook-kubeflow-kfp-notebook/kfp_container@sha256:80c2fc392286355f4f5feef286faea5879743735763246bafd4278a791fabc48


In [6]:
# Creating component while explicitly specifying the newly-built base image
add_op = func_to_container_op(add, base_image=image_with_packages)

In [7]:
# You can also set up the image builder as default image so that it is always used by default
kfp.components.default_base_image_or_builder = build_image_from_working_dir

# Or is you want to customize the builder, you can use lambda:
kfp.components.default_base_image_or_builder = lambda: build_image_from_working_dir(base_image='google/cloud-sdk:latest')

# Now all python components will start using that container builder by default:
add_op2 = func_to_container_op(add)

### Build a pipeline using this component

In [8]:
import kfp.dsl as dsl
@dsl.pipeline(
   name='Calculation pipeline',
   description='A sample pipeline that performs arithmetic calculations.'
)
def calc_pipeline(
   a='1',
   b='7',
   c='17',
):
    #Passing pipeline parameter and a constant value as operation arguments
    add_task = add_op(a, b) #Returns a dsl.ContainerOp class instance. 
    
    #You can create explicit dependancy between the tasks using xyz_task.after(abc_task)
    add_2_task = add_op(b, c)
    
    add_3_task = add_op(add_task.output, add_2_task.output)

## Submit the pipeline for execution

In [9]:
arguments = {'a': '7', 'b': '8'}
client.create_run_from_pipeline_func(pipeline_func=calc_pipeline, arguments=arguments)

# This should output link that leads to the run information page. 
# Note: There is a bug in JupyterLab that modifies the URL and makes the link stop working

RunPipelineResult(run_id=a003d112-f76e-11e9-93ae-42010a800216)