# Creating components from python functions

Lightweight python components do not require you to build a new container image for every code change.
They're intended to use for fast iteration in notebook environment.

#### Building a lightweight python component
To create a component just write a stand-alone python function and then call `kfp.components.func_to_container_op(func)` to convert it to a component that can be used in a pipeline.

There are several requirements for the function:
* The function should be stand-alone. It should not use any code declared outside of the function definition. All imports should be added inside the main function. Any helper functions should also be defined inside the main function.
* The function can only import packages that are available in the base image plus the packages dynamically installed by specifying the `packages_to_install` argument of `func_to_container_op`.
* If the function operates on numbers, the parameters need to have type hints. Supported types are ```[str, int, float, bool]```. Everything else is passed as string.
* To build a component with multiple output values, use the typing.NamedTuple type hint syntax: ```NamedTuple('Outputs', [('output_name_1', type), ('output_name_2', float)])```

In [None]:
# Install Kubeflow Pipelines SDK
!PIP_DISABLE_PIP_VERSION_CHECK=1 pip3 install 'kfp>=0.1.32.2' --quiet

In [None]:
import kfp
# Initializing the client
client = kfp.Client()

# ! Use kfp.Client(host='https://xxxxx.notebooks.googleusercontent.com/') if working from GCP notebooks (or local notebooks)

Simple function that just add two numbers:

In [None]:
#Define a Python function
def add(a: float, b: float) -> float:
   '''Calculates sum of two arguments'''
   return a + b

Convert the function to a pipeline operation

In [None]:
from kfp.components import func_to_container_op

add_op = func_to_container_op(add)

In very simple cases, `func_to_container_op` can be used as decorator. However in this case, the options cannot be specified and the original function cannot be called.

In [None]:
@func_to_container_op
def print_op(string):
   '''Prints a small string'''
   print(string)

A bit more advanced function which demonstrates how to use imports, helper functions and produce multiple outputs.

In [None]:
#Advanced function
#Demonstrates imports, helper functions and multiple outputs
from typing import NamedTuple

def my_divmod(dividend: float, divisor:float) -> NamedTuple('MyDivmodOutput', [('quotient', float), ('remainder', float)]):
    '''Divides two numbers and calculate  the quotient and remainder'''

    #Imports inside a component function:
    import numpy as np

    #This function demonstrates how to use nested functions inside a component function:
    def divmod_helper(dividend, divisor):
        return np.divmod(dividend, divisor)

    (quotient, remainder) = divmod_helper(dividend, divisor)

    return (quotient, remainder)

Test running the python function directly

In [None]:
my_divmod(100, 7)

#### Convert the function to a pipeline operation

You can specify an alternative base container image (the image needs to have Python 3.5+ installed).

In [None]:
divmod_op = func_to_container_op(
    my_divmod,
    output_component_file='my_divmod.component.yaml', # Optional. Write component spec to file for sharing
    base_image='tensorflow/tensorflow:1.15.0-py3',    # Optional. Base container image to use
    packages_to_install=['pandas==0.24'],             # Optional. List of packages to install dynamically
)

#### Write a pipeline function

In [None]:
def calc_pipeline(
   a,
   b = '7',
   c = '17',
):
    #Passing pipeline parameter and a constant value as operation arguments
    add_task = add_op(a, 4) #Returns a dsl.ContainerOp class instance. 
    
    #Passing a task output reference as operation arguments
    #For an operation with a single return value, the output reference can be accessed using `task.output` or `task.outputs['output_name']` syntax
    divmod_task = divmod_op(add_task.output, b)

    #For an operation with a multiple return values, the output references can be accessed using `task.outputs['output_name']` syntax
    result_task = add_op(divmod_task.outputs['quotient'], c)
    
    print_op(divmod_task.outputs['quotient'])
    print_op(divmod_task.outputs['remainder'])

#### Submit the pipeline for execution

In [None]:
#Specify pipeline argument values
arguments = {'a': '7', 'b': '8'}

client.create_run_from_pipeline_func(
    calc_pipeline,
    arguments=arguments,
)