# What are Kubeflow Pipelines? 

A <b>pipeline</b> is a description of a machine learning (ML) workflow, including all of the components in the workflow and how the components relate to each other in the form of a graph.


# What is Pipeline Component?

A pipeline component is one step of a pipeline that does a specific task.

<img src="./img/iris_classification_pipeline.png" alt="iris classification pipeline" align="left" style="padding-left: 10px; padding-right: 40px;"/> 


The left upper image, shows the pipeline this workshop creates.It is a simple pipeline with just a few steps:

1. The first steps train models using respectively decision tree and K-nearest neighbours algorythms. These two steps are executing in paralel. 
2. The results from the these steps are taken in consideration in the next steps - conditional steps. These steps check which model provides higher accuracity. 
3. Depending on the results of the conditional components only the model that give higher accuracity is saved for later use/serving.

The image on the right depicts a pipeline component. We could think of a pipeline component as a function that has its body/logic and a signiture - name, input parameters, outputs. The distinguishing feature of pipeline components is that their logic (ML code) resides in a Docker image. So, as part of the components specification/signiture we need to provide the container image, the command to use to run your component’s code, and the command-line arguments to pass to your component’s code. 
When a pipeline is run, the system launches one or more Kubernetes Pods corresponding to the steps (components) in the pipeline. The Pods start Docker containers, and the containers in turn start your programs.


# Steps to create a Pipeline
1. Write the ML python code
2. Containerize components code
3. Define a Pipeline
    - Define component's specifications
    
    
*Leverage [kfp](https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.html) library 

##  1. ML Python Code
You could find the ML code here:
[iris_classification.py](../ml/iris_classification.py)


## 2. Containerize components code

We alreade created an image that contains our ML code: 

>**annajung/iris:latest**

## 3. Create Iris Classification Pipeline

In [9]:
import kfp
from kfp import dsl

In [10]:
###################################################################################
# 1.Define a Pipeline specification
#
# 2.Define components specifications:
#     - Component that builds model based on Decision Tree
#     - Component that builds model based on K-Nearest Neighbors
#     - Conditional Component that check if the Decision Tree has a higher accuracy
#         :Component that save tree model into a file
#     - Conditional Component that check if the KNN has a higher accuracy
#         :Component that save knn model into a file
###################################################################################

@dsl.pipeline(
    name='iris-classification',
    description='A basic pipeline example for iris classification'
)
def iris_classification_pipeline(n_neighbors=2, splitter="random"):
    tree = dsl.ContainerOp(
        name="Train using Decision Tree",
        image="annajung/iris:latest",
        command=["sh", "-c"],
        arguments=["python iris_classification.py build_model tree " + str(splitter)],
        file_outputs={'output': '/tmp/accuracy_tree.txt'}
    )

    knn = dsl.ContainerOp(
        name="Train using K Nearest Neighbors",
        image="annajung/iris:latest",
        command=["sh", "-c"],
        arguments=["python iris_classification.py build_model knn " + str(n_neighbors)],
        file_outputs={'output': '/tmp/accuracy_knn.txt'}
    )

    with dsl.Condition(tree.output >= knn.output):
        dsl.ContainerOp(
            name='Save Tree model',
            image="annajung/iris:latest",
            command=['sh', '-c'],
            arguments=["python3  iris_classification.py save_final_model tree " + str(splitter)],
            file_outputs={'output': '/tmp/tree.pkl'},
        )

    with dsl.Condition(knn.output > tree.output):
        dsl.ContainerOp(
            name='Save KNN model',
            image="annajung/iris:latest",
            command=['sh', '-c'],
            arguments=["python3  iris_classification.py save_final_model knn " + str(n_neighbors)],
            file_outputs={'output': '/tmp/knn.pkl'},
        )

    dsl.get_pipeline_conf().set_ttl_seconds_after_finished(500)

DSL Compiler compiles given pipeline function into workflow yaml.

In [11]:
kfp.compiler.Compiler().compile(iris_classification_pipeline, 'iris_classification_pipeline.yaml')

