### Brief Introduction of [Kubeflow Pipeline](https://www.kubeflow.org/docs/pipelines/pipelines-overview/)

#### 1. What is Kubeflow pipeline?
* It is just like AML Studio experiment.


#### 2. What is Kubeflow [component](https://www.kubeflow.org/docs/pipelines/concepts/component/)?
* A pipeline has many components, and a component is just like a step in AML Studio experiment.

* A component has a **Docker image** (source codes) and an **interface**, which specifies the input/output

#### 3. How to write pipeline and component?
* Use [Kubeflow Pipelines SDK](https://www.kubeflow.org/docs/pipelines/sdk/), and follow the following steps.

**Step 1**: Put source code into a Docker image. To do that, first [install docker](https://docs.docker.com/docker-for-windows/install/). Then write a [Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/). Build the docker image. 

**Step 2**: Register Docker image to [Dockerhub](https://cloud.docker.com/u/guobowen1990/repository/docker/guobowen1990/cnn-demo) or [GCR (Google Container Registry)](https://console.cloud.google.com/gcr/images/kubeflow-trial-241202?project=kubeflow-trial-241202&folder&organizationId)

Repeate Steps 1 and 2 for every component of pipeline, since every component needs a Docker image

**Step 3**: Now component part is almost done (not done yet).Then we need to write a **yaml file** as an **intermediate representation** of the pipeline.

* The yaml file is generated by Kubeflow Pipeline SDK, to be more specific, the [**kfp.dsl**](https://www.kubeflow.org/docs/pipelines/sdk/dsl-overview/) package. BTW, DSL stands for domain-specific language.
   
* Below is a small python program to demonstrate how to use SDK to generate yaml file. Basically this program does two things: 1. Define a component interface and 2. define a pipeline by connecting all the components.

 ```python
import kfp
from kfp import dsl


def gcs_download_op(url):
    """
     Define component 1
    """
    return dsl.ContainerOp(
        name='GCS - Download',
        image='google/cloud-sdk:216.0.0',
        command=['sh', '-c'],
        arguments=['gsutil cat $0 | tee $1', url, '/tmp/results.txt'],
        file_outputs={
            'data': '/tmp/results.txt',
        }
    )


def echo_op(text):
    """
     Define component 2
    """
    return dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=['echo "$0"', text]
    )
    
@dsl.pipeline(
    name='Sequential pipeline',
    description='A pipeline with two sequential steps.'
)
def sequential_pipeline(url='gs://ml-pipeline-playground/shakespeare1.txt'):
    """A pipeline with two sequential steps."""

    download_task = gcs_download_op(url)
    echo_task = echo_op(download_task.output)


if __name__ == '__main__':
    kfp.compiler.Compiler().compile(sequential_pipeline, __file__ + '.zip')
 ```

* When use the following command to generate the yaml file: demo.yaml

dsl-compile --py [path/to/python/file] --output demo.yaml


**Step 4**: Now it is time to deploy the pipeline! To do that, simply upload the yaml file to Kubeflow UI, then you 