# Deploy and execution of Kubeflow pipelines

In [8]:
import os
import sys
from pipeline import configs

#!pip install --upgrade pip
#!pip install --upgrade tfx[kfp]==1.0.0

### GCP Project

In [9]:
shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
GOOGLE_CLOUD_PROJECT=shell_output[0]
%env GOOGLE_CLOUD_PROJECT={GOOGLE_CLOUD_PROJECT}
print("GCP project ID:" + GOOGLE_CLOUD_PROJECT)

env: GOOGLE_CLOUD_PROJECT=tfx-mlops-dataops-project
GCP project ID:tfx-mlops-dataops-project


### Endpoint

In [10]:
ENDPOINT = configs.ENDPOINT 
if not ENDPOINT:
    from absl import logging
    logging.error('Set your ENDPOINT in this cell.')

### Docker image name

In [11]:
CUSTOM_TFX_IMAGE = configs.PIPELINE_IMAGE 

In [12]:
PIPELINE_NAME = configs.PIPELINE_NAME

### Upload data

In [13]:
!gsutil cp data/* gs://{GOOGLE_CLOUD_PROJECT}-kubeflowpipelines-default/data/

Copying file://data/__init__.py [Content-Type=text/x-python]...
Copying file://data/processed.dvc [Content-Type=application/octet-stream]...    
Copying file://data/raw.dvc [Content-Type=application/octet-stream]...          
Omitting directory "file://data/test_dataset". (Did you mean to do cp -r?)      

Operation completed over 3 objects/188.0 B.                                      


Let's upload our sample data to GCS bucket so that we can use it in our pipeline later.

### Create pipeline image

Let's create a TFX pipeline using the `tfx pipeline create` command.

>Note: When creating a pipeline for KFP, we need a container image which will be used to run our pipeline. And `skaffold` will build the image for us. Because skaffold pulls base images from the docker hub, it will take 5~10 minutes when we build the image for the first time, but it will take much less time from the second build.

In [17]:
!tfx pipeline create \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT} \
--build-image

CLI
Creating pipeline
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
[Docker] Step 1/4 : FROM tensorflow/tfx:1.0.0[Docker] 
[Docker]  ---> ec18173ab098
[Docker] Step 2/4 : WORKDIR /pipeline[Docker] 
[Docker]  ---> Using cache
[Docker]  ---> 743e16a22fed
[Docker] Step 3/4 : COPY ./ ./[Docker] 
[Docker]  ---> f3ee3e790882
[Docker] Step 4/4 : ENV PYTHONPATH="/pipeline:${PYTHONPATH}"[Docker] 
[Docker]  ---> Running in e740574c8ae2
[Docker] Removing intermediate container e740574c8ae2
[Docker]  ---> 4f7140e4ef12
[Docker] Successfully built 4f7140e4ef12
[Docker] Successfully tagged gcr.io/tfx-mlops-dataops-project/mlops-dataops-pipeline:latest
[Docker] The push refers to repository [gcr.io/tfx-mlops-dataops-project/mlops-dataops-pipeline]
[Docker] Preparing
[Docke

While creating a pipeline, `Dockerfile` will be generated to build a Docker image. Don't forget to add it to the source control system (for example, git) along with other source files.

NOTE: `kubeflow` will be automatically selected as an orchestration engine if `airflow` is not installed and `--engine` is not specified.

Now start an execution run with the newly created pipeline using the `tfx run create` command.

### Pipeline execution

In [16]:
!tfx run create --pipeline-name={PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Creating a run for pipeline: mlops-dataops-pipeline
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Run created for pipeline: mlops-dataops-pipeline
| pipeline_name          | run_id                               | status | created_at                | link                                                                                                                         |
| mlops-dataops-pipeline | 677964b2-9ee8-4b8a-a5bd-053aff5ef74f | None   | 2021-08-13T00:11:49+00:00 | https://33a28ea90c347185-dot-us-central1.pipelines.googleusercontent.com/#/runs/details/677964b2-9ee8-4b8a-a5bd-053aff5ef74f |



Or, you can also run the pipeline in the KFP Dashboard.  The new execution run will be listed under Experiments in the KFP Dashboard.  Clicking into the experiment will allow you to monitor progress and visualize the artifacts created during the execution run.

However, we recommend visiting the KFP Dashboard. You can access the KFP Dashboard from the Cloud AI Platform Pipelines menu in Google Cloud Console. Once you visit the dashboard, you will be able to find the pipeline, and access a wealth of information about the pipeline.
For example, you can find your runs under the *Experiments* menu, and when you open your execution run under Experiments you can find all your artifacts from the pipeline under *Artifacts* menu.

>Note: If your pipeline run fails, you can see detailed logs for each TFX component in the Experiments tab in the KFP Dashboard.
    
One of the major sources of failure is permission related problems. Please make sure your KFP cluster has permissions to access Google Cloud APIs. This can be configured [when you create a KFP cluster in GCP](https://cloud.google.com/ai-platform/pipelines/docs/setting-up), or see [Troubleshooting document in GCP](https://cloud.google.com/ai-platform/pipelines/docs/troubleshooting).

### Update and run pipeline

In [None]:
# Update the pipeline
!tfx pipeline update \
--pipeline-path=kubeflow_runner.py \
--endpoint={ENDPOINT}

!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

- [Google Container Registry](https://console.cloud.google.com/gcr)
- [Google Kubernetes Engine](https://console.cloud.google.com/kubernetes)
