# Kubeflow pipelines

**Learning Objectives:**
  1. Learn how to deploy a Kubeflow cluster on GCP
  1. Learn how to create a experiment in Kubeflow
  1. Learn how to package you code into a Kubeflow pipeline
  1. Learn how to run a Kubeflow pipeline in a repeatable and traceable way


## Introduction

In this notebook, we will first setup a Kubeflow cluster on GCP.
Then, we will create a Kubeflow experiment and a Kubflow pipeline from our taxifare machine learning code. At last, we will run the pipeline on the Kubeflow cluster, providing us with a reproducible and traceable way to execute machine learning code.

In [1]:
pip freeze | grep kfp || pip install kfp

Collecting kfp
  Downloading kfp-1.1.1.tar.gz (162 kB)
[K     |████████████████████████████████| 162 kB 9.8 MB/s eta 0:00:01
Collecting kubernetes<12.0.0,>=8.0.0
  Downloading kubernetes-11.0.0-py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 10.0 MB/s eta 0:00:01
Collecting requests_toolbelt>=0.8.0
  Downloading requests_toolbelt-0.9.1-py2.py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 4.6 MB/s  eta 0:00:01
Collecting kfp-server-api<2.0.0,>=0.2.5
  Downloading kfp-server-api-1.0.4.tar.gz (51 kB)
[K     |████████████████████████████████| 51 kB 1.1 MB/s  eta 0:00:01
Collecting Deprecated
  Downloading Deprecated-1.2.10-py2.py3-none-any.whl (8.7 kB)
Collecting strip-hints
  Downloading strip-hints-0.1.9.tar.gz (30 kB)
Collecting docstring-parser>=0.7.3
  Downloading docstring_parser-0.7.3.tar.gz (13 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel

In [2]:
from os import path

import kfp
import kfp.compiler as compiler
import kfp.components as comp
import kfp.dsl as dsl
import kfp.gcp as gcp
import kfp.notebook

## Setup a Kubeflow cluster on GCP

**TODO 1**

To deploy a [Kubeflow](https://www.kubeflow.org/) cluster
in your GCP project, use the [AI Platform pipelines](https://console.cloud.google.com/ai-platform/pipelines):

1. Go to [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines) in the GCP Console.
1. Create a new instance
2. Hit "Configure"
3. Check the box "Allow access to the following Cloud APIs"
1. Hit "Create Cluster"
4. Hit "Deploy"

When the cluster is ready, go back to the AI Platform pipelines page and click on "SETTINGS" entry for your cluster.
This will bring up a pop up with code snippets on how to access the cluster 
programmatically. 

Copy the "host" entry and set the "HOST" variable below with that.


In [47]:
HOST = # TODO: fill in the HOST information for the cluster
BUCKET = # TODO: fill in the GCS bucket

### Authenticate your KFP cluster with a  Kubernetes secret

If you run pipelines that requires calling any GCP services, you need to set the application default credential to a pipeline step by mounting the proper GCP service account token as a Kubernetes secret.

First point your kubectl current context to your cluster. Go back to your [Kubeflow cluster dashboard](https://console.cloud.google.com/ai-platform/pipelines/clusters) or navigate to `Navigation menu > AI Platform > Pipelines` and look to see the cluster name, zone and namespace for the pipeline you deployed above. It's likely called `cluster-1` if this is the first AI Pipelines you've created. 

In [43]:
PROJECT_ID = "<your-gcp-project>"
CLUSTER = "<your-cluster-name>"
ZONE = "<your-cluster-zone>"

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["CLUSTER"] = CLUSTER
os.environ["ZONE"] = ZONE

In [40]:
# Configure kubectl to connect with the cluster
!gcloud container clusters get-credentials "$CLUSTER" --zone "$ZONE" --project "$PROJECT_ID"

Fetching cluster endpoint and auth data.
kubeconfig entry generated for cluster-1.


We'll create a service account called `kfpdemo` with the necessary IAM permissions for our cluster secret. We'll give this service account permissions for any GCP services it might need. This `taxifarre` pipeline needs access to Cloud Storage, so we'll give it the `storage.admin` role and `ml.admin`. Open a Cloud Shell and copy/paste this code in the terminal there. Be sure to replace your PROJECT id in the code below.

```bash
PROJECT=<your-gcp-project-here>

# Create service account
gcloud iam service-accounts create kfpdemo \
  --display-name kfpdemo --project $PROJECT_ID

# Grant permissions to the service account by binding roles
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:kfpdemo@$PROJECT.iam.gserviceaccount.com \
    --role=roles/storage.admin
    
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:kfpdemo@$PROJECT_ID.iam.gserviceaccount.com \
    --role=roles/ml.admin    
```

Then, we'll create and download a key for this service account and store the service account credential as a Kubernetes secret called `user-gcp-sa` in the cluster.

In [None]:
%%bash
gcloud iam service-accounts keys create application_default_credentials.json \
--iam-account kfpdemo@$PROJECT.iam.gserviceaccount.com

In [None]:
# Check that the key was downloaded correctly.
!ls application_default_credentials.json

In [None]:
# Create a k8s secret. If already exists, override.
!kubectl create secret generic user-gcp-sa \
  --from-file=user-gcp-sa.json=application_default_credentials.json \
  -n $NAMESPACE --dry-run -o yaml  |  kubectl apply -f -

## Create an experiment

**TODO 2**

We will start by creating a Kubeflow client to pilot the Kubeflow cluster:

In [48]:
client = kfp.Client(host=HOST)

Let's look at the experiments that are running on this cluster. Since you just launched it, you should see only a single "Default" experiment:

In [49]:
client.list_experiments()

{'experiments': [{'created_at': datetime.datetime(2020, 11, 17, 19, 13, 33, tzinfo=tzlocal()),
                  'description': 'All runs created without specifying an '
                                 'experiment will be grouped here.',
                  'id': 'f242234d-b1fd-4151-beec-d2f24c049996',
                  'name': 'Default',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'},
                 {'created_at': datetime.datetime(2020, 11, 17, 19, 17, 51, tzinfo=tzlocal()),
                  'description': None,
                  'id': '437e8233-5fe3-40bb-8c76-dbd9e1d18fef',
                  'name': 'taxifare',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'}],
 'next_page_token': None,
 'total_size': 2}

Now let's create a 'taxifare' experiment where we could look at all the various runs of our taxifare pipeline:

In [6]:
exp = client.create_experiment(name="taxifare")

Let's make sure the experiment has been created correctly:

In [50]:
client.list_experiments()

{'experiments': [{'created_at': datetime.datetime(2020, 11, 17, 19, 13, 33, tzinfo=tzlocal()),
                  'description': 'All runs created without specifying an '
                                 'experiment will be grouped here.',
                  'id': 'f242234d-b1fd-4151-beec-d2f24c049996',
                  'name': 'Default',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'},
                 {'created_at': datetime.datetime(2020, 11, 17, 19, 17, 51, tzinfo=tzlocal()),
                  'description': None,
                  'id': '437e8233-5fe3-40bb-8c76-dbd9e1d18fef',
                  'name': 'taxifare',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'}],
 'next_page_token': None,
 'total_size': 2}

## Packaging your code into Kubeflow components

We have packaged our taxifare ml pipeline into three components:
* `./components/bq2gcs` that creates the training and evaluation data from BigQuery and exports it to GCS
* `./components/trainjob` that launches the training container on AI-platform and exports the model
* `./components/deploymodel` that deploys the trained model to AI-platform as a REST API

Each of these components has been wrapped into a Docker container, in the same way we did with the taxifare training code in the previous lab.

If you inspect the code in these folders, you'll notice that the `main.py` or `main.sh` files contain the code we previously executed in the notebooks (loading the data to GCS from BQ, or launching a training job to AI-platform, etc.). The last line in the `Dockerfile` tells you that these files are executed when the container is run. 
So we just packaged our ml code into light container images for reproducibility. 

We have made it simple for you to build the container images and push them to the Google Cloud image registry gcr.io in your project:

In [17]:
# Builds the taxifare trainer container in case you skipped the optional part of lab 1
!taxifare/scripts/build.sh

Sending build context to Docker daemon  113.2kB
Step 1/4 : FROM gcr.io/deeplearning-platform-release/tf2-cpu
latest: Pulling from deeplearning-platform-release/tf2-cpu

[1B57c49d0f: Pulling fs layer 
[1B40447d26: Pulling fs layer 
[1B2f862619: Pulling fs layer 
[1B2764011e: Pulling fs layer 
[1B4e2c5db1: Pulling fs layer 
[1B52976526: Pulling fs layer 
[1Bff203efe: Pulling fs layer 
[1Bd74dd489: Pulling fs layer 
[1B11aef694: Pulling fs layer 
[1B5e47c0d1: Pulling fs layer 
[1Be7716976: Pulling fs layer 
[1B37ccffa2: Pulling fs layer 
[1B4903c042: Pulling fs layer 
[1B68d8f6c7: Pulling fs layer 
[1B304a9c74: Pulling fs layer 
[1Bac48815d: Pulling fs layer 
[1Bafb5635c: Pulling fs layer 
[1B55a11a1b: Pulling fs layer 
[1BDigest: sha256:1766be1e60470066cce26bd50acc760a7ba12061519e2da4d093b81e870774a32K[16A[2K[15A[2K[16A[2K[13A[2K[16A[2K[13A[2K[16A[2K[15A[2K[16A[2K[15A[2K[16A[2K[13A[2K[19A[2K[15A[2K[19A[2K[15A[2K[19A[2K[13A[2K[16A[2

In [18]:
# Pushes the taxifare trainer container to gcr/io
!taxifare/scripts/push.sh

The push refers to repository [gcr.io/qwiklabs-gcp-00-568a75dfa3e1/taxifare_training_container]

[1B921820ae: Preparing 
[1B20255c0c: Preparing 
[1B25428eb4: Preparing 
[1B225e785e: Preparing 
[1B18e59128: Preparing 
[1B93aa8921: Preparing 
[1B0c4f31bb: Preparing 
[1Bb5583a70: Preparing 
[1Bce1b9bf5: Preparing 
[1B397c9263: Preparing 
[1B5d53f509: Preparing 
[1B2c0fc305: Preparing 
[1B2aeddb5e: Preparing 
[1B19df8fce: Preparing 
[1B88ee89e7: Preparing 
[1B8f5e88f9: Preparing 
[1Ba40d1ffb: Preparing 
[1B4df0ad6c: Preparing 
[1Bdf553184: Preparing 
[1B02706667: Layer already exists ning-platform-release/tf2-cpu [16A[2K[18A[2K[19A[2K[20A[2K[15A[2K[14A[2K[13A[2K[12A[2K[11A[2K[10A[2K[9A[2K[7A[2K[8A[2K[4A[2K[3A[2K[5A[2K[1A[2Klatest: digest: sha256:0a2d0d43957a3fe43df420ca980461de36a09b82c6d711fc163370c6b0507ce7 size: 4505


In [19]:
# Builds the KF component containers and push them to gcr/io
!cd pipelines && make components

make[1]: Entering directory '/home/jupyter/asl-ml-immersion/notebooks/building_production_ml_systems/solutions/pipelines/components/bq2gcs'
rm: cannot remove './venv': No such file or directory
rm: cannot remove './component.yaml': No such file or directory
OK
Sending build context to Docker daemon   16.9kB
Step 1/6 : FROM google/cloud-sdk:latest
latest: Pulling from google/cloud-sdk

[1Bd3e4f7b0: Pulling fs layer 
[1B81c9382a: Pulling fs layer 
[1Bb1e4ddb7: Pulling fs layer 
[1B05461bdc: Pulling fs layer 
[1BDigest: sha256:12e3701513753324519dcca0cd64c7fd23b00621d08aa91cdbcb2b67c7a27f36[5A[2K[5A[2K[3A[2K[2A[2K[3A[2K[2A[2K[3A[2K[1A[2K[2A[2K[3A[2K[2A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[2A[2K[3A[2K[2A[2K[5A[2K[2A[2K[5A[2K[3A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[3A[2K[5A[2K[4A[2K[4A[2K[3A[2K[4A[2K[3A[2K[4A[2K[3A[2K[4A[2K[3A[2K[

Now that the container images are pushed to the [registry in your project](https://console.cloud.google.com/gcr), we need to create yaml files describing to Kubeflow how to use these containers. It boils down essentially to
* describing what arguments Kubeflow needs to pass to the containers when it runs them
* telling Kubeflow where to fetch the corresponding Docker images

In the cells below, we have three of these "Kubeflow component description files", one for each of our components.

**TODO 3**

**IMPORTANT: Modify the image URI in the cell 
below to reflect that you pushed the images into the gcr.io associated with your project.**

In [34]:
%%writefile bq2gcs.yaml

name: bq2gcs
    
description: |
    This component creates the training and
    validation datasets as BiqQuery tables and export
    them into a Google Cloud Storage bucket at
    gs://qwiklabs-gcp-00-568a75dfa3e1/taxifare/data.
        
inputs:
    - {name: Input Bucket , type: String, description: 'GCS directory path.'}

implementation:
    container:
        image: gcr.io/qwiklabs-gcp-00-568a75dfa3e1/taxifare-bq2gcs
        args: ["--bucket", {inputValue: Input Bucket}]

Overwriting bq2gcs.yaml


In [28]:
%%writefile trainjob.yaml

name: trainjob
    
description: |
    This component trains a model to predict that taxi fare in NY.
    It takes as argument a GCS bucket and expects its training and
    eval data to be at gs://<BUCKET>/taxifare/data/ and will export
    the trained model at  gs://<BUCKET>/taxifare/model/.
        
inputs:
    - {name: Input Bucket , type: String, description: 'GCS directory path.'}

implementation:
    container:
        image: gcr.io/qwiklabs-gcp-00-568a75dfa3e1/taxifare-trainjob
        args: [{inputValue: Input Bucket}]

Overwriting trainjob.yaml


In [29]:
%%writefile deploymodel.yaml

name: deploymodel
    
description: |
    This component deploys a trained taxifare model on GCP as taxifare:dnn.
    It takes as argument a GCS bucket and expects the model to deploy 
    to be found at gs://<BUCKET>/taxifare/model/export/savedmodel/
        
inputs:
    - {name: Input Bucket , type: String, description: 'GCS directory path.'}

implementation:
    container:
        image: gcr.io/qwiklabs-gcp-00-568a75dfa3e1/taxifare-deploymodel
        args: [{inputValue: Input Bucket}]

Overwriting deploymodel.yaml


## Create a Kubeflow pipeline

The code below creates a kubeflow pipeline by decorating a regular function with the
`@dsl.pipeline` decorator. Now the arguments of this decorated function will be
the input parameters of the Kubeflow pipeline.

Inside the function, we describe the pipeline by
* loading the yaml component files we created above into a Kubeflow `op`
* specifying the order into which the Kubeflow ops should be run

In [51]:
# TODO 3
PIPELINE_TAR = "taxifare.tar.gz"
BQ2GCS_YAML = "./bq2gcs.yaml"
TRAINJOB_YAML = "./trainjob.yaml"
DEPLOYMODEL_YAML = "./deploymodel.yaml"


@dsl.pipeline(
    name="Taxifare",
    description="Train a ml model to predict the taxi fare in NY",
)
def pipeline(gcs_bucket_name="<bucket where data and model will be exported>"):

    bq2gcs_op = comp.load_component_from_file(BQ2GCS_YAML)
    bq2gcs = bq2gcs_op(
        input_bucket=gcs_bucket_name,
    )


"""
    trainjob_op = comp.load_component_from_file(TRAINJOB_YAML)
    trainjob = trainjob_op(
        input_bucket=gcs_bucket_name,
    )

    deploymodel_op = comp.load_component_from_file(DEPLOYMODEL_YAML)
    deploymodel = deploymodel_op(
        input_bucket=gcs_bucket_name,
    )

    trainjob.after(bq2gcs)
    deploymodel.after(trainjob)
"""

'\n    trainjob_op = comp.load_component_from_file(TRAINJOB_YAML)\n    trainjob = trainjob_op(\n        input_bucket=gcs_bucket_name,\n    )\n\n    deploymodel_op = comp.load_component_from_file(DEPLOYMODEL_YAML)\n    deploymodel = deploymodel_op(\n        input_bucket=gcs_bucket_name,\n    )\n\n    trainjob.after(bq2gcs)\n    deploymodel.after(trainjob)\n'

The pipeline function above is then used by the Kubeflow compiler to create a Kubeflow pipeline artifact that can be either uploaded to the Kubeflow cluster from the UI, or programatically, as we will do below:

In [52]:
compiler.Compiler().compile(pipeline, PIPELINE_TAR)

In [53]:
ls $PIPELINE_TAR

taxifare.tar.gz


If you untar and uzip this pipeline artifact, you'll see that the compiler has transformed the
Python description of the pipeline into yaml description!

Now let's feed Kubeflow with our pipeline and run it using our client:

In [54]:
# TODO 4
run = client.run_pipeline(
    experiment_id=exp.id,
    job_name="taxifare",
    pipeline_package_path="taxifare.tar.gz",
    params={
        "gcs_bucket_name": BUCKET,
    },
)

Have a look at the link to monitor the run. 

Now all the runs are nicely organized under the experiment in the UI, and new runs can be either manually launched or scheduled through the UI in a completely repeatable and traceable way!