# CI/CD for a Kubeflow pipeline on Vertex AI

**Learning Objectives:**
1. Learn how to create a custom Cloud Build builder to pilote Vertex AI Pipelines
1. Learn how to write a Cloud Build config file to build and push all the artifacts for a KFP
1. Learn how to setup a Cloud Build GitHub trigger a new run of the Kubeflow PIpeline

In this lab you will walk through authoring of a **Cloud Build** CI/CD workflow that automatically builds, deploys, and runs a Kubeflow pipeline on Vertex AI. You will also integrate your workflow with **GitHub** by setting up a trigger that starts the  workflow when a new tag is applied to the **GitHub** repo hosting the pipeline's code.x


In [1]:
import os

## Configuring environment settings

In [2]:
PROJECT_ID = !(gcloud config get-value project)
PROJECT_ID = PROJECT_ID[0]
REGION = "us-central1"
ARTIFACT_STORE = f"gs://{PROJECT_ID}-kfp-artifact-store"
os.environ["REGION"] = REGION
os.environ["ARTIFACT_STORE"] = ARTIFACT_STORE

Let us make sure that the artifact store exists:

In [24]:
!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}

gs://qwiklabs-asl-01-19968276eb55-kfp-artifact-store/


Also, this notebook assumes the dataset is already created and stored in Google Cloud Storage following the instructions covered in the [walkthrough notebook](https://github.com/GoogleCloudPlatform/asl-ml-immersion/blob/master/notebooks/kubeflow_pipelines/walkthrough/solutions/kfp_walkthrough_vertex.ipynb).

If you haven't run it, please run the cell below and create the dataset before running the pipeline.

In [25]:
%%bash
gsutil cp gs://asl-public/data/covertype/training/dataset.csv $ARTIFACT_STORE/data/training/dataset.csv
gsutil cp gs://asl-public/data/covertype/validation/dataset.csv $ARTIFACT_STORE/data/validation/dataset.csv

Copying gs://asl-public/data/covertype/training/dataset.csv [Content-Type=application/octet-stream]...
/ [1 files][  2.1 MiB/  2.1 MiB]                                                
Operation completed over 1 objects/2.1 MiB.                                      
Copying gs://asl-public/data/covertype/validation/dataset.csv [Content-Type=application/octet-stream]...
/ [1 files][529.7 KiB/529.7 KiB]                                                
Operation completed over 1 objects/529.7 KiB.                                    


## Creating the KFP CLI builder for Vertex AI

### Exercise

In the cell below, write a docker file that
* Uses `gcr.io/deeplearning-platform-release/base-cpu` as base image
* Install the python packages `kfp` with version `2.4.0 `, `google-cloud-aiplatform` with version `1.43.0` and `fire`
* Starts `/bin/bash` as entrypoint

In [26]:
%%writefile kfp-cli_vertex/Dockerfile

FROM gcr.io/deeplearning-platform-release/base-cpu
RUN pip install kfp==2.4.0 google-cloud-aiplatform==1.43.0 fire
ENTRYPOINT ["/bin/bash"]

Overwriting kfp-cli_vertex/Dockerfile


### Build the image and push it to your project's **Artifact Registry**.

In [27]:
ARTIFACT_REGISTRY_DIR = "asl-artifact-repo"
KFP_CLI_IMAGE_NAME = "kfp-cli-vertex"
KFP_CLI_IMAGE_URI = f"us-docker.pkg.dev/{PROJECT_ID}/{ARTIFACT_REGISTRY_DIR}/{KFP_CLI_IMAGE_NAME}:latest"
KFP_CLI_IMAGE_URI

'us-docker.pkg.dev/qwiklabs-asl-01-19968276eb55/asl-artifact-repo/kfp-cli-vertex:latest'

### Exercise

In the cell below, use `gcloud builds` to build the `kfp-cli-vertex` Docker image and push it to the project Artifact Registry.

In [28]:
!echo gcloud builds submit --tag $KFP_CLI_IMAGE_URI kfp-cli-vertex
!echo gcloud builds submit --tag {KFP_CLI_IMAGE_URI} kfp-cli_vertex

gcloud builds submit --tag us-docker.pkg.dev/qwiklabs-asl-01-19968276eb55/asl-artifact-repo/kfp-cli-vertex:latest kfp-cli-vertex
gcloud builds submit --tag us-docker.pkg.dev/qwiklabs-asl-01-19968276eb55/asl-artifact-repo/kfp-cli-vertex:latest kfp-cli_vertex


In [30]:
!gcloud builds submit --tag $KFP_CLI_IMAGE_URI $KFP_CLI_IMAGE_NAME


[1;31mERROR:[0m (gcloud.builds.submit) could not find source [kfp-cli-vertex]


## Understanding the **Cloud Build** workflow.

### Exercise

In the cell below, you'll complete the `cloudbuild_vertex.yaml` file describing the CI/CD workflow and prescribing how environment specific settings are abstracted using **Cloud Build** variables.

The CI/CD workflow automates the steps you walked through manually during `lab-02_vertex`:
1. Builds the trainer image
1. Compiles the pipeline
1. Uploads and run the pipeline to the Vertex AI Pipeline environment
1. Pushes the trainer to your project's **Artifact Registry**
 

The **Cloud Build** workflow configuration uses both standard and custom [Cloud Build builders](https://cloud.google.com/cloud-build/docs/cloud-builders). The custom builder encapsulates **KFP CLI**. 

In [None]:
%%writefile cloudbuild_vertex.yaml
# Copyright 2021 Google LLC

# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
# file except in compliance with the License. You may obtain a copy of the License at

# https://www.apache.org/licenses/LICENSE-2.0
    
# Unless required by applicable law or agreed to in writing, software 
# distributed under the License is distributed on an "AS IS"
# BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
# express or implied. See the License for the specific language governing 
# permissions and limitations under the License.

steps:
# Build the trainer image
- name: # TODO
  args: # TODO
  dir:  # TODO


# Push the trainer image, to make it available in the compile step
- name: # TODO
  args: # TODO
  dir:  # TODO


# Compile the pipeline
- name: 'us-docker.pkg.dev/$PROJECT_ID/asl-artifact-repo/kfp-cli-vertex'
  args:
  - '-c'
  - |
    kfp dsl compile # TODO
  env:
  - 'PIPELINE_ROOT=gs://$PROJECT_ID-kfp-artifact-store/pipeline'
  - 'PROJECT_ID=$PROJECT_ID'
  - 'REGION=$_REGION'
  - 'SERVING_CONTAINER_IMAGE_URI=us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest'
  - 'TRAINING_CONTAINER_IMAGE_URI=us-docker.pkg.dev/$PROJECT_ID/asl-artifact-repo/trainer_image_covertype_vertex:latest'
  - 'TRAINING_FILE_PATH=gs://$PROJECT_ID-kfp-artifact-store/data/training/dataset.csv'
  - 'VALIDATION_FILE_PATH=gs://$PROJECT_ID-kfp-artifact-store/data/validation/dataset.csv'
  dir: $_PIPELINE_FOLDER/pipeline_vertex

# Run the pipeline
- name: 'us-docker.pkg.dev/$PROJECT_ID/asl-artifact-repo/kfp-cli-vertex'
  args:
  - '-c'
  - |
    python $_PIPELINE_FOLDER/kfp-cli_vertex/run_pipeline.py  # TODO

logsBucket: 'gs://$PROJECT_ID-cloudbuild'

# Push the images to Artifact Registry
# TODO: List the images to be pushed to the project Docker registry
images: # TODO


# This is required since the pipeline run overflows the default timeout
timeout: 10800s


Let's create a GCS bucket to save the build log.

In [None]:
BUCKET = PROJECT_ID + "-cicd-log"
os.environ["BUCKET"] = BUCKET

In [None]:
%%bash

exists=$(gsutil ls -d | grep -w gs://${BUCKET}/)
if [ -n "$exists" ]; then
    echo -e "Bucket exists, let's not recreate it."
else
    echo "Creating a new GCS bucket."
    gsutil mb -l ${REGION} gs://${BUCKET}
    echo "Here are your current buckets:"
    gsutil ls
fi

## Manually triggering CI/CD runs

You can manually trigger **Cloud Build** runs using the [gcloud builds submit command]( https://cloud.google.com/sdk/gcloud/reference/builds/submit).

In [None]:
SUBSTITUTIONS = f"_REGION={REGION},_PIPELINE_FOLDER=./"
SUBSTITUTIONS

In [None]:
!gcloud builds submit . --config cloudbuild_vertex.yaml --substitutions {SUBSTITUTIONS}

**Note:** If you experience issues with CloudBuild being able to access Vertex AI, you may need to run the following commands in **CloudShell**:

```
PROJECT_ID=$(gcloud config get-value project)
PROJECT_NUMBER=$(gcloud projects list --filter="name=$PROJECT_ID" --format="value(PROJECT_NUMBER)")

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$PROJECT_NUMBER@cloudbuild.gserviceaccount.com \
  --role roles/editor
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com \
    --role roles/storage.objectAdmin
```

## Setting up GitHub integration

## Exercise

In this exercise you integrate your CI/CD workflow with **GitHub**, using [Cloud Build GitHub App](https://github.com/marketplace/google-cloud-build). 
You will set up a trigger that starts the CI/CD workflow when a new tag is applied to the **GitHub** repo managing the  pipeline source code. You will use a fork of this repo as your source GitHub repository.

### Step 1: Create a fork of this repo
[Follow the GitHub documentation](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) to fork [this repo](https://github.com/GoogleCloudPlatform/asl-ml-immersion)

### Step 2: Reflect yaml file change and create a commit

Go to your fork of this repo page, and open `asl-ml-immersion/notebooks/kubeflow_pipelines/cicd/labs/cloudbuild_vertex.yaml` file.

Click Edit button and copy your updated yaml file directly to the page.
![image](https://user-images.githubusercontent.com/6895245/158727133-e5d77f0c-354c-4b2b-a710-8209ee67571f.png)

Click 'Commit changes' button and create a new commit. 
![image](https://user-images.githubusercontent.com/6895245/158727565-13b4981a-8bce-401b-8f1a-d09a33a163a8.png)

### Step 3: Create a **Cloud Build** trigger

Connect the fork you created in the previous step to your Google Cloud project and create a trigger following the steps in the [Creating GitHub app trigger](https://cloud.google.com/cloud-build/docs/create-github-app-triggers) article. Use the following values on the **Edit trigger** form:

|Field|Value|
|-----|-----|
|Name|[YOUR TRIGGER NAME]|
|Description|[YOUR TRIGGER DESCRIPTION]|
|Event| Tag|
|Source| [YOUR FORK]|
|Tag (regex)|.\*|
|Build Configuration|Cloud Build configuration file (yaml or json)|
|Cloud Build configuration file location| ./notebooks/kubeflow_pipelines/cicd/labs/cloudbuild_vertex.yaml|
|Service account| `<PROJECT NUMBER>-compute@developer.gserviceaccount.com` |


Use the following values for the substitution variables:

|Variable|Value|
|--------|-----|
|_REGION|us-central1|
|_PIPELINE_FOLDER|notebooks/kubeflow_pipelines/cicd/labs

### Step 4: Trigger the build

To start an automated build [create a new release of the repo in GitHub](https://help.github.com/en/github/administering-a-repository/creating-releases). Alternatively, you can start the build by applying a tag using `git`. 
```
git tag [TAG NAME]
git push origin --tags
```


After running the command above, a build should have been automatically triggered, which you should able to inspect [here](https://console.cloud.google.com/cloud-build/builds).

Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.