# Orchestrating TFX pipelines on Google Cloud with Vertex Pipelines

## Learning objectives

1.  Use the TFX CLI to build a TFX pipeline container.
2.  Deploy a TFX pipeline container to Vertex Pipelines on Google Cloud.
3.  Create and monitor a TFX pipeline run using the Vertex Pipelines UI.

In this lab, you will utilize the following tools and Google Cloud services to build a TFX pipeline that orchestrates the training and deployment of a TensorFlow classifier to predict forest cover type from tabular cartographic data:

* The [**TFX CLI**](https://www.tensorflow.org/tfx/guide/cli) utility to build and deploy a TFX pipeline.
* [**Vertex Pipelines**](https://cloud.google.com/vertex-ai/docs/pipelines) for TFX pipeline orchestration.
* [**Dataflow**](https://cloud.google.com/dataflow) for scalable, distributed data processing for TFX Beam-based components.
* A [**Vertex Training**](https://cloud.google.com/ai-platform/) job for model training and flock management of tuning trials. 
* [**Vertex Prediction**](https://cloud.google.com/ai-platform/), a model server destination for blessed pipeline model versions.
* [**CloudTuner**](https://www.tensorflow.org/tfx/guide/tuner#tuning_on_google_cloud_platform_gcp) (KerasTuner implementation) and [**Vertex Vizier**](https://cloud.google.com/ai-platform/optimizer/docs/overview) for advanced model hyperparameter tuning.

## Setup

### Define constants

In [3]:
# Add required libraries to Python PATH.
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

env: PATH=/usr/local/cuda/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/jupyter/.local/bin


In [17]:
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
PROJECT_NUMBER= !$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
REGION = 'us-central1'

!echo {PROJECT_ID}
!echo {PROJECT_NUMBER}
!echo {REGION}

dougkelly-vertex-demos
[API [cloudresourcemanager.googleapis.com] not enabled on project , [617979904441]. Would you like to enable and retry (this will take a , few minutes)? (y/N)?  , ERROR: (gcloud.projects.describe) User [617979904441-compute@developer.gserviceaccount.com] does not have permission to access projects instance [dougkelly-vertex-demos] (or it may not exist): Cloud Resource Manager API has not been used in project 617979904441 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/overview?project=617979904441 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry., - '@type': type.googleapis.com/google.rpc.Help,   links:,   - description: Google developers console API activation,     url: https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/overview?project=617979904441, - '@type': type.googleapis.com/

### Configure service accounts for your project for Vertex Pipelines

In [None]:
! gcloud services enable \
compute.googleapis.com \
iam.googleapis.com \
cloudbuild.googleapis.com \
container.googleapis.com \
notebooks.googleapis.com \
aiplatform.googleapis.com \
dataflow.googleapis.com \
bigquery.googleapis.com \
bigquerydatatransfer.googleapis.com \  
artifactregistry.googleapis.com \
cloudresourcemanager.googleapis.com \
cloudtrace.googleapis.com \
iamcredentials.googleapis.com \
monitoring.googleapis.com \
logging.googleapis.com

In [18]:
SERVICE_ACCOUNT_ID=tfx-vertex-pipelines-sa
gcloud iam service-accounts create $SERVICE_ACCOUNT_ID \
    --description="TFX on Google Cloud Vertex Pipelines" \
    --display-name="TFX Vertex Pipelines service account" \
    --project=$PROJECT_ID

Created service account [tfx-vertex-pipelines].


In [None]:
# Add Vertex Pipelines
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:${SERVICE_ACCOUNT_ID}@${PROJECT_ID}.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

In [None]:
# Add BigQuery
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:${SERVICE_ACCOUNT_ID}@${PROJECT_ID}.iam.gserviceaccount.com" \
    --role="roles/bigquery.user"

In [None]:
# Add GCS
gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=serviceAccount:${SERVICE_ACCOUNT_ID}@${PROJECT_ID}.iam.gserviceaccount.com \
    --role=roles/storage.objectAdmin
    
# gsutil iam ch \
# serviceAccount:${SERVICE_ACCOUNT_ID}@${PROJECT_ID}.iam.gserviceaccount.com:roles/storage.objectCreator \
# $BUCKET_NAME

# gsutil iam ch \
# serviceAccount:${SERVICE_ACCOUNT_ID}@${PROJECT_ID}.iam.gserviceaccount.com:roles/storage.objectViewer \
# $BUCKET_NAME

# https://cloud.google.com/vertex-ai/docs/pipelines/configure-project
gcloud iam service-accounts add-iam-policy-binding \
    $SERVICE_ACCOUNT_ID@$PROJECT_ID.iam.gserviceaccount.com \
    --member="user:dougkelly@google.com" \
    --role="roles/iam.serviceAccountUser"

### Create a storage bucket to store pipeline artifacts

In [None]:
GCS_BUCKET = f"gs://{PROJECT_ID}-bucket"

ARTIFACT_STORE_URI = os.path.join(GCS_LOCATION, "tfx_artifacts")

!echo {BUCKET_NAME}

In [10]:
!gsutil ls -al $BUCKET_NAME

gs://cloud-ai-platform-e2834baf-1af1-45b3-9237-b58dbefadbbe/
gs://dougkelly-vertex-demos-bucket/


### Import libraries

In [9]:
import tensorflow as tf
from tfx import v1 as tfx
import kfp

print('TensorFlow version: {}'.format(tf.__version__))
print('TFX version: {}'.format(tfx.__version__))
print('KFP version: {}'.format(kfp.__version__))



TensorFlow version: 2.4.2
TFX version: 0.30.1
KFP version: 1.6.2


## Review the TFX pipeline design pattern for Google Cloud

In [7]:
%cd pipeline

/home/jupyter/training-data-analyst/self-paced-labs/vertex-pipelines/tfx/pipeline


In [8]:
!ls -la

total 8
drwxr-xr-x 2 jupyter jupyter 4096 Jun 27 18:18 .
drwxr-xr-x 4 jupyter jupyter 4096 Jun 27 20:43 ..
