# Custom Image Models with Kubeflow Pipeline and Vertex AI

**Learning Objectives:**
1. Learn how to use Kubeflow pre-built components
1. Learn how to build a Kubeflow pipeline with these components
1. Learn how to compile, upload, and run a Kubeflow pipeline


In this lab, you will build, deploy, and execute a Kubeflow pipeline that orchestrates the Vertex AI services to train and deploy a simple custom image classification model. 

## Setup

In [1]:
PROJECT = !(gcloud config get-value core/project)
PROJECT = PROJECT[0]
BUCKET = PROJECT + "-flowers"
TRAIN_DATA_PATH = f"gs://{BUCKET}/data/train*"
EVAL_DATA_PATH = f"gs://{BUCKET}/data/eval*"
REGION = "us-central1"
ARTIFACT_STORE = f"gs://{BUCKET}-kfp-artifact-store"

%env BUCKET={BUCKET}
%env PROJECT={PROJECT}
%env REGION={REGION}
%env ARTIFACT_STORE={ARTIFACT_STORE}

env: BUCKET=kylesteckler-instructor-flowers
env: PROJECT=kylesteckler-instructor
env: REGION=us-central1
env: ARTIFACT_STORE=gs://kylesteckler-instructor-flowers-kfp-artifact-store


### Data 
Create the bucket and copy the data neccesary for this lab.

First, create the bucket.

In [2]:
%%bash
exists=$(gsutil ls -d | grep -w gs://${BUCKET}/)

if [ -n "$exists" ]; then
  echo -e "Bucket gs://${BUCKET} already exists."
    
else
   echo "Creating a new GCS bucket."
   gsutil mb -l ${REGION} gs://${BUCKET}
   echo -e "\nHere are your current buckets:"
   gsutil ls
fi

Bucket gs://kylesteckler-instructor-flowers already exists.


Copy the data to your bucket

In [3]:
!gsutil -m cp gs://asl-public/data/flowers/tfrecords/* gs://{BUCKET}/data/

Copying gs://asl-public/data/flowers/tfrecords/eval.tfrecord-00000-of-00003 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/eval.tfrecord-00001-of-00003 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/eval.tfrecord-00002-of-00003 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/train.tfrecord-00007-of-00010 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/train.tfrecord-00003-of-00010 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/train.tfrecord-00002-of-00010 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/train.tfrecord-00008-of-00010 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfrecords/train.tfrecord-00000-of-00010 [Content-Type=application/octet-stream]...
Copying gs://asl-public/data/flowers/tfreco

Make sure the data is available

In [4]:
!gsutil ls {TRAIN_DATA_PATH}

gs://kylesteckler-instructor-flowers/data/train.tfrecord-00000-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00001-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00002-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00003-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00004-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00005-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00006-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00007-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00008-of-00010
gs://kylesteckler-instructor-flowers/data/train.tfrecord-00009-of-00010


In [5]:
!gsutil ls {EVAL_DATA_PATH}

gs://kylesteckler-instructor-flowers/data/eval.tfrecord-00000-of-00003
gs://kylesteckler-instructor-flowers/data/eval.tfrecord-00001-of-00003
gs://kylesteckler-instructor-flowers/data/eval.tfrecord-00002-of-00003


### Create Containerized Training Application
* Create a Tensorflow training application for image classification
* Create a Dockerfile 
* Build and push training application 

In [6]:
!mkdir ./flower_trainer

mkdir: cannot create directory ‘./flower_trainer’: File exists


Create `train.py`, a Python script that builds and trains a Tensorflow/Keras model for image classification on the flowers dataset. This training application leverages a [pre-trained TF Hub model](https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/4) as a feature extractor.

In [7]:
%%writefile ./flower_trainer/train.py

import tensorflow as tf 
import fire 
import tensorflow_hub as hub
import os

AIP_MODEL_DIR = os.environ["AIP_MODEL_DIR"]

def parse_example(example, img_shape):
    feature_description = {
        "image": tf.io.FixedLenFeature([], tf.string),
        "label": tf.io.FixedLenFeature([], tf.int64)
    }
    example = tf.io.parse_single_example(example, feature_description)
    img = tf.io.decode_jpeg(example["image"], channels=3)
    
    # resize and scale
    return tf.image.resize(img, img_shape[:-1])/255.0, example["label"]

def create_dataset(pattern, batch_size, img_shape, mode='train'):
    AUTOTUNE = tf.data.AUTOTUNE

    filenames = tf.io.gfile.glob(pattern) # List of files matching pattern
    ds = tf.data.TFRecordDataset(filenames)
    ds = ds.map(lambda x: parse_example(x, img_shape=img_shape), num_parallel_calls=AUTOTUNE)
    
    # Configure for performance
    ds = ds.cache()
    if mode=='train':
        ds = ds.shuffle(buffer_size=10*batch_size).repeat()
    else:
        ds = ds.repeat(1)
    
    return ds.batch(batch_size).prefetch(buffer_size=AUTOTUNE)

def build_hub_model(
    input_shape,
    dense_units,
    dropout,
    num_classes,
    module_handle
):
    
    inputs = tf.keras.layers.Input(shape=input_shape) # [height, width, channels]
    
    # Pre-trained model from tfhub 
    # trainable=False means frozen weights 
    hub_layer = hub.KerasLayer(module_handle, trainable=False)(inputs)
    
    # Additional learned dense layer 
    dense_layer = tf.keras.layers.Dense(dense_units, activation="relu")(hub_layer)
    x = tf.keras.layers.Dropout(dropout)(dense_layer)
    
    # Output the logits 
    output = tf.keras.layers.Dense(num_classes)(x)
    
    # Instantiate keras model
    model = tf.keras.Model(inputs=inputs, outputs=output)
    
    return model 
    

def train_evaluate(
    train_data_path,
    eval_data_path,
    batch_size,
    num_steps,
    num_evals,
    img_shape=[224,224,3],
    dense_units=16,
    dropout=0.1,
    num_classes=5,
    module_handle="https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/4",
):
    
    train_ds = create_dataset(train_data_path, batch_size=batch_size, img_shape=img_shape)
    val_ds = create_dataset(eval_data_path, batch_size=64, img_shape=img_shape,mode='test')

    steps_per_epoch = num_steps // (num_evals * batch_size)
    
    # Build model 
    model = build_hub_model(
        input_shape=img_shape,
        dense_units=dense_units,
        dropout=dropout,
        num_classes=num_classes,
        module_handle=module_handle
    )
    
    model.compile(
        optimizer='adam',
        loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=['accuracy'])

    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=num_evals,
        steps_per_epoch=steps_per_epoch,
        verbose=2
    )
    
    model.save(AIP_MODEL_DIR)

if __name__ == '__main__':
    fire.Fire(train_evaluate)

Overwriting ./flower_trainer/train.py


Create Dockerfile, then build and push your training application

In [8]:
%%writefile ./flower_trainer/Dockerfile
FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-7

# Installs hypertune library and fire 
RUN pip install -U fire

# Copies the trainer code to the docker image.
WORKDIR /app
COPY train.py .

# Sets up the entry point to invoke the trainer.
ENTRYPOINT ["python", "train.py"]

Overwriting ./flower_trainer/Dockerfile


In [9]:
IMAGE_NAME = "flowers_trainer_tf"
TAG = "latest"
TRAINING_CONTAINER_IMAGE_URI = f"gcr.io/{PROJECT}/{IMAGE_NAME}:{TAG}"
TRAINING_CONTAINER_IMAGE_URI

'gcr.io/kylesteckler-instructor/flowers_trainer_tf:latest'

In [None]:
!gcloud builds submit --machine-type=e2-highcpu-32 --timeout=15m --tag $TRAINING_CONTAINER_IMAGE_URI flower_trainer

### Kubeflow Pipeline
Create a Kubeflow pipeline that leverages Vertex AI services to train and deploy the custom image classification model. Use pre-built components for Vertex AI Training and deploying the trained model to a Vertex AI Endpoint. 

In [10]:
!mkdir ./image_classification_pipeline

mkdir: cannot create directory ‘./image_classification_pipeline’: File exists


In [11]:
%%writefile ./image_classification_pipeline/pipeline.py

import os

from google_cloud_pipeline_components.aiplatform import (
    EndpointCreateOp,
    ModelDeployOp,
    ModelUploadOp,
)

from google_cloud_pipeline_components.experimental.custom_job import (
    CustomTrainingJobOp,
)

from kfp.v2 import dsl

PIPELINE_ROOT = os.getenv("PIPELINE_ROOT")
PROJECT = os.getenv("PROJECT")
REGION = os.getenv("REGION")

TRAINING_CONTAINER_IMAGE_URI = os.getenv("TRAINING_CONTAINER_IMAGE_URI")
SERVING_CONTAINER_IMAGE_URI = os.getenv("SERVING_CONTAINER_IMAGE_URI")
SERVING_MACHINE_TYPE = os.getenv("SERVING_MACHINE_TYPE", "n1-standard-16")

TRAIN_DATA_PATH = os.getenv("TRAIN_DATA_PATH")
EVAL_DATA_PATH = os.getenv("EVAL_DATA_PATH")

BATCH_SIZE = int(os.getenv("BATCH_SIZE", "64"))
NUM_STEPS = int(os.getenv("NUM_STEPS", "10000"))
NUM_EVALS = int(os.getenv("NUM_EVALS", "10"))

PIPELINE_NAME = os.getenv("PIPELINE_NAME", "flower-classification")
BASE_OUTPUT_DIR = os.getenv("BASE_OUTPUT_DIR", PIPELINE_ROOT)
MODEL_DISPLAY_NAME = os.getenv("MODEL_DISPLAY_NAME", "flower_classifier")

@dsl.pipeline(
    name=PIPELINE_NAME,
    description="Kubeflow pipeline that trains and deploys custom image classifier on Vertex AI",
    pipeline_root=PIPELINE_ROOT,
)
def create_pipeline():
    worker_pool_specs = [
        {
            "machine_spec": {
                "machine_type": "n1-standard-16",
                "accelerator_type": "NVIDIA_TESLA_T4",
                "accelerator_count": 1,
            },
            "replica_count": 1,
            "container_spec": {
                "image_uri": TRAINING_CONTAINER_IMAGE_URI,
                "args": [
                    f"--train_data_path={TRAIN_DATA_PATH}",
                    f"--eval_data_path={EVAL_DATA_PATH}",
                    f"--num_steps={NUM_STEPS}",
                    f"--num_evals={NUM_EVALS}",
                    f"--batch_size={BATCH_SIZE}"
                ],
            },
        }
    ]
    
    training_task = CustomTrainingJobOp(
        project=PROJECT,
        location=REGION,
        display_name=f"{PIPELINE_NAME}-kfp-training-job",
        worker_pool_specs=worker_pool_specs,
        base_output_directory=PIPELINE_ROOT,
    )
    
    model_upload_task = ModelUploadOp(
        project=PROJECT,
        display_name=f"{PIPELINE_NAME}-kfp-model-upload-job",
        artifact_uri=f"{PIPELINE_ROOT}/model",
        serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,
    )
    model_upload_task.after(training_task)
    
    endpoint_create_task = EndpointCreateOp(
        project=PROJECT,
        display_name=f"{PIPELINE_NAME}-kfp-create-endpoint-job",
    )
    endpoint_create_task.after(model_upload_task)
    
    model_deploy_op = ModelDeployOp(  # pylint: disable=unused-variable
        model=model_upload_task.outputs["model"],
        endpoint=endpoint_create_task.outputs["endpoint"],
        deployed_model_display_name=MODEL_DISPLAY_NAME,
        dedicated_resources_machine_type=SERVING_MACHINE_TYPE,
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=1,
    )

Writing ./image_classification_pipeline/pipeline.py


### Compile the pipeline
Before we compile the pipeline, make sure that the `ARTIFACT_STORE` has been created, and create it if not

In [12]:
!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}

gs://kylesteckler-instructor-flowers-kfp-artifact-store/


Define the environment variables that will be passed to the pipeline compiler

In [13]:
PIPELINE_ROOT = f"{ARTIFACT_STORE}/pipeline"
SERVING_CONTAINER_IMAGE_URI = (
    "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest"
)

%env PIPELINE_ROOT={PIPELINE_ROOT}
%env SERVING_CONTAINER_IMAGE_URI={SERVING_CONTAINER_IMAGE_URI}
%env TRAIN_DATA_PATH={TRAIN_DATA_PATH}
%env EVAL_DATA_PATH={EVAL_DATA_PATH}
%env TRAINING_CONTAINER_IMAGE_URI={TRAINING_CONTAINER_IMAGE_URI}

env: PIPELINE_ROOT=gs://kylesteckler-instructor-flowers-kfp-artifact-store/pipeline
env: SERVING_CONTAINER_IMAGE_URI=us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest
env: TRAIN_DATA_PATH=gs://kylesteckler-instructor-flowers/data/train*
env: EVAL_DATA_PATH=gs://kylesteckler-instructor-flowers/data/eval*
env: TRAINING_CONTAINER_IMAGE_URI=gcr.io/kylesteckler-instructor/flowers_trainer_tf:latest


Compile the pipeline from the Python file we generated into a JSON description using the following commands

In [14]:
PIPELINE_JSON = "flower_classification_kfp_pipeline.json"

In [15]:
!dsl-compile-v2 --py image_classification_pipeline/pipeline.py --output $PIPELINE_JSON



The result is the pipeline file

In [16]:
!head {PIPELINE_JSON}

{
  "pipelineSpec": {
    "components": {
      "comp-custom-training-job": {
        "executorLabel": "exec-custom-training-job",
        "inputDefinitions": {
          "parameters": {
            "base_output_directory": {
              "type": "STRING"
            },


#### Deploy to Vertex AI Pipelines

In [17]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT, location=REGION)

pipeline = aiplatform.PipelineJob(
    display_name="flower-kfp-pipeline",
    template_path=PIPELINE_JSON,
    enable_caching=False,
)

pipeline.run()

Creating PipelineJob
PipelineJob created. Resource name: projects/335831560329/locations/us-central1/pipelineJobs/flower-classification-20221213182108
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/335831560329/locations/us-central1/pipelineJobs/flower-classification-20221213182108')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/flower-classification-20221213182108?project=335831560329
PipelineJob projects/335831560329/locations/us-central1/pipelineJobs/flower-classification-20221213182108 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/335831560329/locations/us-central1/pipelineJobs/flower-classification-20221213182108 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/335831560329/locations/us-central1/pipelineJobs/flower-classification-20221213182108 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/335831560329/l

Feel free to go to [Vertex AI Pipelines](google.com) to watch the execution of your pipeline.

Congrats! You have succesfully built and deployed a Kubeflow Pipeline that orchestrates Vertex AI Custom Training, Model Upload, Endpoint Creation, and Model Deployment.