In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# E2E ML on GCP: MLOps stage 3 : formalization: get started with Vertex AI custom training pipeline components

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage3/get_started_with_custom_training_pipeline_components.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>  
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage3/get_started_with_custom_training_pipeline_components.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/community/ml_ops/stage3/get_started_with_custom_training_pipeline_components.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview


This tutorial demonstrates how to use Vertex AI for E2E MLOps on Google Cloud in production. This tutorial covers stage 3 : formalization: get started with custom training pipeline components.

### Objective

In this tutorial, you learn how to use prebuilt `Google Cloud Pipeline Components` for `Vertex AI Training`.

This tutorial uses the following Google Cloud ML services:

- `Vertex AI Pipelines`
- `Vertex AI Training`
- `Google Cloud Pipeline Components`
- `Vertex AI Dataset, Model and Endpoint` resources
- `Vertex AI Prediction`

The steps performed include:

- Construct a pipeline for:
    - Training a Vertex AI custom trained model.
    - Test the serving binary with a batch prediction job.
    - Deploying a Vertex AI custom trained model.
- Execute a Vertex AI pipeline.
- Construct a pipeline for:
     - Construct a custom training component.
     - Convert custom training component to CustomTrainingJobOp.
     - Training a Vertex AI custom trained model using the converted component.
    - Deploying a Vertex AI custom trained model.
- Execute a Vertex AI pipeline.

### Dataset

The dataset used for this tutorial is the [Flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers) from [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview). The version of the dataset in this tutorial is stored in a public Cloud Storage bucket. The trained model predicts the type of flower an image is from a class of five flowers: daisy, dandelion, rose, sunflower, or tulip.

### Costs 


This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage


Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installations

Install the required packages for executing the notebook.

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME") and not os.getenv("VIRTUAL_ENV")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install -U tensorflow $USER_FLAG -q
! pip3 install --upgrade google-cloud-aiplatform[tensorboard] $USER_FLAG -q
! pip3 install --upgrade google-cloud-pipeline-components $USER_FLAG -q
! pip3 install --upgrade kfp $USER_FLAG -q

### Restart the kernel

Once you've installed the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,dataflow.googleapis.com).

1. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "[your-region]"  # @param {type:"string"}
if REGION == "[your-region]":
    REGION = "us-central1"

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account


**If you are using Vertex AI Workbench Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = False
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        IS_COLAB = True
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When you initialize the Vertex SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_URI == "" or BUCKET_URI is None or BUCKET_URI == "gs://[your-bucket-name]":
    BUCKET_URI = "gs://" + PROJECT_ID + "aip-" + TIMESTAMP

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

#### Service Account

**If you don't know your service account**, try to get your service account using `gcloud` command by executing the second cell below.

In [None]:
SERVICE_ACCOUNT = "[your-service-account]"  # @param {type:"string"}

In [None]:
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your service account from gcloud
    if not IS_COLAB:
        shell_output = !gcloud auth list 2>/dev/null
        SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()

    if IS_COLAB:
        shell_output = ! gcloud projects describe  $PROJECT_ID
        # print("shell_output=", shell_output)
        project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
        SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

    print("Service Account:", SERVICE_ACCOUNT)

#### Set service account access for Vertex AI Pipelines

Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step -- you only need to run these once per service account.

In [None]:
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI

! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

In [None]:
import google.cloud.aiplatform as aip

#### Import TensorFlow

Import the TensorFlow package into your Python environment.

In [None]:
import tensorflow as tf

In [None]:
import json

from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import component

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aip.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

#### Set hardware accelerators

You can set hardware accelerators for training and prediction.

Set the variables `TRAIN_GPU/TRAIN_NGPU` and `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Telsa K80 GPUs allocated to each VM, you would specify:

    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)


Otherwise specify `(None, None)` to use a container image to run on a CPU.

Learn more about [hardware accelerator support for your region](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators).

*Note*: TF releases before 2.3 for GPU support will fail to load the custom model in this tutorial. It is a known issue and fixed in TF 2.3. This is caused by static graph ops that are generated in the serving function. If you encounter this issue on your own custom models, use a container image for TF 2.3 with GPU support.

In [None]:
if os.getenv("IS_TESTING_TRAIN_GPU"):
    TRAIN_GPU, TRAIN_NGPU = (
        aip.gapic.AcceleratorType.NVIDIA_TESLA_K80,
        int(os.getenv("IS_TESTING_TRAIN_GPU")),
    )
else:
    TRAIN_GPU, TRAIN_NGPU = (None, None)

if os.getenv("IS_TESTING_DEPLOY_GPU"):
    DEPLOY_GPU, DEPLOY_NGPU = (
        aip.gapic.AcceleratorType.NVIDIA_TESLA_K80,
        int(os.getenv("IS_TESTING_DEPLOY_GPU")),
    )
else:
    DEPLOY_GPU, DEPLOY_NGPU = (None, None)

#### Set pre-built containers

Set the pre-built Docker container image for training and prediction.


For the latest list, see [Pre-built containers for training](https://cloud.google.com/ai-platform-unified/docs/training/pre-built-containers).


For the latest list, see [Pre-built containers for prediction](https://cloud.google.com/ai-platform-unified/docs/predictions/pre-built-containers).

In [None]:
if os.getenv("IS_TESTING_TF"):
    TF = os.getenv("IS_TESTING_TF")
else:
    TF = "2.5".replace(".", "-")

if TF[0] == "2":
    if TRAIN_GPU:
        TRAIN_VERSION = "tf-gpu.{}".format(TF)
    else:
        TRAIN_VERSION = "tf-cpu.{}".format(TF)
    if DEPLOY_GPU:
        DEPLOY_VERSION = "tf2-gpu.{}".format(TF)
    else:
        DEPLOY_VERSION = "tf2-cpu.{}".format(TF)
else:
    if TRAIN_GPU:
        TRAIN_VERSION = "tf-gpu.{}".format(TF)
    else:
        TRAIN_VERSION = "tf-cpu.{}".format(TF)
    if DEPLOY_GPU:
        DEPLOY_VERSION = "tf-gpu.{}".format(TF)
    else:
        DEPLOY_VERSION = "tf-cpu.{}".format(TF)

TRAIN_IMAGE = "{}-docker.pkg.dev/vertex-ai/training/{}:latest".format(
    REGION.split("-")[0], TRAIN_VERSION
)
DEPLOY_IMAGE = "{}-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(
    REGION.split("-")[0], DEPLOY_VERSION
)

print("Training:", TRAIN_IMAGE, TRAIN_GPU, TRAIN_NGPU)
print("Deployment:", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)

#### Set machine type

Next, set the machine type to use for training and prediction.

- Set the variables `TRAIN_COMPUTE` and `DEPLOY_COMPUTE` to configure  the compute resources for the VMs you will use for for training and prediction.
 - `machine type`
     - `n1-standard`: 3.75GB of memory per vCPU.
     - `n1-highmem`: 6.5GB of memory per vCPU
     - `n1-highcpu`: 0.9 GB of memory per vCPU
 - `vCPUs`: number of \[2, 4, 8, 16, 32, 64, 96 \]

*Note: The following is not supported for training:*

 - `standard`: 2 vCPUs
 - `highcpu`: 2, 4 and 8 vCPUs

*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*.

In [None]:
if os.getenv("IS_TESTING_TRAIN_MACHINE"):
    MACHINE_TYPE = os.getenv("IS_TESTING_TRAIN_MACHINE")
else:
    MACHINE_TYPE = "n1-standard"

VCPU = "4"
TRAIN_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", TRAIN_COMPUTE)

if os.getenv("IS_TESTING_DEPLOY_MACHINE"):
    MACHINE_TYPE = os.getenv("IS_TESTING_DEPLOY_MACHINE")
else:
    MACHINE_TYPE = "n1-standard"

VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Deploy machine type", DEPLOY_COMPUTE)

#### Location of Cloud Storage training data.

Now set the variable `IMPORT_FILE` to the location of the CSV index file in Cloud Storage.

In [None]:
IMPORT_FILE = (
    "gs://cloud-samples-data/vision/automl_classification/flowers/all_data_v2.csv"
)

### Examine the training package

#### Package layout

Before you start the training, you will look at how a Python package is assembled for a custom training job. When unarchived, the package contains the following directory/file layout.

- PKG-INFO
- README.md
- setup.cfg
- setup.py
- trainer
  - \_\_init\_\_.py
  - task.py

The files `setup.cfg` and `setup.py` are the instructions for installing the package into the operating environment of the Docker image.

The file `trainer/task.py` is the Python script for executing the custom training job. *Note*, when we referred to it in the worker pool specification, we replace the directory slash with a dot (`trainer.task`) and dropped the file suffix (`.py`).

#### Package Assembly

In the following cells, you will assemble the training package.

In [None]:
# Make folder for Python training script
! rm -rf custom
! mkdir custom

# Add package information
! touch custom/README.md

setup_cfg = "[egg_info]\n\ntag_build =\n\ntag_date = 0"
! echo "$setup_cfg" > custom/setup.cfg

setup_py = "import setuptools\n\nsetuptools.setup(\n\n    install_requires=[\n\n        'tensorflow==2.5.0',\n\n        'tensorflow_datasets==1.3.0',\n\n    ],\n\n    packages=setuptools.find_packages())"
! echo "$setup_py" > custom/setup.py

pkg_info = "Metadata-Version: 1.0\n\nName: Flowers image classification\n\nVersion: 0.0.0\n\nSummary: Demostration training script\n\nHome-page: www.google.com\n\nAuthor: Google\n\nAuthor-email: aferlitsch@google.com\n\nLicense: Public\n\nDescription: Demo\n\nPlatform: Vertex"
! echo "$pkg_info" > custom/PKG-INFO

# Make the training subfolder
! mkdir custom/trainer
! touch custom/trainer/__init__.py

### Create the task script for the Python training package

Next, you create the `task.py` script for driving the training package. Some noteable steps include:

- Command-line arguments:
    - `data-format` The format of the data. In this example, the data is exported from an `ImageDataSet` and will be in a JSONL format.
    - `train-data-dir`, `val-data-dir`, `test-data-dir`: The Cloud Storage locations of the train, validation and test data. When using Vertex AI custom training, these locations will be specified in the corresponding environment variables: `AIP_TRAINING_DATA_URI`, `AIP_VALIDATION_DATA_URI`, and `AIP_TEST_DATA_URI`.
    - `model-dir`: The location to save the trained model. When using Vertex AI custom training, the location will be specified in the environment variable: `AIP_MODEL_DIR`,
    - `distributr`: single, mirrored or distributed training strategy.
- Data preprocessing (`get_data()`):
    - Compiles the one or more JSONL data files for a dataset, and constructs a `tf.data.Dataset()` generator for data preprocessing and model feeding.
- Model architecture (`get_model()`):
    - Builds the corresponding model architecture.
- Training (`train_model()`):
    - Trains the model
- Model artifact saving
    - Saves the model artifacts where the Cloud Storage location is determined based on the type of distribution training strategy.

In [None]:
%%writefile custom/trainer/task.py
import tensorflow as tf
from tensorflow.python.client import device_lib
import argparse
import os
import sys
import json
import logging
import tqdm
import hypertune as hpt

def parse_args():
    parser = argparse.ArgumentParser(description="TF.Keras Image Classification")

    # data source
    parser.add_argument("--data-format", default=os.getenv('AIP_DATA_FORMAT'), dest="data_format", type=str, help="data format")
    parser.add_argument("--train-data-dir", default=os.getenv('AIP_TRAINING_DATA_URI'), dest="train_data_dir", type=str, help="train data directory")
    parser.add_argument("--val-data-dir", default=os.getenv('AIP_VALIDATION_DATA_URI'), dest="val_data_dir", type=str, help="validation data directory")
    parser.add_argument("--test-data-dir", default=os.getenv('AIP_TEST_DATA_URI'), dest="test_data_dir", type=str, help="test data directory")

    # data preprocessing
    parser.add_argument("--image-width", dest="image_width", default=32, type=int, help="image width")
    parser.add_argument("--image-height", dest="image_height", default=32, type=int, help="image height")

    # model artifact location
    parser.add_argument(
        "--model-dir",
        default=os.getenv("AIP_MODEL_DIR"),
        type=str,
        help="model directory",
    )

    # training hyperparameters
    parser.add_argument(
        "--lr", dest="lr", default=0.01, type=float, help="Learning rate."
    )
    parser.add_argument(
        "--batch-size", dest="batch_size", default=16, type=int, help="mini-batch size"
    )
    parser.add_argument(
        "--epochs", default=10, type=int, help="number of training epochs"
    )
    parser.add_argument(
        "--steps",
        dest="steps",
        default=200,
        type=int,
        help="Number of steps per epoch.",
    )
    parser.add_argument(
        "--distribute",
        dest="distribute",
        type=str,
        default="single",
        help="distributed training strategy",
    )

    parser.add_argument(
        "--tuning",
        dest="tuning",
        type=bool,
        default=False,
        help="hyperparameter tuning"
    )

    args = parser.parse_args()
    return args


args = parse_args()

logging.getLogger().setLevel(logging.DEBUG)
logging.info('DEVICES'  + str(device_lib.list_local_devices()))

# Single Machine, single compute device
if args.distribute == 'single':
    if tf.test.is_gpu_available():
        strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")
    else:
        strategy = tf.distribute.OneDeviceStrategy(device="/cpu:0")
    logging.info("Single device training")
# Single Machine, multiple compute device
elif args.distribute == 'mirrored':
    strategy = tf.distribute.MirroredStrategy()
    logging.info("Mirrored Strategy distributed training")
# Multi Machine, multiple compute device
elif args.distribute == 'multiworker':
    strategy = tf.distribute.MultiWorkerMirroredStrategy()
    logging.info("Multi-worker Strategy distributed training")
    logging.info('TF_CONFIG = {}'.format(os.environ.get('TF_CONFIG', 'Not found')))

logging.info('num_replicas_in_sync = {}'.format(strategy.num_replicas_in_sync))

NUM_WORKERS = strategy.num_replicas_in_sync
GLOBAL_BATCH_SIZE = args.batch_size * NUM_WORKERS


def _is_chief(task_type, task_id):
    ''' Check for primary if multiworker training
    '''
    return (task_type == 'chief') or (task_type == 'worker' and task_id == 0) or task_type is None


def get_data():
    logging.info('DATA_FORMAT ' + args.data_format)
    logging.info('TRAINING_DATA_URI ' + args.train_data_dir)
    logging.info('VALIDATION_DATA_URI ' + args.val_data_dir)
    logging.info('TEST_DATA_URI ' + args.test_data_dir)

    class_names = ["daisy", "dandelion", "roses", "sunflowers", "tulips"]
    class_indices = dict(zip(class_names, range(len(class_names))))
    num_classes = len(class_names)

    GLOBAL_BATCH_SIZE = args.batch_size * NUM_WORKERS

    def parse_image(filename):
        image = tf.io.read_file(filename)
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.resize(image, [args.image_width, args.image_height])
        return image

    def scale(image, label):
        image = tf.cast(image, tf.float32)
        image /= 255.0
        return image, label

    def extract(data_dir, batch_size=GLOBAL_BATCH_SIZE, repeat=True):
        data = []
        labels = []
        for data_uri in tqdm.tqdm(tf.io.gfile.glob(pattern=data_dir)):
            with tf.io.gfile.GFile(name=data_uri, mode="r") as gfile:
                for line in gfile.readlines():
                    instance = json.loads(line)
                    data.append(instance["imageGcsUri"])
                    classification_annotation = instance["classificationAnnotations"][0]
                    label = classification_annotation["displayName"]
                    labels.append(class_indices[label])

        data_dataset = tf.data.Dataset.from_tensor_slices(data)
        data_dataset = data_dataset.map(
            parse_image, num_parallel_calls=tf.data.experimental.AUTOTUNE
        )

        label_dataset = tf.data.Dataset.from_tensor_slices(labels)
        label_dataset = label_dataset.map(lambda x: tf.one_hot(x, num_classes))

        dataset = tf.data.Dataset.zip((data_dataset, label_dataset)).map(scale).cache().shuffle(batch_size * 32)
        if repeat:
            dataset = dataset.repeat()
        dataset = dataset.batch(batch_size)

        # Add property to retain the class names
        dataset.class_names = class_names

        return dataset


    logging.info('Prepare training data')
    train_dataset = extract(args.train_data_dir)

    logging.info('Prepare validation data')
    val_dataset = extract(args.val_data_dir, batch_size=1, repeat=False)

    return num_classes, train_dataset, val_dataset


def get_model(num_classes):
    logging.info("Get model architecture")
    model = tf.keras.Sequential(
        [
            tf.keras.layers.Conv2D(
                32, 3, activation="relu", input_shape=(args.image_width, args.image_height, 3)
            ),
            tf.keras.layers.MaxPooling2D(),
            tf.keras.layers.Conv2D(32, 3, activation="relu"),
            tf.keras.layers.MaxPooling2D(),
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(num_classes, activation="softmax"),
        ]
    )
    model.compile(
        loss=tf.keras.losses.categorical_crossentropy,
        optimizer=tf.keras.optimizers.SGD(learning_rate=args.lr),
        metrics=["accuracy"],
    )
    return model

def train_model(model, train_dataset, val_dataset):
    logging.info("Start model training")
    history = model.fit(
        x=train_dataset, epochs=args.epochs, steps_per_epoch=args.steps, batch_size=args.batch_size, validation_data=val_dataset
    )

    if args.tuning:
        hp_metric = history.history['val_accuracy'][-1]

        hpt = hypertune.HyperTune()
        hpt.report_hyperparameter_tuning_metric(
            hyperparameter_metric_tag='accuracy',
            metric_value=hp_metric,
            global_step=args.epochs
        )

    return history

num_classes, train_dataset, val_dataset = get_data()
with strategy.scope():
    model = get_model(num_classes=num_classes)
history = train_model(model, train_dataset, val_dataset)

logging.info("Save the model to: " + args.model_dir)
if args.distribute == 'multiworker':
    task_type, task_id = (strategy.cluster_resolver.task_type,
                              strategy.cluster_resolver.task_id)
else:
    task_type, task_id = None, None

# single, mirrored or primary for multiworker
if _is_chief(task_type, task_id):
    model.save(args.model_dir)
# non-primary workers for multi-workers
else:
    # each worker saves their model instance to a unique temp location
    worker_dir = args.model_dir + '/workertemp_' + str(task_id)
    tf.io.gfile.makedirs(worker_dir)
    model.save(worker_dir)

#### Store training script on your Cloud Storage bucket

Next, you package the training folder into a compressed tar ball, and then store it in your Cloud Storage bucket.

In [None]:
! rm -f custom.tar custom.tar.gz
! tar cvf custom.tar custom
! gzip custom.tar
! gsutil cp custom.tar.gz $BUCKET_URI/trainer_flowers.tar.gz

## Construct custom training pipeline

In the example below, you construct a pipeline for training a custom model using pre-built Google Cloud Pipeline Components for Vertex AI Training, as follows:


1. Pipeline arguments, specify the locations of:
    - `import_file`: The CSV index file for the dataset.
    - `python_package`: The custom training Python package.
    - `python_module`: The entry module in the package to execute.


2. Use the prebuilt component `ImageDatasetCreateOp` to create a Vertex AI Dataset resource, where:
    - The display name for the dataset is passed into the pipeline.
    - The import file for the dataset is passed into the pipeline.
    - The component returns the dataset resource as `outputs["dataset"]`


3. Use the prebuilt component `CustomPythonPackageTrainingJobRunOp` to train a custom model and upload the custom model as a Vertex AI Model resource, where:
    - The display name for the dataset is passed into the pipeline.
    - The dataset is the output from the `ImageDatasetCreateOp`.
    - The python package, command line argument are passed into the pipeline.
    - The training and serving containers are specified in the pipeline definition.
    - The component returns the model resource as `outputs["model"]`.


4. Use the prebuild component `ModelBatchPredictOp` to do a test batch prediction, where:
    - The model is the output from the `CustomPythonPackageTrainingJobRunOp`.


5. Use the prebuilt component `EndpointCreateOp` to create a Vertex AI Endpoint to deploy the trained model to, where:
    - Since the component has no dependencies on other components, by default it would be executed in parallel with the model training.
    - The `after(training_op)` is added to serialize its execution, so its only executed if the training operation completes successfully.
     - The component returns the endpoint resource as `outputs["endpoint"]`.


6. Use the prebuilt component `ModelDeployOp` to deploy the trained Vertex AI model to, where:
    - The display name for the dataset is passed into the pipeline.
    - The model is the output from the `CustomPythonPackageTrainingJobRunOp`.
    - The endpoint is the output from the `EndpointCreateOp`

*Note:* Since each component is executed as a graph node in its own execution context, you pass the parameter `project` for each component op, in constrast to doing a `aip.init(project=project)` if this was a Python script calling the SDK methods directly within the same execution context.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/custom_icn_training".format(BUCKET_URI)
DEPLOY_COMPUTE = "n1-standard-4"

# If TRAIN_GPU is None, keeping gpu as no accelerator and accelerator_count as 0
gpu = "ACCELERATOR_TYPE_UNSPECIFIED"
accelerator_count = 0

if TRAIN_GPU:
    gpu = TRAIN_GPU.name
    accelerator_count = 1
else:
    gpu = "ACCELERATOR_TYPE_UNSPECIFIED"  # Unspecified accelerator type, which means no accelerator.
    accelerator_count = 0


@dsl.pipeline(
    name="flowers-custom-training",
    description="Custom image image classification training",
)
def pipeline(
    import_file: str,
    display_name: str,
    batch_files: list,
    python_package: str,
    python_module: str,
    bucket: str = PIPELINE_ROOT,
    project: str = PROJECT_ID,
    region: str = REGION,
):
    from google_cloud_pipeline_components import aiplatform as gcc_aip
    from google_cloud_pipeline_components.v1.batch_predict_job import \
        ModelBatchPredictOp
    from google_cloud_pipeline_components.v1.endpoint import (EndpointCreateOp,
                                                              ModelDeployOp)

    dataset_op = gcc_aip.ImageDatasetCreateOp(
        project=project,
        display_name=display_name,
        gcs_source=import_file,
        import_schema_uri=aip.schema.dataset.ioformat.image.single_label_classification,
    )

    training_op = gcc_aip.CustomPythonPackageTrainingJobRunOp(
        project=project,
        display_name=display_name,
        dataset=dataset_op.outputs["dataset"],
        # Training
        python_package_gcs_uri=python_package,
        python_module_name=python_module,
        container_uri=TRAIN_IMAGE,
        staging_bucket=PIPELINE_ROOT,
        annotation_schema_uri=aip.schema.dataset.annotation.image.classification,
        args=["--epochs", "50", "--image-width", "32", "--image-height", "32"],
        replica_count=1,
        machine_type=TRAIN_COMPUTE,
        accelerator_type=gpu,
        accelerator_count=accelerator_count,
        # Serving - As part of this operation, the model is registered to Vertex AI
        model_serving_container_image_uri=DEPLOY_IMAGE,
        model_display_name=display_name,
    )

    batch_op = ModelBatchPredictOp(
        project=project,
        job_display_name="batch_predict_job",
        model=training_op.outputs["model"],
        gcs_source_uris=batch_files,
        gcs_destination_output_uri_prefix=bucket,
        instances_format="jsonl",
        predictions_format="jsonl",
        model_parameters={},
        machine_type=DEPLOY_COMPUTE,
        starting_replica_count=1,
        max_replica_count=1,
    )

    endpoint_op = EndpointCreateOp(
        project=project,
        location=region,
        display_name=display_name,
    ).after(batch_op)

    _ = ModelDeployOp(
        model=training_op.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=1,
        dedicated_resources_machine_type=DEPLOY_COMPUTE,
    )

### Get test item(s)

Now do a batch prediction to your Vertex model. You will use arbitrary examples out of the dataset as a test items. Don't be concerned that the examples were likely used in training the model -- we just want to demonstrate how to make a prediction.

In [None]:
test_items = !gsutil cat $IMPORT_FILE | head -n2
if len(str(test_items[0]).split(",")) == 3:
    _, test_item_1, test_label_1 = str(test_items[0]).split(",")
    _, test_item_2, test_label_2 = str(test_items[1]).split(",")
else:
    test_item_1, test_label_1 = str(test_items[0]).split(",")
    test_item_2, test_label_2 = str(test_items[1]).split(",")

print(test_item_1, test_label_1)
print(test_item_2, test_label_2)

### Copy test item(s)

For the batch prediction, copy the test items over to your Cloud Storage bucket.

In [None]:
file_1 = test_item_1.split("/")[-1]
file_2 = test_item_2.split("/")[-1]

! gsutil cp $test_item_1 $BUCKET_URI/$file_1
! gsutil cp $test_item_2 $BUCKET_URI/$file_2

test_item_1 = BUCKET_URI + "/" + file_1
test_item_2 = BUCKET_URI + "/" + file_2

### Make the batch input file

Now make a batch input file, which you will store in your local Cloud Storage bucket. The batch input file can only be in JSONL. For JSONL file, you make one dictionary entry per line for each data item (instance). The dictionary contains the key/value pairs:

- `content`: The Cloud Storage path to the image.
- `mime_type`: The content type. In our example, it is a `jpeg` file.

For example:

                        {'content': '[your-bucket]/file1.jpg', 'mime_type': 'jpeg'}

In [None]:
gcs_input_uri = BUCKET_URI + "/test.jsonl"
with tf.io.gfile.GFile(gcs_input_uri, "w") as f:
    data = {"content": test_item_1, "mime_type": "image/jpeg"}
    f.write(json.dumps(data) + "\n")
    data = {"content": test_item_2, "mime_type": "image/jpeg"}
    f.write(json.dumps(data) + "\n")

print(gcs_input_uri)
! gsutil cat $gcs_input_uri

### Compile and execute the pipeline

Next, you compile the pipeline and then exeute it. The pipeline takes the following parameters, which are passed as the dictionary `parameter_values`:

- `import_file`: The Cloud Storage path to the dataset index file.
- `batch_files`: A list of Cloud Storage paths to the input batch files.
- `display_name`: The display name for the generated Vertex AI resources.
- `python_package`: The Python package for the custom training job.
- `python_module`: The Python module in the package to execute.
- `project`: The project ID.
- `region`: The region.

In [None]:
compiler.Compiler().compile(
    pipeline_func=pipeline, package_path="custom_icn_training.json"
)

pipeline = aip.PipelineJob(
    display_name="flowers-custom_training",
    template_path="custom_icn_training.json",
    pipeline_root=PIPELINE_ROOT,
    parameter_values={
        "import_file": IMPORT_FILE,
        "batch_files": [gcs_input_uri],
        "display_name": "flowers" + TIMESTAMP,
        "python_package": f"{BUCKET_URI}/trainer_flowers.tar.gz",
        "python_module": "trainer.task",
        "project": PROJECT_ID,
        "region": REGION,
    },
)

pipeline.run()

! rm -f custom_icn_training.json

### View custom model training pipeline results

Finally, you will view the artifact outputs of each task in the pipeline.

In [None]:
PROJECT_NUMBER = pipeline.gca_resource.name.split("/")[1]
print(PROJECT_NUMBER)


def print_pipeline_output(job, output_task_name):
    JOB_ID = job.name
    print(JOB_ID)
    for _ in range(len(job.gca_resource.job_detail.task_details)):
        TASK_ID = job.gca_resource.job_detail.task_details[_].task_id
        EXECUTE_OUTPUT = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/executor_output.json"
        )
        GCP_RESOURCES = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/gcp_resources"
        )
        EVAL_METRICS = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/evaluation_metrics"
        )
        if tf.io.gfile.exists(EXECUTE_OUTPUT):
            ! gsutil cat $EXECUTE_OUTPUT
            return EXECUTE_OUTPUT
        elif tf.io.gfile.exists(GCP_RESOURCES):
            ! gsutil cat $GCP_RESOURCES
            return GCP_RESOURCES
        elif tf.io.gfile.exists(EVAL_METRICS):
            ! gsutil cat $EVAL_METRICS
            return EVAL_METRICS

    return None


print("image-dataset-create")
artifacts = print_pipeline_output(pipeline, "image-dataset-create")
print("\n\n")
output = !gsutil cat $artifacts
output = json.loads(output[0])
dataset_id = output["artifacts"]["dataset"]["artifacts"][0]["metadata"]["resourceName"]
print("\n\n")
print("custompythonpackagetrainingjob-run")
artifacts = print_pipeline_output(pipeline, "custompythonpackagetrainingjob-run")
print("\n\n")
output = !gsutil cat $artifacts
output = json.loads(output[0])
model_id = output["artifacts"]["model"]["artifacts"][0]["metadata"]["resourceName"]
print("\n\n")
print("endpoint-create")
artifacts = print_pipeline_output(pipeline, "endpoint-create")
print("\n\n")
output = !gsutil cat $artifacts
output = json.loads(output[0])
endpoint_id = output["artifacts"]["endpoint"]["artifacts"][0]["metadata"][
    "resourceName"
]
print("model-deploy")
artifacts = print_pipeline_output(pipeline, "model-deploy")
print("\n\n")
print("model-batch-predict")
artifacts = print_pipeline_output(pipeline, "model-batch-predict")
output = !gsutil cat $artifacts
output = json.loads(output[0])
print("\n\n")
print(
    output["artifacts"]["batchpredictionjob"]["artifacts"][0]["metadata"][
        "gcsOutputDirectory"
    ]
)
output = !gsutil cat $artifacts
output = json.loads(output[0])
batch_job_id = output["artifacts"]["batchpredictionjob"]["artifacts"][0]["metadata"][
    "resourceName"
]
print("\n\n")

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the method `delete()`.  Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

#### Delete the Vertex AI Model, Endpoint and BatchPredictionJob resources

Next, delete the daatset, undelpoy and delete the Vertex AI Model, Endpoint and BathPredictionJob resources.

In [None]:
dataset = aip.ImageDataset(dataset_id)
try:
    dataset.delete()
except:
    pass


endpoint = aip.Endpoint(endpoint_id)
endpoint.undeploy_all()
try:
    endpoint.delete()
except:
    pass

model = aip.Model(model_id)
try:
    model.delete()
except:
    pass

batch_job = aip.BatchPredictionJob(batch_job_id)
batch_job.delete()

## Using `create_custom_training_job_op_from_component`

An alternative approach is for you to create your own component to do custom training, instead of creating a Python package and executing it with a `CustomPythonPackageTrainingOp`. In this case, what would have been inside the Python package is instead directly embedded in your component.

You might do this for example if you are early on in the development of the training package and you want speed and convenience over scaling. One issue with this is that when executed it will only be seen and tracked as a component artifact, versus being seen and tracked as a CustomTrainingJob.

The utility `google_cloud_pipeline_components.experimental.custom_job.utils.create_custom_training_job_op_from_component` provides you the benefits of both. This utility takes as input your custom training component and outputs a conversion to a `CustomTrainingJobOp` component.

### Create a custom training job component

First, you create a custom training component `custom_train_model`. In this component, you will use a very simple script to train a CIFAR-10 model. The script has very few bells and whistles otherthan:

- Setting the learning rate, number of epochs, batch size and number of steps per epoch as parameters to your component.
- Setting the Cloud Storage location to save the trained model artifacts to.

Note, you set the default value of `model_dir` to a null string. The reason you do this, is that the training service may alternately specifiy the location with the environment variable `AIP_MODEL_DIR`. The code logic is: if the paraneter `model_dir` is set (non-empty), use that value; otherwise use the location specified by the environment variable `AIP_MODEL_DIR`.

Once the model is trained, you need to know where the model artifacts are located. Since their location can be either that of the `model_dir` parameter or the environment variable `AIP_MODEL_DIR`. You do this with the component `model_artifacts()`. If the `model_dir` parameter is non-empty, then return its value; otherwise construct the value of AIP_MODEL_DIR setting from the `base_output_directory` parameter.

In [None]:
from google_cloud_pipeline_components.v1.custom_job import utils


@component(
    base_image="tensorflow/tensorflow:latest",
    packages_to_install=["tensorflow_datasets"],
)
def custom_train_model(
    model_dir: str = "",
    lr: float = 0.01,
    epochs: int = 10,
    steps: int = 200,
    batch_size: int = 64,
):
    import os

    import tensorflow as tf
    import tensorflow_datasets as tfds

    if model_dir == "":
        model_dir = os.getenv("AIP_MODEL_DIR")

    # Preparing dataset
    BUFFER_SIZE = 10000

    def get_data():

        # Scaling CIFAR10 data from (0, 255] to (0., 1.]
        def scale(image, label):
            image = tf.cast(image, tf.float32)
            image /= 255.0
            return image, label

        datasets, info = tfds.load(name="cifar10", with_info=True, as_supervised=True)
        return datasets["train"].map(scale).cache().shuffle(BUFFER_SIZE).repeat()

    # Build the Keras model
    def get_model():
        model = tf.keras.Sequential(
            [
                tf.keras.layers.Conv2D(
                    32, 3, activation="relu", input_shape=(32, 32, 3)
                ),
                tf.keras.layers.MaxPooling2D(),
                tf.keras.layers.Conv2D(32, 3, activation="relu"),
                tf.keras.layers.MaxPooling2D(),
                tf.keras.layers.Flatten(),
                tf.keras.layers.Dense(10, activation="softmax"),
            ]
        )
        model.compile(
            loss=tf.keras.losses.sparse_categorical_crossentropy,
            optimizer=tf.keras.optimizers.SGD(learning_rate=lr),
            metrics=["accuracy"],
        )
        return model

    def train_model(model, train_dataset):
        model.fit(x=train_dataset, epochs=epochs, steps_per_epoch=steps)
        return model

    train_dataset = get_data().batch(batch_size)

    model = get_model()

    model = train_model(model, train_dataset)

    model.save(model_dir)


@component()
def model_artifacts(model_dir: str, base_output_directory: str) -> str:
    # location of model artifacts overridden by model_dir parameter
    if model_dir != "":
        return model_dir
    # location of model artifacts specified by base_output_directory
    else:
        return base_output_directory + "/model"

### Convert your custom training component to a predefined CustomTrainingJobOp component

Next, use the utility to convert your custom training component to a CustomTrainingJobOp component, as the required parameter. There are some additional optional keyword parameters to overide default settings in the worker pool specification, service account, and optional setting tensorboard instance and encryption key.

Learn more about [create_custom_training_job_op_from_component reference](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-0.2.0/google_cloud_pipeline_components.experimental.custom_job.html)

In [None]:
custom_job_training_op = utils.create_custom_training_job_op_from_component(
    custom_train_model, replica_count=1
)

## Construct custom training pipeline

In the example below, you construct a pipeline for training a custom model using:

1. Pipeline arguments, specify the locations of:
    - `display_name`: The human readable name for the model and endpoint.
    - `model_dir`: Optionally location (override) for saving the model artifacts
    - `epochs`: The number of epochs.
    - `steps`: The number of steps per epoch.
    - `lr`: The learning rate.
    - `project`: The project for executing the pipeline components.
    - `location`: The location for executing the pipeline components.
    - `deploy_image`: The serving container.


2. Use the converted `custom_job_training_op` to train the custom model.
    - If model_dir is non-empty string, it will override the setting of base_output_directory.

3. Use the custom component `model_artifacts` to determine the location of the model artifacts.
    - If model_dir is non-empty string, return its location.
    - Otherwise, return the derived location from base_output_directory.

4. Use the prebuilt component `ModelUploadOp` to create a `Vertex AI Model` resource from the model artifacts.
    - The location of the model artifacts is the output from the component `model_artifacts`.
    - The `after(custom_job_op)` is added to serialize its execution, so its only executed if the training operation completes successfully.

5. Use the prebuilt component `EndpointCreateOp` to create a `Vertex AI Endpoint` to deploy the trained model to, where:
    - Since the component has no dependencies on other components, by default it would be executed in parallel with the model training.
    - The `after(model_op)` is added to serialize its execution, so its only executed if the training operation completes successfully.
     - The component returns the endpoint resource as `outputs["endpoint"]`.

6. Use the prebuilt component `ModelDeployOp` to deploy the trained `Vertex AI Model` to, where:
    - The display name for the dataset is passed into the pipeline.
    - The model is the output from the `ModelUploadOp`.
    - The endpoint is the output from the `EndpointCreateOp`.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/custom_cifar10_training".format(BUCKET_URI)


@dsl.pipeline(name="custom-model-training-sample-pipeline")
def pipeline(
    display_name: str,
    model_dir: str = "",
    lr: float = 0.01,
    epochs: int = 10,
    steps: int = 200,
    project: str = PROJECT_ID,
    location: str = REGION,
    deploy_image: str = "us-docker.pkg.dev/cloud-aiplatform/prediction/tf2-cpu.2-3:latest",
):
    from google_cloud_pipeline_components import aiplatform as gcc_aip

    custom_job_op = custom_job_training_op(
        model_dir=model_dir,
        lr=lr,
        epochs=epochs,
        steps=steps,
        project=project,
        location=location,
        base_output_directory=PIPELINE_ROOT,
    )

    artifacts_op = model_artifacts(model_dir, PIPELINE_ROOT)

    model_upload_op = gcc_aip.ModelUploadOp(
        project=project,
        display_name=display_name,
        artifact_uri=artifacts_op.output,
        serving_container_image_uri=deploy_image,
    ).after(custom_job_op)

    endpoint_op = gcc_aip.EndpointCreateOp(
        project=project,
        location=location,
        display_name=display_name,
    ).after(model_upload_op)

    _ = gcc_aip.ModelDeployOp(
        model=model_upload_op.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=1,
        dedicated_resources_machine_type=DEPLOY_COMPUTE,
    )

### Compile and execute the pipeline

Next, you compile the pipeline and then exeute it. The pipeline takes the following parameters, which are passed as the dictionary `parameter_values`:

- `display_name`: The display name for the generated Vertex AI resources.
- `epochs`: The number of epochs.
- `project`: The project ID.
- `region`: The region.

*Note:* In this execution, you do not override the location of the model artifacts -- i.e., model_dir parameter is a null string.

In [None]:
compiler.Compiler().compile(
    pipeline_func=pipeline, package_path="custom_cifar10_training.json"
)

pipeline = aip.PipelineJob(
    display_name="cifar10-custom_training",
    template_path="custom_cifar10_training.json",
    pipeline_root=PIPELINE_ROOT,
    parameter_values={
        "display_name": "simple-example",
        "epochs": 20,
        "project": PROJECT_ID,
        "location": REGION,
    },
)

pipeline.run()

! rm -f custom_cifar10_training.json

### View custom model training pipeline results

Finally, you will view the artifact outputs of each task in the pipeline.

In [None]:
PROJECT_NUMBER = pipeline.gca_resource.name.split("/")[1]
print(PROJECT_NUMBER)

print("custom-train-model")
artifacts = print_pipeline_output(pipeline, "custom-train-model")
print("\n\n")
print("model-artifacts")
artifacts = print_pipeline_output(pipeline, "model-artifacts")
print("\n\n")
print("model-upload")
artifacts = print_pipeline_output(pipeline, "model-upload")
print("\n\n")
output = !gsutil cat $artifacts
output = json.loads(output[0])
model_id = output["artifacts"]["model"]["artifacts"][0]["metadata"]["resourceName"]
print("endpoint-create")
artifacts = print_pipeline_output(pipeline, "endpoint-create")
print("\n\n")
output = !gsutil cat $artifacts
output = json.loads(output[0])
endpoint_id = output["artifacts"]["endpoint"]["artifacts"][0]["metadata"][
    "resourceName"
]
print("model-deploy")
artifacts = print_pipeline_output(pipeline, "model-deploy")
print("\n\n")

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the method `delete()`.  Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

#### Delete the Vertex AI Model and Endpoint resource

Next, undelpoy and delete the Vertex AI Model and Endpoint resources.

#### Delete the Vertex model and endpoint

Next, undelpoy and delete the Vertex Model and Endpoint resource.

In [None]:
endpoint.undeploy_all()
try:
    endpoint.delete()
except:
    pass

model = aip.Model(model_id)
try:
    model.delete()
except:
    pass

# Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.

In [None]:
delete_bucket = False

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil rm -r $BUCKET_URI