In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# E2E ML on GCP: MLOps stage 5 : deployment: Get started with Vertex AI Endpoints
<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage5/get_started_with_vertex_endpoints.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage5/get_started_with_vertex_endpoints.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/community/ml_ops/stage5/get_started_with_vertex_endpoints.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>         
</table>
<br/><br/><br/>

## Overview

This tutorial demonstrates how to use Vertex AI SDK to create and use `Vertex AI Endpoint` resources for serving models. `Vertex AI Endpoints` provide the ability to virtualize the serving binaries and serving infrastructure.


### Objective

In this tutorial, you learn how to use `Vertex AI Endpoint` resources.

This tutorial uses the following Google Cloud ML services and resources:

- `Vertex AI Endpoints`
- `Vertex AI Models`

The steps performed include:

- Creating an `Endpoint` resource.
- List all `Endpoint` resources.
- List `Endpoint` resources by query filter.
- Configuring the serving binary of a `Model` resource for deployment to an `Endpoint` resource.
- Deploying a single `Model` resource to an `Endpoint` resource.
- Get deployment settings for a deployed `Model` resource.
- Configuring auto-scaling.
- Deploying multiple `Model` resources to an `Endpoint` resource and configuring a traffic split.
- Dynamically change the traffic split for an `Endpoint` resource.
- Undeploy a single `Model` resource to an `Endpoint` resource.
- Undeploy all `Model` resources from an `Endpoint` resource.
- Delete an `Endpoint` resource.
- In pipeline: Create an `Endpoint` resource and deploy an existing `Model` resource to the `Endpoint` resource.
- In pipeline: Deploy an existing `Model` resource to an existing `Endpoint` resource.

### Dataset

This tutorial uses a pre-trained image classification model from TensorFlow Hub, which is trained on ImageNet dataset.

Learn more about [ResNet V2 pretained model](https://tfhub.dev/google/imagenet/resnet_v2_101/classification/5). 

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the following packages to execute this notebook.

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform $USER_FLAG -q
! pip3 install --upgrade google-cloud-pipeline-components $USER_FLAG -q
! pip3 install tensorflow-hub $USER_FLAG -q

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### GPU runtime

*Make sure you're running this notebook in a GPU runtime if you have that option. In Colab, select* **Runtime > Change Runtime Type > GPU**

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the following APIs: Vertex AI APIs, Compute Engine APIs, and Cloud Storage.](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,storage-component.googleapis.com)

4. If you are running this notebook locally, you need to install the [Cloud SDK]((https://cloud.google.com/sdk)).

5. Enter your project ID in the cell below. Then run the  cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$`.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using Vertex AI Workbench Notebooks**, your environment is already authenticated. 

**If you are using Colab**, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

In the Cloud Console, go to the [Create service account key](https://console.cloud.google.com/apis/credentials/serviceaccountkey) page.

**Click Create service account**.

In the **Service account name** field, enter a name, and click **Create**.

In the **Grant this service account access to project** section, click the Role drop-down list. Type "Vertex" into the filter box, and select **Vertex Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

Click Create. A JSON file that contains your key downloads to your local environment.

Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = "google.colab" in sys.modules
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When you initialize the Vertex AI SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    BUCKET_NAME = PROJECT_ID + "aip-" + TIMESTAMP
    BUCKET_URI = f"gs://{BUCKET_NAME}"

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

In [None]:
import google.cloud.aiplatform as aip
import tensorflow as tf
import tensorflow_hub as hub

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aip.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

#### Set hardware accelerators

You can set hardware accelerators for training and prediction.

Set the variables `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Telsa K80 GPUs allocated to each VM, you would specify:

    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)


Otherwise specify `(None, None)` to use a container image to run on a CPU.

Learn more about [hardware accelerator support for your region](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators).

*Note*: TF releases before 2.3 for GPU support will fail to load the custom model in this tutorial. It is a known issue and fixed in TF 2.3. This is caused by static graph ops that are generated in the serving function. If you encounter this issue on your own custom models, use a container image for TF 2.3 with GPU support.

In [None]:
if os.getenv("IS_TESTING_DEPLOY_GPU"):
    DEPLOY_GPU, DEPLOY_NGPU = (
        aip.gapic.AcceleratorType.NVIDIA_TESLA_K80,
        int(os.getenv("IS_TESTING_DEPLOY_GPU")),
    )
else:
    DEPLOY_GPU, DEPLOY_NGPU = (None, None)

#### Set pre-built containers

Set the pre-built Docker container image for prediction.


For the latest list, see [Pre-built containers for prediction](https://cloud.google.com/ai-platform-unified/docs/predictions/pre-built-containers).

In [None]:
if os.getenv("IS_TESTING_TF"):
    TF = os.getenv("IS_TESTING_TF")
else:
    TF = "2.5".replace(".", "-")

if TF[0] == "2":
    if DEPLOY_GPU:
        DEPLOY_VERSION = "tf2-gpu.{}".format(TF)
    else:
        DEPLOY_VERSION = "tf2-cpu.{}".format(TF)
else:
    if DEPLOY_GPU:
        DEPLOY_VERSION = "tf-gpu.{}".format(TF)
    else:
        DEPLOY_VERSION = "tf-cpu.{}".format(TF)

DEPLOY_IMAGE = "{}-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(
    REGION.split("-")[0], DEPLOY_VERSION
)

print("Deployment:", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)

#### Set machine type

Next, set the machine type to use for prediction.

- Set the variable `DEPLOY_COMPUTE` to configure  the compute resources for the VMs you will use for for prediction.
 - `machine type`
     - `n1-standard`: 3.75GB of memory per vCPU.
     - `n1-highmem`: 6.5GB of memory per vCPU
     - `n1-highcpu`: 0.9 GB of memory per vCPU
 - `vCPUs`: number of \[2, 4, 8, 16, 32, 64, 96 \]

*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*.

In [None]:
if os.getenv("IS_TESTING_DEPLOY_MACHINE"):
    MACHINE_TYPE = os.getenv("IS_TESTING_DEPLOY_MACHINE")
else:
    MACHINE_TYPE = "n1-standard"

VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", DEPLOY_COMPUTE)

## Get pretrained model from TensorFlow Hub

For demonstration purposes, this tutorial uses a pretrained model from TensorFlow Hub (TFHub), which is then uploaded to a `Vertex AI Model` resource. Once you have a `Vertex AI Model` resource, the model can be deployed to a `Vertex AI Endpoint` resource.

### Download the pretrained model

First, you download the pretrained model from TensorFlow Hub. The model gets downloaded as a TF.Keras layer. To finalize the model, in this example, you create a `Sequential()` model with the downloaded TFHub model as a layer, and specify the input shape to the model.

In [None]:
tfhub_model = tf.keras.Sequential(
    [hub.KerasLayer("https://tfhub.dev/google/imagenet/resnet_v2_101/classification/5")]
)

tfhub_model.build([None, 224, 224, 3])

tfhub_model.summary()

### Save the model artifacts

At this point, the model is in memory. Next, you save the model artifacts to a Cloud Storage location.

In [None]:
MODEL_DIR = BUCKET_URI + "/model"
tfhub_model.save(MODEL_DIR)

### Upload the TensorFlow Hub model to a `Vertex AI Model` resource

Finally, you upload the model artifacts from the TFHub model into a `Vertex AI Model` resource.

*Note:* When you upload the model artifacts to a `Vertex Model` resource, you specify the corresponding deployment container image.

In [None]:
model = aip.Model.upload(
    display_name="example_" + TIMESTAMP,
    artifact_uri=MODEL_DIR,
    serving_container_image_uri=DEPLOY_IMAGE,
)

print(model)

## Creating an `Endpoint` resource

You create an `Endpoint` resource using the `Endpoint.create()` method. At a minimum, you specify the display name for the endpoint. Optionally, you can specify the project and location (region); otherwise the settings are inherited by the values you set when you initialized the Vertex AI SDK with the `init()` method.

In this example, the following parameters are specified:

- `display_name`: A human readable name for the `Endpoint` resource.
- `project`: Your project ID.
- `location`: Your region.
- `labels`: (optional) User defined metadata for the `Endpoint` in the form of key/value pairs.

This method returns an `Endpoint` object.

Learn more about [Vertex AI Endpoints](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api).

In [None]:
endpoint = aip.Endpoint.create(
    display_name="example_" + TIMESTAMP,
    project=PROJECT_ID,
    location=REGION,
    labels={"your_key": "your_value"},
)

print(endpoint)

### Get details on an `Endpoint` resource

You can get the underlying details of an `Endpoint` object with the property `gca_resource`.

In [None]:
print(endpoint.gca_resource)

### Listing `Endpoints`

The method `Endpoint.list()` will return a list of all the `Endpoint` resources for your project.

In [None]:
endpoints = aip.Endpoint.list()
print(len(endpoints))

### Listing `Endpoints` using a filter

You can narrow the `Endpoint` resources returned from the `list()` method using the parameter `filter`. The format for the parameter is:

    filter='<endpoint-property>=<value>[AND <endpoint-property>=<value>, ...]'
    
In this example, you filter on a single `Endpoint` property - `display_name`.

If the `list()` method returns multiple `Endpoints`, you can sort them using the parameter `order_by`. The format for the parameter is:

    order_by='<endpoint-property?'
    
In this example, you order the returned list by the `Endpoint` property - `create_time`.

In [None]:
aip.Endpoint.list(filter="display_name=example_" + TIMESTAMP, order_by="create_time")

## Deploying `Model` resources to an `Endpoint` resource.

You can deploy one of more `Vertex AI Model` resource instances to the same endpoint. Each `Vertex AI Model` resource that is deployed will have its own deployment container for the serving binary. 

*Note:* For this example, you specified the deployment container for the TFHub model in the previous step of uploading the model artifacts to a `Vertex AI Model` resource.

### Deploying a single `Endpoint` resource

In the next example, you deploy a single `Vertex AI Model` resource to a `Vertex AI Endpoint` resource. The `Vertex AI Model` resource already has defined for it the deployment container image. To deploy, you specify the following additional configuration settings:

- The machine type.
- The (if any) type and number of GPUs.
- Static, manual or auto-scaling of VM instances.

In this example, you deploy the model with the minimal amount of specified parameters, as follows:

- `model`: The `Model` resource.
- `deployed_model_displayed_name`: The human readable name for the deployed model instance.
- `machine_type`: The machine type for each VM instance.

Do to the requirements to provision the resource, this may take upto a few minutes.

In [None]:
response = endpoint.deploy(
    model=model,
    deployed_model_display_name="example_" + TIMESTAMP,
    machine_type=DEPLOY_COMPUTE,
)

print(endpoint)

#### Get information on the deployed model

You can get the deployment settings of the deployed model from the `Endpoint` resource configuration data `gca_resource.deployed_models`. In this example, only one model is deployed -- hence the reference to the subscript `[0]`.

In [None]:
print(endpoint.gca_resource.deployed_models[0])

### Undeploy `Model` resource from `Endpoint` resource

When a `Model` resource is deployed to an `Endpoint` resource, the deployed `Model` resource instance is assigned an ID -- commonly referred to as the deployed model ID.

You can undeploy a specific `Model` resource instance with the `undeploy()` method, with the following parameters:

- `deployed_model_id`: The ID assigned to the deployed model.

In [None]:
deployed_model_id = endpoint.gca_resource.deployed_models[0].id
print(deployed_model_id)

endpoint.undeploy(deployed_model_id)

## Configuring GPU resources and scaling

In the next example, you additionally configure the deployed model for hardware accelerators (GPUs) and the number of VM instances, with the following additional parameters:

- `accelerator_type`: The type of hardware accelerator (GPU).
- `accelerator_count`: The number of hardware accelerators per VM instance.
- `min_replica_count`: The minimum number of VM instances for auto-scaling. *Note:* Must be at least one.
- `max_replica_count`: The maximum number of VM instances for auto-scaling.

`Vertex AI Endpoints` support the following types of scaling.

- Single Instance: The min and max replica count are both set to 1.
- Manual Scaling: The min and max replica count are set to the same value greater than 1. In this case, the number of (max) VM instances are provisioned at startup and remain constant.
- Auto Scaling: The max replica count is greater than the min replica count. On startup, the minimum number of VM instances are provisioned, and may dynamically increase up to the max replica count and decrease back down to the min replica count, depending on the load.

In [None]:
MIN_NODES = 1
MAX_NODES = 2


response = endpoint.deploy(
    model=model,
    deployed_model_display_name="example_" + TIMESTAMP,
    machine_type=DEPLOY_COMPUTE,
    accelerator_type=DEPLOY_GPU,
    accelerator_count=DEPLOY_NGPU,
    min_replica_count=MIN_NODES,
    max_replica_count=MAX_NODES,
)

deployed_model_id = endpoint.gca_resource.deployed_models[0].id

### Deploy multiple `Model` resources to an `Endpoint` resource

An `Endpoint` resource can have multiple deployed models. When a prediction request is sent to an `Endpoint`, it will get routed to one of the deployed models, based on load balancing. All the deployed models on the same `Endpoint` resource must be homogenous -- i.e., the same input vector and output.

When you deploy more than one `Model` resource to an `Endpoint`, you specific how to split the prediction traffic between the deployed `Model` resources. A common use for this is `canary rollout`, where one has a new production version of a model and incremently rolls it out -- observing that there are no negative consequences to the new model.

The parameter `traffic_split` has the format:

{ "0": percent, deploy_model_id: percent, ... }

The key "0" refers to the model to be deployed. In this example, it will get 10 percent. Each subsequent key/value pair refers to the existing deployed models, where the key is the deployed model ID and the value is the new percentage for that model. All the values must add up to 100 (100 percent).

In [None]:
response = endpoint.deploy(
    model=model,
    deployed_model_display_name="example_" + TIMESTAMP,
    machine_type=DEPLOY_COMPUTE,
    traffic_split={"0": 10, deployed_model_id: 90},
)

print(endpoint.gca_resource.deployed_models)

### Reconfigure traffic split

Currently, reconfiguring the traffic split is not supported in the SDK (but will be in the future). You can dynamically reconfigure the traffic split using the GAPIC API interface. An example of dynamically reconfiguring the traffic split is for a progressive canary rollout.

To do this, you:

1. Create a client interface with the call `gapic.EndpointServiceClient()`
2. Obtain the GAPIC object for the corresponding endpoint using the method `get_endpoint()`
3. Update the traffic split in the in-memory GAPIC endpoint object.
4. Issue the method `update_endpoint()` to dynamically update the traffic split settings on the load balancer for the `Endpoint` resource.

#### Create the client interface

First, you create the GAPIC client interface.

In [None]:
# API service endpoint
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)

# Vertex location root path for your dataset, model and endpoint resources
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

# client options same for all services
client_options = {"api_endpoint": API_ENDPOINT}


def create_endpoint_client():
    client = aip.gapic.EndpointServiceClient(client_options=client_options)
    return client


clients = {}
clients["endpoint"] = create_endpoint_client()

traffic_split = endpoint.traffic_split
print(traffic_split)
new_traffic_split = {}
for key, value in traffic_split.items():
    if value == 90:
        value = 80
    else:
        value = 20
    new_traffic_split[key] = value
print(new_traffic_split)

#### Update the traffic split

Next, you set the new traffic split on the in-memory GAPIC endpoint, and then push it out to the load balancer on the `Endpoint` resource using the `update_endpoint()` method.

In [None]:
from google.protobuf.field_mask_pb2 import FieldMask

gapic_endpoint = clients["endpoint"].get_endpoint(name=endpoint.resource_name)

gapic_endpoint.traffic_split = new_traffic_split
gapic_endpoint.deployed_models = []

clients["endpoint"].update_endpoint(
    endpoint=gapic_endpoint, update_mask=FieldMask(paths=["traffic_split"])
)

# refetch the endpoint
gapic_endpoint = clients["endpoint"].get_endpoint(name=endpoint.resource_name)
print(gapic_endpoint.traffic_split)

### Undeploy a single `Model` resource from an `Endpoint` resource.

Next, you undeploy a single `Model` instance from an `Endpoint` resource with multiple deployed `Model` instances.

In [None]:
model_0 = endpoint.gca_resource.deployed_models[0].id
model_1 = endpoint.gca_resource.deployed_models[1].id

endpoint.undeploy(model_0)
print(endpoint.gca_resource.deployed_models)

### Undeploy all `Model` resources from an `Endpoint` resource.

Finally, you can undeploy all `Model` instances from an `Endpoint` resource using the `undeploy_all()` method.

In [None]:
endpoint.undeploy_all()

### Delete an `Endpoint` resource

You can delete an `Endpoint` resource with the `delete()` method if the `Endpoint` resource has no deployed models.

In [None]:
endpoint.delete()

## `Endpoint` as part of a `Vertex AI Pipeline`

The next sections demonstrate how to use an `Endpoint` resource in a `Vertex AI Pipeline`.

### Create `Endpoint` in pipeline

In this pipeline, you create an `Endpoint` resource, and then you deploy a `Model` resource to the `Endpoint` resource. The `Model` resource  to deploy is your existing TFHub model which you previously imported as a `Model` resource. The steps are:

- For pipeline parameters, pass the resource name for the existing `Model` resource.
- Use the `GetVertexModelOp()` component to create a `VertexModel` pipeline artifact for the model.
- Create an `Endpoint` resource.
- Using the `VertexModel` pipeline artifact, deploy the `Model` resource to the `Endpoint` resource.

In [None]:
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import Artifact, Output, component

PIPELINE_ROOT = "{}/pipeline_root/endpoint_example".format(BUCKET_URI)


@dsl.pipeline(
    name="create-endpoint-deploy-model",
    description="create an endpoint and deploy a model",
)
def pipeline(
    display_name: str,
    resource_name: str,
    project: str = PROJECT_ID,
    region: str = REGION,
):
    from google_cloud_pipeline_components.experimental.evaluation import \
        GetVertexModelOp
    from google_cloud_pipeline_components.v1.endpoint import (EndpointCreateOp,
                                                              ModelDeployOp)

    model = GetVertexModelOp(model_resource_name=resource_name)

    endpoint_op = EndpointCreateOp(
        project=project,
        location=region,
        display_name=display_name,
    )

    _ = ModelDeployOp(
        model=model.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=1,
        dedicated_resources_machine_type=DEPLOY_COMPUTE,
    )


try:
    compiler.Compiler().compile(
        pipeline_func=pipeline, package_path="create_endpoint_and_deploy_model.json"
    )
except Exception as e:
    print(e)

### Execute the pipleline

Next you execute the pipeline. The pipeline takes the following parameters, which are passed as the dictionary `parameter_values`:

- `display_name`: The display name for the generated Vertex AI resources.
- `resource_name`: The resource name of the existing `Model` resource.
- `project`: The project ID.
- `region`: The region.

In [None]:
try:
    pipeline = aip.PipelineJob(
        display_name="create-endpoint-deploy-pipeline",
        template_path="create_endpoint_and_deploy_model.json",
        pipeline_root=PIPELINE_ROOT,
        parameter_values={
            "display_name": "create_endpoint_and_deploy_model_" + TIMESTAMP,
            "resource_name": model.resource_name,
            "project": PROJECT_ID,
            "region": REGION,
        },
        enable_caching=False,
    )

    pipeline.run()

    ! rm -f create_endpoint_and_deploy_model.json
except Exception as e:
    print(e)

### View the pipeline results

Finally, you will view the artifact outputs of each task in the pipeline.

In [None]:
import json


def print_pipeline_output(job, output_task_name):
    PROJECT_NUMBER = job.gca_resource.name.split("/")[1]
    print(PROJECT_NUMBER)

    JOB_ID = job.name
    print(JOB_ID)
    for _ in range(len(job.gca_resource.job_detail.task_details)):
        TASK_ID = job.gca_resource.job_detail.task_details[_].task_id
        EXECUTE_OUTPUT = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/executor_output.json"
        )
        GCP_RESOURCES = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/gcp_resources"
        )
        EVAL_METRICS = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/evaluation_metrics"
        )
        if tf.io.gfile.exists(EXECUTE_OUTPUT):
            ! gsutil cat $EXECUTE_OUTPUT
            return EXECUTE_OUTPUT
        elif tf.io.gfile.exists(GCP_RESOURCES):
            ! gsutil cat $GCP_RESOURCES
            return GCP_RESOURCES
        elif tf.io.gfile.exists(EVAL_METRICS):
            ! gsutil cat $EVAL_METRICS
            return EVAL_METRICS

    return None


try:
    print("endpoint-create")
    artifacts = print_pipeline_output(pipeline, "endpoint-create")
    print("\n\n")
    output = !gsutil cat $artifacts
    output = json.loads(output[0])
    endpoint_id = output["artifacts"]["endpoint"]["artifacts"][0]["metadata"][
        "resourceName"
    ]
    print("model-deploy")
    artifacts = print_pipeline_output(pipeline, "model-deploy")
    print("\n\n")
except Exception as e:
    print(e)

#### Clean up pipeline resources

Delete all the `Vertex AI` resources created for this example -- except for the `Endpoint` resource, which you use in the next example.

In [None]:
try:
    endpoint = aip.Endpoint(endpoint_id)
    endpoint.undeploy_all()

    pipeline.delete()
except Exception as e:
    print(e)

### Pass existing `Endpoint` into a pipeline

In this pipeline, you deploy a `Model` resource to the `Endpoint` resource. The `Model` resource  to deploy is your existing TFHub model which you previously imported as a `Model` resource. The `Endpoint` resource is the existing `Endpoint` from the previous pipeline example. The steps are:

- For pipeline parameters, pass the resource names and resource URIs for the existing `Model` and `Endpoint` resource.
- Use the `importer_node()` component to create a `VertexModel` pipeline artifact for the model.
- Use the `GetVertexModelOp()` component to create a `VertexModel` pipeline artifact for the model.
- Using the `VertexModel` and `VertexEndpoint` pipeline artifacts, deploy the `Model` resource to the `Endpoint` resource.

*Note:* This example currently blocked by internal issue: b/219835305

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/endpoint_example_2".format(BUCKET_URI)


# (WORKAROUND  b/219835305)
@component(
    base_image="python:3.9",
    packages_to_install=["google-cloud-aiplatform"],
)
def return_unmanaged_endpoint(resource_name: str, endpoint: Output[Artifact]):

    endpoint.metadata["resourceName"] = resource_name


@dsl.pipeline(
    name="deploy-model-existing-endpoint",
    description="deploy a model to an existing endpoint",
)
def pipeline(
    display_name: str,
    model_resource_name: str,
    endpoint_resource_uri: str,
    endpoint_resource_name: str,
    project: str = PROJECT_ID,
    region: str = REGION,
):
    from google_cloud_pipeline_components.experimental.evaluation import \
        GetVertexModelOp
    from google_cloud_pipeline_components.v1.endpoint import ModelDeployOp

    # Desired sequence: blocked by b/219835305
    """
    from kfp.v2.components import importer_node
    from google_cloud_pipeline_components.types import artifact_types
    """

    model = GetVertexModelOp(model_resource_name=model_resource_name)

    # Desired sequence: blocked by b/219835305
    """
    endpoint = importer_node.importer(
        artifact_uri=endpoint_resource_uri,
        artifact_class=artifact_types.VertexEndpoint,
        metadata={"resourceName": endpoint_resource_name},
    )
    """

    # (WORKAROUND  b/219835305)
    endpoint = return_unmanaged_endpoint(resource_name=endpoint_resource_name)

    _ = ModelDeployOp(
        model=model.outputs["model"],
        endpoint=endpoint.outputs["endpoint"],
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=1,
        dedicated_resources_machine_type=DEPLOY_COMPUTE,
    )


try:
    compiler.Compiler().compile(
        pipeline_func=pipeline, package_path="deploy_model_existing_endpoint.json"
    )
except Exception as e:
    print(e)

### Execute the pipleline

Next you execute the pipeline. The pipeline takes the following parameters, which are passed as the dictionary `parameter_values`:

- `display_name`: The display name for the generated Vertex AI resources.
- `model_resource_name`: The resource name of the existing `Model` resource.
- `endpoint_resource_name`: The resource name of the existing `Endpoint` resource.
- `endpoint_resource_uri`: The resource uri of the existing `Endpoint` resource.
- `project`: The project ID.
- `region`: The region.

In [None]:
try:
    pipeline = aip.PipelineJob(
        display_name="deploy-model-existing-endpoint",
        template_path="deploy_model_existing_endpoint.json",
        pipeline_root=PIPELINE_ROOT,
        parameter_values={
            "display_name": "deploy_model_existing_endpoint_" + TIMESTAMP,
            "model_resource_name": model.resource_name,
            "endpoint_resource_name": endpoint.resource_name,
            "endpoint_resource_uri": "https://us-central1-aiplatform.googleapis.com/v1/"
            + endpoint.resource_name,
            "project": PROJECT_ID,
            "region": REGION,
        },
    )

    pipeline.run()

    ! rm -f deploy_model_existing_endpoint.json
except Exception as e:
    print(e)

### View the pipeline results

Finally, you will view the artifact outputs of each task in the pipeline.

In [None]:
try:
    print("model-deploy")
    artifacts = print_pipeline_output(pipeline, "model-deploy")
    print("\n\n")
except Exception as e:
    print(e)

#### Clean up pipeline resources

Delete all the `Vertex AI` resources created for this example.

In [None]:
try:
    endpoint.undeploy_all()
    endpoint.delete()
    pipeline.delete()
except Exception as e:
    print(e)

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
delete_bucket = False
delete_model = True

if delete_model:
    try:
        model.delete()
    except Exception as e:
        print(e)

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil rm -rf {BUCKET_URI}