In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# E2E ML on GCP: MLOps stage 6 : Get started with Optimized TensorFlow Enterprise container with Vertex AI Prediction / text models

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage6/get_started_with_optimized_tfe_bert.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage6/get_started_with_optimized_tfe_bert.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/community/ml_ops/stage6/get_started_with_optimized_tfe_bert.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>       
</table>
<br/><br/><br/>

## Overview

This tutorial demonstrates how to upload and deploy a TensorFlow text model to a `Vertex AI Endpoint` with a `TensorFlow Enterprise (TFE)` run-time optimized container.

While the optimizations can be applied to any TensorFlow model, the optimizations work best for tabular and text model types.

*Note:* To train the model it takes about 15 minutes with a T4. If training on a CPU, it takes several hours.

### Objective

In this tutorial, you learn how to use `TensorFlow Enterprise Optimized` container for TensorFlow models deployed to a `Vertex AI Endpoint` resource.

This tutorial uses the following Google Cloud ML services and resources:

- `Vertex AI Prediction`
- `Vertex AI Models`
- `Vertex AI Endpoints`
- `TensorFlow Enterprise Optimized` container

The steps performed include:

- Download a pretrained BERT model from TensorFlow Hub.
- Fine-tune (transfer learning) the BERT model as a binary classifier.
- Upload the TensorFlow Hub model as a `Vertex AI Model` resource, with standard TensorFlow serving container.
- Upload the TensorFlow Hub model as a `Vertex AI Model` resource, with TensorFlow Enterprise Optimized container
- Create two `Endpoint` resources.
- Deploying both `Model` resources to separate `Endpoint` resources.
- Make the same online prediction requests to both `Model` resource instances deployed to the `Endpoint` resources.
- Compare the prediction accuracy between the two deployed `Model` resources.
- Configuring container settings for fine-tune control of optimizations.
- Create a `Private Endpoint` resource.
- Deploy the `Model` resoure with then `TensorFlow Enterprise Optimized` to the `Private Endpoint` resource.
- Make an online prediction request to the `Private Endpoint` resource.

Learn more about [TensorFlow Enterprise Optimized container](https://cloud.google.com/vertex-ai/docs/predictions/optimized-tensorflow-runtime)

### Dataset

This tutorial uses a pre-trained BERT text model from TensorFlow Hub, which is then fine-tuned (transfer learning) on [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) that contains the 50,000 movie reviews from the [Internet Movie Database](https://www.imdb.com/). The trained model predicts whether a movie review is positive or negative.

Learn more about [BERT pretained encoder model]( https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3). 

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the following packages to execute this notebook.

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform $USER_FLAG -q
! pip3 install --upgrade google-cloud-pipeline-components $USER_FLAG -q
! pip3 install tensorflow tensorflow-text $USER_FLAG -q
! pip3 install tensorflow-hub $USER_FLAG -q
! pip3 install tf-models-official $USER_FLAG -q

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### GPU runtime

*Make sure you're running this notebook in a GPU runtime if you have that option. In Colab, select* **Runtime > Change Runtime Type > GPU**

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the following APIs: Vertex AI APIs, Compute Engine APIs, and Cloud Storage.](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,compute_component,storage-component.googleapis.com)

4. If you are running this notebook locally, you need to install the [Cloud SDK]((https://cloud.google.com/sdk)).

5. Enter your project ID in the cell below. Then run the  cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$`.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using Google Cloud Notebooks**, your environment is already authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

In the Cloud Console, go to the [Create service account key](https://console.cloud.google.com/apis/credentials/serviceaccountkey) page.

**Click Create service account**.

In the **Service account name** field, enter a name, and click **Create**.

In the **Grant this service account access to project** section, click the Role drop-down list. Type "Vertex" into the filter box, and select **Vertex Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

Click Create. A JSON file that contains your key downloads to your local environment.

Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = False
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        IS_COLAB = True
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When you initialize the Vertex AI SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_URI == "" or BUCKET_URI is None or BUCKET_URI == "gs://[your-bucket-name]":
    BUCKET_URI = "gs://" + PROJECT_ID + "aip-" + TIMESTAMP

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

In [None]:
import google.cloud.aiplatform as aip
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text
from official.nlp import optimization  # to create AdamW optimizer

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aip.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

#### Set hardware accelerators

You can set hardware accelerators for training and prediction.

Set the variables `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Telsa K80 GPUs allocated to each VM, you would specify:

    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)


Otherwise specify `(None, None)` to use a container image to run on a CPU.

Learn more about [hardware accelerator support for your region](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators).

*Note*: TF releases before 2.3 for GPU support will fail to load the custom model in this tutorial. It is a known issue and fixed in TF 2.3. This is caused by static graph ops that are generated in the serving function. If you encounter this issue on your own custom models, use a container image for TF 2.3 with GPU support.

In [None]:
if os.getenv("IS_TESTING_DEPLOY_GPU"):
    DEPLOY_GPU, DEPLOY_NGPU = (
        aip.gapic.AcceleratorType.NVIDIA_TESLA_K80,
        int(os.getenv("IS_TESTING_DEPLOY_GPU")),
    )
else:
    DEPLOY_GPU, DEPLOY_NGPU = (None, None)

#### Set pre-built containers

Set the pre-built Docker container image for prediction.


For the latest list, see [Pre-built containers for prediction](https://cloud.google.com/ai-platform-unified/docs/predictions/pre-built-containers).

In [None]:
if os.getenv("IS_TESTING_TF"):
    TF = os.getenv("IS_TESTING_TF")
else:
    TF = "2.5".replace(".", "-")

if TF[0] == "2":
    if DEPLOY_GPU:
        DEPLOY_VERSION = "tf2-gpu.{}".format(TF)
    else:
        DEPLOY_VERSION = "tf2-cpu.{}".format(TF)
else:
    if DEPLOY_GPU:
        DEPLOY_VERSION = "tf-gpu.{}".format(TF)
    else:
        DEPLOY_VERSION = "tf-cpu.{}".format(TF)

DEPLOY_IMAGE = "{}-docker.pkg.dev/vertex-ai/prediction/{}:latest".format(
    REGION.split("-")[0], DEPLOY_VERSION
)

print("Deployment:", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)

#### Set machine type

Next, set the machine type to use for prediction.

- Set the variable `DEPLOY_COMPUTE` to configure  the compute resources for the VMs you will use for for prediction.
 - `machine type`
     - `n1-standard`: 3.75GB of memory per vCPU.
     - `n1-highmem`: 6.5GB of memory per vCPU
     - `n1-highcpu`: 0.9 GB of memory per vCPU
 - `vCPUs`: number of \[2, 4, 8, 16, 32, 64, 96 \]

*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*.

In [None]:
if os.getenv("IS_TESTING_DEPLOY_MACHINE"):
    MACHINE_TYPE = os.getenv("IS_TESTING_DEPLOY_MACHINE")
else:
    MACHINE_TYPE = "n1-standard"

VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", DEPLOY_COMPUTE)

## Get pretrained encoder bedding model from TensorFlow Hub

For demonstration purposes, this tutorial uses a pretrained BERT encoder model from TensorFlow Hub (TFHub), fine-tune the model as a binary classifier, which is then uploaded to a `Vertex AI Model` resource. Once you have a `Vertex AI Model` resource, the model can be deployed to a `Vertex AI Endpoint` resource.

### Download the pretrained encoder model

First, you download the pretrained BERT encoder model and corresponding BERT text preprocessor from TensorFlow Hub. The encoder and text preprocessor gets downloaded as a TF.Keras layer. To finalize the model, in this example, you create a `Functional` model with the downloaded TFHub encoder and preprocessor and add a classifier to complete the model as a binary classifier.

In [None]:
inputs = tf.keras.layers.Input(shape=(), dtype=tf.string)
preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
encoder = hub.KerasLayer(
    "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3", trainable=True
)
classifier = tf.keras.layers.Dense(
    1, activation="sigmoid", kernel_regularizer=tf.keras.regularizers.l2(0.001)
)

x = preprocess(inputs)
x = encoder(x)
outputs = classifier(x["pooled_output"])
tfhub_model = tf.keras.Model(inputs, outputs)

tfhub_model.summary()

### Download the dataset

Download the [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) from the [Internet Movie Database](https://www.imdb.com/). 

In [None]:
import shutil

url = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"

dataset = tf.keras.utils.get_file(
    "aclImdb_v1.tar.gz", url, untar=True, cache_dir=".", cache_subdir=""
)

dataset_dir = os.path.join(os.path.dirname(dataset), "aclImdb")

train_dir = os.path.join(dataset_dir, "train")

# remove unused folders to make it easier to load the data
remove_dir = os.path.join(train_dir, "unsup")
shutil.rmtree(remove_dir)

### Preprocess the dataset

The IMDB dataset has already been divided into train and test, but it lacks a validation set. To create a validation set, you use an 80:20 split of the training data by using the validation_split argument.

In [None]:
AUTOTUNE = tf.data.AUTOTUNE
batch_size = 32
seed = 42

raw_train_ds = tf.keras.preprocessing.text_dataset_from_directory(
    "aclImdb/train",
    batch_size=batch_size,
    validation_split=0.2,
    subset="training",
    seed=seed,
)

class_names = raw_train_ds.class_names
train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE)

val_ds = tf.keras.preprocessing.text_dataset_from_directory(
    "aclImdb/train",
    batch_size=batch_size,
    validation_split=0.2,
    subset="validation",
    seed=seed,
)

val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

test_ds = tf.keras.preprocessing.text_dataset_from_directory(
    "aclImdb/test", batch_size=batch_size
)

test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)

### Compile the model

Next, compile for subsequent fine-tuning (transfer learning).

In [None]:
epochs = 3
steps_per_epoch = tf.data.experimental.cardinality(train_ds).numpy()
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1 * num_train_steps)

init_lr = 3e-5
optimizer = optimization.create_optimizer(
    init_lr=init_lr,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmup_steps,
    optimizer_type="adamw",
)

loss = tf.keras.losses.BinaryCrossentropy()
metrics = tf.metrics.BinaryAccuracy()

tfhub_model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

### Train the model

Next, you train (fine-tune) the model.

In [None]:
history = tfhub_model.fit(x=train_ds, validation_data=val_ds, epochs=epochs)

Epoch 1/3
625/625 [==============================] - 5081s 8s/step - loss: 0.7344 - binary_accuracy: 0.4785 - val_loss: 0.7025 - val_binary_accuracy: 0.4992
Epoch 2/3
625/625 [==============================] - 5080s 8s/step - loss: 0.6966 - binary_accuracy: 0.5167 - val_loss: 0.6946 - val_binary_accuracy: 0.5338
Epoch 3/3
625/625 [==============================] - ETA: 0s - loss: 0.6917 - binary_accuracy: 0.5344

### Evaluate the model

Next, evaluate the model. The validation loss is expected to be ~0.43.

In [None]:
loss, accuracy = tfhub_model.evaluate(test_ds)

print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")

### Save the model artifacts

At this point, the model is in memory. Next, you save the model artifacts to a Cloud Storage location.

In [None]:
MODEL_DIR = BUCKET_URI + "/model"
tfhub_model.save(MODEL_DIR)

### Upload the TensorFlow Hub model to a `Vertex AI Model` resource with standard TensorFlow container

Next, you upload the model artifacts from the TFHub model into a `Vertex AI Model` resource with a standard TensorFlow container.

In [None]:
DEPLOY_IMAGE = "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest"

model_standard = aip.Model.upload(
    display_name="standard_" + TIMESTAMP,
    artifact_uri=MODEL_DIR,
    serving_container_image_uri=DEPLOY_IMAGE,
)

print(model_standard)

### Upload the TensorFlow Hub model to a `Vertex AI Model` resource with optimized TensorFlow container

Next, you upload the model artifacts from the TFHub model into a second `Vertex AI Model` resource with the TensorFllow Enterprise optimized container: `us-docker.pkg.dev/vertex-ai-restricted/prediction/tf_opt-gpu.nightly:latest`.

Two optimization options are applied to the model.

- *allow_precompilation* - turns on model pre-compilation for better performance. Note that model precompilation happens when the first request with the new batch size arrives, and the response for that request is sent after precompilation is complete. To mitigate this, specify a warmup file (see the section earlier in this colab). Model precompilation works for different kinds of models, and in most cases has a positive effect on performance. However, we recommend that you try it out for your model before you enable it in production.

- *allow_precision_affecting_optimizations* - enables precision affecting optimizations. In some cases this makes the model run significantly faster at the cost of very minimal loss to model prediction power. You should assess the precision impact to your model when using this optimization.


In [None]:
DEPLOY_IMAGE = "us-docker.pkg.dev/vertex-ai-restricted/prediction/tensorflow-enterprise-inference-engine-alpha:latest"
model_tfe_opt = aip.Model.upload(
    display_name="tfe_opt_" + TIMESTAMP,
    artifact_uri=MODEL_DIR,
    serving_container_image_uri=DEPLOY_IMAGE,
    serving_container_args=[
        "--allow_precompilation=true",
        "--allow_precision_affecting_optimizations=true",
    ],
)

print(model_tfe_opt)

## Creating two public `Endpoint` resource

You create two public `Endpoint` resources using the `Endpoint.create()` method. At a minimum, you specify the display name for the endpoint. Optionally, you can specify the project and location (region); otherwise the settings are inherited by the values you set when you initialized the Vertex AI SDK with the `init()` method.

In this example, the following parameters are specified:

- `display_name`: A human readable name for the `Endpoint` resource.
- `project`: Your project ID.
- `location`: Your region.
- `labels`: (optional) User defined metadata for the `Endpoint` in the form of key/value pairs.

This method returns an `Endpoint` object.

Learn more about [Vertex AI Endpoints](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api).

In [None]:
endpoint_standard = aip.Endpoint.create(
    display_name="standard_" + TIMESTAMP,
    project=PROJECT_ID,
    location=REGION,
    labels={"your_key": "your_value"},
)

print(endpoint_standard)

endpoint_tfe_opt = aip.Endpoint.create(
    display_name="tfe_opt_" + TIMESTAMP,
    project=PROJECT_ID,
    location=REGION,
    labels={"your_key": "your_value"},
)

print(endpoint_tfe_opt)

## Deploying the `Model` resources to a the `Endpoint` resources.

Next, you deploy both instances of the model, standard and optimized containers, to separate public `Vertex AI Endpoints`.

*Note:* For this example, you specified the deployment container for the TFHub model in the previous step of uploading the model artifacts to a `Vertex AI Model` resource.

In this example, you deploy the model with the minimal amount of specified parameters, as follows:

- `model`: The `Model` resource.
- `deployed_model_displayed_name`: The human readable name for the deployed model instance.
- `machine_type`: The machine type for each VM instance.

Do to the requirements to provision the resource, this may take upto a few minutes.

In [None]:
response = endpoint_standard.deploy(
    model=model_standard,
    deployed_model_display_name="standard_" + TIMESTAMP,
    machine_type=DEPLOY_COMPUTE,
)

print(endpoint_standard)

response = endpoint_tfe_opt.deploy(
    model=model_tfe_opt,
    deployed_model_display_name="tfe_opt_" + TIMESTAMP,
    machine_type=DEPLOY_COMPUTE,
)

print(endpoint_tfe_opt)

### Prepare test data for prediction

Next, you prepare the test data for the prediction. In this example, you use synthetic data.

In [None]:
INSTANCES = ["This was the best movie ever", "Movie was boring"]

### Make the prediction

Now that your `Model` resource is deployed to an `Endpoint` resource, you can do online predictions by sending prediction requests to the Endpoint resource.

#### Request

Each instance in the prediction request is a dictionary entry of the form:

                        {input_name: content}

- `input_name`: the name of the input layer of the underlying model.
- `content`: The data item as a 1D Python list.

Since the `predict()` service can take multiple data items (instances), you will send your single data item as a list of one data item. As a final step, you package the instances list into Google's protobuf format -- which is what we pass to the `predict()` service.

#### Response

The response from the `predict()` call is a Python dictionary with the following entries:

- `ids`: The internal assigned unique identifiers for each prediction request.
- `predictions`: The predicted confidence, between 0 and 1, per class label.
- `deployed_model_id`: The Vertex AI identifier for the deployed `Model` resource which did the predictions.

In [None]:
serving_input = tfhub_model.input.name

# The format of each instance should conform to the deployed model's prediction input schema.
instances = [{serving_input: INSTANCES[0]}, {serving_input: INSTANCES[1]}]

prediction_standard = endpoint_standard.predict(instances=instances)

print(prediction_standard)

In [None]:
# The format of each instance should conform to the deployed model's prediction input schema.
instances = [{serving_input: INSTANCES[0]}, {serving_input: INSTANCES[1]}]

prediction_tfe_opt = endpoint_tfe_opt.predict(instances=instances)

print(prediction_tfe_opt)

### Compare difference in precision

Finally, compare the difference precision between the non-optimized deployed model and optimized deployed model. It will appear to be very negligable.

In [None]:
import numpy as np

abs(
    np.asarray(prediction_standard.predictions)
    - np.asarray(prediction_tfe_opt.predictions)
)

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
delete_bucket = False
delete_model = True
delete_endpoint = True

if delete_endpoint:
    try:
        endpoint_standard.undeploy_all()
        endpoint_standard.delete()
        endpoint_tfe_opt.undeploy_all()
        endpoint_tfe_opt.delete()
    except Exception as e:
        print(e)

if delete_model:
    try:
        tfhub_model.delete()
    except Exception as e:
        print(e)

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil rm -rf {BUCKET_URI}