In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Deploying a PyTorch Text Classification Model on [Vertex AI](https://cloud.google.com/vertex-ai)

**Kindly reach out to Vertex AI before you run any scale tests or you have any questions.**


# Overview

This example focuses on how to deploy a PyTorch text classification model on Vertex AI using Vertex AI Prediction pre-built PyTorch images. This example assumes you have already had a trained model in the GCS uri. If you have not trained a PyTorch text classification model yet, you can follow [this notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/pytorch_text_classification_using_vertex_sdk_and_gcloud/pytorch-text-classification-vertex-ai-train-tune-deploy.ipynb) to train one.

### Objective

How to **Deploy PyTorch models on [Vertex AI](https://cloud.google.com/vertex-ai)** and emphasize the support for deploying PyTorch models on Vertex AI. In this notebook, you won't focus on support for training PyTorch models on Vertex AI. Check out [the notebooks](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/pytorch_text_classification_using_vertex_sdk_and_gcloud/pytorch-text-classification-vertex-ai-train-tune-deploy.ipynb) to learn more about support for training on Vertex AI.

### Table of Contents

This notebook covers following sections:

- [Creating Notebooks instance](#Creating-Notebooks-instance-on-Google-Cloud)
- [Deploying](#Deploying)
    - [Deploying model on Vertex AI Predictions with custom container](#Deploying-the-pre-built-PyTorch-container-to-Vertex-AI-Predictions)

### Costs 

This tutorial uses billable components of Google Cloud Platform (GCP):

* [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench)
* [Vertex AI Predictions](https://cloud.google.com/vertex-ai/docs/predictions/getting-predictions)
* [Cloud Storage](https://cloud.google.com/storage)

Learn about [Vertex AI Pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage Pricing](https://cloud.google.com/storage/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

### Set up your local development environment

**If you are using Colab or Google Cloud Notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)
1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)
1. [Install virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv) and create a virtual environment that uses Python 3. Activate the virtual environment.
1. To install Jupyter, run `pip3 install jupyter` on the command-line in a terminal shell.
1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.
1. Open this notebook in the Jupyter Notebook Dashboard.

### Install additional packages

Python dependencies required for this notebook are [Torch](https://pypi.org/project/torch/) and [Torch Model Archiver](https://pypi.org/project/torch-model-archiver/) which will be installed in the Notebooks instance itself.

In [None]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

In [None]:
!pip install {USER_FLAG} --upgrade torch==1.11

You will be using [Vertex AI SDK for Python](https://cloud.google.com/vertex-ai/docs/start/client-libraries#python) to interact with Vertex AI services. The high-level `aiplatform` library is designed to simplify common data science workflows by using wrapper classes and opinionated defaults. 

#### Install Vertex AI SDK for Python

In [None]:
!pip install {USER_FLAG} --upgrade google-cloud-aiplatform[prediction]

#### Install Torch Model Archiver
This is required if you want to deploy the trained model to Vertex AI using Vertex AI Prediction pre-built PyTorch images.

In [None]:
!pip install {USER_FLAG} --upgrade torch-model-archiver

### Restart the Kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
1. Enable following APIs in your project required for running the tutorial
    - [Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com)
    - [Cloud Storage API](https://console.cloud.google.com/flows/enableapi?apiid=storage.googleapis.com)
1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).
1. Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud` or `google.auth`.

In [None]:
PROJECT_ID = "[your-project-id]"  # <---CHANGE THIS TO YOUR PROJECT

import os

# Get your Google Cloud project ID using google.auth
if not os.getenv("IS_TESTING"):
    import google.auth

    _, PROJECT_ID = google.auth.default()
    print("Project ID: ", PROJECT_ID)

# validate PROJECT_ID
if PROJECT_ID == "" or not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    print(
        f"Please set your project id before proceeding to next step. Currently it's set as {PROJECT_ID}"
    )

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime


def get_timestamp():
    return datetime.now().strftime("%Y%m%d%H%M%S")


TIMESTAMP = get_timestamp()
print(f"TIMESTAMP = {TIMESTAMP}")

### Authenticate your Google Cloud account

---

**If you are using Google Cloud Notebooks**, your environment is already authenticated. Skip this step.

---

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key** page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).
2. Click **Create service account**.
3. In the **Service account name** field, enter a name, and click **Create**.
4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI" into the filter box, and select **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.
5. Click *Create*. A JSON file that contains your key downloads to your local environment.
6. Enter the path to your service account key as the `GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [None]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

You may also change the `REGION` variable, which is used for operations throughout the rest of this notebook. Make sure to [choose a region where Vertex AI services are available](https://cloud.google.com/vertex-ai/docs/general/locations#available_regions). You may not use a Multi-Regional Storage bucket for prediction with Vertex AI.

In [None]:
BUCKET_NAME = "gs://[your-bucket-name]"  # <---CHANGE THIS TO YOUR BUCKET
REGION = "us-central1"  # @param {type:"string"}

In [None]:
if BUCKET_NAME == "" or not BUCKET_NAME or BUCKET_NAME == "gs://[your-bucket-name]":
    BUCKET_NAME = f"gs://{PROJECT_ID}aip-{get_timestamp()}"

In [None]:
print(f"PROJECT_ID = {PROJECT_ID}")
print(f"BUCKET_NAME = {BUCKET_NAME}")
print(f"REGION = {REGION}")

---

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

---

In [None]:
! gsutil mb -l $REGION $BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_NAME

## Deploying

There is no need to create a custom container to deploy a PyTorch model on [Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/getting-predictions) that serves online predictions. You deploy a Vertex AI Prediction pre-built PyTorch container in order to serve predictions from a fine-tuned transformer model from Hugging Face Transformers for sentiment analysis task. You can then use Vertex AI to classify sentiment of input texts. 

Essentially, to deploy a PyTorch model using Vertex AI Prediction pre-built PyTorch images on Vertex AI following are the steps:

1. Package the trained model artifacts including [default](https://pytorch.org/serve/#default-handlers) or [custom](https://pytorch.org/serve/custom_service.html) handlers by creating an archive file using [Torch model archiver](https://github.com/pytorch/serve/tree/master/model-archiver).
2. Run the pre-built PyTorch image locally with the model artifacts (optionally).
3. Upload the model with the pre-built PyTorch image to serve predictions as a Vertex AI Model resource.
4. Create a Vertex AI Endpoint and [deploy the model](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api) resource.

#### **Download model artifacts**

Download model artifacts that were saved as part of the training (or hyperparameter tuning) job from Cloud Storage to local directory. Follow [this notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/pytorch_text_classification_using_vertex_sdk_and_gcloud/pytorch-text-classification-vertex-ai-train-tune-deploy.ipynb) if you haven't had a trained PyTorch text classification model.

In [None]:
GCS_TRAINED_MODEL_URI = "gs://[your-gcs-path]"  # <---CHANGE THIS TO YOUR GCS PATH THAT CONTAINS MODEL ARTIFACTS

Validate model artifact files in the Cloud Storage bucket.

In [None]:
!gsutil ls -r $GCS_TRAINED_MODEL_URI

Copy files from Cloud Storage to local directory.

In [None]:
!mkdir trained_model
!gsutil -m cp -r $GCS_TRAINED_MODEL_URI/ ./trained_model

In [None]:
!ls -ltrR ./trained_model

In [None]:
LOCAL_TRAINED_MODEL_DIRECTORY = "[your-local-directory]"  # <---CHANGE THIS TO YOUR LOCAL DIRECTORY THAT CONTAINS MODEL ARTIFACTS

#### **Create a custom model handler to handle prediction requests**

When predicting sentiments of the input text with the fine-tuned transformer model, it requires pre-processing of the input text and post-processing by adding name (positive/negative) to the target label (1/0) along with probability (or confidence). You create a custom handler script that is packaged with the model artifacts and the pre-built PyTorch image executes the code when it runs. 

Custom handler script does the following:

- Pre-process input text before sending it to the model for inference
- Customize how the model is invoked for inference
- Post-process output from the model before sending back a response

Please refer to the [TorchServe documentation](https://pytorch.org/serve/custom_service.html) for defining a custom handler.

In [None]:
PREDICTOR_DIRECTORY = "./predictor"

!mkdir $PREDICTOR_DIRECTORY

In [None]:
%%writefile $PREDICTOR_DIRECTORY/custom_handler.py

import os
import json
import logging

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)


class TransformersClassifierHandler(BaseHandler):
    """
    The handler takes an input string and returns the classification text 
    based on the serialized transformers checkpoint.
    """
    def __init__(self):
        super(TransformersClassifierHandler, self).__init__()
        self.initialized = False

    def initialize(self, ctx):
        """ Loads the model.pt file and initialized the model object.
        Instantiates Tokenizer for preprocessor to use
        Loads labels to name mapping file for post-processing inference response
        """
        self.manifest = ctx.manifest

        properties = ctx.system_properties
        model_dir = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = self.manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt or pytorch_model.bin file")
        
        # Load model
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.to(self.device)
        self.model.eval()
        logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))
        
        # Ensure to use the same tokenizer used during training
        self.tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

        # Read the mapping file, index to object name
        mapping_file_path = os.path.join(model_dir, "index_to_name.json")

        if os.path.isfile(mapping_file_path):
            with open(mapping_file_path) as f:
                self.mapping = json.load(f)
        else:
            logger.warning('Missing the index_to_name.json file. Inference output will default.')
            self.mapping = {"0": "Negative",  "1": "Positive"}

        self.initialized = True

    def preprocess(self, data):
        """ Preprocessing input request by tokenizing
            Extend with your own preprocessing steps as needed
        """
        text = data[0].get("data")
        if text is None:
            text = data[0].get("body")
        sentences = text.decode('utf-8')
        logger.info("Received text: '%s'", sentences)

        # Tokenize the texts
        tokenizer_args = ((sentences,))
        inputs = self.tokenizer(*tokenizer_args,
                                padding='max_length',
                                max_length=128,
                                truncation=True,
                                return_tensors = "pt")
        return inputs

    def inference(self, inputs):
        """ Predict the class of a text using a trained transformer model.
        """
        prediction = self.model(inputs['input_ids'].to(self.device))[0].argmax().item()

        if self.mapping:
            prediction = self.mapping[str(prediction)]

        logger.info("Model predicted: '%s'", prediction)
        return [prediction]

    def postprocess(self, inference_output):
        return inference_output


##### **Generate target label to name file**

In the custom handler, you refer to a mapping file between target labels and their meaningful names that will be used to format the prediction response. Here you are mapping target label "0" as "Negative" and "1"  as "Positive". 

In [None]:
%%writefile $PREDICTOR_DIRECTORY/index_to_name.json

{
    "0": "Negative", 
    "1": "Positive"
}

#### **Package the trained model artifacts**

The pre-built PyTorch images require a model archive file using [Torch model archiver](https://github.com/pytorch/serve/tree/master/model-archiver).

In [None]:
ARCHIVED_MODEL_PATH = "./archived_model"

!mkdir $ARCHIVED_MODEL_PATH

In [None]:
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires to add a path to find the installed torch-model-archiver
if IS_GOOGLE_CLOUD_NOTEBOOK:
    os.environ["PATH"] = f'{os.environ.get("PATH")}:~/.local/bin'

Package the trained model artifacts including [default](https://pytorch.org/serve/#default-handlers) or [custom](https://pytorch.org/serve/custom_service.html) handlers by creating an archive file using [Torch model archiver](https://github.com/pytorch/serve/tree/master/model-archiver). The pre-built PyTorch image requires the model archived file named as `model.mar` so you need to set the model-name as `model`.

In [None]:
!torch-model-archiver -f \
  --model-name=model \
  --version=1.0 \
  --serialized-file=$LOCAL_TRAINED_MODEL_DIRECTORY/pytorch_model.bin \
  --handler=$PREDICTOR_DIRECTORY/custom_handler.py \
  --extra-files "$LOCAL_TRAINED_MODEL_DIRECTORY/config.json,$LOCAL_TRAINED_MODEL_DIRECTORY/tokenizer.json,$LOCAL_TRAINED_MODEL_DIRECTORY/training_args.bin,$LOCAL_TRAINED_MODEL_DIRECTORY/tokenizer_config.json,$LOCAL_TRAINED_MODEL_DIRECTORY/special_tokens_map.json,$LOCAL_TRAINED_MODEL_DIRECTORY/vocab.txt,$PREDICTOR_DIRECTORY/index_to_name.json" \
  --export-path=$ARCHIVED_MODEL_PATH

#### **Run the pre-built PyTorch images locally (optionally)**

Before you upload a model to Vertex AI, you can run the pre-built PyTorch images locally with the model file.

Vertex AI Predictions have pre-built images in the different [Artifact Registry multi-regions](https://cloud.google.com/artifact-registry/docs/repositories/repo-locations). To run the images locally, you could use the image matching our target regions. In this example, you use PyTorch 1.11 on CPU so you can choose either of the images.
- us-docker.pkg.dev/vertex-ai/prediction/pytorch-cpu.1-11:latest
- europe-docker.pkg.dev/vertex-ai/prediction/pytorch-cpu.1-11:latest
- asia-docker.pkg.dev/vertex-ai/prediction/pytorch-cpu.1-11:latest

Vertex AI provides [Vertex SDK](https://github.com/googleapis/python-aiplatform/tree/main/google/cloud/aiplatform/prediction) to help test images locally. You will use Vertex SDK to test the pre-built PyTorch images with the archived model file.

Set up the logging config in the notebook.

In [None]:
import logging

logging.basicConfig(level=logging.INFO)

Select the image uri matching the region that the models will be deployed to. For the details of Artifact Registry multi-regions, check out [the documentation](https://cloud.google.com/artifact-registry/docs/repositories/repo-locations).

In [None]:
serving_container_image_uri = "[your-multi-region]-docker.pkg.dev/vertex-ai/prediction/pytorch-cpu.1-11:latest"  # <---CHANGE THIS TO YOUR MULTI REGION, COULD BE `us`, `europe`, or `asia`

Since you are using pre-built PyTorch images locally, you need to populate the necessary routes and ports. You do not need to populate routes or ports while you upload models to Vertex AI with pre-built PyTorch images.

In [None]:
health_route = "/ping"
predict_route = "/predictions/model"
serving_container_ports = [8080]

Create a local model.

In [None]:
from google.cloud.aiplatform.prediction import LocalModel

local_model = LocalModel(
    serving_container_image_uri=serving_container_image_uri,
    serving_container_predict_route=predict_route,
    serving_container_health_route=health_route,
    serving_container_ports=serving_container_ports,
)

Store test instances. To learn more about formatting input instances in JSON, [read the documentation](https://cloud.google.com/vertex-ai/docs/predictions/online-predictions-custom-models#request-body-details).

In [None]:
INPUT_FILE = "./instances.json"

In [None]:
%%bash

cat > ./instances.json <<END
{ 
   "instances": [
     { 
       "data": {
         "b64": "$(echo 'Take away the CGI and the A-list cast and you end up with film with less punch.' | base64 --wrap=0)"
       }
     }
   ]
}
END

Run and send requests to the container locally. In this test, you run a health check and a prediction request.

In [None]:
with local_model.deploy_to_local_endpoint(
    artifact_uri=f"{ARCHIVED_MODEL_PATH}",
) as local_endpoint:
    health_check_response = local_endpoint.run_health_check()

    predict_response = local_endpoint.predict(
        request_file=INPUT_FILE,
        headers={"Content-Type": "application/json"},
    )

Print out the health check response and its content.

In [None]:
print(health_check_response, health_check_response.content)

Print out the predict response and its content.

In [None]:
print(predict_response, predict_response.content)

Also print out all the container logs.

In [None]:
local_endpoint.print_container_logs(show_all=True)

#### **Deploying the pre-built PyTorch container to Vertex AI Predictions**

You create a model resource on Vertex AI and deploy the model to a Vertex AI Endpoints. You must deploy a model to an endpoint before using the model. The deployed model runs the pre-built PyTorch image to serve predictions. 

In [None]:
ARCHIVED_MODEL_GCS_URI = f"{BUCKET_NAME}/archived-pytorch-model"

Copy the archived model to GCS.

In [None]:
!gsutil cp -r $ARCHIVED_MODEL_PATH $ARCHIVED_MODEL_GCS_URI

Validate the archived model file exists on Cloud Storage bucket. To deploy PyTorch models using Vertex AI Prediction pre-built PyTorch images, you must have a `model.mar` file right under the artifact uri.

In [None]:
! gsutil ls -al $ARCHIVED_MODEL_GCS_URI

##### **Initialize the Vertex AI SDK for Python**

In [None]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME)

##### **Create a Model resource with the pre-built PyTorch image**

In [None]:
VERSION = 1
model_display_name = f"pytorch-v{VERSION}-{TIMESTAMP}"
model_description = "PyTorch based text classifier with the pre-built PyTorch image"

##### **Option 1. Create a Model resource through the LocalModel (if you have have run the container locally)**

In [None]:
model = aiplatform.Model.upload(
    local_model=local_model,
    display_name=model_display_name,
    description=model_description,
    artifact_uri=ARCHIVED_MODEL_GCS_URI,
)

model.wait()

print(model.display_name)
print(model.resource_name)

##### **Option 2. Create a Model resource through Model (if you have have NOT run the container locally)**

In [None]:
serving_container_image_uri = "[your-multi-region]-docker.pkg.dev/vertex-ai/prediction/pytorch-cpu.1-11:latest"  # <---CHANGE THIS TO YOUR MULTI REGION, COULD BE `us`, `europe`, or `asia`

In [None]:
model = aiplatform.Model.upload(
    display_name=model_display_name,
    description=model_description,
    serving_container_image_uri=serving_container_image_uri,
    artifact_uri=ARCHIVED_MODEL_GCS_URI,
)

model.wait()

print(model.display_name)
print(model.resource_name)

For more context on upload or importing a model, refer [documentation](https://cloud.google.com/vertex-ai/docs/general/import-model)

##### **Create an Endpoint for Model with pre-built PyTorch image**

In [None]:
endpoint_display_name = f"pytorch-endpoint-{TIMESTAMP}"
endpoint = aiplatform.Endpoint.create(display_name=endpoint_display_name)

##### **Deploy the Model to Endpoint**

Deploying a model associates physical resources with the model so it can serve online predictions with low latency. 

**NOTE:** This step takes few minutes to deploy the resources.

In [None]:
traffic_percentage = 100
machine_type = "n1-standard-4"
deployed_model_display_name = model_display_name
sync = True

endpoint = model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=deployed_model_display_name,
    machine_type=machine_type,
    traffic_percentage=traffic_percentage,
    sync=sync,
)

#### **Invoking the Endpoint with deployed Model using Vertex AI SDK to make predictions**

##### **List the deployed models of the endpoint**

In [None]:
endpoint.list_models()

##### **Formatting input for online prediction**

This notebook uses [Torchserve's KServe based inference API](https://pytorch.org/serve/inference_api.html#kserve-inference-api) which is also [Vertex AI Predictions compatible format](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#prediction). For online prediction requests, format the prediction input instances as JSON with base64 encoding as shown here:

```
[
    {
        "data": {
            "b64": "<base64 encoded string>"
        }
    }
]
```

Define sample texts to test predictions

In [None]:
test_instances = [
    b"Jaw dropping visual affects and action! One of the best I have seen to date.",
    b"Take away the CGI and the A-list cast and you end up with film with less punch.",
]

##### **Sending an online prediction request**

Format input text string and call prediction endpoint with formatted input request and get the response

In [None]:
import base64
import json

print("=" * 100)
for instance in test_instances:
    print(f"Input text: \n\t{instance.decode('utf-8')}\n")
    b64_encoded = base64.b64encode(instance)
    test_instance = [{"data": {"b64": f"{str(b64_encoded.decode('utf-8'))}"}}]
    print(f"Formatted input: \n{json.dumps(test_instance, indent=4)}\n")
    prediction = endpoint.predict(instances=test_instance)
    print(f"Prediction response: \n\t{prediction}")
    print("=" * 100)

##### ***[Optional]*** **Make prediction requests using gcloud CLI**
You can also call the Vertex AI Endpoint to make predictions using [`gcloud beta ai endpoints predict`](https://cloud.google.com/sdk/gcloud/reference/beta/ai/endpoints/predict). 

The following cell shows how to make a prediction request to Vertex AI Endpoints using `gcloud` CLI: 

In [None]:
%%bash -s $REGION $endpoint_display_name

REGION=$1
endpoint_display_name=$2

# get endpoint id
echo "REGION = ${REGION}"
echo "ENDPOINT DISPLAY NAME = ${endpoint_display_name}"
endpoint_id=$(gcloud beta ai endpoints list --region ${REGION} --filter "display_name=${endpoint_display_name}" --format "value(ENDPOINT_ID)")
echo "ENDPOINT_ID = ${endpoint_id}"

# call prediction endpoint
input_text="Take away the CGI and the A-list cast and you end up with film with less punch."
echo "INPUT TEXT = ${input_text}"

prediction=$(
echo """
{ 
   "instances": [
     { 
       "data": {
         "b64": "$(echo ${input_text} | base64 --wrap=0)"
       }
     }
   ]
}
""" | gcloud beta ai endpoints predict ${endpoint_id} --region=$REGION --json-request -)

echo "PREDICTION RESPONSE = ${prediction}"

## Cleaning up 

### Cleaning up training and deployment resources

To clean up all Google Cloud resources used in this notebook, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Model
- Endpoint
- Cloud Storage Bucket


Set flags for the resource type to be deleted

In [None]:
delete_endpoint = True
delete_model = True
delete_bucket = False

#### **Undeploy models and Delete endpoints**

In [None]:
if delete_endpoint:
    endpoint.delete(force=True)

#### **Deleting models**

In [None]:
if delete_model:
    model.delete()

#### **Delete contents from the staging bucket**

---

***NOTE: Everything in this Cloud Storage bucket will be DELETED. Please run it with caution.***

---

In [None]:
if delete_bucket and "BUCKET_NAME" in globals():
    print(f"Deleting all contents from the bucket {BUCKET_NAME}")

    shell_output = ! gsutil du -as $BUCKET_NAME
    print(
        f"Size of the bucket {BUCKET_NAME} before deleting = {shell_output[0].split()[0]} bytes"
    )

    # uncomment below line to delete contents of the bucket
    # ! gsutil rm -r $BUCKET_NAME

    shell_output = ! gsutil du -as $BUCKET_NAME
    if float(shell_output[0].split()[0]) > 0:
        print(
            "PLEASE UNCOMMENT LINE TO DELETE BUCKET. CONTENT FROM THE BUCKET NOT DELETED"
        )

### Cleaning up Notebook Environment

After you are done experimenting, you can either [STOP](https://cloud.google.com/ai-platform/notebooks/docs/shut-down) or DELETE the AI Notebook instance to prevent any  charges. If you want to save your work, you can choose to stop the instance instead.

```
# Stopping Notebook instance
gcloud notebooks instances stop example-instance --location=us-central1-a


# Deleting Notebook instance
gcloud notebooks instances delete example-instance --location=us-central1-a
```