In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Training, tuning and deploying a PyTorch text sentiment classification model on Vertex AI

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/training/pytorch-text-sentiment-classification-custom-train-deploy.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/training/pytorch-text-sentiment-classification-custom-train-deploy.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
<a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/training/pytorch-text-sentiment-classification-custom-train-deploy.ipynb" target='_blank'>
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
     </a>
  </td>
</table>
<br/><br/><br/>

## Overview

This notebook demonstrates building and deploying a text sentiment classification model by fine-tuing a pre-trained [BERT](https://huggingface.co/bert-base-cased) model using Vertex AI and Pytorch SDK. This example is inspired by the Hugging Face [Token_Classification](https://github.com/huggingface/notebooks/blob/master/examples/token_classification.ipynb) and [Run_Glue](https://github.com/huggingface/transformers/blob/v2.5.0/examples/run_glue.py) notebooks. 

You can find more details about the model at [Hugging Face Hub](https://huggingface.co/bert-base-cased). For more notebooks with the state of the art PyTorch/Tensorflow/JAX, you can explore [Hugging FaceNotebooks](https://huggingface.co/transformers/notebooks.html).

Learn more about [Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training).

### Objective

In this tutorial, you learn to build, train, tune and deploy a PyTorch model on [Vertex AI](https://cloud.google.com/vertex-ai). You mainly focus on support for custom model training and deployment on Vertex AI. 


This tutorial uses the following Google Cloud ML services:

- Vertex AI `Workbench`
- Vertex AI `Training`(Custom Python Package Training) 
- Vertex AI `Model Registry`
- Vertex AI `Endpoint`

The steps performed include:

- Create training package for the text classification model.
- Train the model with custom training on Vertex AI.
- Check the created model artifacts.
- Create a custom container for predictions.
- Deploy the trained model to a Vertex AI Endpoint using the custom container for predictions.
- Send online prediction requests to the deployed model and validate.
- Clean up the resources created in this notebook.

### Dataset

The dataset used for this tutorial is the [Happy Moments dataset](https://www.kaggle.com/ritresearch/happydb) from [Kaggle Datasets](https://www.kaggle.com/ritresearch/happydb). The version of the dataset you use in this tutorial is stored in a public Cloud Storage bucket.

More information about this dataset can be found on [the HappyDB website](https://rit-public.github.io/HappyDB/).

### Costs 

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage
* Cloud Build
* Artifact Registry

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), [Cloud Build pricing](https://cloud.google.com/build/pricing), [Artifact Registry pricing](https://cloud.google.com/artifact-registry/pricing) and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the packages required for executing this notebook.

In [None]:
! pip3 install --upgrade --quiet google-cloud-aiplatform

### Colab only: Uncomment the following cell to restart the kernel.

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Set the region

**Optional**: Update the 'REGION' variable to specify the region that you want to use. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"  # @param {type: "string"}

#### UUID

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append the uuid onto the name of resources you create in this tutorial.

In [None]:
import random
import string


# Generate a uuid of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))


UUID = generate_uuid()

### Authenticate your Google Cloud account

To authenticate your Google Cloud account, follow the instructions for your Jupyter environment:

**1. Vertex AI Workbench**
<br>You are already authenticated.

**2. Local JupyterLab instance**
<br>Uncomment and run the following code:

In [None]:
# ! gcloud auth login

**3. Colab**
<br>Uncomment and run the following code:

In [None]:
# from google.colab import auth

# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

### Import libraries and define constants

Import the required libraries for this notebook.

In [None]:
import base64
import json
import os
import sys

from google.cloud import aiplatform
from google.protobuf.json_format import MessageToDict

Define the constants needed for this tutorial.

In [None]:
# Name for the package application / model / repository
APP_NAME = "finetuned-bert-classifier"

# URI for the pre-built container for custom training
PRE_BUILT_TRAINING_CONTAINER_IMAGE_URI = (
    "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest"
)

# Name of the folder where the python package needs to be stored
PYTHON_PACKAGE_APPLICATION_DIR = "python_package"

# Path to the source distribution tar of the python package
source_package_file_name = f"{PYTHON_PACKAGE_APPLICATION_DIR}/dist/trainer-0.1.tar.gz"

# GCS path where the python package is stored
python_package_gcs_uri = (
    f"{BUCKET_URI}/pytorch-on-gcp/{APP_NAME}/train/python_package/trainer-0.1.tar.gz"
)

# Module name for training application
python_module_name = "trainer.task"

# Training job's display name
JOB_NAME = f"{APP_NAME}-pytorch-pkg-train-{UUID}"

# Set training job's machine-type
TRAIN_MACHINE_TYPE = "n1-standard-8"
# Set training job's accelerator type
TRAIN_ACCELERATOR_TYPE = "NVIDIA_TESLA_V100"
# Set no. of h/w accelerators needed for the training job
TRAIN_ACCELERATOR_COUNT = 1

# Set the name of the container image for prediction
CUSTOM_PREDICTOR_IMAGE_URI = (
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{APP_NAME}/pytorch_predict_{APP_NAME}:latest"
)

# Set the version for model-deployment
VERSION = 1
# Set the model display name
model_display_name = f"{APP_NAME}-v{VERSION}"
# Set the model description
model_description = "PyTorch based text classifier with custom container"

# Set the health route for prediction container
health_route = "/ping"
# Set the predict route for prediction container
predict_route = f"/predictions/{APP_NAME}"
# Set the serving container ports for prediction
serving_container_ports = [7080]

# Set the display name for endpoint
endpoint_display_name = f"{APP_NAME}-endpoint"
# Set the machine-type for deployment
DEPLOY_MACHINE_TYPE = "n1-standard-4"

### Initialize the Vertex AI SDK for Python

In [None]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

## Custom Training on Vertex AI

__Recommended Training Application Structure__

You can structure your training application in any way you like. However, the [following structure](https://cloud.google.com/vertex-ai/docs/training/create-python-pre-built-container#structure) is commonly used in Vertex AI samples, and having your project organized similarly can make it easier for you to follow the samples.

The following `python_package` directory structure shows a sample packaging approach.

```
├── python_package
│   ├── setup.py
│   └── trainer
│       ├── __init__.py
│       ├── experiment.py
│       ├── metadata.py
│       ├── model.py
│       ├── task.py
│       └── utils.py
└── pytorch-text-sentiment-classification-custom-train-deploy.ipynb    --> This notebook
```

* Main project directory contains your `setup.py` file with the dependencies. 
* Inside `trainer` directory:
    - `task.py` - Main application module initializes and parse task arguments (hyperparameters). It also serves as an entry point to the trainer.
    - `model.py` -  Includes a function to create a model with a sequence classification head from a pre-trained model.
    - `experiment.py` - Runs the model training and evaluation experiment, and exports the final model.
    - `metadata.py` - Defines the metadata for classification tasks such as predefined model, dataset name and target labels.
    - `utils.py` - Includes utility functions such as those used for reading data, saving models to Cloud Storage buckets.

### Create files requierd for the python package

Create directories for the python package.

In [None]:
!mkdir -p python_package/trainer
!mkdir -p python_package/scripts
!touch ./python_package/trainer/__init__.py

Create the `model.py` file that returns the specified pre-trained model.

In [None]:
%%writefile ./python_package/trainer/model.py

from transformers import AutoModelForSequenceClassification
from trainer import metadata

def create(num_labels):
    """create the model by loading a pretrained model or define your 
    own

    Args:
      num_labels: number of target labels
    """
    # Create the model, loss function, and optimizer
    model = AutoModelForSequenceClassification.from_pretrained(
        metadata.PRETRAINED_MODEL_NAME,
        num_labels=num_labels
    )
    
    return model

Create the `utils.py` file that defines utility functions for data-loading, preprocessing, and model-saving.

In [None]:
%%writefile ./python_package/trainer/utils.py

import os
import datetime
import pandas as pd

from google.cloud import storage

from transformers import AutoTokenizer
from datasets import load_dataset, load_metric, ReadInstruction, DatasetDict, Dataset
from trainer import metadata


def preprocess_function(examples):
    tokenizer = AutoTokenizer.from_pretrained(
        metadata.PRETRAINED_MODEL_NAME,
        use_fast=True,
    )
    
    # Tokenize the texts
    tokenizer_args = (
        (examples['text'],) 
    )
    result = tokenizer(*tokenizer_args, 
                       padding='max_length', 
                       max_length=metadata.MAX_SEQ_LENGTH, 
                       truncation=True)
    
    # We can extract this automatically but the unique() method of the dataset
    # is not reporting the label -1 which shows up in the pre-processing
    # hence the additional -1 term in the dictionary
    
    label_to_id = metadata.TARGET_LABELS
    
    # Map labels to IDs (not necessary for GLUE tasks)
    if label_to_id is not None and "label" in examples:
        result["label"] = [label_to_id[l] for l in examples["label"]]

    return result


def load_data(args):
    """Loads the data into two different data loaders. (Train, Test)

        Args:
            args: arguments passed to the python script
    """
    # dataset loading repeated here to make this cell idempotent
    # since we are over-writing datasets variable
    
    df_train = pd.read_csv(metadata.TRAIN_DATA)
    df_test = pd.read_csv(metadata.TEST_DATA)
    
    dataset = DatasetDict({"train": Dataset.from_pandas(df_train),"test": Dataset.from_pandas(df_test)})

    dataset = dataset.map(preprocess_function, 
                          batched=True, 
                          load_from_cache_file=True)

    train_dataset, test_dataset = dataset["train"], dataset["test"]

    return train_dataset, test_dataset


def save_model(args):
    """Saves the model to Google Cloud Storage or local file system

    Args:
      args: contains name for saved model.
    """
    scheme = 'gs://'
    if args.job_dir.startswith(scheme):
        job_dir = args.job_dir.split("/")
        bucket_name = job_dir[2]
        object_prefix = "/".join(job_dir[3:]).rstrip("/")

        if object_prefix:
            model_path = '{}/{}'.format(object_prefix, args.model_name)
        else:
            model_path = '{}'.format(args.model_name)

        bucket = storage.Client().bucket(bucket_name)    
        local_path = os.path.join("/tmp", args.model_name)
        files = [f for f in os.listdir(local_path) if os.path.isfile(os.path.join(local_path, f))]
        for file in files:
            local_file = os.path.join(local_path, file)
            blob = bucket.blob("/".join([model_path, file]))
            blob.upload_from_filename(local_file)
        print(f"Saved model files in gs://{bucket_name}/{model_path}")
    else:
        print(f"Saved model files at {os.path.join('/tmp', args.model_name)}")
        print(f"To save model files in GCS bucket, please specify job_dir starting with gs://")


Create the `metadata.py` file for defining the constants used in the training application.

In [None]:
%%writefile ./python_package/trainer/metadata.py

# Task type can be either 'classification', 'regression', or 'custom'.
# This is based on the target feature in the dataset.
TASK_TYPE = 'classification'

# Dataset paths
    
TRAIN_DATA = "gs://cloud-samples-data/ai-platform-unified/datasets/text/happydb/happydb_train.csv"
TEST_DATA = "gs://cloud-samples-data/ai-platform-unified/datasets/text/happydb/happydb_test.csv"

# pre-trained model name
PRETRAINED_MODEL_NAME = 'bert-base-cased'

# List of the class values (labels) in a classification dataset.
TARGET_LABELS = {"leisure": 0, "exercise":1, "enjoy_the_moment":2, "affection":3,"achievement":4, "nature":5, "bonding":6}


# maximum sequence length
MAX_SEQ_LENGTH = 128

Create the `experiment.py` file which defines the functions for hyperparameter tuning and training. 

In [None]:
%%writefile ./python_package/trainer/experiment.py

import os
import numpy as np
import hypertune

from transformers import (
    AutoTokenizer,
    EvalPrediction,
    Trainer,
    TrainingArguments,
    default_data_collator,
    TrainerCallback
)

from trainer import model, metadata, utils


class HPTuneCallback(TrainerCallback):
    """
    A custom callback class that reports a metric to hypertuner
    at the end of each epoch.
    """
    
    def __init__(self, metric_tag, metric_value):
        super(HPTuneCallback, self).__init__()
        self.metric_tag = metric_tag
        self.metric_value = metric_value
        self.hpt = hypertune.HyperTune()
        
    def on_evaluate(self, args, state, control, **kwargs):
        print(f"HP metric {self.metric_tag}={kwargs['metrics'][self.metric_value]}")
        self.hpt.report_hyperparameter_tuning_metric(
            hyperparameter_metric_tag=self.metric_tag,
            metric_value=kwargs['metrics'][self.metric_value],
            global_step=state.epoch)


def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.argmax(preds, axis=1)
    return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}


def train(args, model, train_dataset, test_dataset):
    """Create the training loop to load pretrained model and tokenizer and 
    start the training process

    Args:
      args: read arguments from the runner to set training hyperparameters
      model: The neural network that you are training
      train_dataset: The training dataset
      test_dataset: The test dataset for evaluation
    """
    
    # initialize the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        metadata.PRETRAINED_MODEL_NAME,
        use_fast=True,
    )
    
    # set training arguments
    training_args = TrainingArguments(
        evaluation_strategy="epoch",
        learning_rate=args.learning_rate,
        per_device_train_batch_size=args.batch_size,
        per_device_eval_batch_size=args.batch_size,
        num_train_epochs=args.num_epochs,
        weight_decay=args.weight_decay,
        output_dir=os.path.join("/tmp", args.model_name)
    )
    
    # initialize our Trainer
    trainer = Trainer(
        model,
        training_args,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        data_collator=default_data_collator,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics
    )
    
    # add hyperparameter tuning callback to report metrics when enabled
    if args.hp_tune == "y":
        trainer.add_callback(HPTuneCallback("accuracy", "eval_accuracy"))
    
    # training
    trainer.train()
    
    return trainer


def run(args):
    """Load the data, train, evaluate, and export the model for serving and
     evaluating.

    Args:
      args: experiment parameters.
    """
    # Open our dataset
    train_dataset, test_dataset = utils.load_data(args)

    label_list = train_dataset.unique("label")
    num_labels = len(label_list)
    
    # Create the model, loss function, and optimizer
    text_classifier = model.create(num_labels=num_labels)
    
    # Train / Test the model
    trainer = train(args, text_classifier, train_dataset, test_dataset)

    metrics = trainer.evaluate(eval_dataset=test_dataset)
    trainer.save_metrics("all", metrics)

    # Export the trained model
    trainer.save_model(os.path.join("/tmp", args.model_name))

    # Save the model to GCS
    if args.job_dir:
        utils.save_model(args)
    else:
        print(f"Saved model files at {os.path.join('/tmp', args.model_name)}")
        print(f"To save model files in GCS bucket, please specify job_dir starting with gs://")


Create the `task.py` which is the main file that runs the training application.

In [None]:
%%writefile ./python_package/trainer/task.py

import argparse
import os

from trainer import experiment


def get_args():
    """Define the task arguments with the default values.

    Returns:
        experiment parameters
    """
    args_parser = argparse.ArgumentParser()


    # Experiment arguments
    args_parser.add_argument(
        '--batch-size',
        help='Batch size for each training and evaluation step.',
        type=int,
        default=16)
    args_parser.add_argument(
        '--num-epochs',
        help="""\
        Maximum number of training data epochs on which to train.
        If both --train-size and --num-epochs are specified,
        --train-steps are: (train-size/train-batch-size) * num-epochs.\
        """,
        default=1,
        type=int,
    )
    args_parser.add_argument(
        '--seed',
        help='Random seed (default: 42)',
        type=int,
        default=42,
    )

    # Estimator arguments
    args_parser.add_argument(
        '--learning-rate',
        help='Learning rate value for the optimizers.',
        default=2e-5,
        type=float)
    args_parser.add_argument(
        '--weight-decay',
        help="""
      The factor by which the learning rate should decay by the end of the
      training.

      decayed_learning_rate =
        learning_rate * decay_rate ^ (global_step / decay_steps)

      If set to 0 (default), then no decay occurs.
      If set to 0.5, then the learning rate should reach 0.5 of its original
          value at the end of the training.
      Note that decay_steps is set to train_steps.
      """,
        default=0.01,
        type=float)

    # Enable hyperparameter
    args_parser.add_argument(
        '--hp-tune',
        default="n",
        help='Enable hyperparameter tuning. Valida values are: "y" - enable, "n" - disable')
    
    # Saved model arguments
    args_parser.add_argument(
        '--job-dir',
        default=os.getenv('AIP_MODEL_DIR'),
        help='GCS location to export models')
    args_parser.add_argument(
        '--model-name',
        default="finetuned-bert-classifier",
        help='The name of your saved model')

    return args_parser.parse_args()


def main():
    """Setup / Start the experiment
    """
    args = get_args()
    print(args)
    experiment.run(args)


if __name__ == '__main__':
    main()


Following is the `setup.py` file for the training application. The `find_packages()` function inside `setup.py` includes the `trainer` directory in the package because it contains `__init__.py` which tells [Python Setuptools](https://setuptools.readthedocs.io/en/latest/) to include all subdirectories of the parent directory as dependencies. 

In `setup.py`, you also specify the Python packages that are required for the training application such as `transformers`, `datasets`, `cloudml-hypertune` and `tqdm`.

In [None]:
%%writefile ./{PYTHON_PACKAGE_APPLICATION_DIR}/setup.py

from setuptools import find_packages
from setuptools import setup
import setuptools

from distutils.command.build import build as _build
import subprocess


REQUIRED_PACKAGES = [
    'transformers',
    'datasets',
    'tqdm',
    'cloudml-hypertune'
]

setup(
    name='trainer',
    version='0.1',
    install_requires=REQUIRED_PACKAGES,
    packages=find_packages(),
    include_package_data=True,
    description='Vertex AI | Training | PyTorch | Text Classification | Python Package'
)

Run the following command to create a source distribution.

In [None]:
!cd {PYTHON_PACKAGE_APPLICATION_DIR} && python3 setup.py sdist --formats=gztar

Now upload the source distribution with the training application to Cloud Storage bucket.

In [None]:
!gsutil cp {source_package_file_name} {python_package_gcs_uri}

Validate that the source distribution exists in the Cloud Storage bucket.

In [None]:
!gsutil ls -l {python_package_gcs_uri}

### Run a custom job in Vertex AI using a pre-built container

In this notebook, you are using Hugging Face Datasets and fine-tuning a transformer model from the Hugging Face Transformers library for sentiment analysis tasks using PyTorch. You don't need to build a PyTorch environment from scratch for running the training application because Vertex AI provides [pre-built containers](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers#available_container_images). 

Vertex AI pre-built containers are Docker container images that you can use for custom training. They include some common dependencies used in training code based on the machine learning framework and framework version.

You use a [pre-built container for PyTorch](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers#pytorch) and the packaged training application to run the training job on Vertex AI.

Configure a [Custom Job](https://cloud.google.com/vertex-ai/docs/training/create-custom-job) with the [pre-built container](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) image for PyTorch and training code packaged as Python source distribution. 

In [None]:
job = aiplatform.CustomPythonPackageTrainingJob(
    display_name=JOB_NAME,
    python_package_gcs_uri=python_package_gcs_uri,
    python_module_name=python_module_name,
    container_uri=PRE_BUILT_TRAINING_CONTAINER_IMAGE_URI,
)

Run the Custom training job with the following parameters:
- `machine_type`: Mahcine type on which the job needs to run.
- `accelerator_type`: Hardware accelerator type for running the job. One of _ACCELERATOR_TYPE_UNSPECIFIED_,
        _NVIDIA_TESLA_K80_, _NVIDIA_TESLA_P100_, _NVIDIA_TESLA_V100_, _NVIDIA_TESLA_P4_,
        _NVIDIA_TESLA_T4_, _NVIDIA_TELSA_A100_
- `accelerator_count`: The number of accelerators to attach to a worker replica.
- `replica_count`: The number of worker replicas.
- `args`: Command line arguments to be passed to the Python script.

Learn more about Vertex AI's [Custom Python-Package Trainining](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomPythonPackageTrainingJob).

*Note*: This training job may take over 24 hours.

In [None]:
if os.getenv("IS_TESTING"):
    sys.exit(0)

In [None]:
training_args = ["--num-epochs", "2", "--model-name", APP_NAME]

model = job.run(
    replica_count=1,
    machine_type=TRAIN_MACHINE_TYPE,
    accelerator_type=TRAIN_ACCELERATOR_TYPE,
    accelerator_count=TRAIN_ACCELERATOR_COUNT,
    args=training_args,
)

Validate that the model artifacts are written to Cloud Storage by the training code after the job completes successfully.

In [None]:
job_response = MessageToDict(job._gca_resource._pb)
GCS_MODEL_ARTIFACTS_URI = job_response["trainingTaskInputs"]["baseOutputDirectory"][
    "outputUriPrefix"
]
print(f"Model artifacts are available at {GCS_MODEL_ARTIFACTS_URI}")

In [None]:
!gsutil ls -lr $GCS_MODEL_ARTIFACTS_URI/

## Deployment

Deploying a PyTorch [model on Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/getting-predictions) requires you to use a custom container that serves online predictions on a Vertex AI Endpoint. You deploy a container running [PyTorch's TorchServe](https://pytorch.org/serve/) tool in order to serve predictions from the fine-tuned transformer model for a sentiment analysis task. Then, you can then use Vertex AI's online prediction service to classify the sentiment of input texts. 

### Deploying a model on Vertex AI using a custom container

To use a custom container to serve predictions from a PyTorch model, you must provide Vertex AI with a Docker container image that runs an HTTP server application, such as TorchServe in this case. Learn more about the [prediction container requirements on Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements).

Essentially, the following steps are needed to deploy a PyTorch model on Vertex AI:

1. Package the trained model artifacts including [default](https://pytorch.org/serve/#default-handlers) or [custom](https://pytorch.org/serve/custom_service.html) handlers by creating an archive file using [Torch model archiver](https://github.com/pytorch/serve/tree/master/model-archiver).
2. Build a [custom container](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements) compatible with Vertex AI to serve the model using TorchServe.
3. Upload the model with the custom container image to serve predictions as a Vertex AI model resource.
4. Create a Vertex AI Endpoint and [deploy the model](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api) resource.


#### Create a custom model handler to handle prediction requests

When passing input text to the fine-tuned transformer model, the input text needs to be pre-processed. Once the model generates predictions, some post-processing has to be performed on the generated output to label it into the underlying classes and serve their probabilities (or confidence scores). 

To include the steps like pre-processing and post-processing, you create a custom handler script that is packaged with the model artifacts. Later, TorchServe executes the script when deployed. 

Custom handler script does the following:

- Pre-process input text before sending it to the model for inference
- Customize how the model is invoked for inference
- Post-process output from the model before sending back a response

Learn more about defining a custom handler from [TorchServe documentation](https://pytorch.org/serve/custom_service.html).

Create a directory to define a function for handling predictions.

In [None]:
!mkdir -p predictor

Create the `custom_handler.py` file that handles the prediction requests when deployed.

In [None]:
%%writefile predictor/custom_handler.py

import os
import json
import logging

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from ts.torch_handler.base_handler import BaseHandler

logger = logging.getLogger(__name__)


class TransformersClassifierHandler(BaseHandler):
    """
    The handler takes an input string and returns the classification text 
    based on the serialized transformers checkpoint.
    """
    def __init__(self):
        super(TransformersClassifierHandler, self).__init__()
        self.initialized = False

    def initialize(self, ctx):
        """ Loads the model.pt file and initialized the model object.
        Instantiates Tokenizer for preprocessor to use
        Loads labels to name mapping file for post-processing inference response
        """
        self.manifest = ctx.manifest

        properties = ctx.system_properties
        model_dir = properties.get("model_dir")
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        # Read model serialize/pt file
        serialized_file = self.manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)
        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt or pytorch_model.bin file")
        
        # Load model
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.to(self.device)
        self.model.eval()
        logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))
        
        # Ensure to use the same tokenizer used during training
        self.tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')

        # Read the mapping file, index to object name
        mapping_file_path = os.path.join(model_dir, "index_to_name.json")

        if os.path.isfile(mapping_file_path):
            with open(mapping_file_path) as f:
                self.mapping = json.load(f)
        else:
            logger.warning('Missing the index_to_name.json file. Inference output defaults.')
            self.mapping = {"0": "Negative",  "1": "Positive"}

        self.initialized = True

    def preprocess(self, data):
        """ Preprocessing input request by tokenizing
            Extend with your own preprocessing steps as needed
        """
        text = data[0].get("data")
        if text is None:
            text = data[0].get("body")
        sentences = text.decode('utf-8')
        logger.info("Received text: '%s'", sentences)

        # Tokenize the texts
        tokenizer_args = ((sentences,))
        inputs = self.tokenizer(*tokenizer_args,
                                padding='max_length',
                                max_length=128,
                                truncation=True,
                                return_tensors = "pt")
        return inputs

    def inference(self, inputs):
        """ Predict the class of a text using a trained transformer model.
        """
        prediction = self.model(inputs['input_ids'].to(self.device))[0].argmax().item()

        if self.mapping:
            prediction = self.mapping[str(prediction)]

        logger.info("Model predicted: '%s'", prediction)
        return [prediction]

    def postprocess(self, inference_output):
        return inference_output


### Generate a file for class names

For the custom handler, create the following mapping file (`index_to_name.json`) that is used to associate the target labels with their meaningful names while formatting the prediction responses.

In [None]:
%%writefile ./predictor/index_to_name.json

{
    "0": "leisure",
    "1": "exercise",
    "2": "enjoy_the_moment",
    "3": "affection",
    "4": "achievement",
    "5": "nature",
    "6": "bonding"
}

### Create a custom container image to serve predictions

Next, you use [Artifact Registry](https://cloud.google.com/artifact-registry) and [Cloud Build](https://cloud.google.com/build) to create the custom container image in the following steps:

#### Download the model artifacts

Download model artifacts that were saved as part of the training (or hyperparameter tuning) job from Cloud Storage to local directory

Validate model artifact files in the Cloud Storage bucket

In [None]:
!gsutil ls -r $GCS_MODEL_ARTIFACTS_URI/model/

Copy the files from Cloud Storage to a local directory.

In [None]:
!gsutil -m cp -r $GCS_MODEL_ARTIFACTS_URI/model/ ./predictor/

In [None]:
!ls -ltrR ./predictor/model

#### Create a Dockerfile for the image

Create a Dockerfile with TorchServe as the base image by performing the following steps:

 - Install dependencies such as `transformers`.
 - Add model artifacts to the `/home/model-server/` directory in the container image.
 - Add the custom handler script to the `/home/model-server/` directory in the container image.
 - Create `/home/model-server/config.properties` to define the serving configuration (health and prediction listener ports).
 - Run [Torch model archiver](https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive) to create a model archive file from the files copied into the `/home/model-server/` directory in the container image. The model archive is saved in the `/home/model-server/model-store/` directory with the name same as `<model-name>.mar`.
 - Launch TorchServe HTTP server that references the configuration properties and enables serving for the model.

In [None]:
%%bash -s $APP_NAME

APP_NAME=$1

cat << EOF > ./predictor/Dockerfile

FROM pytorch/torchserve:latest-cpu

# install dependencies
RUN python3 -m pip install --upgrade pip
RUN pip3 install transformers

USER model-server

# copy model artifacts, custom handler and other dependencies
COPY ./custom_handler.py /home/model-server/
COPY ./index_to_name.json /home/model-server/
COPY ./model/$APP_NAME/ /home/model-server/

# create torchserve configuration file
USER root
RUN printf "\nservice_envelope=json" >> /home/model-server/config.properties
RUN printf "\ninference_address=http://0.0.0.0:7080" >> /home/model-server/config.properties
RUN printf "\nmanagement_address=http://0.0.0.0:7081" >> /home/model-server/config.properties
USER model-server

# expose health and prediction listener ports from the image
EXPOSE 7080
EXPOSE 7081

# create model archive file packaging model artifacts and dependencies
RUN torch-model-archiver -f \
  --model-name=$APP_NAME \
  --version=1.0 \
  --serialized-file=/home/model-server/pytorch_model.bin \
  --handler=/home/model-server/custom_handler.py \
  --extra-files "/home/model-server/config.json,/home/model-server/tokenizer.json,/home/model-server/training_args.bin,/home/model-server/tokenizer_config.json,/home/model-server/special_tokens_map.json,/home/model-server/vocab.txt,/home/model-server/index_to_name.json" \
  --export-path=/home/model-server/model-store

# run Torchserve HTTP serve to respond to prediction requests
CMD ["torchserve", \
     "--start", \
     "--ts-config=/home/model-server/config.properties", \
     "--models", \
     "$APP_NAME=$APP_NAME.mar", \
     "--model-store", \
     "/home/model-server/model-store"]
EOF

echo "Writing ./predictor/Dockerfile"

#### Create a docker repository

Create your own Docker repository in Artifact Registry where you push the docker image for serving predictions.

1. Run the `gcloud artifacts repositories create` command to create a new Docker repository with your specified region and description.

2. Run the `gcloud artifacts repositories list` command to verify that your repository is created.

Set `APP_NAME` to the name of your repository.

In [None]:
# Create the repository in Artifact registry
! gcloud artifacts repositories create {APP_NAME} --repository-format=docker --location={REGION} --description="Docker repository"

# List all repositories and check your repository
! gcloud artifacts repositories list

#### Build the docker image tagged with the image path

Next, you build a docker image inside the created repository using Cloud Build. Cloud Build tries to locate the repository path provided in the tag.

Learn more about [building and pushing a docker image with Cloud Build](https://cloud.google.com/build/docs/build-push-docker-image).

In [None]:
!gcloud builds submit --region={REGION} --tag=$CUSTOM_PREDICTOR_IMAGE_URI ./predictor

### Deploying the serving container to Vertex AI

Next, you create a model resource on Vertex AI and deploy the model to a Vertex AI Endpoint. You must deploy a model to an endpoint for serving online predictions. The deployed model runs the custom container image to serve predictions. 

#### Create a Vertex AI Model resource

Create a Vertex AI model resource with the created model artifacts and the container image.

In [None]:
model = aiplatform.Model.upload(
    display_name=model_display_name,
    description=model_description,
    serving_container_image_uri=CUSTOM_PREDICTOR_IMAGE_URI,
    serving_container_predict_route=predict_route,
    serving_container_health_route=health_route,
    serving_container_ports=serving_container_ports,
)

model.wait()

print(model.display_name)
print(model.resource_name)

#### Create a Vertex AI Endpoint

Create a Vertex AI Endpoint to deploy the registered Vertex AI model.

In [None]:
endpoint = aiplatform.Endpoint.create(display_name=endpoint_display_name)

#### Deploy the Model to Endpoint

Deploying a model associates physical resources with the model so it can serve online predictions with low latency. 

**NOTE:** It takes a few minutes to deploy the resources.

In [None]:
model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=model_display_name,
    machine_type=DEPLOY_MACHINE_TYPE,
    sync=True,
)

## Send online prediction requests

Now, invoke the endpoint where the model is deployed using the Vertex AI SDK to make predictions for some test instances.

### Format input for online prediction

This notebook uses [TorchServe's KServe based inference API](https://pytorch.org/serve/inference_api.html#kserve-inference-api), which is also a [Vertex AI predictions compatible format](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#prediction). For online prediction requests, format the prediction input instances as JSON with base64 encoding as follows:

```
[
    {
        "data": {
            "b64": "<base64 encoded string>"
        }
    }
]
```

In [None]:
test_instances = [
    b"I went to a meeting that went really well.",
    b"I ran four miles this morning with a good time.",
    b"Watching the storms we had yesterday.  The lightning was incredible!",
    b"The last night I said with her 'I love you '. And she said ' Yes'.",
    b"I had followed a complex recipe making roasted duck, which took me hours and I had successfully made it.",
    b"I woke up this morning to birds chirping.",
]

### Send online prediction requests

Format the input text string, call prediction endpoint with formatted input requests and get the response.

In [None]:
# print the test instances and their responses
for instance in test_instances:
    print(f"Input text: \n\t{instance.decode('utf-8')}\n")
    b64_encoded = base64.b64encode(instance)
    test_instance = [{"data": {"b64": f"{str(b64_encoded.decode('utf-8'))}"}}]
    print(f"Formatted input: \n{json.dumps(test_instance, indent=4)}\n")
    prediction = endpoint.predict(instances=test_instance)
    print(f"Prediction response: \n\t{prediction}")
    print("=" * 100)

## Cleaning up 

To clean up all Google Cloud resources used in this notebook, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Training Job
- Vertex AI Model
- Vertex AI Endpoint
- Cloud Storage Bucket (set `delete_bucket` to **True** to delete the bucket)
- Image Regpository (Artifact Registry)

In [None]:
delete_bucket = False

# Delete the Custom training job
job.delete()

# Undeploy the model from the endpoint
endpoint.undeploy_all()
# Delete the endpoint
endpoint.delete()

# Delete the Vertex AI Model resource
model.delete()

# Delete the Cloud Storage bucket
if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil -m rm -r $BUCKET_URI

# Delete artifact repository
! gcloud artifacts repositories delete $APP_NAME --location=$REGION --quiet