In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Get started with Vertex AI Training for LightGBM

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/training/get_started_vertex_training_lightgbm.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Ftraining%2Fget_started_vertex_training_lightgbm.ipynb">
      <img width="32px" src="https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/training/get_started_vertex_training_lightgbm.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/training/get_started_vertex_training_lightgbm.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

## Overview


This tutorial demonstrates how to use Vertex AI Training for training a LightGBM model.

Learn more about [Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training).


__NOTE__: This notebook is a revised version of a [notebook](https://github.com/RajeshThallam/vertex-ai-labs/blob/main/07-vertex-train-deploy-lightgbm/vertex-train-deploy-lightgbm-model.ipynb) from the [**vertex-ai-labs** public repo](https://github.com/RajeshThallam/vertex-ai-labs).

### Objective

In this tutorial, you learn how to train a LightGBM custom model using the custom container method for Vertex AI Training.

This tutorial uses the following Vertex AI services:

- Vertex AI Training
- Vertex AI Model Registry
- Vertex AI Batch predictions
- Vertex AI Online prediction

The steps performed include:

- Training using a Python package.
- Save the model artifacts to Cloud Storage using GCSFuse.
- Construct a FastAPI prediction server.
- Construct a Dockerfile deployment image for the server.
- Test the deployment image locally (optional and not for Colab users).
- Create a Vertex AI model resource.
- Run a batch prediction job.
- Deploy the model to an endpoint and send online prediction requests.
- Clean up the created resources.

### Dataset

The dataset used for this tutorial is the [Iris dataset](https://www.tensorflow.org/datasets/catalog/iris) from [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview). This dataset doesn't require any feature engineering. The version of the dataset in this tutorial is stored in a public Cloud Storage bucket. The trained model predicts the type of Iris flower species from a class of three species: setosa, virginica, or versicolor.

### Costs 

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage
* Artifact Registry
* Cloud Build

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), [Artifact Registry pricing](https://cloud.google.com/artifact-registry/pricing), and [Cloud Build pricing](https://cloud.google.com/build/pricing) and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Get started

### Install Vertex AI SDK for Python and other required packages


In [None]:
! pip3 install --upgrade -q google-cloud-aiplatform \
                            tensorflow==2.17.0

### Restart runtime (Colab only)

To use the newly installed packages, you must restart the runtime on Google Colab.

In [None]:
import sys

if "google.colab" in sys.modules:

    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

Authenticate your environment on Google Colab.


In [None]:
import sys

if "google.colab" in sys.modules:

    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information

To run this tutorial, you must have an existing Google Cloud project. Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

### Import the required libraries

In [None]:
import json
import os
import sys

import google.cloud.aiplatform as aiplatform
import tensorflow as tf

### Initialize Vertex AI SDK for Python

To get started using Vertex AI, you must [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Initialize the Vertex AI SDK for Python using your project and Cloud Storage bucket.

In [None]:
aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)

### Enable Artifact Registry API

Learn more about [Enabling service](https://cloud.google.com/artifact-registry/docs/enable-service).

In [None]:
! gcloud services enable artifactregistry.googleapis.com --project {PROJECT_ID}

if os.getenv("IS_TESTING"):
    ! sudo apt-get update --yes && sudo apt-get --only-upgrade --yes install google-cloud-sdk-cloud-run-proxy google-cloud-sdk-harbourbridge google-cloud-sdk-cbt google-cloud-sdk-gke-gcloud-auth-plugin google-cloud-sdk-kpt google-cloud-sdk-local-extract google-cloud-sdk-minikube google-cloud-sdk-app-engine-java google-cloud-sdk-app-engine-go google-cloud-sdk-app-engine-python google-cloud-sdk-spanner-emulator google-cloud-sdk-bigtable-emulator google-cloud-sdk-nomos google-cloud-sdk-package-go-module google-cloud-sdk-firestore-emulator kubectl google-cloud-sdk-datastore-emulator google-cloud-sdk-app-engine-python-extras google-cloud-sdk-cloud-build-local google-cloud-sdk-kubectl-oidc google-cloud-sdk-anthos-auth google-cloud-sdk-app-engine-grpc google-cloud-sdk-pubsub-emulator google-cloud-sdk-datalab google-cloud-sdk-skaffold google-cloud-sdk google-cloud-sdk-terraform-tools google-cloud-sdk-config-connector
    ! gcloud components update --quiet

## Create a private Docker repository

Your first step is to create your own Docker repository in the Artifact Registry.

1. Run the `gcloud artifacts repositories create` command to create a new Docker repository with your region with the description "docker repository".

2. Run the `gcloud artifacts repositories list` command to verify that your repository is created.

In [None]:
# Set a display name for the app to use later
APP_NAME = "iris-classification"
# Set the name for your private repo
PRIVATE_REPO = f"{APP_NAME}-repo-unique"
# Create the repo
! gcloud artifacts repositories create {PRIVATE_REPO} \
                                      --repository-format=docker \
                                      --location={LOCATION} \
                                      --project={PROJECT_ID} \
                                      --description="Prediction repository"
# List the repos and check if your repo is created
! gcloud artifacts repositories list --project={PROJECT_ID} --location={LOCATION}

## Configure authentication to your private repo

Before you push or pull container images, configure Docker to use the `gcloud` command-line tool to authenticate requests to `Artifact Registry` for your region.

In [None]:
! gcloud auth configure-docker {LOCATION}-docker.pkg.dev --quiet

## Set container image paths

Set the prebuilt Docker container image for training and custom container for predictions.


For the latest list, see [Prebuilt containers for training](https://cloud.google.com/ai-platform-unified/docs/training/pre-built-containers).

In [None]:
# Set prebuilt image for training
TRAIN_VERSION = "scikit-learn-cpu.0-23"
TRAIN_IMAGE = "{}-docker.pkg.dev/vertex-ai/training/{}:latest".format(
    LOCATION.split("-")[0], TRAIN_VERSION
)

# Set prebuilt image for serving
DEPLOY_VERSION = "lightgbm-cpu"
DEPLOY_IMAGE = "{}-docker.pkg.dev/{}/{}/{}:latest".format(
    LOCATION, PROJECT_ID, PRIVATE_REPO, DEPLOY_VERSION
)
print("Deploy image:", DEPLOY_IMAGE)

## Set machine type

Next, set the machine type to use for training and prediction.

- Set the variables `TRAIN_COMPUTE` and `DEPLOY_COMPUTE` to configure the compute resources(VMs) needed for training and prediction.
 - `machine type`
     - `n1-standard`: 3.75GB of memory per vCPU
     - `n1-highmem`: 6.5GB of memory per vCPU
     - `n1-highcpu`: 0.9 GB of memory per vCPU
 - `vCPUs`: number of \[2, 4, 8, 16, 32, 64, 96 \]

**Note:** The following isn't supported for training:

 - `standard`: 2 vCPUs
 - `highcpu`: 2, 4 and 8 vCPUs

**Note:** You may also use n2 and e2 machine types for training and deployment, but they don't support GPUs.

In [None]:
MACHINE_TYPE = "n1-standard"

VCPU = "4"
TRAIN_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", TRAIN_COMPUTE)

VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Deploy machine type", DEPLOY_COMPUTE)

## Create the training package

### Package layout

Before you start the training, take a look at how a Python package is assembled for a custom training job. When unarchived, the package contains the following directory layout.

- PKG-INFO
- README.md
- setup.cfg
- setup.py
- trainer
  - \_\_init\_\_.py
  - task.py

The files `setup.cfg` and `setup.py` include the instructions for installing the package into the operating environment of the Docker image.

The file `trainer/task.py` is the Python script for executing the custom training job. 

**Note**: When referred to the file in the worker pool specification, the file suffix(`.py`) is dropped and the directory slash is replaced with a dot(`trainer.task`).

### Package Assembly

In the following cells, you assemble the training package.

In [None]:
# Make folder for Python training script
! rm -rf custom
! mkdir custom

# Add package information
! touch custom/README.md

setup_cfg = "[egg_info]\n\ntag_build =\n\ntag_date = 0"
! echo "$setup_cfg" > custom/setup.cfg

setup_py = "import setuptools\n\nsetuptools.setup(\n\n    install_requires=[\n\n'lightgbm'    ],\n\n    packages=setuptools.find_packages())"
! echo "$setup_py" > custom/setup.py

pkg_info = "Metadata-Version: 1.0\n\nName: Iris tabular classification\n\nVersion: 0.0.0\n\nSummary: Demostration training script\n\nHome-page: www.google.com\n\nAuthor: Google\n\nAuthor-email: aferlitsch@google.com\n\nLicense: Public\n\nDescription: Demo\n\nPlatform: Vertex"
! echo "$pkg_info" > custom/PKG-INFO

# Make the training subfolder
! mkdir custom/trainer
! touch custom/trainer/__init__.py

### Create *task.py* script

Next, you create the *task.py* script for running the training package. Some noteable steps include:

- **Parse command-line arguments**: 
    - `model-dir`: The location to save the trained model. When using Vertex AI custom training, the location is specified through the environment variable: `AIP_MODEL_DIR`
    
- **Data preprocessing** (`get_data()`): Download the dataset and split it into training and test sets.
    
- **Training** (`train_model()`): Train the model.
    
- **Saving the model artifacts**: Save the model artifacts and evaluation metrics to the Cloud Storage location specified in `model-dir`.

In [None]:
%%writefile custom/trainer/task.py
# Single Instance Training for Iris

import datetime
import os
import subprocess
import sys

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import pandas as pd

import lightgbm as lgb

import argparse
import logging

logging.getLogger().setLevel(logging.INFO)

logging.info("Parsing arguments")

parser = argparse.ArgumentParser()
parser.add_argument(
    '--model-dir', 
    dest='model_dir',        
    default=os.getenv('AIP_MODEL_DIR'), 
    type=str, 
    help='Location to export GCS model')
args = parser.parse_args()
logging.info(args)

def get_data():
    # Download data
    logging.info("Downloading data")
    iris = load_iris()
    print(iris.data.shape)

    # split data
    print("Splitting data into test and train")
    x, y = iris.data, iris.target
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=123)

    # create dataset for lightgbm
    print("creating dataset for LightGBM")
    lgb_train = lgb.Dataset(x_train, y_train)
    lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
    
    return lgb_train, lgb_eval

def train_model(lgb_train, lg_eval):
    # specify your configurations as a dict
    params = {
        'boosting_type': 'gbdt',
        'objective': 'multiclass',
        'metric': {'multi_error'},
        'num_leaves': 31,
        'learning_rate': 0.05,
        'feature_fraction': 0.9,
        'bagging_fraction': 0.8,
        'bagging_freq': 5,
        'verbose': 0,
        'num_class' : 3
    }

    # train lightgbm model
    logging.info('Starting training...')
    model = lgb.train(params,
                    lgb_train,
                    num_boost_round=20,
                    valid_sets=lgb_eval)
    
    return model

lgb_train, lgb_eval = get_data()
model = train_model(lgb_train, lgb_eval)

# GCSFuse conversion
gs_prefix = 'gs://'
gcsfuse_prefix = '/gcs/'
if args.model_dir.startswith(gs_prefix):
    args.model_dir = args.model_dir.replace(gs_prefix, gcsfuse_prefix)
    dirpath = os.path.split(args.model_dir)[0]
    if not os.path.isdir(dirpath):
        os.makedirs(dirpath)
        
# save model to file
logging.info('Saving model...')
model_filename = 'model.txt'
gcs_model_path = os.path.join(args.model_dir, model_filename)
model.save_model(gcs_model_path)

### Store training package in Cloud Storage bucket

Next, package the training folder into a compressed tar ball, and then store it in your Cloud Storage bucket.

In [None]:
# Remove any existing tar and zip files
! rm -f custom.tar custom.tar.gz
# Create a tar file
! tar cvf custom.tar custom
# Create a zip file
! gzip custom.tar
# Copy the package to Cloud Storage bucket
! gsutil cp custom.tar.gz $BUCKET_URI/trainer_iris.tar.gz

## Create custom training job

In this step, you create a custom training job using the `CustomTrainingJob` class, with the following parameters:

- `display_name`: The human readable name for the custom training job.
- `container_uri`: The training container image.

- `python_package_gcs_uri`: The Cloud Storage location of the Python training package.
- `python_module_name`: The relative path to the training script in the Python package.

**Note:** For specifying any dependencies, use `install_requires` parameter in the *setup.py* script.

In [None]:
# Set display name for training job
DISPLAY_NAME = f"{APP_NAME}-training-unique"

# Define the training job
job = aiplatform.CustomPythonPackageTrainingJob(
    display_name=DISPLAY_NAME,
    python_package_gcs_uri=f"{BUCKET_URI}/trainer_iris.tar.gz",
    python_module_name="trainer.task",
    container_uri=TRAIN_IMAGE,
    project=PROJECT_ID,
)

## Prepare your training arguments

Now, define the parameters to run your custom training container:

- `--model-dir`: Command-line argument to specify where to store the model artifacts. You can use either of the following methods to specify the storage location for artifacts.
    - **method-1**(set *DIRECT* to True): You pass the Cloud Storage location as a command line argument to your training script.
    - **method-2**(set *DIRECT* to False): You pass the Cloud Storage location as the environment variable `AIP_MODEL_DIR` to your training script. In this case, you provide the model artifact location in the job specification itself as `base_output_dir`.
    
**Note**: Depending on how you pass your model artifact location to the training job, the training task must be configured to receive the value. In this tutorial, the training task parses the `model_dir` argument and if no value is found, then it looks for `AIP_MODEL_DIR` that is set using `base_output_dir`.

In [None]:
MODEL_DIR = f"{BUCKET_URI}/model"

DIRECT = False
if DIRECT:
    CMDARGS = [
        "--model_dir=" + MODEL_DIR,
    ]
else:
    CMDARGS = []

## Run the custom training job

Next, run the custom job to start training by invoking the `run` method, with the following parameters:

- `args`: The command-line arguments to pass to the training script.
- `replica_count`: The number of compute instances for training (replica_count = 1 is single node training).
- `machine_type`: The machine type for the compute instances.
- `accelerator_type`: The hardware accelerator type.
- `accelerator_count`: The number of accelerators to attach to a worker replica.
- `base_output_dir`: The Cloud Storage location to save the model artifacts.
- `sync`: Set **True** to wait until the completion of the job.

In [None]:
# Run the job
job.run(
    args=CMDARGS,
    replica_count=1,
    machine_type=TRAIN_COMPUTE,
    base_output_dir=MODEL_DIR,
    sync=False,
)

### Wait for completion of custom training job

Next, wait for the custom training job to complete. Alternatively, you can set the parameter `sync` to `True` in the `run()` method to block the job until it is completed.

In [None]:
job.wait()

### Verify the model artifacts

Next, verify that the training script has successfully saved the trained model to your Cloud Storage location.

In [None]:
# Set the path where model is saved
model_path_to_deploy = MODEL_DIR + "/model"
print(f"Model path with trained model artifacts {model_path_to_deploy}")

# List the contents of the model folder
! gsutil ls $model_path_to_deploy

## Build a server app using FastAPI

Next, you use FastAPI to implement HTTP server as a custom deployment container. The container must listen and respond to liveness checks, health checks, and prediction requests. The HTTP server must listen for requests on **0.0.0.0**.

Learn more about [deployment container requirements](https://cloud.google.com/ai-platform-unified/docs/predictions/custom-container-requirements#image) and [FastAPI](https://fastapi.tiangolo.com/).

### Create a folder to store resources for your app
Below, you create a folder called `serve` to store your artifacts for serving. Then, you create a subfolder called `app` to store your serving script.

Finally, you follow the below folder structure to build your app:

```
    - serve
        - app
            - main.py
            - prestart.sh
        - requirements.txt
        - Dockerfile
        - instances.json
```

In [1]:
# Remove if the folder exists already
! rm -rf serve

# Create a new folder
! mkdir serve

# Create a subfolder for storing the serving scripts
! mkdir serve/app

### Create requirements file for the serving container

Next, create the `requirements.txt` file which lists the Python packages needed for the serving container.

In [None]:
%%writefile serve/requirements.txt
numpy~=1.20
scikit-learn~=0.24
lightgbm==4.5.0

### Write the FastAPI serving script

Next, you write the serving script for the HTTP server using `FastAPI`.

The serving script consists of the following steps:

- Instantiate a `FastAPI` application class.
- Load the trained model from the artifacts.
- Load the class names from the dataset (Iris dataset from Sklearn datasets) used for training.
- Define `health()` method to address your app's [health check requests](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#health).
- Define `predict()` method to return responses to the [prediction requests](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#prediction).

In [None]:
%%writefile serve/app/main.py
from fastapi.logger import logger
from fastapi import FastAPI, Request
import numpy as np
import os
from sklearn.datasets import load_iris
import lightgbm as lgb
import logging

# Set logging
gunicorn_logger = logging.getLogger('gunicorn.error')
logger.handlers = gunicorn_logger.handlers

if __name__ != "main":
    logger.setLevel(gunicorn_logger.level)
else:
    logger.setLevel(logging.DEBUG)

# Create the server app
app = FastAPI()

# Load the model
logger.info("Loading the model")
model = lgb.Booster(model_file="model/model.txt")
    
# Load the class names
logger.info("Loading the target class labels")
class_names = load_iris().target_names

@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    """ health check to ensure HTTP server is ready to handle 
        prediction requests
    """
    return {"status": "healthy"}

@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    # Asynchronous wait for HTTP requests
    body = await request.json()
    # Get the content of the prediction request
    instances = body["instances"]
    # Reformat prediction request as a numpy array
    inputs = np.asarray(instances)
    # Invoke the model to make predictions
    outputs = model.predict(inputs)
    # Return formatted predictions as response
    logger.info(f"Outputs {outputs}")
    return {"predictions": [class_names[class_num] for class_num in np.argmax(outputs, axis=1)]}

### Add a pre-start script

FastAPI executes the pre-start script before starting the server. The environment variable `PORT` is set equal to `AIP_HTTP_PORT` in order to run FastAPI on the same port expected by Vertex AI. Vertex AI sends liveness checks, health checks, and prediction requests to `AIP_HTTP_PORT` on the container. Your container's HTTP server must listen for requests on this port.

In [None]:
%%writefile serve/app/prestart.sh
#!/bin/bash
export PORT=$AIP_HTTP_PORT

Set the variable `PORT` to use while running the app inside a container.

In [None]:
# Set the port for serving the app
PORT = 7080

### Store test instances

Next, you create some examples to subsequently test the FastAPI server and the hosted LightGBM model.

Learn more about [JSON formatting of prediction requests for custom models](https://cloud.google.com/ai-platform-unified/docs/predictions/online-predictions-custom-models#request-body-details).

In [None]:
%%writefile serve/instances.json
{
    "instances": [
        [6.7, 3.1, 4.7, 1.5],
        [4.6, 3.1, 1.5, 0.2]
    ]
}

## Build the custom container image for serving

In this section, you containerize your serving app and create an image for it. You use the image later while uploading your model to Vertex AI Model Registry.

### Create the Dockerfile

Write the Dockerfile, using `tiangolo/uvicorn-gunicorn-fastapi:python3.9` as base image. This automatically runs FastAPI for you using Gunicorn and Uvicorn. 

Learn more about [Deploying FastAPI with Docker](https://fastapi.tiangolo.com/deployment/docker/).

In [None]:
%%bash -s $MODEL_DIR $PORT

MODEL_DIR=$1
PORT=$2
mkdir -p ./serve/model/
gsutil cp $MODEL_DIR/model/* ./serve/model/

cat > ./serve/Dockerfile <<EOF
# Load the base image
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9
# Set the workdir
WORKDIR /app
# Copy the required files and folders to the workdir
COPY ./app /app
COPY ./model /app/model
COPY requirements.txt requirements.txt
# Install the requirements
RUN pip3 install -r requirements.txt
# Expose the required port for running the server
EXPOSE $PORT
# Start the server
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "$PORT"]
EOF

### Build the container image locally

Next, build your custom container image. While building the image, pass the tag as `DEPLOY_IMAGE` which is later interpreted as the repo path when pushed to Artifact Registry.

**Note**: This step doesn't work in Colab. For pushing the image from a Colab notebook, skip to the __[Submit the image using Cloud Build(for Colab users)](#colab-section)__ section.

In [None]:
IS_COLAB = "google.colab" in sys.modules

if not IS_COLAB:
    ! docker build --tag={DEPLOY_IMAGE} ./serve

### Run and test the container locally (optional)

Before you push the container image to Artifact Registry, you can run it as a container in your local environment to verify that the server works as expected.

In this step, you run the container locally in detached mode(`-d`) and provide the environment variables needed for serving predictions. Then, you test the `/health` and `/predict` routes to make sure the container works as intended.

In [None]:
if not IS_COLAB:
    # stop and remove if there's any pre-existing container
    ! sudo docker stop local-iris
    ! sudo docker rm local-iris
    # create and run the container
    ! docker run -t -d --rm -p {PORT}:{PORT} \
        --name=local-iris \
        -e AIP_HTTP_PORT={PORT} \
        -e AIP_HEALTH_ROUTE=/health \
        -e AIP_PREDICT_ROUTE=/predict \
        -e AIP_STORAGE_URI={MODEL_DIR} \
        {DEPLOY_IMAGE}
    # list the containers and verify
    ! docker container ls
    # wait a few seconds to let the server ready
    ! sleep 10

#### Health check

Send a health check request to container. 

In [None]:
if not IS_COLAB:
    ! curl http://localhost:{PORT}/health

If successful, the server returns the following response:

```
{
  "status": "healthy"
}
```

#### Prediction check

Send a prediction request to the container.

In [None]:
if not IS_COLAB:
    ! curl -X POST \
      -d @serve/instances.json \
      -H "Content-Type: application/json; charset=utf-8" \
      http://localhost:{PORT}/predict

If successful, the server returns predictions in the below format:

```
{"predictions":["versicolor","setosa"]}
```

#### Stop the container

Finally, stop the local container.

In [None]:
if not IS_COLAB:
    ! docker stop local-iris

#### Push the container image to Artifact Registry

In [None]:
if not IS_COLAB:
    ! docker push $DEPLOY_IMAGE

<a id="colab-section"></a>
## Submit the image using Cloud Build(for Colab users)

You may skip this section if you aren't running this tutorial in Colab. 

In the cell below, Cloud Build command line tool is used to create and push the serving app image to Artifact Registry.

In [None]:
if IS_COLAB:
    ! gcloud builds submit ./serve --project={PROJECT_ID} --region={LOCATION} --tag={DEPLOY_IMAGE}

## Upload the model to Vertex AI Model Registry

Next, upload your model to Vertex AI Model Registry using `Model.upload()` method, with the following parameters:

- `display_name`: The human readable name for the model resource.
- `description`: A description of the model(optional).
- `artifact_uri`: The path to the directory containing the Model artifact and any of its supporting files.
- `serving_container_image_uri`: The serving container image path in Artifact Registry.
- `serving_container_predict_route`:  HTTP path to send prediction requests to the container.
- `serving_container_health_route`: HTTP path to send health check requests to the container.
- `serving_container_ports`: The ports exposed by the container to listen to requests.
- `sync`: Set **False** to run the job asynchronously.

If the `upload()` method is run asynchronously, you can subsequently block the cell until completion using the `wait()` method.

**Note**: In this tutorial, the saved model(`model.txt`) is directly loaded from the custom container image itself. However, this method isn't recommended if your model size is large. For large models, you can access the model from the Cloud Storage bucket directly inside the container using the `artifact_uri` parameter. Learn more about [model upload to Vertex AI](https://cloud.google.com/vertex-ai/docs/samples/aiplatform-upload-model-sample).

In [None]:
# Set the model display name
MODEL_DISPLAY_NAME = f"{APP_NAME}-model-unique"
# Set the model description(optional)
MODEL_DESCRIPTION = "LightGBM based iris flower classifier with custom container"
# Set the health route
HEALTH_ROUTE = "/health"
# Set the predict route
PREDICT_ROUTE = "/predict"
# Set the ports used for serving
SERVING_PORTS = [PORT]

# Upload the model
model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    description=MODEL_DESCRIPTION,
    serving_container_image_uri=DEPLOY_IMAGE,
    serving_container_predict_route=PREDICT_ROUTE,
    serving_container_health_route=HEALTH_ROUTE,
    serving_container_ports=SERVING_PORTS,
)
# Print the model display name
print(model.display_name)
# Print the model resource name
print(model.resource_name)

## Make batch predictions

In this section, you create batch prediction requests for your uploaded model using random examples.

Learn more about [prediction requests in Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions).

### Create test samples

Create random examples to use as input instances for batch prediction job. 

__Note__: Random examples are used as test samples only to demonstrate how to make a prediction request.

In [None]:
# Define the test samples
INSTANCES = [[6.7, 3.1, 4.7, 1.5], [4.6, 3.1, 1.5, 0.2]]

### Make batch input file

Now make a batch input file, which is further stored in your Cloud Storage bucket. Each instance in the instances list is again a list. For this example, you use `.jsonl` format as below:

                        [instance_1]
                        [instance_2]
                            .
                            .

**Note**: In this example, only two instances are stored in the input file for demonstration purpose. In general, online predictions are more suitable for short payloads and low latencies. 

In [None]:
# Set a Cloud Storage path to save the test input file
BATCH_INPUT_URI = f"{BUCKET_URI}/{APP_NAME}/test/batch_input/test.jsonl"

# Write the instances to the file in Cloud Storage bucket
with tf.io.gfile.GFile(BATCH_INPUT_URI, "w") as f:
    for i in INSTANCES:
        f.write(str(i) + "\n")

# Show the file contents
! gsutil cat $BATCH_INPUT_URI

### Make batch prediction request

You can create a batch prediction job by invoking the `batch_predict()` method, with the following parameters:

- `job_display_name`: The human readable name for the batch prediction job.
- `gcs_source`: A list of one or more batch request input files.
- `gcs_destination_prefix`: The Cloud Storage location for storing the batch prediction resuls.
- `instances_format`: The format for the input instances, either 'csv' or 'jsonl'. Defaults to 'jsonl'.
- `predictions_format`: The format for the output predictions, either 'csv' or 'jsonl'. Defaults to 'jsonl'.
- `machine_type`: The type of machine to use for training.
- `sync`: If set to **True**, the call blocks while waiting for the batch job to complete execution.

In [None]:
# Set the batch job's display name
BATCH_JOB_DISPLAY_NAME = f"{APP_NAME}-batch-unique"

# Create a batch prediction job
batch_predict_job = model.batch_predict(
    job_display_name=BATCH_JOB_DISPLAY_NAME,
    gcs_source=BATCH_INPUT_URI,
    gcs_destination_prefix=f"{BUCKET_URI}/{APP_NAME}/test/batch_output/",
    instances_format="jsonl",
    predictions_format="jsonl",
    model_parameters=None,
    machine_type=DEPLOY_COMPUTE,
    starting_replica_count=1,
    max_replica_count=1,
    sync=False,
)

### Wait for completion of the batch prediction job

If you set `sync` to **False** earlier, you can wait for the batch prediction job to complete execution using the `wait()` method.

In [None]:
batch_predict_job.wait()

### Get batch prediction results

Next, get the results from the completed batch prediction job.The results are written to the Cloud Storage output bucket you specified in the batch prediction request.

Call the method `iter_outputs()` to get a list of each Cloud Storage file generated in the results. Each file contains one or more prediction responses in a JSON format with following keys:

- `instance`: The input from the prediction request.
- `prediction`: The prediction response.

In [None]:
# Parse the output results
bp_iter_outputs = batch_predict_job.iter_outputs()

# Extract the predictions from the parsed output
prediction_results = list()
for blob in bp_iter_outputs:
    if blob.name.split("/")[-1].startswith("prediction"):
        prediction_results.append(blob.name)

tags = list()
for prediction_result in prediction_results:
    gfile_name = f"gs://{bp_iter_outputs.bucket.name}/{prediction_result}"
    with tf.io.gfile.GFile(name=gfile_name, mode="r") as gfile:
        for line in gfile.readlines():
            line = json.loads(line)
            print(line)
            break

## Deploy the model to an endpoint

Next, deploy your model for online predictions. To deploy the model, you invoke the `deploy()` method, with the following parameters:

- `deployed_model_display_name`: A human readable name for the deployed model.
- `traffic_split`: Percent of traffic at the endpoint that goes to the deployed model, which is specified as a dictionary of one or more key/value pairs.
If only one model, then specify as { "0": 100 }, where "0" refers to this model being uploaded and 100 means 100% of the traffic.
If there are existing models on the endpoint, for which the traffic needs to be split, then use *model_id* to specify as { "0": percent, model_id: percent, ... }, where *model_id* is the id of an existing model deployed to the endpoint. The percentages must add up to 100.
- `machine_type`: The type of machine to use for serving.
- `starting_replica_count`: The number of compute instances to provision initially.
- `max_replica_count`: The maximum number of compute instances to scale up. In this tutorial, only one instance is provisioned.

In [None]:
# Set the display name
DEPLOYED_MODEL_DISPLAY_NAME = f"{APP_NAME}-deployed-model-unique"
# Set traffic split for the endpoint
TRAFFIC_SPLIT = {"0": 100}
# Set min. no. of nodes
MIN_NODES = 1
# Set max. no. of nodes
MAX_NODES = 1

# Deploy the model to an endpoint
endpoint = model.deploy(
    deployed_model_display_name=DEPLOYED_MODEL_DISPLAY_NAME,
    traffic_split=TRAFFIC_SPLIT,
    machine_type=DEPLOY_COMPUTE,
    min_replica_count=MIN_NODES,
    max_replica_count=MAX_NODES,
)

## Send online prediction requests

After your model is successfully deployed to an endpoint, you can send online prediction requests to your model.

### Request

The format of each instance is:

    [feature_list]

You can send more than one instance in a prediction request to your model. However, there's a limit of 1.5 MB on the request payload size.

Learn more about [sending online prediction requests](https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions).

### Response

The response from the `predict()` call is a Python dictionary with the following entries:

- `ids`: The internal assigned unique identifiers for each prediction request.
- `predictions`: The predicted confidence, between 0 and 1, per class label.
- `deployed_model_id`: The Vertex AI identifier for the deployed model resource.

Learn more about [getting predictions on Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/overview).

In [None]:
# Send prediction requests to the endpoint
prediction_response = endpoint.predict(INSTANCES)

# Print the prediction response
print(prediction_response)

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.


In [None]:
# Undeploy the model from endpoint
endpoint.undeploy_all()

# Delete the endpoint
endpoint.delete()

# Delete the model
model.delete()

# Delete the training job
job.delete()

# Delete the batch prediction job
batch_predict_job.delete()

# Delete the repo in the Artifact Registry
! gcloud artifacts repositories delete {PRIVATE_REPO} --project={PROJECT_ID} --location={LOCATION} --quiet

# Delete the Cloud Storage bucket
delete_bucket = True
if delete_bucket:
    ! gsutil -m rm -r $BUCKET_URI

# Delete the locally generated files and folders
! rm -rf custom serve

# Delete the local docker container image
if not IS_COLAB:
    ! docker image rm $DEPLOY_IMAGE -f