In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Get started with Vertex AI Training for LightGBM

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/training/get_started_vertex_training_lightgbm.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/training/get_started_vertex_training_lightgbm.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/training/get_started_vertex_training_lightgbm.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/>
This notebook is a revised version of notebook from [Rajesh Thallam](https://github.com/RajeshThallam/vertex-ai-labs/blob/main/07-vertex-train-deploy-lightgbm/vertex-train-deploy-lightgbm-model.ipynb)

## Overview


This tutorial demonstrates how to use Vertex AI Training for training a LightGBM model.

Learn more about [Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training).

### Objective

In this tutorial, you learn how to use `Vertex AI Training` for training a LightGBM custom model.

This tutorial uses the following Google Cloud ML services:

- `Vertex AI Training`
- `Vertex AI Model` resource

The steps performed include:

- Training using a Python package.
- Save the model artifacts to Cloud Storage using GCSFuse.
- Construct a FastAPI prediction server.
- Construct a Dockerfile deployment image.
- Test the deployment image locally.
- Create a `Vertex AI Model` resource.

### Dataset

The dataset used for this tutorial is the [Iris dataset](https://www.tensorflow.org/datasets/catalog/iris) from [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview). This dataset does not require any feature engineering. The version of the dataset in this tutorial is stored in a public Cloud Storage bucket. The trained model predicts the type of Iris flower species from a class of three species: setosa, virginica, or versicolor.

### Costs 

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the following packages to execute this notebook.

In [None]:
import os

! pip3 install --upgrade google-cloud-aiplatform  -q
! pip3 install -U google-cloud-storage  -q
! pip3 install -U lightgbm  -q
! pip3 install --upgrade google-auth -q

if os.getenv("IS_TESTING"):
    ! pip3 install --upgrade tensorflow -q

### Colab only: Uncomment the following cell to restart the kernel.

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Set the region

**Optional**: Update the 'REGION' variable to specify the region that you want to use. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"  # @param {type: "string"}

### Authenticate your Google Cloud account

To authenticate your Google Cloud account, follow the instructions for your Jupyter environment:

**1. Vertex AI Workbench**
<br>You are already authenticated.

**2. Local JupyterLab instance**
<br>Uncomment and run the following code:

In [None]:
# ! gcloud auth login

**3. Colab**
<br>Uncomment and run the following code:

In [None]:
# from google.colab import auth

# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

In [None]:
import google.cloud.aiplatform as aiplatform
import lightgbm

In [None]:
print(f"LightGBM version {lightgbm.__version__}")

## Initialize Vertex SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

### Enable Artifact Registry API

First, you must enable the Artifact Registry API service for your project.

Learn more about [Enabling service](https://cloud.google.com/artifact-registry/docs/enable-service).

In [None]:
! gcloud services enable artifactregistry.googleapis.com

if os.getenv("IS_TESTING"):
    ! sudo apt-get update --yes && sudo apt-get --only-upgrade --yes install google-cloud-sdk-cloud-run-proxy google-cloud-sdk-harbourbridge google-cloud-sdk-cbt google-cloud-sdk-gke-gcloud-auth-plugin google-cloud-sdk-kpt google-cloud-sdk-local-extract google-cloud-sdk-minikube google-cloud-sdk-app-engine-java google-cloud-sdk-app-engine-go google-cloud-sdk-app-engine-python google-cloud-sdk-spanner-emulator google-cloud-sdk-bigtable-emulator google-cloud-sdk-nomos google-cloud-sdk-package-go-module google-cloud-sdk-firestore-emulator kubectl google-cloud-sdk-datastore-emulator google-cloud-sdk-app-engine-python-extras google-cloud-sdk-cloud-build-local google-cloud-sdk-kubectl-oidc google-cloud-sdk-anthos-auth google-cloud-sdk-app-engine-grpc google-cloud-sdk-pubsub-emulator google-cloud-sdk-datalab google-cloud-sdk-skaffold google-cloud-sdk google-cloud-sdk-terraform-tools google-cloud-sdk-config-connector
    ! gcloud components update --quiet

### Create a private Docker repository

Your first step is to create your own Docker repository in Google Artifact Registry.

1. Run the `gcloud artifacts repositories create` command to create a new Docker repository with your region with the description "docker repository".

2. Run the `gcloud artifacts repositories list` command to verify that your repository was created.

In [None]:
PRIVATE_REPO = "prediction"

! gcloud artifacts repositories create {PRIVATE_REPO} --repository-format=docker --location={REGION} --description="Prediction repository"

! gcloud artifacts repositories list

### Configure authentication to your private repo

Before you push or pull container images, configure Docker to use the `gcloud` command-line tool to authenticate requests to `Artifact Registry` for your region.

In [None]:
! gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

#### Set pre-built containers

Set the pre-built Docker container image for training and prediction.


For the latest list, see [Pre-built containers for training](https://cloud.google.com/ai-platform-unified/docs/training/pre-built-containers).


For the latest list, see [Pre-built containers for prediction](https://cloud.google.com/ai-platform-unified/docs/predictions/pre-built-containers).

In [None]:
TRAIN_VERSION = "scikit-learn-cpu.0-23"
DEPLOY_VERSION = "lightgbm-cpu"

# prebuilt
TRAIN_IMAGE = "{}-docker.pkg.dev/vertex-ai/training/{}:latest".format(
    REGION.split("-")[0], TRAIN_VERSION
)

DEPLOY_IMAGE = "{}-docker.pkg.dev/{}/{}/{}:latest".format(
    REGION, PROJECT_ID, PRIVATE_REPO, DEPLOY_VERSION
)
print("Deploy image:", DEPLOY_IMAGE)

#### Set machine type

Next, set the machine type to use for training and prediction.

- Set the variables `TRAIN_COMPUTE` and `DEPLOY_COMPUTE` to configure  the compute resources for the VMs you will use for for training and prediction.
 - `machine type`
     - `n1-standard`: 3.75GB of memory per vCPU.
     - `n1-highmem`: 6.5GB of memory per vCPU
     - `n1-highcpu`: 0.9 GB of memory per vCPU
 - `vCPUs`: number of \[2, 4, 8, 16, 32, 64, 96 \]

*Note: The following is not supported for training:*

 - `standard`: 2 vCPUs
 - `highcpu`: 2, 4 and 8 vCPUs

*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*.

In [None]:
MACHINE_TYPE = "n1-standard"

VCPU = "4"
TRAIN_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", TRAIN_COMPUTE)

VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Deploy machine type", DEPLOY_COMPUTE)

### Examine the training package

#### Package layout

Before you start the training, you will look at how a Python package is assembled for a custom training job. When unarchived, the package contains the following directory/file layout.

- PKG-INFO
- README.md
- setup.cfg
- setup.py
- trainer
  - \_\_init\_\_.py
  - task.py

The files `setup.cfg` and `setup.py` are the instructions for installing the package into the operating environment of the Docker image.

The file `trainer/task.py` is the Python script for executing the custom training job. *Note*, when we referred to it in the worker pool specification, we replace the directory slash with a dot (`trainer.task`) and dropped the file suffix (`.py`).

#### Package Assembly

In the following cells, you will assemble the training package.

In [None]:
# Make folder for Python training script
! rm -rf custom
! mkdir custom

# Add package information
! touch custom/README.md

setup_cfg = "[egg_info]\n\ntag_build =\n\ntag_date = 0"
! echo "$setup_cfg" > custom/setup.cfg

setup_py = "import setuptools\n\nsetuptools.setup(\n\n    install_requires=[\n\n'lightgbm'    ],\n\n    packages=setuptools.find_packages())"
! echo "$setup_py" > custom/setup.py

pkg_info = "Metadata-Version: 1.0\n\nName: Iris tabular classification\n\nVersion: 0.0.0\n\nSummary: Demostration training script\n\nHome-page: www.google.com\n\nAuthor: Google\n\nAuthor-email: aferlitsch@google.com\n\nLicense: Public\n\nDescription: Demo\n\nPlatform: Vertex"
! echo "$pkg_info" > custom/PKG-INFO

# Make the training subfolder
! mkdir custom/trainer
! touch custom/trainer/__init__.py

### Create the task script for the Python training package

Next, you create the `task.py` script for driving the training package. Some noteable steps include:

- Command-line arguments:
    - `model-dir`: The location to save the trained model. When using Vertex AI custom training, the location will be specified in the environment variable: `AIP_MODEL_DIR`
    
- Data preprocessing (`get_data()`):
    - Download the dataset and split into training and test.
- Training (`train_model()`):
    - Trains the model
- Model artifact saving
    - Saves the model artifacts and evaluation metrics where the Cloud Storage location specified by `model-dir`.

In [None]:
%%writefile custom/trainer/task.py
# Single Instance Training for Iris

import datetime
import os
import subprocess
import sys

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import pandas as pd

import lightgbm as lgb

import argparse
import logging

logging.getLogger().setLevel(logging.INFO)

logging.info("Parsing arguments")

parser = argparse.ArgumentParser()
parser.add_argument(
    '--model-dir', 
    dest='model_dir',        
    default=os.getenv('AIP_MODEL_DIR'), 
    type=str, 
    help='Location to export GCS model')
args = parser.parse_args()
logging.info(args)

def get_data():
    # Download data
    logging.info("Downloading data")
    iris = load_iris()
    print(iris.data.shape)

    # split data
    print("Splitting data into test and train")
    x, y = iris.data, iris.target
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=123)

    # create dataset for lightgbm
    print("creating dataset for LightGBM")
    lgb_train = lgb.Dataset(x_train, y_train)
    lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
    
    return lgb_train, lgb_eval

def train_model(lgb_train, lg_eval):
    # specify your configurations as a dict
    params = {
        'boosting_type': 'gbdt',
        'objective': 'multiclass',
        'metric': {'multi_error'},
        'num_leaves': 31,
        'learning_rate': 0.05,
        'feature_fraction': 0.9,
        'bagging_fraction': 0.8,
        'bagging_freq': 5,
        'verbose': 0,
        'num_class' : 3
    }

    # train lightgbm model
    logging.info('Starting training...')
    model = lgb.train(params,
                    lgb_train,
                    num_boost_round=20,
                    valid_sets=lgb_eval)
    
    return model

lgb_train, lgb_eval = get_data()
model = train_model(lgb_train, lgb_eval)

# GCSFuse conversion
gs_prefix = 'gs://'
gcsfuse_prefix = '/gcs/'
if args.model_dir.startswith(gs_prefix):
    args.model_dir = args.model_dir.replace(gs_prefix, gcsfuse_prefix)
    dirpath = os.path.split(args.model_dir)[0]
    if not os.path.isdir(dirpath):
        os.makedirs(dirpath)
        
# save model to file
logging.info('Saving model...')
model_filename = 'model.txt'
gcs_model_path = os.path.join(args.model_dir, model_filename)
model.save_model(gcs_model_path)

#### Store training script on your Cloud Storage bucket

Next, you package the training folder into a compressed tar ball, and then store it in your Cloud Storage bucket.

In [None]:
! rm -f custom.tar custom.tar.gz
! tar cvf custom.tar custom
! gzip custom.tar
! gsutil cp custom.tar.gz $BUCKET_URI/trainer_iris.tar.gz

### Create and run custom training job


To train a custom model, you perform two steps: 1) create a custom training job, and 2) run the job.

#### Create custom training job

A custom training job is created with the `CustomTrainingJob` class, with the following parameters:

- `display_name`: The human readable name for the custom training job.
- `container_uri`: The training container image.

- `python_package_gcs_uri`: The location of the Python training package as a tarball.
- `python_module_name`: The relative path to the training script in the Python package.

*Note:* There is no requirements parameter. You specify any requirements in the `setup.py` script in your Python package.

In [None]:
DISPLAY_NAME = "iris_"

job = aiplatform.CustomPythonPackageTrainingJob(
    display_name=DISPLAY_NAME,
    python_package_gcs_uri=f"{BUCKET_URI}/trainer_iris.tar.gz",
    python_module_name="trainer.task",
    container_uri=TRAIN_IMAGE,
    project=PROJECT_ID,
)

### Prepare your command-line arguments

Now define the command-line arguments for your custom training container:

- `args`: The command-line arguments to pass to the executable that is set as the entry point into the container.
  - `--model-dir` : For our demonstrations, we use this command-line argument to specify where to store the model artifacts.
      - direct: You pass the Cloud Storage location as a command line argument to your training script (set variable `DIRECT = True`), or
      - indirect: The service passes the Cloud Storage location as the environment variable `AIP_MODEL_DIR` to your training script (set variable `DIRECT = False`). In this case, you tell the service the model artifact location in the job specification.

In [None]:
MODEL_DIR = "{}/model".format(BUCKET_URI)

DIRECT = False
if DIRECT:
    CMDARGS = [
        "--model_dir=" + MODEL_DIR,
    ]
else:
    CMDARGS = []

#### Run the custom training job

Next, you run the custom job to start the training job by invoking the method `run`, with the following parameters:

- `args`: The command-line arguments to pass to the training script.
- `replica_count`: The number of compute instances for training (replica_count = 1 is single node training).
- `machine_type`: The machine type for the compute instances.
- `accelerator_type`: The hardware accelerator type.
- `accelerator_count`: The number of accelerators to attach to a worker replica.
- `base_output_dir`: The Cloud Storage location to write the model artifacts to.
- `sync`: Whether to block until completion of the job.

In [None]:
job.run(
    args=CMDARGS,
    replica_count=1,
    machine_type=TRAIN_COMPUTE,
    base_output_dir=MODEL_DIR,
    sync=False,
)

model_path_to_deploy = MODEL_DIR + "/model"

### List a custom training job

In [None]:
_job = job.list(filter="display_name=iris")
print(_job)

### Wait for completion of custom training job

Next, wait for the custom training job to complete. Alternatively, one can set the parameter `sync` to `True` in the `run()` method to block until the custom training job is completed.

In [None]:
job.wait()

### Delete a custom training job

After a training job is completed, you can delete the training job with the method `delete()`.  Prior to completion, a training job can be canceled with the method `cancel()`.

In [None]:
job.delete()

### Verify the model artifacts

Next, verify the training script successfully saved the trained model to your Cloud Storage location.

In [None]:
print(f"Model path with trained model artifacts {model_path_to_deploy}")

! gsutil ls $model_path_to_deploy

## Deploy LightGBM model to Vertex AI Endpoint

### Build a HTTP Server with FastAPI

Next, you use FastAPI to implement the HTTP server as a custom deployment container. The container must listen and respond to liveness checks, health checks, and prediction requests. The HTTP server must listen for requests on 0.0.0.0.

Learn more about [deployment container requirements](https://cloud.google.com/ai-platform-unified/docs/predictions/custom-container-requirements#image).

Learn more about [FastAPI](https://fastapi.tiangolo.com/).

In [None]:
# Make folder for serving script
! rm -rf serve
! mkdir serve

# Make the predictor subfolder
! mkdir serve/app

### Create the requirements file for the serving container

Next, create the `requirements.txt` file for the server environment which specifies which Python packages need to be installed on the serving container.

In [None]:
%%writefile serve/requirements.txt

numpy
scikit-learn>=0.24
pandas
lightgbm==3.2.1
google-cloud-storage


### Write the FastAPI serving script

Next, you write the serving script for the HTTP server using `FastAPI`, as follows:

- `app`: Instantiate a `FastAPI` application
- `health()`: Define the response to health request.
    - return status code 200
- `predict()`: Define the response to the predict request.
    - `body = await request.json()`: Asynchronous wait for HTTP requests.
    - `instances = body["instances"]` : Content of the prediction request.
    - `inputs = np.asarray(instances)`: Reformats prediction request as a numpy array.
    - `outputs = model.predict(inputs)`: Invoke the model to make the predictions.
    - `return {"predictions": ... }`: Return formatted predictions in response body.

In [None]:
%%writefile serve/app/main.py
from fastapi.logger import logger
from fastapi import FastAPI, Request

import numpy as np
import os

from sklearn.datasets import load_iris
import lightgbm as lgb

import logging

gunicorn_logger = logging.getLogger('gunicorn.error')
logger.handlers = gunicorn_logger.handlers

if __name__ != "main":
    logger.setLevel(gunicorn_logger.level)
else:
    logger.setLevel(logging.DEBUG)

app = FastAPI()

model_f = "/model/model.txt"

logger.info("Loading model")
_model = lgb.Booster(model_file=model_f)
logger.info("Loading target class labels")
_class_names = load_iris().target_names

@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    """ health check to ensure HTTP server is ready to handle 
        prediction requests
    """
    return {"status": "healthy"}


@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    body = await request.json()

    instances = body["instances"]
    inputs = np.asarray(instances)
    
    outputs = _model.predict(inputs)

    logger.info(f"Outputs {outputs}")
    return {"predictions": [_class_names[class_num] for class_num in np.argmax(outputs, axis=1)]}

### Add pre-start script
FastAPI will execute this script before starting up the server. The `PORT` environment variable is set to equal `AIP_HTTP_PORT` in order to run FastAPI on same the port expected by AI Platform (Unified).

In [None]:
%%writefile serve/app/prestart.sh

#!/bin/bash
export PORT=$AIP_HTTP_PORT

### Store test instances

Next, you create synthetic examples to subsequently test the FastAPI server and the trained LightGBM model.

Learn more about [JSON formatting of prediction requests for custom models](https://cloud.google.com/ai-platform-unified/docs/predictions/online-predictions-custom-models#request-body-details).

In [None]:
%%writefile serve/instances.json
{
    "instances": [
        [6.7, 3.1, 4.7, 1.5],
        [4.6, 3.1, 1.5, 0.2]
    ]
}

## Build and push prediction container to Artifact Registry

Write the Dockerfile, using `tiangolo/uvicorn-gunicorn-fastapi` as a base image. This will automatically run FastAPI for you using Gunicorn and Uvicorn. 

Learn more about [Deploying FastAPI with Docker](https://fastapi.tiangolo.com/deployment/docker/).

In [None]:
%%bash -s $MODEL_DIR

MODEL_DIR=$1

mkdir -p ./serve/model/
gsutil cp -r ${MODEL_DIR}/model/ ./serve/model/ 

cat > ./serve/Dockerfile <<EOF
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY ./app /app
COPY ./model /
COPY requirements.txt requirements.txt

RUN pip3 install -r requirements.txt

EXPOSE 7080
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7080"]

EOF

#### Build the container locally

Next, you build and tag your custom deployment container.

In [None]:
import sys

IS_COLAB = "google.colab" in sys.modules

if not IS_COLAB:
    ! docker build --tag={DEPLOY_IMAGE} ./serve
else:
    # install docker daemon
    ! apt-get -qq install docker.io

### Run and test the container locally (optional)

Run the container locally in detached mode and provide the environment variables that the container requires. These env vars will be provided to the container by AI Platform Prediction once deployed. Test the `/health` and `/predict` routes, then stop the running image.

Before push the container image to Artifact Registry to use it with Vertex Predictions, you can run it as a container in your local environment to verify that the server works as expected

In [None]:
if not IS_COLAB:
    ! docker rm local-iris 2>/dev/null
    ! docker run -t -d --rm -p 7080:7080 \
        --name=local-iris \
        -e AIP_HTTP_PORT=7080 \
        -e AIP_HEALTH_ROUTE=/health \
        -e AIP_PREDICT_ROUTE=/predict \
        -e AIP_STORAGE_URI={MODEL_DIR} \
        {DEPLOY_IMAGE}
    ! docker container ls
    ! sleep 10

#### Health check

Send the container's server a health check. The output should be {"status": "healthy"}.

In [None]:
if not IS_COLAB:
    ! curl http://localhost:7080/health

If successful, the server returns the following response:

```
{
  "status": "healthy"
}
```

#### Prediction check

Send the container's server a prediction request.

In [None]:
! curl -X POST \
  -d @serve/instances.json \
  -H "Content-Type: application/json; charset=utf-8" \
  http://localhost:7080/predict

This request uses a test sentence. If successful, the server returns prediction in below format:

```
{"predictions":["versicolor","setosa"]}
```

#### Stop the local container

Finally, stop the local container.

In [None]:
if not IS_COLAB:
    ! docker stop local-iris

#### Push the serving container to Artifact Registry

Push your container image with inference code and dependencies to your Artifact Registry.

In [None]:
if not IS_COLAB:
    ! docker push $DEPLOY_IMAGE

*Executes in Colab*

In [None]:
%%bash -s $IS_COLAB $DEPLOY_IMAGE
if [ $1 == "False" ]; then
  exit 0
fi
set -x
dockerd -b none --iptables=0 -l warn &
for i in $(seq 5); do [ ! -S "/var/run/docker.sock" ] && sleep 2 || break; done
docker build --tag={DEPLOY_IMAGE} ./serve
docker push $2
kill $(jobs -p)

#### Deploying the serving container to Vertex Predictions
We create a model resource on Vertex AI and deploy the model to a Vertex Endpoints. You must deploy a model to an endpoint before using the model. The deployed model runs the custom container image to serve predictions.

## Upload the model

Next, upload your model to a `Model` resource using `Model.upload()` method, with the following parameters:

- `display_name`: The human readable name for the `Model` resource.
- `artifact`: The Cloud Storage location of the trained model artifacts.
- `serving_container_image_uri`: The serving container image.
- `serving_container_predict_route`:  HTTP path to send prediction requests to the container.
- `serving_container_health_route`: HTTP path to send health check requests to the container.
- `serving_container_ports`: The ports exposed by the container to listen to requests.
- `sync`: Whether to execute the upload asynchronously or synchronously.

If the `upload()` method is run asynchronously, you can subsequently block until completion with the `wait()` method.

In [None]:
APP_NAME = "iris"

model_display_name = f"{APP_NAME}"
model_description = "LightGBM based iris flower classifier with custom container"

MODEL_NAME = APP_NAME
health_route = "/ping"
predict_route = "/predict"
serving_container_ports = [7080]

In [None]:
if not os.getenv("IS_TESTING"):
    model = aiplatform.Model.upload(
        display_name=model_display_name,
        description=model_description,
        serving_container_image_uri=DEPLOY_IMAGE,
        serving_container_predict_route=predict_route,
        serving_container_health_route=health_route,
        serving_container_ports=serving_container_ports,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)

## Make batch predictions

Documentation - [Batch prediction requests - Vertex AI](https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions)

### Make test items

You will use synthetic data as a test data items. Don't be concerned that we are using synthetic data -- we just want to demonstrate how to make a prediction.

In [None]:
INSTANCES = [[6.7, 3.1, 4.7, 1.5], [4.6, 3.1, 1.5, 0.2]]

### Make the batch input file

Now make a batch input file, which you will store in your local Cloud Storage bucket.  Each instance in the prediction request is a list of the form:

                        [ [ content_1], [content_2] ]

- `content`: The feature values of the test item as a list.

In [None]:
import json

[json.dumps(record) for record in INSTANCES]

In [None]:
import tensorflow as tf

gcs_input_uri = f"{BUCKET_URI}/{APP_NAME}/test/batch_input/test.jsonl"
with tf.io.gfile.GFile(gcs_input_uri, "w") as f:
    for i in INSTANCES:
        f.write(str(i) + "\n")

! gsutil cat $gcs_input_uri

### Make the batch prediction request

Now that your Model resource is trained, you can make a batch prediction by invoking the batch_predict() method, with the following parameters:

- `job_display_name`: The human readable name for the batch prediction job.
- `gcs_source`: A list of one or more batch request input files.
- `gcs_destination_prefix`: The Cloud Storage location for storing the batch prediction resuls.
- `instances_format`: The format for the input instances, either 'csv' or 'jsonl'. Defaults to 'jsonl'.
- `predictions_format`: The format for the output predictions, either 'csv' or 'jsonl'. Defaults to 'jsonl'.
- `machine_type`: The type of machine to use for training.
- `sync`: If set to True, the call will block while waiting for the asynchronous batch job to complete.

In [None]:
if not os.getenv("IS_TESTING"):
    MIN_NODES = 1
    MAX_NODES = 1

    batch_predict_job = model.batch_predict(
        job_display_name=f"{APP_NAME}",
        gcs_source=gcs_input_uri,
        gcs_destination_prefix=f"{BUCKET_URI}/{APP_NAME}/test/batch_output/",
        instances_format="jsonl",
        predictions_format="jsonl",
        model_parameters=None,
        machine_type=DEPLOY_COMPUTE,
        starting_replica_count=MIN_NODES,
        max_replica_count=MAX_NODES,
        sync=False,
    )

    print(batch_predict_job)

### Wait for completion of batch prediction job

Next, wait for the batch job to complete. Alternatively, one can set the parameter `sync` to `True` in the `batch_predict()` method to block until the batch prediction job is completed.

In [None]:
if not os.getenv("IS_TESTING"):
    batch_predict_job.wait()

### Get the predictions

Next, get the results from the completed batch prediction job.

The results are written to the Cloud Storage output bucket you specified in the batch prediction request. You call the method iter_outputs() to get a list of each Cloud Storage file generated with the results. Each file contains one or more prediction requests in a JSON format:

- `instance`: The prediction request.
- `prediction`: The prediction response.

In [None]:
import json

if not os.getenv("IS_TESTING"):
    bp_iter_outputs = batch_predict_job.iter_outputs()

    prediction_results = list()
    for blob in bp_iter_outputs:
        if blob.name.split("/")[-1].startswith("prediction"):
            prediction_results.append(blob.name)

    tags = list()
    for prediction_result in prediction_results:
        gfile_name = f"gs://{bp_iter_outputs.bucket.name}/{prediction_result}"
        with tf.io.gfile.GFile(name=gfile_name, mode="r") as gfile:
            for line in gfile.readlines():
                line = json.loads(line)
                print(line)
                break

## Make online predictions

Documentation - [Online prediction request](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api)

## Deploy the model

Next, deploy your model for online prediction. To deploy the model, you invoke the `deploy` method, with the following parameters:

- `deployed_model_display_name`: A human readable name for the deployed model.
- `traffic_split`: Percent of traffic at the endpoint that goes to this model, which is specified as a dictionary of one or more key/value pairs.
If only one model, then specify as { "0": 100 }, where "0" refers to this model being uploaded and 100 means 100% of the traffic.
If there are existing models on the endpoint, for which the traffic will be split, then use model_id to specify as { "0": percent, model_id: percent, ... }, where model_id is the model id of an existing model to the deployed endpoint. The percents must add up to 100.
- `machine_type`: The type of machine to use for training.
- `starting_replica_count`: The number of compute instances to initially provision.
- `max_replica_count`: The maximum number of compute instances to scale to. In this tutorial, only one instance is provisioned.

In [None]:
DEPLOYED_NAME = f"{APP_NAME}"

TRAFFIC_SPLIT = {"0": 100}

MIN_NODES = 1
MAX_NODES = 1

if not os.getenv("IS_TESTING"):
    endpoint = model.deploy(
        deployed_model_display_name=DEPLOYED_NAME,
        traffic_split=TRAFFIC_SPLIT,
        machine_type=DEPLOY_COMPUTE,
        min_replica_count=MIN_NODES,
        max_replica_count=MAX_NODES,
    )

### Make the prediction

Now that your `Model` resource is deployed to an `Endpoint` resource, you can do online predictions by sending prediction requests to the `Endpoint` resource.

#### Request

The format of each instance is:

    [feature_list]

Since the predict() method can take multiple items (instances), send your single test item as a list of one test item.

#### Response

The response from the predict() call is a Python dictionary with the following entries:

- `ids`: The internal assigned unique identifiers for each prediction request.
- `predictions`: The predicted confidence, between 0 and 1, per class label.
- `deployed_model_id`: The Vertex AI identifier for the deployed `Model` resource which did the predictions.

In [None]:
instances_list = INSTANCES

if not os.getenv("IS_TESTING"):
    prediction = endpoint.predict(instances_list)
    print(prediction)

## Undeploy the model

When you are done doing predictions, you undeploy the model from the `Endpoint` resouce. This deprovisions all compute resources and ends billing for the deployed model.

In [None]:
if not os.getenv("IS_TESTING"):
    endpoint.undeploy_all()

### Deleting your private Docker repostory

Finally, once your private repository becomes obsolete, use the command `gcloud artifacts repositories delete` to delete it `Google Artifact Registry`.

In [None]:
! gcloud artifacts repositories delete {PRIVATE_REPO} --location={REGION} --quiet

# Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.


In [None]:
# Delete the model using the Vertex model object
try:
    model.delete()
    endpoint.delete()
except Exception as e:
    print(e)

delete_bucket = False
if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil rm -r $BUCKET_URI