In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# E2E ML on GCP: MLOps stage 6 : Get started with Custom Prediction Routine (CPR)
<table align="left">
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage6/get_started_with_custom_predictions.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/ai/platform/notebooks/deploy-notebook?download_url=https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage6/get_started_with_custom_predictions.ipynb">
      Open in Vertex Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview

This tutorial demonstrates how to use Vertex AI SDK to build a custom container that uses the Custom Prediction Routine model server to serve a scikit-learn model on Vertex AI Predictions. This is currently an **experimental** feature and is not yet officially supported by the Vertex AI SDK. In this tutorial, you'll be installing the Vertex AI SDK from an experimental branch on github. 


### Objective

In this tutorial, you learn how to use Custom Prediction Routine (CPR) for `Vertex AI Predictions`.

This tutorial uses the following Google Cloud ML services:

- `Vertex AI Training`
- `Vertex AI Predictions`
- `Vertex AI Custom Predictions`
- `Google Artifact Registry`

The steps performed include:

- Write a custom data preprocessor.
- Train the model.
- Build a custom scikit-learn serving container with custom data preprocessing using the Custom Prediction Routine model server.
    - Test the model serving container locally.
    - Upload and deploy the model serving container to Vertex AI Endpoint.
    - Make a prediction request.
- Build a custom scikit-learn serving container with custom predictor (post-processing) using the Custom Prediction Routine model server.
    - Implement custom predictor.
    - Test the model serving container locally.
    - Upload and deploy the model serving container to Vertex AI Endpoint.
    - Make a prediction request.
- Build a custom scikit-learn serving container with custom predictor and HTTP request handler using the Custom Prediction Routine model server.
    - Implement a custom handler.
    - Test the model serving container locally.
    - Upload and deploy the model serving container to Vertex AI Endpoint.
    - Make a prediction request.
- Customize the Dockerfile for a custom scikit-learn serving container with custom predictor and HTTP request handler using the Custom Prediction Routine model server.
    - Implement a custom Dockerfile.
    - Test the model serving container locally.
    - Upload and deploy the model serving container to Vertex AI Endpoint.
    - Make a prediction request.

### Dataset

The dataset used for this tutorial is the [Iris dataset](https://www.tensorflow.org/datasets/catalog/iris) from [Tensorflow Datasets](https://www.tensorflow.org/datasets/catalog/overview). This dataset does not require any feature engineering. The version of the dataset you will use in this tutorial is stored in a public Cloud Storage bucket. The trained model predicts the type of Iris flower species from a class of three species: setosa, virginica, or versicolor.

### Install additional packages

Install additional package dependencies not installed in your notebook environment, such as NumPy, Scikit-learn, FastAPI, Uvicorn, and joblib. Use the latest major GA version of each package.

In [None]:
! mkdir src

In [None]:
%%writefile src/requirements.txt
fastapi
uvicorn
joblib~=1.0
numpy~=1.20
scikit-learn~=0.24
google-cloud-storage>=1.26.0,<2.0.0dev
google-cloud-aiplatform[prediction] @ git+https://github.com/googleapis/python-aiplatform.git@custom-prediction-routine

**The model you deploy will have a different set of dependencies pre-installed than your notebook environment has. You should not assume that because things work in the notebook, they will work in the model. Instead, you will be very explicit about the dependencies for the model by listing them in requirements.txt and then use `pip install` to install the exact same dependencies in the notebook. Please note, of course, that there is a chance that a dependency is missed in requirements.txt that already exists in the notebook. If that's the case, things will run in the notebook, but not in the model. To guard against that, you will test the model locally before deploying to the cloud.**

In [None]:
# Install the same dependencies used in the serving container in the notebook
# environment.
%pip install -U --user -r src/requirements.txt

### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training or prediction with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"  # @param {type: "string"}

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When you initialize the Vertex AI SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_URI == "" or BUCKET_URI is None or BUCKET_URI == "gs://[your-bucket-name]":
    BUCKET_URI = "gs://" + PROJECT_ID + "aip-" + TIMESTAMP

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

In [None]:
import google.cloud.aiplatform as aip

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aip.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

## Write your custom data preprocessing

First, you write the module `preprocess.py` for data preprocessing of the training data. Since all the features are numeric, each feature column will be standardized - i.e.,  mean of 0 and a standard deviation of 1. This is also referred to as scaling the numeric feature values.

In [None]:
%%writefile src/preprocess.py
import numpy as np

class MySimpleScaler(object):
    def __init__(self):
        self._means = None
        self._stds = None

    def preprocess(self, data):
        if self._means is None:  # during training only
            self._means = np.mean(data, axis=0)

        if self._stds is None:  # during training only
            self._stds = np.std(data, axis=0)
            if not self._stds.all():
                raise ValueError("At least one column has standard deviation of 0.")

        return (data - self._means) / self._stds

## Train and store model and data preprocesing module

Next, you train the model as follows:

1. Use `preprocess.MySimpleScaler` to preprocess the Iris data
2. Train a model using scikit-learn.
3. Export your trained model as a joblib (`.joblib`) file .
4. Export your `MySimpleScaler` instance as a pickle (`.pkl`) file.

In [None]:
%mkdir model

In [None]:
%cd src

import pickle

import joblib
from preprocess import MySimpleScaler
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
scaler = MySimpleScaler()

X = scaler.preprocess(iris.data)
y = iris.target

model = RandomForestClassifier()
model.fit(X, y)

joblib.dump(model, "../model/model.joblib")
with open("../model/preprocessor.pkl", "wb") as f:
    pickle.dump(scaler, f)

%cd ..

### Upload model artifacts and custom data preprocessor to Cloud Storage

To deploy your model, the model artifacts `model.joblib` and data preprocessor `preprocessor.pkl` need to be stored in Cloud Storage.

In [None]:
! gsutil cp model/* {BUCKET_URI}/model/
! gsutil ls {BUCKET_URI}/model/

## Build a custom model serving container using the CPR model server: Scenario 1: implementing the pre and post processor

Next, its time to build a custom serving container for the trained model and the data preprocessor. As for data preprocessing, we could fuse the data preprocessor to the model by using the AutoGraph compiler (e.g., @tf.function decorator) to conver the Python code to a static graph. There are a couple of limitations to this approach:

    - Not all Python operations can be converted to a graph operation.
    - Only static graph operations are supported.
    
While this simple data preprocessor could be converted to a static graph, many more complex pre and post-processing cannot. In this case, we want the pre and post-processing steps to be executed as pure Python code, where:

    - The data preprocessing is inserted between the HTTP server and the model input.
    - The data preprocessing is sandboxed, such that if an exception is thrown it does not bring down the model server.
    
The Vertex AI Custom Prediction Routine provides a template means for doing the above, that can be used out of the box.

Learn more about [Custom Prediction Routine model server](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/model_server.py).

A custom model serving container contains the follow three code components:

1. [Model server](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/model_server.py)
    * HTTP server that hosts the model
    * Responsible for setting up routes/ports/etc.
    * In this example we will use the `google.cloud.aiplatform.prediction.model_server.ModelServer` out of the box.
2. [Request Handler](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/handler.py)
    * Responsible for webserver aspects of handling a request, such as deserializing the request body, and serializing the reponse, setting response headers, etc.
    * In this example, we will use the default Handler, `google.cloud.aiplatform.prediction.handler.PredictionHandler` provided in the SDK.
3. [Predictor](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/predictor.py)
    * Responsible for the ML logic for processing a prediction request.

Each of these three components can be customized based on the requirements of the custom container. 


You use the predefined [`SklearnPredictor`](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/sklearn/predictor.py) as your `CprPredictor`'s base class. You only need to implement the `load`, `preprocess`, and `postprocess` methods.

```
class CprPredictor(SklearnPredictor):

    def __init__(self):
        return

    def load(self, gcs_artifacts_uri: str):
        """ (super) Loads the model artifact.
            Loads the preprocessor module.
        """
       

    def preprocess(self, prediction_input: dict):
        """ Apply the preprocessor to the input data
        """

    def postprocess(self, prediction_results: np.ndarray):
        """ Convert class indices to class names
        """
       

    def predict(self, instances: np.ndarray):
        """ (super) Performs prediction.
        
```

Note, the [`PredictionHandler`](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/handler.py) will be used for prediction request handling, and the following will be executed:
```
self._predictor.postprocess(self._predictor.predict(self._predictor.preprocess(prediction_input)))
```

First, implement a custom `Predictor` that loads in the preprocesor. The preprocessor will then be used at `preprocess` time.

In [None]:
%%writefile src/predictor.py

import joblib
import numpy as np
import pickle

from google.cloud import storage
from google.cloud.aiplatform.prediction.sklearn.predictor import SklearnPredictor

from sklearn.datasets import load_iris


class CprPredictor(SklearnPredictor):
    
    def __init__(self):
        return
    
    def load(self, gcs_artifacts_uri: str):
        """Loads the preprocessor artifacts."""
        super().load(gcs_artifacts_uri)
        gcs_client = storage.Client()
        with open("preprocessor.pkl", 'wb') as preprocessor_f:
            gcs_client.download_blob_to_file(
                f"{gcs_artifacts_uri}/preprocessor.pkl", preprocessor_f
            )

        with open("preprocessor.pkl", "rb") as f:
            preprocessor = pickle.load(f)

        self._class_names = load_iris().target_names
        self._preprocessor = preprocessor
    
    def preprocess(self, prediction_input):
        """Perform scaling preprocessing"""
        inputs = super().preprocess(prediction_input)
        return self._preprocessor.preprocess(inputs)
    
    def postprocess(self, prediction_results):
        """Convert class indices to class names."""
        return {"predictions": [self._class_names[class_num] for class_num in prediction_results]}

## Build and push container to Artifact Registry

### Build your container

To build a custom container, we also need to write an entrypoint of the image that starts the model server. However, with the Custom Prediction Routine feature, you don't need to write the entrypoint anymore. Vertex AI SDK will populate the entrypoint with the custom predictor you provide.

#### Set up credentials (for local execution)

Setting up credentials is only required to run the custom serving container locally. Credentials set up is required to execute the `Predictor`'s `load` function, which downloads the model artifacts from Cloud Storage.

There are two options for setting up your credentials, depending on permissions granted to your service account.

First enable the IAM API if it's not already enabled.

In [None]:
! gcloud services enable iam.googleapis.com

#### Option 1: Service Account

Follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

Next, generate the service account key, and save it to `credentials.json` in the same directory you are running this notebook.

In [None]:
SERVICE_ACCOUNT = "[your-service-account]"  # @param {type:"string"}

In [None]:
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your GCP project id from gcloud
    shell_output = !gcloud auth list 2>/dev/null
    SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()
    print("Service Account:", SERVICE_ACCOUNT)

In [None]:
! gcloud iam service-accounts keys create credentials.json --iam-account=$SERVICE_ACCOUNT
! gcloud auth application-default login

#### Option 2: User Account

Follow these steps:

1. Open a terminal and cd to the same directory that you are running the notebook.

2. Execute the command `gcloud auth application-default login` -- answer yes to continue. This will open up an authentication browser tab. Follow the instructions.

In [None]:
CREDENTIALS_FILE = "/home/jupyter/.config/gcloud/application_default_credentials.json"

#### Build your custom model serving container

To build a custom image, a Dockerfile is necessary where you need to implement what the image looks like. With the Custom Prediction Routine feature, Vertex AI SDK auto-generates the Dockerfile and builds the image for you.

Using `python:3.7` as a base image by default.

In [None]:
import os

from google.cloud.aiplatform.prediction import LocalModel
from src.predictor import CprPredictor

REPOSITORY = "custom-preprocess-container-prediction"  # @param {type:"string"}
SERVER_IMAGE = "sklearn-cpr-preprocess-server"  # @param {type:"string"}

local_model = LocalModel.create_cpr_model(
    "src",
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}",
    predictor=CprPredictor,
    requirements_path="src/requirements.txt",
)

#### Get the specification for the serving container

Next, display the specification of the custom serving container you just built.

In [None]:
local_model.get_serving_container_spec()

### Create example data

Next, create some synthetic example data, and store the examples in a JSON format for prediction.

Learn more about [formatting input instances in JSON](https://cloud.google.com/vertex-ai/docs/predictions/online-predictions-custom-models#request-body-details)

In [None]:
INPUT_FILE = "instances.json"

In [None]:
%%writefile $INPUT_FILE
{
    "instances": [
        [6.7, 3.1, 4.7, 1.5],
        [4.6, 3.1, 1.5, 0.2]
    ]
}

### Test the custom model serving container locally

Next, you test your custom model serving container, with CPR, locally. In this example, the container executes a prediction request and a health check.

*Note:* You need to have the credentials set up in the previous step and pass the path to the credentials while running the container. The service account should have the **Storage Object Admin** permission.

In [None]:
with local_model.deploy_to_local_endpoint(
    artifact_uri=f"{BUCKET_URI}/model",
    credential_path=CREDENTIALS_FILE,  # Update this to the path to your credentials.
) as local_endpoint:
    predict_response = local_endpoint.predict(
        request_file=INPUT_FILE,
        headers={"Content-Type": "application/json"},
    )

    health_check_response = local_endpoint.run_health_check()

Print out the predict response and its content.

In [None]:
predict_response, predict_response.content

Print out the health check response and its content.

In [None]:
health_check_response, health_check_response.content

Also print out all the container logs.

In [None]:
local_endpoint.print_container_logs(show_all=True)

### Push the container to artifact registry

#### Configure Docker to access Artifact Registry

In [None]:
! gcloud services enable artifactregistry.googleapis.com

In [None]:
! gcloud beta artifacts repositories create {REPOSITORY} \
    --repository-format=docker \
    --location=$REGION

In [None]:
! gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

#### Push your container image to your Artifact Registry repository

In [None]:
local_model.push_image()

## Deploy custom model serving container to Vertex AI

### Upload the custom serving container to a `Vertex AI Model` resource

Use the LocalModel instance to upload the custom serving container to a `Vertex AI Model` resource. It will populate the container specification automatically for you.

In [None]:
model = local_model.upload(
    display_name="iris_" + TIMESTAMP,
    artifact_uri=f"{BUCKET_URI}/model",
)

### Deploy the model to `Vertex AI Endpoint` resource

Next, deploy the Vertex AI Model resource to a Vertex AI Endpoint resource, for prediction.

In [None]:
endpoint = model.deploy(machine_type="n1-standard-4")

## Make predictions to deployed model

### Using Vertex AI SDK

First, you make a prediction request using the Vertex AI SDK.

In [None]:
endpoint.predict(instances=[[6.7, 3.1, 4.7, 1.5], [4.6, 3.1, 1.5, 0.2]])

### Using REST

Next, you repeat the same, but use the REST interface to make a prediction request.

In [None]:
ENDPOINT_ID = endpoint.name

In [None]:
! curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d @instances.json \
https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/{ENDPOINT_ID}:predict

### Using gcloud CLI

Finally, you repeat the same, but use the gcloud command line interface to make a prediction request.

In [None]:
! gcloud ai endpoints predict $ENDPOINT_ID \
  --region=$REGION \
  --json-request=instances.json

### Cleanup: Scenario 1

In [None]:
try:
    # Undeploy model and delete endpoint
    endpoint.delete(force=True)

    # Delete the model resource
    model.delete()
except:
    pass

# Delete the container image from Artifact Registry
! gcloud artifacts docker images delete \
    --quiet \
    --delete-tags \
    {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}

! rm -rf model

## Build a custom model serving container using the CPR model server: Scenario 2: implementing the predictor

Next, you will implement a custom `predictor()` method for the CPR model server, instead of using a pre-built predictor. The `predictor()` method handles the sending the instances data to the model and receiving the prediction request. It will also, call the `preprocess()` method to preprocess the input data to the method before sending it to the model. In this example, you inherit the base class `Predictor`, and implement the corresponding `predictor()` method.

```
class CprPredictor(Predictor):
    """Default Predictor implementation for Sklearn models."""

    def __init__(self):
        return

    def load(self, gcs_artifacts_uri: str):
        """ Loads the model artifact.
            Loads the preprocessor module.
        """
       

    def preprocess(self, prediction_input: dict):
        """ (super) Apply the preprocessor to the input data
        """
       

    def predict(self, instances: np.ndarray):
        """ Performs prediction.
        
```

In [None]:
%%writefile src/predictor.py

import joblib
import numpy as np
import pickle

from google.cloud import storage
from google.cloud.aiplatform.prediction.predictor import Predictor

from sklearn.datasets import load_iris


class CprPredictor(Predictor):
    
    def __init__(self):
        return
    
    def load(self, gcs_artifacts_uri: str):
        """Loads the preprocessor and model artifacts."""
        gcs_client = storage.Client()
        with open("preprocessor.pkl", 'wb') as preprocessor_f, open("model.joblib", 'wb') as model_f:
            gcs_client.download_blob_to_file(
                f"{gcs_artifacts_uri}/preprocessor.pkl", preprocessor_f
            )
            gcs_client.download_blob_to_file(
                f"{gcs_artifacts_uri}/model.joblib", model_f
            )

        with open("preprocessor.pkl", "rb") as f:
            preprocessor = pickle.load(f)

        self._class_names = load_iris().target_names
        self._model = joblib.load("model.joblib")
        self._preprocessor = preprocessor

    def predict(self, instances):
        """Performs prediction."""
        instances = instances["instances"]
        inputs = np.asarray(instances)
        preprocessed_inputs = self._preprocessor.preprocess(inputs)
        outputs = self._model.predict(preprocessed_inputs)

        return {"predictions": [self._class_names[class_num] for class_num in outputs]}

## Build the custom model serving container

Next, you build the custom model serving container.

In [None]:
import os

from google.cloud.aiplatform.prediction import LocalModel
from src.predictor import CprPredictor

local_model = LocalModel.create_cpr_model(
    "src",
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}",
    predictor=CprPredictor,
    requirements_path=os.path.join("src", "requirements.txt"),
)

#### Get the specification for the serving container

Next, display the specification of the custom serving container you just built.

In [None]:
local_model.get_serving_container_spec()

### Test the custom model serving container locally

Next, you test your custom model serving container, with CPR, locally. In this example, the container executes a prediction request and a health check.

*Note:* You need to have the credentials set up in the previous step and pass the path to the credentials while running the container. The service account should have the **Storage Object Admin** permission.

In [None]:
with local_model.deploy_to_local_endpoint(
    artifact_uri=f"{BUCKET_URI}/model",
    credential_path=CREDENTIALS_FILE,
) as local_endpoint:
    predict_response = local_endpoint.predict(
        request_file=INPUT_FILE,
        headers={"Content-Type": "application/json"},
    )

    health_check_response = local_endpoint.run_health_check()

Print out the predict response and its content.

In [None]:
predict_response, predict_response.content

Print out the health check response and its content.

In [None]:
health_check_response, health_check_response.content

Also print out all the container logs.

In [None]:
local_endpoint.print_container_logs(show_all=True)

### Push the container to artifact registry

#### Configure Docker to access Artifact Registry

In [None]:
! gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

#### Push your container image to your Artifact Registry repository

In [None]:
local_model.push_image()

## Deploy custom model serving container to Vertex AI

### Upload the custom serving container to a `Vertex AI Model` resource

Use the LocalModel instance to upload the custom serving container to a `Vertex AI Model` resource. It will populate the container specification automatically for you.

In [None]:
model = local_model.upload(
    display_name="iris_" + TIMESTAMP,
    artifact_uri=f"{BUCKET_URI}/model",
)

### Deploy the model to `Vertex AI Endpoint` resource

Next, deploy the Vertex AI Model resource to a Vertex AI Endpoint resource, for prediction.

In [None]:
endpoint = model.deploy(machine_type="n1-standard-4")

## Make predictions to deployed model

### Using Vertex AI SDK

Make a prediction request using the Vertex AI SDK.

In [None]:
endpoint.predict(instances=[[6.7, 3.1, 4.7, 1.5], [4.6, 3.1, 1.5, 0.2]])

### Cleanup: Scenario 2

In [None]:
try:
    # Undeploy model and delete endpoint
    endpoint.delete(force=True)

    # Delete the model resource
    model.delete()
except:
    pass

# Delete the container image from Artifact Registry
! gcloud artifacts docker images delete \
    --quiet \
    --delete-tags \
    {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}

! rm -rf model src/entrypoint.py

## Build a custom model serving container using the CPR model server: Scenario 3: implementing the predictor and request handler

Next, you will implement a custom `handler()` method for the CPR model server, instead of using a pre-built http request handler. The `handler()` method handles the extraction of the prediction request from the HTTP request message. It will also, call the `predictor()` method to pass the extraction instances data for the prediction request.

A [`Handler`](https://github.com/googleapis/python-aiplatform/blob/custom-prediction-routine/google/cloud/aiplatform/prediction/handler.py) must implement the following interface.

```
class CprHandler(PredictionHandler):
    """Interface for Handler class to handle prediction requests."""

    def __init__(
        self, gcs_artifacts_uri: str, predictor: Optional[Type[Predictor]] = None,
    ):
        """Initializes a Handler instance.
        Args:
            gcs_artifacts_uri (str):
                Required. The value of the environment variable AIP_STORAGE_URI.
            predictor (Type[Predictor]):
                Optional. The Predictor class this handler uses to initiate predictor
                instance if given.
        """

    def handle(self, request: Request) -> Response:
        """Handles a prediction request.
        Args:
            request (Request):
                The request sent to the application.
        Returns:
            The response of the prediction request.
        """
```

In [None]:
%%writefile src/handler.py

import csv
from io import StringIO
import json

from fastapi import Response

from google.cloud.aiplatform.prediction.handler import PredictionHandler

class CprHandler(PredictionHandler):
    """Default prediction handler for the prediction requests sent to the application."""

    async def handle(self, request):
        """Handles a prediction request."""
        request_body = await request.body()
        prediction_instances = self._convert_csv_to_list(request_body)
        prediction_instances = {"instances": prediction_instances}

        prediction_results = self._predictor.postprocess(
            self._predictor.predict(self._predictor.preprocess(prediction_instances))
        )

        return Response(content=json.dumps(prediction_results))
    
    def _convert_csv_to_list(self, data):
        """Converts list of string in csv format to list of float.
        
        Example input:
          b"1.1,2.2,3.3,4.4\n2.3,3.4,4.5,5.6\n"
          
        Example output:
            [
                [1.1, 2.2, 3.3, 4.4],
                [2.3, 3.4, 4.5, 5.6],
            ]
        """
        res = []
        for r in csv.reader(StringIO(data.decode("utf-8")), quoting=csv.QUOTE_NONNUMERIC):
            res.append(r)
        return res

## Build the custom model serving container

Next, you build the custom model serving container.

In [None]:
import os

from google.cloud.aiplatform.prediction import LocalModel
from src.handler import CprHandler
from src.predictor import CprPredictor

local_model = LocalModel.create_cpr_model(
    "src",
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}",
    predictor=CprPredictor,
    handler=CprHandler,
    requirements_path=os.path.join("src", "requirements.txt"),
)

#### Get the specification for the serving container

Next, display the specification of the custom serving container you just built.

In [None]:
local_model.get_serving_container_spec()

### Create example data

Next, create some synthetic example data, and store the examples in a CSV format for prediction.

To send input instances in CSV, need to use raw predict to use an arbitrary HTTP payload rather than JSON format. 

Learn more about [Raw Predict](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/rawPredict)

In [None]:
INPUT_FILE = "instances.csv"

In [None]:
%%writefile $INPUT_FILE
6.7,3.1,4.7,1.5
4.6,3.1,1.5,0.2

### Test the custom model serving container locally

Next, you test your custom model serving container, with CPR, locally. In this example, the container executes a prediction request and a health check.

*Note:* You need to have the credentials set up in the previous step and pass the path to the credentials while running the container. The service account should have the **Storage Object Admin** permission.

In [None]:
with local_model.deploy_to_local_endpoint(
    artifact_uri=f"{BUCKET_URI}/model",
    credential_path=CREDENTIALS_FILE,
) as local_endpoint:
    predict_response = local_endpoint.predict(
        request_file=INPUT_FILE,
        headers={"Content-Type": "application/json"},
    )

    health_check_response = local_endpoint.run_health_check()

Print out the predict response and its content.

In [None]:
predict_response, predict_response.content

Print out the health check response and its content.

In [None]:
health_check_response, health_check_response.content

Also print out all the container logs.

In [None]:
local_endpoint.print_container_logs(show_all=True)

### Push the container to artifact registry

#### Configure Docker to access Artifact Registry

In [None]:
! gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

#### Push your container image to your Artifact Registry repository

In [None]:
local_model.push_image()

## Deploy custom model serving container to Vertex AI

### Upload the custom serving container to a `Vertex AI Model` resource

Use the LocalModel instance to upload the custom serving container to a `Vertex AI Model` resource. It will populate the container specification automatically for you.

In [None]:
model = local_model.upload(
    display_name="iris_" + TIMESTAMP,
    artifact_uri=f"{BUCKET_URI}/model",
)

### Deploy the model to `Vertex AI Endpoint` resource

Next, deploy the Vertex AI Model resource to a Vertex AI Endpoint resource, for prediction.

In [None]:
endpoint = model.deploy(machine_type="n1-standard-4")

## Make predictions to deployed model

### Using Vertex AI SDK

Make a prediction request using the Vertex AI SDK `rawPredict()` method.

In [None]:
from google.api import httpbody_pb2
from google.cloud import aiplatform_v1 as gapic

prediction_client = gapic.PredictionServiceClient(
    client_options={"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
)

with open(INPUT_FILE) as f:
    http_body = httpbody_pb2.HttpBody(
        data=f.read().encode("utf-8"),
        content_type="text/csv",
    )

request = gapic.RawPredictRequest(
    endpoint=endpoint.resource_name,
    http_body=http_body,
)

prediction_client.raw_predict(request=request)

### Cleanup: Scenario 3

In [None]:
try:
    # Undeploy model and delete endpoint
    endpoint.delete(force=True)

    # Delete the model resource
    model.delete()
except:
    pass

# Delete the container image from Artifact Registry
! gcloud artifacts docker images delete \
    --quiet \
    --delete-tags \
    {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}

! rm -rf model src/entrypoint.py

## Build a custom model serving container using the CPR model server: Scenario 4: implementing the Docker build process

Next, you will implement the Docker build process, instead of using the predefined Docker build process.

First, you write the container's entrypoint file that will launch the custom model server. 

In [None]:
%%writefile src/entrypoint.py

import os
from typing import Optional, Type

from google.cloud.aiplatform import prediction

from predictor import CprPredictor
from handler import CprHandler


def main(
    predictor_class: Optional[Type[prediction.predictor.Predictor]] = None,
    handler_class: Type[prediction.handler.Handler] = prediction.handler.PredictionHandler,
    model_server_class: Type[prediction.model_server.ModelServer] = prediction.model_server.ModelServer,
):
    handler = handler_class(
        os.environ.get("AIP_STORAGE_URI"), predictor=predictor_class
    )

    return model_server_class(handler).start()

if __name__ == "__main__":
    main(
        predictor_class=CprPredictor,
        handler_class=CprHandler
    )

### Build the custom model serving container

#### Write the Docker file.

First, build the Docker file. *Note:* You specify the entrypoint as the entry point module you defined.

In [None]:
%%writefile Dockerfile

# Users select base images.
FROM python:3.7

# Sets the directories' permissions so that any user can access the folder.
RUN mkdir -m 777 -p /home /usr/app
ENV HOME=/home
WORKDIR /usr/app

# Copies all the stuff to the image.
COPY src /usr/app/src
COPY src/requirements.txt /usr/app/requirements.txt

# Installs python dependencies.
RUN pip3 install --no-cache-dir -r /usr/app/requirements.txt

# Informs Docker that the container listens on the specified ports at runtime.
EXPOSE 8080

# Sets up an entrypoint to start the model server.
ENTRYPOINT ["python3", "/usr/app/src/entrypoint.py"]

#### Build the container image

Next, build the container image.

In [None]:
! docker build --tag={REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE} .

### Test the custom model serving container locally

Next, you test your custom model serving container, with CPR, locally. In this example, the container executes a prediction request and a health check.

*Note:* You need to have the credentials set up in the previous step and pass the path to the credentials while running the container. The service account should have the **Storage Object Admin** permission.

In [None]:
! docker run -d -p 80:8080 \
    --name=local-iris-custom \
    -e AIP_HTTP_PORT=8080 \
    -e AIP_HEALTH_ROUTE=/health \
    -e AIP_PREDICT_ROUTE=/predict \
    -e AIP_STORAGE_URI={BUCKET_URI}/model \
    -e GOOGLE_APPLICATION_CREDENTIALS=/usr/app/credentials.json \
    -e GOOGLE_CLOUD_PROJECT={PROJECT_ID} \
    -v {CREDENTIALS_FILE}:/usr/app/credentials.json \
    {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}

Print out the health check response and its content.

In [None]:
! curl localhost/health

Print out the predict response and its content.

In [None]:
! curl -X POST \
  -d @instances.csv \
  -H "Content-Type: application/json; charset=utf-8" \
  localhost/predict

Shutdown the Docker service

In [None]:
! docker stop local-iris-custom

### Push the container to artifact registry

#### Configure Docker to access Artifact Registry

In [None]:
! gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet

#### Push your container image to your Artifact Registry repository

In [None]:
local_model.push_image()

## Deploy custom model serving container to Vertex AI

### Upload the custom serving container to a `Vertex AI Model` resource

Use the LocalModel instance to upload the custom serving container to a `Vertex AI Model` resource. It will populate the container specification automatically for you.

In [None]:
model = local_model.upload(
    display_name="iris_" + TIMESTAMP,
    artifact_uri=f"{BUCKET_URI}/model",
)

### Deploy the model to `Vertex AI Endpoint` resource

Next, deploy the Vertex AI Model resource to a Vertex AI Endpoint resource, for prediction.

In [None]:
endpoint = model.deploy(machine_type="n1-standard-4")

## Make predictions to deployed model

### Using Vertex AI SDK

Make a prediction request using the Vertex AI SDK `rawPredict()` method.

In [None]:
from google.api import httpbody_pb2
from google.cloud import aiplatform_v1 as gapic

prediction_client = gapic.PredictionServiceClient(
    client_options={"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
)

with open(INPUT_FILE) as f:
    http_body = httpbody_pb2.HttpBody(
        data=f.read().encode("utf-8"),
        content_type="text/csv",
    )

request = gapic.RawPredictRequest(
    endpoint=endpoint.resource_name,
    http_body=http_body,
)

prediction_client.raw_predict(request=request)

### Cleanup: Scenario 4

In [None]:
try:
    # Undeploy model and delete endpoint
    endpoint.delete(force=True)

    # Delete the model resource
    model.delete()
except:
    pass

# Delete the container image from Artifact Registry
! gcloud artifacts docker images delete \
    --quiet \
    --delete-tags \
    {REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{SERVER_IMAGE}

! rm -rf model src/entrypoint.py

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
! gsutil rm -rf {BUCKET_URI}

! rm -rf src model instances.json instances.csv Dockerfile