In [None]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Deploying R models on Vertex AI

## Overview

This tutorial walks through building a custom container to serve a R model on Vertex Predictions. You will use the [`plumber` R package](https://www.rplumber.io/articles/introduction.html) to create a prediction and health endpoint from trained model artifacts. 

## Dataset

This tutorial uses R.A. Fisher's Iris dataset, a small dataset that is popular for trying out machine learning techniques. Each instance has four numerical features, which are different measurements of a flower, and a target label that marks it as one of three types of iris: Iris setosa, Iris versicolour, or Iris virginica.

This tutorial uses the copy of the [Iris dataset included in the R package](https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/iris).

## Objective
The goal is to:

- Train a model locally on the notebook using a flower's measurements as input to predict what type of iris (flower) it is.
- Save the model
- Build a web service using `plumber` to handle predictions and health checks
- Build a custom container with model artifacts
- Upload and deploy custom container to Vertex Prediction
- This tutorial focuses more on deploying this model with Vertex AI than on the design of the model itself.

## Costs
This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Creating Notebooks instance on Google Cloud

This notebook assumes you are working with Python and R based development environment. You can create a Notebook instance using Google Cloud Console or [gcloud](https://cloud.google.com/sdk/gcloud/reference/notebooks/instances/create) command to spin Notebook instance with R support.

```
gcloud notebooks instances create r-notebook-instance \
    --vm-image-project=deeplearning-platform-release \
    --vm-image-family=r-4-0-cpu-experimental-notebooks \
    --machine-type=n1-standard-4 \
    --location=us-central1-a \
    --boot-disk-size=100 \
    --network=default
```

This notebook runs R and Python  in the same notebook file using [`rpy2`](https://pypi.org/project/rpy2/) package that can run embedded R code using line and cell magic commands `%R` and `%%R`. Refer to the documentation on using [R and Python in the same notebook](https://cloud.google.com/notebooks/docs/r-python-same-notebook) file.

In [None]:
%load_ext rpy2.ipython

In [None]:
!python --version

### Install additional packages

Install additional R package dependencies not installed in your notebook environment, such as randomForest, plumber. Use the latest major GA version of each package from CRAN.

In [None]:
%%R

install.packages(c("randomForest", "plumber"), repos = "http://cran.us.r-project.org")

In [None]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

We will be using [Vertex SDK for Python](https://cloud.google.com/vertex-ai/docs/start/client-libraries#python) to interact with Vertex AI services. The high-level aiplatform library is designed to simplify common data science workflows by using wrapper classes and opinionated defaults.

**Install Vertex SDK for Python**

In [None]:
!pip -q install {USER_FLAG} --upgrade google-cloud-aiplatform

### Restart the kernel
After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Load R magic commands

In [None]:
%load_ext rpy2.ipython

## Before you begin

### Set up your Google Cloud project
The following steps are required, regardless of your notebook environment.

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
3. Enable the Vertex AI API and Compute Engine API.
    - [Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com)
    - [Cloud Storage API](https://console.cloud.google.com/flows/enableapi?apiid=storage.googleapis.com)
    - [Container Registry API](https://console.cloud.google.com/flows/enableapi?apiid=containerregistry.googleapis.com)
4. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).
5. Enter your project ID in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

Note: Jupyter runs lines prefixed with ! or % as shell commands, and it interpolates Python variables with $ or {} into these commands.



#### Set your project ID

If you don't know your project ID, you may be able to get your project ID using `gcloud` or `google.auth`.

In [None]:
PROJECT_ID = "[your-project-id]"  # <---CHANGE THIS TO YOUR PROJECT

import os

# Get your Google Cloud project ID using google.auth
if not os.getenv("IS_TESTING"):
    import google.auth

    _, PROJECT_ID = google.auth.default()
    print("Project ID: ", PROJECT_ID)

# validate PROJECT_ID
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    print(
        f"Please set your project id before proceeding to next step. Currently it's set as {PROJECT_ID}"
    )

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime


def get_timestamp():
    return datetime.now().strftime("%Y%m%d%H%M%S")


TIMESTAMP = get_timestamp()
print(f"TIMESTAMP = {TIMESTAMP}")

### Authenticate your Google Cloud account

---

If you are using Google Cloud Notebooks, your environment is already authenticated. Skip this step.

---

**If you are using Colab** run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [Create service account key page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).
2. Click **Create service account**.
3. In the **Service account name** field, enter a name, and click **Create**.
4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI" into the filter box, and select **Vertex AI Administrator**. Type **"Storage Object Admin"** into the filter box, and select Storage Object Admin.
5. Click **Create**. A JSON file that contains your key downloads to your local environment.
6. Enter the path to your service account key as the `GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [None]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket
**The following steps are required, regardless of your notebook environment.**

When you submit a training job using the Cloud SDK, you upload a Python package containing your training code to a Cloud Storage bucket. Vertex AI runs the code from this package. In this tutorial, Vertex AI also saves the trained model that results from your job in the same bucket. Using this model artifact, you can then create Vertex AI model and endpoint resources in order to serve online predictions.

Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

You may also change the REGION variable, which is used for operations throughout the rest of this notebook. Make sure to [choose a region where Vertex AI services are available](https://cloud.google.com/vertex-ai/docs/general/locations#available_regions). You may not use a Multi-Regional Storage bucket for training with Vertex AI.

In [None]:
BUCKET_NAME = "gs://[your-bucket-name]"  # <---CHANGE THIS TO YOUR BUCKET
# BUCKET_NAME = "gs://cloud-ai-platform-2f444b6a-a742-444b-b91a-c7519f51bd77"  # <---CHANGE THIS TO YOUR BUCKET
REGION = "us-central1"  # @param {type:"string"}

In [3]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "gs://[your-bucket-name]":
    BUCKET_NAME = f"gs://{PROJECT_ID}aip-{get_timestamp()}"

NameError: name 'PROJECT_ID' is not defined

In [None]:
print(f"PROJECT_ID = {PROJECT_ID}")
print(f"BUCKET_NAME = {BUCKET_NAME}")
print(f"REGION = {REGION}")

---

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.

---

In [None]:
! gsutil mb -l $REGION $BUCKET_NAME

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_NAME

### Import libraries and define constants


In [None]:
import os
import sys

import google.auth
from google.cloud import aiplatform

In [None]:
%%R

library(ggplot2)
library(randomForest)
library(plumber)

In [None]:
APP_NAME = "custom-rf-classifier"

## Train R model locally

In [None]:
!mkdir ./predictor

In [None]:
%%R 

head(iris)

In [None]:
%%R 

scatter <- ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) 
scatter + geom_point(aes(color=Species, shape=Species)) +
  xlab("Sepal Length") +  ylab("Sepal Width") +
  ggtitle("Sepal Length-Width")

In [None]:
%%R 

scatter <- ggplot(data=iris, aes(x = Petal.Length, y = Petal.Width)) 
scatter + geom_point(aes(color=Species, shape=Species)) +
  xlab("Petal Length") +  ylab("Petal Width") +
  ggtitle("Petal Length-Width")

In [None]:
%%R

# train model
model = randomForest(Species ~ ., data = iris)
# save model
save(model, file = "./predictor/model.RData")

## Deploying

Deploying a R model on [Vertex Predictions](https://cloud.google.com/vertex-ai/docs/predictions/getting-predictions) requires to use a custom container that serves online predictions. You will deploy a container running [`plumber R`](https://www.rplumber.io/) package to serve predictions from trained model artifacts. You can then use Vertex Predictions to classify sentiment of input texts.

### Deploying model on Vertex Predictions with custom container
To use a custom container to serve predictions from a R model, you must provide Vertex AI with a Docker container image that runs an HTTP server, such as `plumber` in this case. Please refer to documentation that describes the [container image requirements](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements) to be compatible with Vertex Predictions.

![Serving with Custom Containers on Vertex Predictions](./images/serving-with-custom-containers-on-vertex-predictions.png)

#### Serving script

We will create a R script file that obtains predicted values from input requests and returns status of the server. `Plumber` makes use of comment “annotations” above functions to define the web service. When you feed the file into `Plumber`, you’ll get a runnable web service that other systems can interact with over a network.

In [None]:
%%bash

cat << EOF > ./predictor/serving.R

# serving.R

library("randomForest")

#* Health check
#* @get /ping
#* @serializer unboxedJSON
function() {
    list(status = "OK")
}

#* @apiTitle flower classifier
#* @param petal_length 
#* @param petal_width 
#* @param sepal_length
#* @param sepal_width
#* @post /classify
function (req) 
{
    instances <- as.data.frame(jsonlite::fromJSON(req\$postBody))
    results <- list()
    
    load("./model.RData")
    
    for(i in 1:nrow(instances)) {       # for-loop over columns
        petal_length <- instances[i, "instances.petal_length"]
        petal_width <- instances[i, "instances.petal_width"]
        sepal_length <- instances[i, "instances.sepal_length"]
        sepal_width <- instances[i, "instances.sepal_width"]
        test = c(sepal_length, sepal_width, petal_length, petal_width)
        test = sapply(test, as.numeric)
        test = data.frame(matrix(test, ncol = 4))
        colnames(test) = colnames(iris[, 1:4])
        results <- append(results, predict(model, test))
    }
    
    list(predictions = results)
}

EOF

We will create a file that runs the server

In [None]:
%%bash

cat << EOF > ./predictor/startServer.R

library(plumber)
pr <- plumb("serving.R")
pr\$run(host = "0.0.0.0", port = 7080)

EOF

#### Create serving container image `Dockerfile`

Now package the model artifacts and the scoring script into a Docker image. `Dockerfile` uses the base image supplied by RStudio (`rstudio/plumber`), installs `randomForest`, and then adds the model and the above scoring script. Finally, it runs the code that will start the server and listen on port 7080.

In [None]:
%%bash

cat << EOF > ./predictor/Dockerfile

FROM rstudio/plumber

# install random forest
RUN R -e 'install.packages(c("randomForest"), repos = "https://cran.rstudio.com/")'

# Copy model and script
RUN mkdir /app
COPY model.RData /app
COPY serving.R /app
COPY startServer.R /app
WORKDIR /app

# plumber & run server
EXPOSE 7080

ENTRYPOINT ["R", "-f", "/app/startServer.R"]

EOF

In [None]:
CUSTOM_PREDICTOR_IMAGE_URI = f"gcr.io/{PROJECT_ID}/r-predict-{APP_NAME}"
print(f"CUSTOM_PREDICTOR_IMAGE_URI = {CUSTOM_PREDICTOR_IMAGE_URI}")

#### Build and upload the image

In [None]:
!docker build \
  --tag=$CUSTOM_PREDICTOR_IMAGE_URI \
  ./predictor

#### Run the container locally *[Optional]*

Before push the container image to Container Registry to use it with Vertex Predictions, you can run it as a container in your local environment to verify that the server works as expected

1. To run the container image as a container locally, run the following command:

In [None]:
!echo $CUSTOM_PREDICTOR_IMAGE_URI
!docker stop local_rf_classifier
!docker run -t -d --rm -p 7080:7080 --name=local_rf_classifier $CUSTOM_PREDICTOR_IMAGE_URI
!sleep 10

In [None]:
!docker container ls

2. To send the container's server a health check, run the following command:

In [None]:
!curl http://localhost:7080/ping

3. To send the container's server a prediction request, run the following commands:

In [None]:
%%bash

cat > ./predictor/instances.json <<END
{
  "instances": [{
      "sepal_width": 1,
      "sepal_length": 2,
      "petal_width": 3,
      "petal_length": 1
    },
    {
      "sepal_width": 4,
      "sepal_length": 2,
      "petal_width": 1,
      "petal_length": 1
    }
  ]
}
END

curl -s -X POST \
  -H "Content-Type: application/json; charset=utf-8" \
  -d @./predictor/instances.json \
  http://localhost:7080/classify

4. To stop the container, run the following command:

In [None]:
!docker stop local_rf_classifier

#### Deploying the serving container to Vertex Predictions

We create a model resource on Vertex AI and deploy the model to a Vertex Endpoints. You must deploy a model to an endpoint before using the model. The deployed model runs the custom container image to serve predictions.

##### **Push the serving container to Container Registry**

Push your container image with inference code and dependencies to your Container Registry

In [None]:
!docker push $CUSTOM_PREDICTOR_IMAGE_URI

##### **Initialize the Vertex SDK for Python**

In [None]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_NAME)

##### **Create a Model resource with custom serving container**

In [None]:
VERSION = 1
model_display_name = f"{APP_NAME}-v{VERSION}"
model_description = "R based flower classifier with custom container"

MODEL_NAME = APP_NAME
health_route = "/ping"
predict_route = f"/classify"
serving_container_ports = [7080]

In [None]:
model = aiplatform.Model.upload(
    display_name=model_display_name,
    description=model_description,
    serving_container_image_uri=CUSTOM_PREDICTOR_IMAGE_URI,
    serving_container_predict_route=predict_route,
    serving_container_health_route=health_route,
    serving_container_ports=serving_container_ports,
)

model.wait()

print(model.display_name)
print(model.resource_name)

For more context on upload or importing a model, refer [documentation](https://cloud.google.com/vertex-ai/docs/general/import-model)

##### **Create an Endpoint for Model with Custom Container**

In [None]:
endpoint_display_name = f"{APP_NAME}-endpoint"
endpoint = aiplatform.Endpoint.create(display_name=endpoint_display_name)

##### **Deploy the Model to Endpoint**

Deploying a model associates physical resources with the model so it can serve online predictions with low latency.

**NOTE:** This step takes few minutes to deploy the resources.

In [None]:
traffic_percentage = 100
machine_type = "n1-standard-4"
deployed_model_display_name = model_display_name
min_replica_count = 1
max_replica_count = 3
sync = True

model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=deployed_model_display_name,
    machine_type=machine_type,
    traffic_percentage=traffic_percentage,
    sync=sync,
)

#### Invoking the Endpoint with deployed Model using Vertex SDK to make predictions

##### **Get the Endpoint id**

In [None]:
endpoint_display_name = f"{APP_NAME}-endpoint"
filter = f'display_name="{endpoint_display_name}"'

for endpoint_info in aiplatform.Endpoint.list(filter=filter):
    print(
        f"Endpoint display name = {endpoint_info.display_name} resource id ={endpoint_info.resource_name} "
    )

endpoint = aiplatform.Endpoint(endpoint_info.resource_name)

In [None]:
endpoint.list_models()

##### **Formatting input for online prediction**
For online prediction requests, the prediction input instances must be formatted as JSON:

```
[
    <simple list>,
    ...
]
```
    
The instances[] object is required, and must contain the list of instances to get predictions for. In the following example, each input instance is a list of dict objects:

In [None]:
instances = [
    {
        "sepal_width": 1,
        "sepal_length": 2,
        "petal_width": 3,
        "petal_length": 1
    },
    {
        "sepal_width": 4,
        "sepal_length": 2,
        "petal_width": 1,
        "petal_length": 1
    }
]

##### **Sending an online prediction request**

In [None]:
endpoint.predict(instances)

### Cleaning up

TODO: Add cleaning steps

#### Cleaning up Notebook Environment
After you are done experimenting, you can either STOP or DELETE the AI Notebook instance to prevent any charges. If you want to save your work, you can choose to stop the instance instead.

```
# Stopping Notebook instance
gcloud notebooks instances stop r-notebook-instance --location=us-central1-a


# Deleting Notebook instance
gcloud notebooks instances delete r-notebook-instance --location=us-central1-a
```