# LAB 5b:  Deploy and predict with Keras model on Vertex AI

**Learning Objectives**

1. Setup up the environment
1. Deploy trained Keras model to an endpoint for online prediction on Vertex AI
1. Online predict from model on Vertex AI
1. Batch predict from model on Vertex AI

## Introduction 
In this notebook, we'll be deploying our Keras model to Vertex AI and creating predictions.

We will set up the environment, deploy a trained Keras model to Vertex AI for online prediction, online predict from deployed model on Vertex AI, and batch predict on Vertex AI.

Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/5b_deploy_keras_ai_platform_babyweight.ipynb).

## Set up environment variables and load necessary libraries

Import necessary libraries.

In [None]:
try:
    from google.cloud import aiplatform

except ImportError:
    !pip3 install -U google-cloud-aiplatform --user

    print("Please restart the kernel and re-run the notebook.")

In [None]:
import os
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

### Set environment variables.

Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket, so you only need to change your project and region.

In [None]:
%%bash
PROJECT=$(gcloud config list project --format "value(core.project)")
echo "Your current GCP Project Name is: "$PROJECT

In [None]:
# Change these to try this notebook out
PROJECT = "asl-ml-immersion"  # Replace with your PROJECT
BUCKET = PROJECT  # defaults to PROJECT
REGION = "us-central1"  # Replace with your REGION

In [None]:
os.environ["PROJECT"] = PROJECT
os.environ["BUCKET"] = BUCKET
os.environ["REGION"] = REGION

In [None]:
%%bash
gcloud config set project $PROJECT
gcloud config set ai/region $REGION

## Check our trained model files

Let's check the directory structure of our outputs of our trained model in folder we exported the model to in our last [lab](../solutions/10_train_keras_ai_platform_babyweight.ipynb). We'll want to deploy the saved_model.pb within the directory of the tuned model as well as the variable values in the variables folder. Therefore, we need the path of the latest tuned directory so that everything within it can be found by Vertex AI's model deployment service. Note that the `2*` substrings are there to match timestamp strings.

In [None]:
%%bash
gsutil ls gs://${BUCKET}/babyweight/tuned_2*

In [None]:
%%bash
MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \
                 | tail -1)
gsutil ls ${MODEL_LOCATION}

## Upload model, create endpoint and deploy trained model

Uploading our SavedModel from the above `MODEL_LOCATION`, creating and endpoint and deploying the trained model to act as a REST web service are three simple gcloud calls. We also run a command to list the endpoints, to fetch the fully qualified resource name `ENDPOINT_RESOURCENAME` for the endpoint.

In [None]:
%%bash
TIMESTAMP=$(date -u +%Y%m%d_%H%M%S)
MODEL_DISPLAYNAME=babyweight_model_$TIMESTAMP
ENDPOINT_DISPLAYNAME=babyweight_endpoint_$TIMESTAMP
IMAGE_URI="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-3:latest"
MODEL_LOCATION=$(gsutil ls -d -- gs://${BUCKET}/babyweight/tuned_2*/2* \
                 | tail -1)
echo "MODEL_LOCATION=${MODEL_LOCATION}"

# Model
MODEL_RESOURCENAME=$(gcloud ai models upload \
    --region=$REGION \
    --display-name=$MODEL_DISPLAYNAME \
    --container-image-uri=$IMAGE_URI \
    --artifact-uri=$MODEL_LOCATION \
    --format="value(model)")

MODEL_ID=$(echo $MODEL_RESOURCENAME | cut -d"/" -f6)

echo "MODEL_DISPLAYNAME=${MODEL_DISPLAYNAME}"
echo "MODEL_RESOURCENAME=${MODEL_RESOURCENAME}"
echo "MODEL_ID=${MODEL_ID}"

# Endpoint
ENDPOINT_RESOURCENAME=$(gcloud ai endpoints create \
  --region=$REGION \
  --display-name=$ENDPOINT_DISPLAYNAME \
  --format="value(name)")

ENDPOINT_ID=$(echo $ENDPOINT_RESOURCENAME | cut -d"/" -f6)

echo "ENDPOINT_DISPLAYNAME=${ENDPOINT_DISPLAYNAME}"
echo "ENDPOINT_RESOURCENAME=${ENDPOINT_RESOURCENAME}"
echo "ENDPOINT_ID=${ENDPOINT_ID}"

# Deployment
DEPLOYEDMODEL_DISPLAYNAME=${MODEL_DISPLAYNAME}_deployment
MACHINE_TYPE=n1-standard-2
MIN_REPLICA_COUNT=1
MAX_REPLICA_COUNT=3

gcloud ai endpoints deploy-model $ENDPOINT_RESOURCENAME \
  --region=$REGION \
  --model=$MODEL_RESOURCENAME \
  --display-name=$DEPLOYEDMODEL_DISPLAYNAME \
  --machine-type=$MACHINE_TYPE \
  --min-replica-count=$MIN_REPLICA_COUNT \
  --max-replica-count=$MAX_REPLICA_COUNT \
  --traffic-split=0=100

## Use model to make online prediction.

### Python API

We can use the Python API to send a JSON request to the endpoint of the service to make it predict a baby's weight. The order of the responses are the order of the instances.

In [None]:
ENDPOINT_RESOURCENAME = (
    ""  # TODO: Copy your `ENDPOINT_RESOURCENAME` from above.
)
os.environ["ENDPOINT_RESOURCENAME"] = ENDPOINT_RESOURCENAME

api_endpoint = f"{REGION}-aiplatform.googleapis.com"

# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for multiple requests.
client = aiplatform.gapic.PredictionServiceClient(
    client_options=client_options
)

instances = [
    {
        "is_male": "True",
        "mother_age": 26.0,
        "plurality": "Single(1)",
        "gestation_weeks": 39,
    },
    {
        "is_male": "False",
        "mother_age": 29.0,
        "plurality": "Single(1)",
        "gestation_weeks": 38,
    },
    {
        "is_male": "True",
        "mother_age": 26.0,
        "plurality": "Triplets(3)",
        "gestation_weeks": 39,
    },
    {
        "is_male": "Unknown",
        "mother_age": 29.0,
        "plurality": "Multiple(2+)",
        "gestation_weeks": 38,
    },
]

instances = [
    json_format.ParseDict(instance, Value()) for instance in instances
]
response = client.predict(endpoint=ENDPOINT_RESOURCENAME, instances=instances)

# The predictions are a google.protobuf.Value representation of the model's predictions.
print(" prediction:", response.predictions)

The predictions for the four instances were: 5.33, 6.09, 2.50, and 5.86 pounds respectively when I ran it (your results might be different).

### gcloud shell API

Instead we could use the gcloud shell API. Create a newline delimited JSON file with one instance per line and submit using gcloud.

In [None]:
%%writefile inputs.json
{
    "instances": [
        {
            "is_male": "True",
            "mother_age": 26.0,
            "plurality": "Single(1)",
            "gestation_weeks": 39,
        },
        {
            "is_male": "False",
            "mother_age": 26.0,
            "plurality": "Single(1)",
            "gestation_weeks": 39,
        },
    ]
}

Now call `gcloud ai endpoint predict` using the JSON we just created and point to our deployed `ENDPOINT_RESOURCENAME`.

In [None]:
%%bash
gcloud ai endpoints predict $ENDPOINT_RESOURCENAME \
    --region=$REGION \
    --json-request=inputs.json

## Use model to make batch prediction.

Batch prediction is commonly used when you have thousands to millions of predictions. It will create a Vertex AI batch prediction job. We will put our prediction request JSONL file (multiple lines of JSON records) to GCS, and use the Python API to request the job.

In [None]:
%%writefile inputs.jsonl
{
    "is_male": "True",
    "mother_age": 26.0,
    "plurality": "Single(1)",
    "gestation_weeks": 39,
}
{
    "is_male": "False",
    "mother_age": 26.0,
    "plurality": "Single(1)",
    "gestation_weeks": 39,
}

In [None]:
!gsutil cp inputs.jsonl gs://$BUCKET/babyweight/batchpred/inputs.jsonl

In [None]:
MODEL_RESOURCENAME = (
    ""  # TODO: replace with your MODEL_RESOURCENAME from above
)

aiplatform.init(project=PROJECT, location=REGION)

my_model = aiplatform.Model(MODEL_RESOURCENAME)

batch_prediction_job = my_model.batch_predict(
    job_display_name="babyweight_batch",
    gcs_source=f"gs://{BUCKET}/babyweight/batchpred/inputs.jsonl",
    gcs_destination_prefix=f"gs://{BUCKET}/babyweight/batchpred/outputs",
    machine_type="n1-standard-2",
    accelerator_count=0,
    starting_replica_count=1,
    max_replica_count=1,
)

batch_prediction_job.wait()

print(batch_prediction_job.display_name)
print(batch_prediction_job.resource_name)
print(batch_prediction_job.state)

In [None]:
!gsutil cat $(gsutil ls gs://$BUCKET/babyweight/batchpred/outputs | tail -n1)prediction.errors_stats-*

In [None]:
!gsutil cat $(gsutil ls gs://$BUCKET/babyweight/batchpred/outputs | tail -n1)prediction.results-*

## Lab Summary:
In this lab, we set up the environment, deployed a trained Keras model to Vertex AI, online predicted from deployed model, and batch predicted from deployed model on Vertex AI.

Copyright 2021 Google LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
    https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.