# Using TF Serving with AI Platform Prediction Custom Containers (Beta)

This notebook demonstrates how to deploy a TensorFlow 2.x model using AI Platform Prediction Custom Containers (Alpha) and TensorFlow Serving.


Although, this notebook uses the custom serving module developed in the `01-prepare-for-serving.ipynb` notebook, the discussed techniques can be applied to any TF 2.x model.

For more information about the AI Platform Prediction Custom Containers feature refer to [TBD].

In [33]:
import base64
import os
import json
import time
import numpy as np
import tensorflow as tf

import google.auth

from google.auth.credentials import Credentials
from google.auth.transport.requests import AuthorizedSession

from typing import List, Optional, Text, Tuple

## Setting up the environment

This notebook was tested on **AI Platform Notebooks** using the standard TF 2.2 image.

### Set the model store path

Set the `SAVED_MODEL_PATH` to the GCS location of the `SavedModel` created in the `01-prepare-for-serving.ipynb`

In [34]:
SAVED_MODEL_PATH = 'gs://mlops-dev-workspace/models/resnet_serving'

### Configuring a regional Artifact Registry


A container image to be deployed to AI Platform Prediction must be stored in a regional [Google Cloud Artifact Registry](https://cloud.google.com/artifact-registry/docs). Using an external registry like Docker Hub or a Container Registry is not supported. 

If you already have an existing Artifact Registry you can use,  skip to the **Push TF Serving container images to Artifact Registry** section.

#### Create a regional Artifact Registry

In [35]:
repository_name = 'aipp-images'
region = 'us-central1'

!gcloud beta artifacts repositories create {repository_name} \
--repository-format=docker \
--location={region}

[1;31mERROR:[0m (gcloud.beta.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [36]:
!gcloud beta artifacts repositories list

Listing items under project mlops-dev-env, across all locations.

Note: To perform actions on the Container Registry repositories listed below please use 'gcloud container images'.

                                   ARTIFACT_REGISTRY
REPOSITORY   FORMAT  DESCRIPTION  LOCATION     CREATE_TIME          UPDATE_TIME
aipp-images  DOCKER               us-central1  2020-08-17T20:00:20  2020-08-17T20:34:36

         CONTAINER_REGISTRY
HOSTNAME           LOCATION
gcr.io             us


#### Set up authenticaton for Docker

In [37]:
hostname = f'{region}-docker.pkg.dev'

!gcloud beta auth configure-docker {hostname} --quiet


{
  "credHelpers": {
    "us-central1-docker.pkg.dev": "gcloud", 
    "asia.gcr.io": "gcloud", 
    "staging-k8s.gcr.io": "gcloud", 
    "us.gcr.io": "gcloud", 
    "gcr.io": "gcloud", 
    "marketplace.gcr.io": "gcloud", 
    "eu.gcr.io": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


#### Push TF Serving container images to the Artifact Registry

If you have skipped the previous steps and are re-using the existing Artifact Registry set the `region` and `repository_name` variables in the below cell with the values reflecting your environment.

In [38]:
#regions='your-region'
#repository_name='your-registryname'

In [39]:
_ , project_id = google.auth.default()

#cpu_image_name = f'{region}-docker.pkg.dev/{project_id}/{repository_name}/tensorflow_serving:latest-cpu'
#gpu_image_name = f'{region}-docker.pkg.dev/{project_id}/{repository_name}/tensorflow_serving:latest-gpu'
cpu_image_name = f'gcr.io/{project_id}/tensorflow_serving:latest-cpu'
gpu_image_name = f'gcr.io/{project_id}/tensorflow_serving:latest-gpu'

In [40]:
!docker pull tensorflow/serving:latest
!docker pull tensorflow/serving:latest-gpu

latest: Pulling from tensorflow/serving
Digest: sha256:a94b7e3b0e825350675e83b0c2f2fc28f34be358c34e4126a1d828de899ec44f
Status: Image is up to date for tensorflow/serving:latest
docker.io/tensorflow/serving:latest
latest-gpu: Pulling from tensorflow/serving
Digest: sha256:9f2154baa458bf7b523d5f3c9f545056ed14d75ceac00742d1903d37d80393e9
Status: Image is up to date for tensorflow/serving:latest-gpu
docker.io/tensorflow/serving:latest-gpu


In [41]:
!docker tag tensorflow/serving:latest {cpu_image_name}
!docker tag tensorflow/serving:latest-gpu {gpu_image_name}

In [42]:
!docker push {cpu_image_name}
!docker push {gpu_image_name}

The push refers to repository [gcr.io/mlops-dev-env/tensorflow_serving]

[1Bac716820: Preparing 
[1Bbd8c4bd3: Preparing 
[1Be785c230: Preparing 
[1Ba73fd165: Preparing 
[1Bf9a74649: Preparing 
[1Bda143c91: Preparing 
[1B287e1f04: Preparing 
[2B287e1f04: Layer already exists [8A[2K[3A[2Klatest-cpu: digest: sha256:a94b7e3b0e825350675e83b0c2f2fc28f34be358c34e4126a1d828de899ec44f size: 1989
The push refers to repository [gcr.io/mlops-dev-env/tensorflow_serving]

[1B41b4553f: Preparing 
[1B6ab262b7: Preparing 
[1Bfdb5f1f9: Preparing 
[1B64ade40f: Preparing 
[1B0889ee68: Preparing 
[1Bd332a58a: Preparing 
[1Bf11cbf29: Preparing 
[1Ba4b22186: Preparing 
[1Bafb09dc3: Preparing 
[1Bb5a53aac: Preparing 
[1Bc8e5063e: Preparing 
[1B7c529ced: Layer already exists [9A[2K[6A[2K[2A[2Klatest-gpu: digest: sha256:9f2154baa458bf7b523d5f3c9f545056ed14d75ceac00742d1903d37d80393e9 size: 2835


In [43]:
repository_id = f'{region}-docker.pkg.dev/{project_id}/{repository_name}'

!gcloud beta artifacts docker images list {repository_id}

Listing items under project mlops-dev-env, location us-central1, repository aipp-images.

IMAGE                                                                    DIGEST                                                                   CREATE_TIME          UPDATE_TIME
us-central1-docker.pkg.dev/mlops-dev-env/aipp-images/tensorflow_serving  sha256:9f2154baa458bf7b523d5f3c9f545056ed14d75ceac00742d1903d37d80393e9  2020-08-17T20:34:36  2020-08-17T20:34:36
us-central1-docker.pkg.dev/mlops-dev-env/aipp-images/tensorflow_serving  sha256:a94b7e3b0e825350675e83b0c2f2fc28f34be358c34e4126a1d828de899ec44f  2020-08-17T20:32:43  2020-08-17T20:32:43


## Deploying a model version

We will use AI Platform Prediction REST API to create model and model version resources.

### Create an authorized session 

The AI Platform Prediction REST API calls must be authorized through OAuth 2. We will use the `google.auth.transport.requests.AuthorizedSession` client to transparently handle OAuth authorization flow.

In [44]:
service_endpoint = 'https://alpha-ml.googleapis.com'

credentials, project_ = google.auth.default()
authed_session = AuthorizedSession(credentials)

### List all models in the project

In [45]:
url = f'{service_endpoint}/v1/projects/{project_id}/models/'

response = authed_session.get(url)
response.json()

{'models': [{'name': 'projects/mlops-dev-env/models/ResNet101',
   'regions': ['us-central1'],
   'etag': 'S7FgvSfwfUY='}]}

### Create a model resource

In [46]:
model_name = 'ResNet101'

url = f'{service_endpoint}/v1/projects/{project_id}/models/'

request_body = {
    "name": model_name
}

response = authed_session.post(url, data=json.dumps(request_body))
response.json()

{'error': {'code': 409,
  'message': 'Field: model.name Error: A model with the same name already exists.',
  'status': 'ALREADY_EXISTS',
  'details': [{'@type': 'type.googleapis.com/google.rpc.BadRequest',
    'fieldViolations': [{'field': 'model.name',
      'description': 'A model with the same name already exists.'}]}]}}

### Get the model's info

In [47]:
url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}'

response = authed_session.get(url)
response.json()

{'name': 'projects/mlops-dev-env/models/ResNet101',
 'regions': ['us-central1'],
 'etag': 'S7FgvSfwfUY='}

### Create a model version

When deploying a custom container to AI Platform Prediction you need to configure two groups of settings. The first group defines the configuration of the AI Platform Prediction service that hosts your container. For example, a node type, manual or autoscaling parameters, an accelerator configuration, etc. The second group are the settings specific to a given container. 

Refer to [TBD]() for a detailed discussion of the available service settings.

There are three ways of passing configuration settings to a container:
* the settings can be embedded in a custom container image
* you can pass the settings as command line arguments, or 
* you can supply a configuration file. 

In the first method, the configuration settings are supplied  at the time the container container is built. The other two methods allow you to set the settings  at the deployment time. 

Some model servers commonly used in AI Platform Prediction custom containers, including TF Serving used in this notebook, also expose a management API that allows you to change configurations after the server has been deployed. Configuring the server through the management API is currently not supported due to the constraints of the REST interface exposed by AI Platform Prediction.


Supplying configuration settings through a command line interface is straightforward. The AI Platform Prediction REST API utilizes JSON to encode requests and responses. You can provide the command line arguments as the `args` key of the JSON `container` object in the create model version request body.


Passing a config file to a container hosted in AI Platform Prediction is a little bit trickier. The container runs in an isolated environment and does not have access to resources (including Cloud Storage) outside of this environment. To pass file based assets (including a config file) to the container you need to stage them in the GCS deployment location. The GCS deployment location - set through the `deployment_uri` field of the REST API request body - is copied to the isolated environment by the create model version request. The url to the location of the copy in the isolated environment is exposed through the `AIP_STORAGE_URI` environment variable. 

In the following example you will use both the command line arguments and the configuration file to configure the TF Serving model server. Most of the configurations will be passed as command line arguments. The [server side batching]()(https://www.tensorflow.org/tfx/serving/serving_config#batching_configuration) parameters will be passed as a config file.


#### Create the config file with batching settings

In [48]:
batching_config = 'batching.pbtxt'

In [49]:
%%writefile {batching_config}

max_batch_size { value: 128 }
batch_timeout_micros { value: 150000 }
max_enqueued_batches { value: 16 }
num_batch_threads { value: 8 }

Writing batching.pbtxt


#### Copy the batch config file to the staging location in GCS

You are going to use the folder where the custom ResNet10 model was saved as the staging location.

In [50]:
!gsutil cp {batching_config} {SAVED_MODEL_PATH}/{batching_config}

Copying file://batching.pbtxt [Content-Type=application/octet-stream]...
/ [1 files][  136.0 B/  136.0 B]                                                
Operation completed over 1 objects/136.0 B.                                      


In [51]:
!gsutil cat {SAVED_MODEL_PATH}/batching.pbtxt


max_batch_size { value: 128 }
batch_timeout_micros { value: 150000 }
max_enqueued_batches { value: 16 }
num_batch_threads { value: 8 }


In [52]:
!gsutil ls {SAVED_MODEL_PATH}

gs://mlops-dev-workspace/models/resnet_serving/batching.pbtxt
gs://mlops-dev-workspace/models/resnet_serving/1/


#### Deploy the container

In [53]:
version_name = 'batching_150'

url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}/versions'

request_body = {
    # Service settings
    "name": version_name,
    "deployment_uri": SAVED_MODEL_PATH,
    "machine_type": 'n1-standard-8',
    "accelerator_config": {
        "count": 1,
        "type": 'NVIDIA_TESLA_P4'},
    "routes": {
        "predict": f"/v1/models/{model_name}:predict",
        "health": f"/v1/models/{model_name}"},
    
    # Container settings
    "container": {
        "image": gpu_image_name,
        "args": [
            "--rest_api_port=8080",
            f"--model_name={model_name}",
            "--model_base_path=$(AIP_STORAGE_URI)",
            "--enable_batching",
            "--batching_parameters_file=$(AIP_STORAGE_URI)/batching.pbtxt"]}
}
            
response = authed_session.post(url, data=json.dumps(request_body))
response.json()

{'name': 'projects/mlops-dev-env/operations/create_ResNet101_batching_150-1597710897245',
 'metadata': {'@type': 'type.googleapis.com/google.cloud.ml.v1.OperationMetadata',
  'createTime': '2020-08-18T00:34:58Z',
  'operationType': 'CREATE_VERSION',
  'modelName': 'projects/mlops-dev-env/models/ResNet101',
  'version': {'name': 'projects/mlops-dev-env/models/ResNet101/versions/batching_150',
   'deploymentUri': 'gs://mlops-dev-workspace/models/resnet_serving',
   'createTime': '2020-08-18T00:34:57Z',
   'etag': 'GM7PM2uxJB0=',
   'machineType': 'n1-standard-8',
   'acceleratorConfig': {'count': '1', 'type': 'NVIDIA_TESLA_P4'},
   'container': {'image': 'gcr.io/mlops-dev-env/tensorflow_serving:latest-gpu',
    'args': ['--rest_api_port=8080',
     '--model_name=ResNet101',
     '--model_base_path=$(AIP_STORAGE_URI)',
     '--enable_batching',
     '--batching_parameters_file=$(AIP_STORAGE_URI)/batching.pbtxt']},
   'routes': {'predict': '/v1/models/ResNet101:predict',
    'health': '/v1

#### Check the deployment status

In [55]:
url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}/versions/{version_name}'

response = authed_session.get(url)
response.json()

{'name': 'projects/mlops-dev-env/models/ResNet101/versions/batching_150',
 'isDefault': True,
 'deploymentUri': 'gs://mlops-dev-workspace/models/resnet_serving',
 'createTime': '2020-08-18T00:34:57Z',
 'state': 'READY',
 'etag': 'Kg+4YsdtKYY=',
 'machineType': 'n1-standard-8',
 'acceleratorConfig': {'count': '1', 'type': 'NVIDIA_TESLA_P4'},
 'container': {'image': 'gcr.io/mlops-dev-env/tensorflow_serving:latest-gpu',
  'args': ['--rest_api_port=8080',
   '--model_name=ResNet101',
   '--model_base_path=$(AIP_STORAGE_URI)',
   '--enable_batching',
   '--batching_parameters_file=$(AIP_STORAGE_URI)/batching.pbtxt']},
 'routes': {'predict': '/v1/models/ResNet101:predict',
  'health': '/v1/models/ResNet101'}}

## Testing the deployed model

You will now run inference by invoking the TF Serving `Predict` API.

Refer to the [TF Serving REST API Reference](https://www.tensorflow.org/tfx/serving/api_rest) for more information about the API format.

#### Load sample images

In [56]:
image_folder = 'test_images'
raw_images = [tf.io.read_file(os.path.join(image_folder, image_path)).numpy()
         for image_path in os.listdir(image_folder)]

encoded_images = [{'b64': base64.b64encode(image).decode('utf-8')} for image in raw_images]  

#### Call the `predict` endpoint 

In [57]:
url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}/versions/{version_name}:predict'
signature = 'serving_preprocess'

request_body = {
            'signature_name': signature,
            'instances': encoded_images
        }
            
response = authed_session.post(url, data=json.dumps(request_body))
response.json()

{'predictions': [{'labels': ['military uniform',
    'suit',
    'Windsor tie',
    'pickelhaube',
    'bow tie'],
   'probabilities': [0.940013826,
    0.0485324822,
    0.00640657172,
    0.00201301626,
    0.000604337547]},
  {'labels': ['Egyptian cat', 'tiger cat', 'tabby', 'lynx', 'Siamese cat'],
   'probabilities': [0.827052057,
    0.131283119,
    0.0410555713,
    0.0005708182,
    1.89249167e-05]}]}

## Cleaning up

### Delete model version and model resources
#### List model versions

In [58]:
model_name = 'ResNet101'

url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}/versions'

response = authed_session.get(url)
response.json()

{'versions': [{'name': 'projects/mlops-dev-env/models/ResNet101/versions/batching_150',
   'isDefault': True,
   'deploymentUri': 'gs://mlops-dev-workspace/models/resnet_serving',
   'createTime': '2020-08-18T00:34:57Z',
   'lastUseTime': '2020-08-18T00:42:19Z',
   'state': 'READY',
   'etag': 'Kg+4YsdtKYY=',
   'machineType': 'n1-standard-8',
   'acceleratorConfig': {'count': '1', 'type': 'NVIDIA_TESLA_P4'},
   'container': {'image': 'gcr.io/mlops-dev-env/tensorflow_serving:latest-gpu',
    'args': ['--rest_api_port=8080',
     '--model_name=ResNet101',
     '--model_base_path=$(AIP_STORAGE_URI)',
     '--enable_batching',
     '--batching_parameters_file=$(AIP_STORAGE_URI)/batching.pbtxt']},
   'routes': {'predict': '/v1/models/ResNet101:predict',
    'health': '/v1/models/ResNet101'}}]}

#### Delete the specific version

In [59]:
version_name = 'batching_150'

url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}/versions/{version_name}'

response = authed_session.delete(url)
response.json()

{'name': 'projects/mlops-dev-env/operations/delete_ResNet101_batching_150-1597711347712',
 'metadata': {'@type': 'type.googleapis.com/google.cloud.ml.v1.OperationMetadata',
  'createTime': '2020-08-18T00:42:27Z',
  'operationType': 'DELETE_VERSION',
  'modelName': 'projects/mlops-dev-env/models/ResNet101',
  'version': {'name': 'projects/mlops-dev-env/models/ResNet101/versions/batching_150',
   'deploymentUri': 'gs://mlops-dev-workspace/models/resnet_serving',
   'createTime': '2020-08-18T00:34:57Z',
   'state': 'READY',
   'etag': 'Kg+4YsdtKYY=',
   'machineType': 'n1-standard-8',
   'acceleratorConfig': {'count': '1', 'type': 'NVIDIA_TESLA_P4'},
   'container': {'image': 'gcr.io/mlops-dev-env/tensorflow_serving:latest-gpu',
    'args': ['--rest_api_port=8080',
     '--model_name=ResNet101',
     '--model_base_path=$(AIP_STORAGE_URI)',
     '--enable_batching',
     '--batching_parameters_file=$(AIP_STORAGE_URI)/batching.pbtxt']},
   'routes': {'predict': '/v1/models/ResNet101:predict

#### Delete the model

In [None]:
url = f'{service_endpoint}/v1/projects/{project_id}/models/{model_name}'

response = authed_session.delete(url)
response.json()

## Next Steps

Walk through the `aipp_deploy.ipynb` notebook to learn how to deploy the custom serving module created in this notebook to **AI Platform Prediction** using TF Serving container image.

## License

<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>