# 05Tools: Prediction - Local

Predictions from models created in the 05 series of notebooks.

This notebook is part of collection of examples that showcase many ways to serve models:
- Online:
    - Vertex AI Endpoints: Python, REST, CLI (gcloud): [05Tools - Prediction - Online.ipynb](./05Tools%20-%20Prediction%20-%20Online.ipynb)
    - (**THIS NOTEBOOK**) Local with TensorFlow ModelServer: [05Tools - Prediction - Local.ipynb](./05Tools%20-%20Prediction%20-%20Local.ipynb)
    - Custom: Build a custom container with TensorFlow ModelServer: [05Tools - Prediction - Custom.ipynb](./05Tools%20-%20Prediction%20-%20Custom.ipynb)
        - Remote Service with Cloud Run
        - Local Service with Docker Run
- Batch: [05Tools - Prediction - Batch.ipynb](./05Tools%20-%20Prediction%20-%20Batch.ipynb)
    - BigQuery ML Model Import
    - Vertex AI Batch Prediction Jobs

**Prerequisites:**
-  At least 1 of the notebooks in this series [05a-05i]
- The [05 - Vertex AI Custom Model - TensorFlow - in Notebook](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20in%20Notebook.ipynb) notebook is required to run the [Keras model example](#keras) below.

**Conceptual Flow & Workflow**

<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_pred_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_pred_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
EXPERIMENT = '05_predictions'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [3]:
from google.cloud import aiplatform
from google.cloud import bigquery

import tensorflow as tf
import requests
import json
import numpy as np

import multiprocessing

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [5]:
BUCKET = PROJECT_ID
DIR = f"temp/{EXPERIMENT}"

environment:

In [6]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Get a Model For Predictions
This project already has a model serving online predictions at a Vertex AI Endpoint.  This section will use the endpoint to retrieve the deployed model and get its information to use for batch prediction methods in this notebook.

### Get Endpoint

[Endpoint Properties and Methods](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint):

```python
endpoint
endpoint.display_name
endpoint.resource_name
endpoint.traffic_split
endpoint.list_models()
```

In [7]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
endpoint = endpoints[0]

In [8]:
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/1961322035766362112?project=statmike-mlops-349915


### Get Model at Endpoint
Using the model on the endpoint for the current series:

In [9]:
endpoint

<google.cloud.aiplatform.models.Endpoint object at 0x7f0b23585350> 
resource name: projects/1026793852137/locations/us-central1/endpoints/1961322035766362112

In [10]:
#endpoint.list_models()[0]

In [11]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

### Review Model Information

In [12]:
model.display_name

'05_05h'

In [13]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05h'

In [14]:
model.version_id

'1'

In [15]:
model.version_description

'run-20220927230247-6'

In [16]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05h@1'

In [17]:
model.supported_input_storage_formats

['jsonl', 'bigquery', 'csv', 'tf-record', 'tf-record-gzip', 'file-list']

In [18]:
model.name

'model_05_05h'

In [19]:
model.uri

'gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model'

In [20]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}/versions/{model.version_id}/properties?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/model_05_05h/versions/1/properties?project=statmike-mlops-349915


#### Review Model Information Using the `aiplatform_v1` Model Client
It may also be helpful to try the [ModelServiceClient](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.model_service.ModelServiceClient) in version 1 of the client to review the model attributes.  Here is example code for trying this.

Curious about client versions and layers?  Check out this tip document [aiplatform_notes.md](../Tips/aiplatform_notes.md).

In [21]:
from google.cloud import aiplatform_v1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
ModelClientv1 = aiplatform_v1.ModelServiceClient(client_options = client_options)

ModelClientv1.get_model(
    name = model.versioned_resource_name
)

name: "projects/1026793852137/locations/us-central1/models/model_05_05h@1"
display_name: "05_05h"
predict_schemata {
}
metadata {
}
container_spec {
  image_uri: "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest"
}
supported_deployment_resources_types: DEDICATED_RESOURCES
supported_deployment_resources_types: SHARED_RESOURCES
supported_input_storage_formats: "jsonl"
supported_input_storage_formats: "bigquery"
supported_input_storage_formats: "csv"
supported_input_storage_formats: "tf-record"
supported_input_storage_formats: "tf-record-gzip"
supported_input_storage_formats: "file-list"
supported_output_storage_formats: "jsonl"
supported_output_storage_formats: "bigquery"
create_time {
  seconds: 1664323764
  nanos: 618427000
}
update_time {
  seconds: 1664323768
  nanos: 500624000
}
deployed_models {
  endpoint: "projects/1026793852137/locations/us-central1/endpoints/1961322035766362112"
  deployed_model_id: "6805735083375329280"
}
etag: "AMEw9yNcp6VeV0ro_K4TYuC4RycegqUGGvZ-j2T

---
## Retrieve Records For Prediction

In [22]:
n = 1000
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()

In [23]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST


Remove columns not included as features in the model:

In [24]:
newobs = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')
#newobs[0]

In [25]:
len(newobs)

1000

In [26]:
newobs[0]

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

---
## Notebook Predictions: Load Keras Model
<a id = 'keras'></a>

The version of TensorFlow used in the training job that created the model may be a different version than the one running in this notebook.  This can cause an issue with tf.keras.models.load_model.  Make sure the versions are the same to prevent issues.  For the demonstration here, it is known that the model trained in notebook/experiment = `05` was triained by the notebook enviornment so the version will match the one in this notebook.

To start, the `05` model will be retrieved.  When creating models in the Vertex AI Model Registry, the `model_id` was set to be in the format `model_{SERIES}_{EXPERIMENT}` and the specific `model_id` to reference here is `model_05_05`:

In [79]:
model_05 = aiplatform.Model(model_name = f'model_05_05')

In [80]:
model_05.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05@3'

In [81]:
model_05.uri

'gs://statmike-mlops-349915/05/05/models/20220927184222/model'

In [82]:
keras_model = tf.keras.models.load_model(model_05.uri) #(model.uri)

In [83]:
predictions = keras_model.predict(
    {key: tf.constant([value], dtype=tf.float32, name = key) for key, value in newobs[0].items()}
)
predictions

array([[9.9997985e-01, 2.0149941e-05]], dtype=float32)

In [84]:
np.argmax(predictions[0])

0

---
## Local Predictions: With TensorFlow ModelServer
Locally run [TensorFlow Serving with Docker](https://www.tensorflow.org/tfx/serving/docker#serving_example)
- Official [tensorflow/serving repository](https://hub.docker.com/r/tensorflow/serving/tags/) of images by version
- Note that there may be dependencies between the version of TensorFlow that did the training of the model and the version of the serving container that is used to server predictions from the model.


### Load the Model and Review Signature
Load the model currently on the series endpoint: stored in `model` above

In [31]:
reloaded_model = tf.saved_model.load(model.uri)

In [32]:
reloaded_model.signatures

_SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F0A5266CD10>})

In [33]:
reloaded_model.signatures['serving_default']

<ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F0A5266CD10>

In [34]:
reloaded_model.signatures['serving_default'].structured_input_signature

((),
 {'V7': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V7'),
  'V25': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V25'),
  'V10': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V10'),
  'V27': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V27'),
  'V1': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V1'),
  'V22': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V22'),
  'V12': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V12'),
  'V5': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V5'),
  'V9': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V9'),
  'V17': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V17'),
  'Amount': TensorSpec(shape=(None, 1), dtype=tf.float32, name='Amount'),
  'V2': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V2'),
  'V26': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V26'),
  'V18': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V18'),
  'V21': TensorSpec(shape=(None, 1), dtype=tf.f

In [35]:
#!saved_model_cli show --dir {DIR}/model --all

### Copy the model to local directory

Review the local directory for this notebook (created above):

In [36]:
DIR

'temp/05_predictions'

In [37]:
!ls {DIR}

Copy the model files to the local directory for this notebook:

In [38]:
!gsutil cp -R {model.uri} {DIR}

Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/keras_metadata.pb...
Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/saved_model.pb...
/ [2 files][513.3 KiB/513.3 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/variables/variables.data-00000-of-00001...
Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/variables/variables.index...
/ [4 files][558.2 KiB/558.2 KiB]                                                
Operation completed over 4 objects/558.2 KiB.                                    


In [39]:
!ls {DIR}

model


In [40]:
!ls {DIR}/model

keras_metadata.pb  saved_model.pb  variables


### Download (Pull) The Docker Image and Start Serving Container

**Note** In this series of notebooks `05` used the local notebook to train with TensorFlow while the notebooks `05a-i` use a pre-built container with a possibly different version. Knowing these versions may be important to pull the correct image from the tensorflow/serving repository of images.  The example below pulls the imaged tagged `2.7.4` to align with the version of TensorFlow used to train modles in notebooks `05a-i`.  

In [41]:
serving_version = '2.7.4'
#!docker pull tensorflow/serving # defaults to :latest
!docker pull tensorflow/serving:{serving_version}

2.7.4: Pulling from tensorflow/serving
Digest: sha256:100e9b37cd68cd6c049ef8399c408b2c51c1b024e90e2b6bc67690915cbef6dd
Status: Image is up to date for tensorflow/serving:2.7.4
docker.io/tensorflow/serving:2.7.4


### Run the serving image locally

The container is going to be run with commands in this notebook.  In order to run the serving while not tying up further exectutions in this notebook, a subprocess will be launched using `multiprocessing`. To learn more about multiprocessing and running tasks from Python in parallel visit the tips notebook [Python Multiprocessing](../Tips/Python%20Multiprocessing.ipynb).

First, build the syntax of the `docker run` command.  Note that the local model folder `{DIR}/model` is being mounted to a folder on the image in `/models/{SERIES}/01`.  The subfolder for `SERIES` will be created by the image because it matches the environment variable being passed in as `MODEL_NAME`.

In this demonstration the model is directly in the mounted folder, while in many production enviornments there might be multiple runs of a model in the folder organized by versioned subfolders.  For this reason the `/01` subfolder is added to the folder path to mimic the expected versioning by the image.  

In [57]:
command = f'''docker run -t -p 8501:8501 \
-v "/$(pwd)/{DIR}/model:/models/{SERIES}/01" \
-e MODEL_NAME={SERIES} \
tensorflow/serving:{serving_version}'''
print(command)

docker run -t -p 8501:8501 -v "/$(pwd)/temp/05_predictions/model:/models/05/01" -e MODEL_NAME=05 tensorflow/serving:2.7.4


Run the command in a subprocess at the local folder of this notebook - use multiprocess.Process():

In [58]:
!pwd

/home/jupyter/vertex-ai-mlops/05 - TensorFlow


In [59]:
def docker_runner():
    !{command}
    #!docker run -t -p 8501:8501 -v "/$(pwd)/temp/05_predictions/model:/models/05/01" -e MODEL_NAME=05 tensorflow/serving:2.7.4

def main():
    p = multiprocessing.Process(target=docker_runner)
    p.start()
    return p
    
p = main()

2022-09-30 12:56:25.198285: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: 05 model_base_path: /models/05
2022-09-30 12:56:25.198652: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-09-30 12:56:25.198689: I tensorflow_serving/model_servers/server_core.cc:591]  (Re-)adding model: 05
2022-09-30 12:56:25.299288: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: 05 version: 1}
2022-09-30 12:56:25.299348: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: 05 version: 1}
2022-09-30 12:56:25.299380: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: 05 version: 1}
2022-09-30 12:56:25.299441: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /models/05/01
2022-09-30 12:56:25.315147: I external/org_tensorflow/tensorflow/cc/saved_model/rea

### Get Predictions on Exposed Port

Prepare in instance (observation) for prediction request:

In [67]:
tryob_json = json.dumps({"instances": [newobs[0]]})
tryob_json

'{"instances": [{"Time": 35337, "V1": 1.0928441854981998, "V2": -0.0132303486713432, "V3": 1.35982868199426, "V4": 2.7315370965921004, "V5": -0.707357349219652, "V6": 0.8738370029866129, "V7": -0.7961301510622031, "V8": 0.437706509544851, "V9": 0.39676985012996396, "V10": 0.587438102569443, "V11": -0.14979756231827498, "V12": 0.29514781622888103, "V13": -1.30382621882143, "V14": -0.31782283120234495, "V15": -2.03673231037199, "V16": 0.376090905274179, "V17": -0.30040350116459497, "V18": 0.433799615590844, "V19": -0.145082264348681, "V20": -0.240427548108996, "V21": 0.0376030733329398, "V22": 0.38002620963091405, "V23": -0.16764742731151097, "V24": 0.0275573495476881, "V25": 0.59211469704354, "V26": 0.219695164116351, "V27": 0.0369695108704894, "V28": 0.010984441006191, "Amount": 0.0}]}'

Make prediction request:

In [68]:
json_response = requests.post(f'http://localhost:8501/v1/models/{SERIES}:predict', data=tryob_json, headers={"content-type": "application/json"})

In [69]:
json_response

<Response [200]>

In [70]:
print(json_response.text)

{
    "predictions": [[0.999359429, 0.000640570885]
    ]
}


In [71]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999359429, 0.000640570885]]

In [72]:
np.argmax(predictions[0])

0

### Shutdown TensorFlow Serving Container
There are two entities running: a subprocess called `p` and a docker container that was run by the subprocess.  It is not enough to just stop `p` but it might be enough to stop the container and then the subprocess will terminate due to completion.  The command below stop the subprocess `p` and then stop and remove the container.

In [73]:
p.terminate()

In [74]:
p.is_alive()

False

In [75]:
docker = !docker ps -a
docker

['CONTAINER ID   IMAGE                          COMMAND                  CREATED              STATUS              PORTS                              NAMES',
 '5acedd8a6d6d   tensorflow/serving:2.7.4       "/usr/bin/tf_serving…"   About a minute ago   Up About a minute   8500/tcp, 0.0.0.0:8501->8501/tcp   thirsty_curie',
 'cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c \'/opt/bi…"   6 weeks ago          Up 3 days                                              proxy-agent']

In [76]:
for d in docker:
    if 'tensorflow/serving' in d:
        print(d.split()[-1])
        !docker stop {d.split()[-1]}
        !docker rm {d.split()[0]}

thirsty_curie
thirsty_curie
5acedd8a6d6d


In [77]:
!docker ps -a

CONTAINER ID   IMAGE                          COMMAND                  CREATED       STATUS      PORTS     NAMES
cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c '/opt/bi…"   6 weeks ago   Up 3 days             proxy-agent
