
# 05Tools: Prediction - Online
Predictions from models created in the 05 series of notebooks.

This notebook is part of collection of examples that showcase many ways to serve models:
- Online:
    - (**THIS NOTEBOOK**) Vertex AI Endpoints: Python, REST, CLI (gcloud): [05Tools - Prediction - Online.ipynb](./05Tools%20-%20Prediction%20-%20Online.ipynb)
    - Local with TensorFlow ModelServer: [05Tools - Prediction - Local.ipynb](./05Tools%20-%20Prediction%20-%20Local.ipynb)
    - Custom: Build a custom container with TensorFlow ModelServer: [05Tools - Prediction - Custom.ipynb](./05Tools%20-%20Prediction%20-%20Custom.ipynb)
        - Remote Service with Cloud Run
        - Local Service with Docker Run
- Batch: [05Tools - Prediction - Batch.ipynb](./05Tools%20-%20Prediction%20-%20Batch.ipynb)
    - BigQuery ML Model Import
    - Vertex AI Batch Prediction Jobs

**Prerequisites:**
-  At least 1 of the notebooks in this series [05, 05a-05i]

**Conceptual Flow & Workflow**

<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_pred_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_pred_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
EXPERIMENT = '05_predictions'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [53]:
from google.cloud import aiplatform
from google.cloud import bigquery

import tensorflow as tf

from google.api import httpbody_pb2
from datetime import datetime
import json
import numpy as np

import asyncio
import time
import multiprocessing

clients:

In [4]:
aiplatform.init(project = PROJECT_ID, location = REGION)
bq = bigquery.Client(project = PROJECT_ID)

parameters:

In [5]:
BUCKET = PROJECT_ID
DIR = f"temp/{EXPERIMENT}"

environment:

In [6]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Get Vertex AI Endpoint

This project already has a model serving online predictions at a Vertex AI Endpoint.  This section will use the endpoint to retrieve the deployed model and get its information to use for online prediction methods in this notebook.

### Get Endpoint

[Endpoint Properties and Methods](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint):

```python
endpoint
endpoint.display_name
endpoint.resource_name
endpoint.traffic_split
endpoint.list_models()
```

In [7]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    print(f"There does not appear to be an endpoint for SERIES = {SERIES}")

Endpoint Exists: projects/1026793852137/locations/us-central1/endpoints/7422110306790277120


In [8]:
endpoint.display_name

'05'

In [10]:
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/7422110306790277120?project=statmike-mlops-349915


In [11]:
endpoint.traffic_split

{'6452842428394110976': 100}

In [13]:
endpoint.list_models()[0]

id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
display_name: "05_05a"
create_time {
  seconds: 1696372313
  nanos: 959486000
}
dedicated_resources {
  machine_spec {
    machine_type: "n1-standard-4"
  }
  min_replica_count: 1
  max_replica_count: 1
}
model_version_id: "16"

---
## Retrieve Records For Prediction

In [181]:
n = 1000
samples = bq.query(
    query = f"""
        SELECT * EXCEPT({VAR_TARGET}, {VAR_OMIT}, splits)
        FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}
        WHERE splits='TEST'
        LIMIT {n}"""
).to_dataframe()

In [182]:
samples.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.240428,0.037603,0.380026,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.26508,-0.060003,-0.053585,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,-0.374356,0.196006,0.656552,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-0.923275,-0.545992,-0.252324,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0


Remove columns not included as features in the model:

In [183]:
newobs = samples.to_dict(orient='records')
#newobs[0]

In [184]:
len(newobs)

1000

In [185]:
newobs[0]

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

---
## Online Predictions: Methods for Vertex AI Endpoints

There are multiple ways to interact with a Vertex AI Endpoint from Python.  This notebook gives examples of for Python (multiple version of the client, and layers), as well as REST and the `gcloud` CLI.  To better understand these clients, review the notes here: [aiplatform_notes.md](../Tips/aiplatform_notes.md).

>**Explanations**
>For each of the methods below, the `predict` part of the request can be exchanged for `explain` if the endpoint has a model deployed with explanations setup.  Note: This >will not work for the raw prediction methods.  See more about setting up explainability in the explainability notebooks within this series.
>- [05Tools - Explainability - Example-Based.ipynb](./05Tools%20-%20Explainability%20-%20Example-Based.ipynb)
>- [05Tools - Explainability - Feature-Based.ipynb](./05Tools%20-%20Explainability%20-%20Feature-Based.ipynb)

---
### Get Predictions: Python Client

[aiplatform.Endpoint.predict()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint#google_cloud_aiplatform_Endpoint_predict)

In [20]:
prediction = endpoint.predict(instances = newobs[0:1])
prediction

Prediction(predictions=[[0.99990654, 9.34729251e-05]], deployed_model_id='6452842428394110976', model_version_id='16', model_resource_name='projects/1026793852137/locations/us-central1/models/model_05_05a', explanations=None)

In [21]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [22]:
np.argmax(prediction.predictions[0])

0

#### Raw Predictions

Use arbitrary headers

[aiplatform.Endpoint.raw_predict()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint#google_cloud_aiplatform_Endpoint_raw_predict)

In [37]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
headers = {'Content-Type':'application/json'}

In [38]:
prediction = endpoint.raw_predict(
    body = json.dumps(instances).encode("utf-8"),
    headers = headers
)
prediction

<Response [200]>

In [39]:
prediction = json.loads(prediction.text)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [40]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [41]:
np.argmax(prediction['predictions'][0])

0

#### Async Predictions

Make the call for a prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform.Endpoint.predict_async()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint#google_cloud_aiplatform_Endpoint_predict_async)

In [42]:
prediction = await endpoint.predict_async(instances = newobs[0:1])
prediction

Prediction(predictions=[[0.99990654, 9.34729251e-05]], deployed_model_id='6452842428394110976', model_version_id='16', model_resource_name='projects/1026793852137/locations/us-central1/models/model_05_05a', explanations=None)

In [43]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [44]:
np.argmax(prediction.predictions[0])

0

---
### Get Predictions: Python Client (gapic access to v1)

This is functionally the same as the Python Client V1 section below.

[aiplatform.gapic.PredictionServiceClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.prediction_service.PredictionServiceClient)

#### Client

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [46]:
predictor = aiplatform.gapic.PredictionServiceClient(client_options = client_options)

#### Predictions

In [50]:
prediction = predictor.predict(
    endpoint = endpoint.resource_name,
    instances = newobs[0:1]
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.99990654
    }
    values {
      number_value: 9.34729251e-05
    }
  }
}
deployed_model_id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
model_display_name: "05_05a"
model_version_id: "16"

In [51]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [52]:
np.argmax(prediction.predictions[0])

0

#### Raw Predictions

In [55]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [56]:
prediction = predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.99990654, 9.34729251e-05]\n    ]\n}"

In [58]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [59]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [60]:
np.argmax(prediction['predictions'][0])

0

#### Async Predictions

Make the call for a prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform.gapic.PredictionServiceAsyncClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.prediction_service.PredictionServiceAsyncClient#google_cloud_aiplatform_v1_services_prediction_service_PredictionServiceAsyncClient_predict)

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [61]:
async_predictor = aiplatform.gapic.PredictionServiceAsyncClient(client_options = client_options)

In [62]:
prediction = await async_predictor.predict(
    endpoint = endpoint.resource_name,
    instances = newobs[0:1]
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.99990654
    }
    values {
      number_value: 9.34729251e-05
    }
  }
}
deployed_model_id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
model_display_name: "05_05a"
model_version_id: "16"

In [63]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [64]:
np.argmax(prediction.predictions[0])

0

#### Async Raw Predictions

Make the call for a raw prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform.gapic.PredictionServiceAsyncClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.prediction_service.PredictionServiceAsyncClient#google_cloud_aiplatform_v1_services_prediction_service_PredictionServiceAsyncClient_predict)

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [61]:
async_predictor = aiplatform.gapic.PredictionServiceAsyncClient(client_options = client_options)

In [65]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [66]:
prediction = await async_predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.99990654, 9.34729251e-05]\n    ]\n}"

In [67]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [68]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [69]:
np.argmax(prediction['predictions'][0])

0

---
### Get Predictions: Python Client V1

This is functionally the same as the Python Client gapic section above.

[aiplatform_v1.PredictionServiceClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.prediction_service.PredictionServiceClient)

#### Client

In [70]:
from google.cloud import aiplatform_v1

In [71]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [72]:
predictor = aiplatform_v1.PredictionServiceClient(client_options = client_options)

#### Predictions

In [73]:
prediction = predictor.predict(
    endpoint = endpoint.resource_name,
    instances = newobs[0:1]
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.99990654
    }
    values {
      number_value: 9.34729251e-05
    }
  }
}
deployed_model_id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
model_display_name: "05_05a"
model_version_id: "16"

In [74]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [75]:
np.argmax(prediction.predictions[0])

0

#### Raw Predictions

In [76]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [77]:
prediction = predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.99990654, 9.34729251e-05]\n    ]\n}"

In [78]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [79]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [80]:
np.argmax(prediction['predictions'][0])

0

#### Async Predictions

Make the call for a prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform_v1.PredictionServiceAsyncClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.prediction_service.PredictionServiceAsyncClient)

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [82]:
async_predictor = aiplatform_v1.PredictionServiceAsyncClient(client_options = client_options)

In [83]:
prediction = await async_predictor.predict(
    endpoint = endpoint.resource_name,
    instances = newobs[0:1]
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.99990654
    }
    values {
      number_value: 9.34729251e-05
    }
  }
}
deployed_model_id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
model_display_name: "05_05a"
model_version_id: "16"

In [84]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [85]:
np.argmax(prediction.predictions[0])

0

#### Async Raw Predictions

Make the call for a raw prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform_v1.PredictionServiceAsyncClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.prediction_service.PredictionServiceAsyncClient#google_cloud_aiplatform_v1_services_prediction_service_PredictionServiceAsyncClient_predict)

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [86]:
async_predictor = aiplatform_v1.PredictionServiceAsyncClient(client_options = client_options)

In [87]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [88]:
prediction = await async_predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.99990654, 9.34729251e-05]\n    ]\n}"

In [89]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [90]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [91]:
np.argmax(prediction['predictions'][0])

0

---
### Get Predictions: Python Client V1 beta 1

[aiplatform_v1beta1.PredictionServiceClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1beta1.services.prediction_service.PredictionServiceClient)

#### Client

In [92]:
from google.cloud import aiplatform_v1beta1

In [93]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [94]:
predictor = aiplatform_v1beta1.PredictionServiceClient(client_options = client_options)

#### Predictions

In [95]:
prediction = predictor.predict(
    endpoint = endpoint.resource_name,
    instances = newobs[0:1]
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.99990654
    }
    values {
      number_value: 9.34729251e-05
    }
  }
}
deployed_model_id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
model_display_name: "05_05a"
model_version_id: "16"

In [96]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [97]:
np.argmax(prediction.predictions[0])

0

#### Raw Predictions

In [98]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [99]:
prediction = predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.99990654, 9.34729251e-05]\n    ]\n}"

In [100]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [101]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [102]:
np.argmax(prediction['predictions'][0])

0

#### Async Predictions

Make the call for a prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform_v1beta1.PredictionServiceAsyncClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1beta1.services.prediction_service.PredictionServiceAsyncClient)

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [104]:
async_predictor = aiplatform_v1beta1.PredictionServiceAsyncClient(client_options = client_options)

In [105]:
prediction = await async_predictor.predict(
    endpoint = endpoint.resource_name,
    instances = newobs[0:1]
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.99990654
    }
    values {
      number_value: 9.34729251e-05
    }
  }
}
deployed_model_id: "6452842428394110976"
model: "projects/1026793852137/locations/us-central1/models/model_05_05a"
model_display_name: "05_05a"
model_version_id: "16"

In [106]:
prediction.predictions[0]

[0.99990654, 9.34729251e-05]

In [107]:
np.argmax(prediction.predictions[0])

0

#### Async Raw Predictions

Make the call for a raw prediction an [awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables) coroutine.

[aiplatform_v1beta1.PredictionServiceAsyncClient()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1beta1.services.prediction_service.PredictionServiceAsyncClient)

In [45]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [108]:
async_predictor = aiplatform_v1beta1.PredictionServiceAsyncClient(client_options = client_options)

In [109]:
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [110]:
prediction = await async_predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.99990654, 9.34729251e-05]\n    ]\n}"

In [111]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [112]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [113]:
np.argmax(prediction['predictions'][0])

0

---
### Get Prediction: REST

REST Resource [v1.projects.locations.endpoints](https://cloud.google.com/vertex-ai/docs/reference/rest#rest-resource:-v1.projects.locations.endpoints)

#### Method 1: Command Line CURL

In [130]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": newobs[0:1]}))

In [131]:
prediction = !curl -s POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

prediction = json.loads(''.join([p.strip() for p in prediction]))
prediction

{'predictions': [[0.99990654, 9.34729251e-05]],
 'deployedModelId': '6452842428394110976',
 'model': 'projects/1026793852137/locations/us-central1/models/model_05_05a',
 'modelDisplayName': '05_05a',
 'modelVersionId': '16'}

In [132]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [133]:
np.argmax(prediction['predictions'][0])

0

##### Use CURL for Raw Predictions

In [134]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"signature_name": "serving_default", "instances": [newobs[0]]}))

In [135]:
prediction = !curl -s POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict

prediction = json.loads(''.join([p.strip() for p in prediction]))
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [136]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [137]:
np.argmax(prediction['predictions'][0])

0

#### Method 2: Python with requests

In [138]:
import requests

In [139]:
token = !gcloud auth application-default print-access-token
headers = {
    "content-type": "application/json; charset=utf-8",
    "Authorization": f'Bearer {token[0]}'
}
json_response = requests.post(
    f'https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict',
    data = json.dumps({"instances": [newobs[0]]}),
    headers = headers
)

In [140]:
print(json_response.text)

{
  "predictions": [
    [
      0.99990654,
      9.34729251e-05
    ]
  ],
  "deployedModelId": "6452842428394110976",
  "model": "projects/1026793852137/locations/us-central1/models/model_05_05a",
  "modelDisplayName": "05_05a",
  "modelVersionId": "16"
}



In [141]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.99990654, 9.34729251e-05]]

In [142]:
np.argmax(predictions[0])

0

##### Use Requests for Raw Predictions

In [103]:
import requests

In [143]:
token = !gcloud auth application-default print-access-token
headers = {
    "content-type": "application/json; charset=utf-8", 
    "Authorization": f'Bearer {token[0]}'
}
json_response = requests.post(
    f'https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict', 
    data = json.dumps({"signature_name": "serving_default", "instances": [newobs[0]]}),
    headers = headers
)

In [144]:
print(json_response.text)

{
    "predictions": [[0.99990654, 9.34729251e-05]
    ]
}


In [145]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.99990654, 9.34729251e-05]]

In [146]:
np.argmax(predictions[0])

0

---
### Get Prediction: gcloud (CLI)

[gcloud ai endpoints](https://cloud.google.com/sdk/gcloud/reference/ai/endpoints)

In [147]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newobs[0]]}))

In [148]:
prediction = !gcloud ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json
prediction

['Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]',
 '[[0.99990654, 9.34729251e-05]]']

In [149]:
import ast
prediction = ast.literal_eval(prediction[1])
prediction[0]

[0.99990654, 9.34729251e-05]

In [150]:
np.argmax(prediction[0])

0

#### Use gcloud (CLI) For Raw Predictions

In [151]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"signature_name": "serving_default", "instances": [newobs[0]]}))

In [152]:
prediction = !gcloud ai endpoints raw-predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --format="json" --request=@{DIR}/request.json
prediction

['Using endpoint [https://us-central1-aiplatform.googleapis.com/]',
 '{',
 '  "predictions": [',
 '    [',
 '      0.99990654,',
 '      9.34729251e-05',
 '    ]',
 '  ]',
 '}']

In [153]:
prediction = json.loads("".join(prediction[1:]))
prediction

{'predictions': [[0.99990654, 9.34729251e-05]]}

In [154]:
prediction['predictions'][0]

[0.99990654, 9.34729251e-05]

In [155]:
np.argmax(prediction['predictions'][0])

0

---
## Requesting Many Predictions: Synchronous and Asynchronus

There are times where you want to make many request of an endpoint for predictions.  If you send request one at a time, synchronous, then the endpoint will fullfill each request as it receives it.  An endpoint is designed to handle simoultaneous requests, asynchronous.  If you set the max_replicas > 1 during the endpoint setup then it will also scale up to handle the amount of traffic.  
- [Configure compute resources for prediction](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute)

Using Python to make concurrent request is an example of multiprocessing.  A review of the tip notebook [Python Multiprocessing](../Tips/Python%20Multiprocessing.ipynb) can be helpful for understanding the method used below to make asynchronous requests concurrently using `asyncio`.

---
## Online Predictions: Synchronous Examples
Synchronous calls to the Vertex AI Endpoint with different batch size of instances.  This is packaging multiple prediction request up in a single call of the endpoint - size is batch_size.

In [156]:
len(newobs)

1000

In [173]:
def syncPredictions(instances, batch_size = 1):
    predictions = []
    start = time.perf_counter()
    # a loop where each step request predictions for batch_size number of instances - in a single request
    for p in range(0, len(instances), batch_size):
        #instances = [json_format.ParseDict(example, Value()) for example in newobs[p:p+batch_size]]
        preds = endpoint.predict(instances = instances[p:p+batch_size])
        predictions.extend(np.argmax(pred) for pred in preds.predictions)
    elapsed = time.perf_counter() - start
    print(f'{elapsed:0.5f} seconds for {len(instances)} instances in sychronous batches of size = {batch_size}')
    return predictions

In [175]:
# default batch_size = 1
predictions = syncPredictions(newobs)

13.41630 seconds for 1000 instances in sychronous batches of size = 1


In [176]:
# specify batch_size = 2 - expecting half the time if the endpoint can handle multiple at the same time
predictions = syncPredictions(newobs, batch_size = 2)

6.59109 seconds for 1000 instances in sychronous batches of size = 2


In [177]:
# specify batch_size = 10 - expecting 1/10 the time if the endpoint can handle this many at the same time
predictions = syncPredictions(newobs, batch_size = 10)

1.69778 seconds for 1000 instances in sychronous batches of size = 10


In [178]:
# get a count of the number of predictions that resulted in 0 (not fraud) and 1 (fraud)
from collections import Counter
c = Counter(predictions)
c

Counter({0: 998, 1: 2})

In [186]:
# get the index for the predictions that resulted in predicting = 1 (fraud)
[i for i, j in enumerate(predictions) if j == 1]

[53, 576]

In [187]:
# review the inputs that lead to a prediction = 1 (fraud)
samples.iloc[[i for i, j in enumerate(predictions) if j == 1]]

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
53,85285,-7.030308,3.421991,-9.525072,5.270891,-4.02463,-2.865682,-6.989195,3.791551,-4.62273,...,0.545698,1.103398,-0.541855,0.036943,-0.355519,0.353634,1.042458,1.359516,-0.272188,0.0
576,56887,-0.075483,1.812355,-2.566981,4.127549,-1.628532,-0.805895,-3.390135,1.019353,-2.451251,...,0.338598,0.794372,0.270471,-0.143624,0.013566,0.634203,0.213693,0.773625,0.387434,5.0


---
## Online Predictions: Asynchronous Examples

Asynchronous calls to the the Vertex AI Endpoint with different batch sizes and number of concurrent requests.  

In [236]:
len(newobs)

1000

In [227]:
async def asyncPredictions(instances, batch_size = 1, limit_concur_request = 10):
    limit = asyncio.Semaphore(limit_concur_request)
    # requests come back out of order so create an ordered structure to capture them
    predictions = [None] * len(instances)
    
    # function to make prediction requests
    async def predictor(p, instances):
        async with limit:
            if limit.locked():
                await asyncio.sleep(.01)
            preds = await endpoint.predict_async(instances = instances)
        predictions[p:p+batch_size] = [np.argmax(pred) for pred in preds.predictions]

    # function to manage concurrent prediction requests
    async def runner(instances):
        tasks = []
        for p in range(0, len(instances), batch_size):
            task = asyncio.create_task(predictor(p, instances[p:p+batch_size]))
            tasks.append(task)
        results = await asyncio.gather(*tasks)
    
    start = time.perf_counter()
    await runner(instances)
    elapsed = time.perf_counter() - start
    print(f'{elapsed:0.5f} seconds for {len(instances)} instances in asynchronous batches of size = {batch_size} managed within {limit_concur_request} concurrent requests')
    
    return predictions

In [228]:
# force synchronous request in batch_size = 1
predictions = await asyncPredictions(newobs, batch_size = 1, limit_concur_request = 1)

25.39337 seconds for 1000 instances in asynchronous batches of size = 1 managed within 1 concurrent requests


In [229]:
# force synchronous request in batch_size = 2
predictions = await asyncPredictions(newobs, batch_size = 2, limit_concur_request = 1)

14.36343 seconds for 1000 instances in asynchronous batches of size = 2 managed within 1 concurrent requests


In [230]:
# force synchronous request in batch_size = 10
predictions = await asyncPredictions(newobs, batch_size = 10, limit_concur_request = 1)

2.89454 seconds for 1000 instances in asynchronous batches of size = 10 managed within 1 concurrent requests


In [231]:
# force asynchronous with 2 concurrent and batch_size = 1
predictions = await asyncPredictions(newobs, batch_size = 1, limit_concur_request = 2)

12.36634 seconds for 1000 instances in asynchronous batches of size = 1 managed within 2 concurrent requests


In [232]:
# force asynchronous with 10 concurrent and batch_size = 1
predictions = await asyncPredictions(newobs, batch_size = 1, limit_concur_request = 10)

2.48452 seconds for 1000 instances in asynchronous batches of size = 1 managed within 10 concurrent requests


In [233]:
# force asynchronous with 20 concurrent and batch_size = 1
predictions = await asyncPredictions(newobs, batch_size = 1, limit_concur_request = 20)

1.45734 seconds for 1000 instances in asynchronous batches of size = 1 managed within 20 concurrent requests


In [234]:
# force asynchronous with 10 concurrent and batch_size = 2
predictions = await asyncPredictions(newobs, batch_size = 1, limit_concur_request = 20)

1.44272 seconds for 1000 instances in asynchronous batches of size = 1 managed within 20 concurrent requests


In [240]:
# all at once with 10 concurrent and batch_size = 100
predictions = await asyncPredictions(newobs, batch_size = 100, limit_concur_request = 10)

0.27778 seconds for 1000 instances in asynchronous batches of size = 100 managed within 10 concurrent requests


In [241]:
# all at once with 100 concurrent and batch_size = 10
predictions = await asyncPredictions(newobs, batch_size = 10, limit_concur_request = 100)

0.39970 seconds for 1000 instances in asynchronous batches of size = 10 managed within 100 concurrent requests


In [242]:
# get a count of the number of predictions that resulted in 0 (not fraud) and 1 (fraud)
from collections import Counter
c = Counter(predictions)
c

Counter({0: 998, 1: 2})

In [243]:
# get the index for the predictions that resulted in predicting = 1 (fraud)
[i for i, j in enumerate(predictions) if j == 1]

[53, 576]

In [244]:
# review the inputs that lead to a prediction = 1 (fraud)
samples.iloc[[i for i, j in enumerate(predictions) if j == 1]]

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
53,85285,-7.030308,3.421991,-9.525072,5.270891,-4.02463,-2.865682,-6.989195,3.791551,-4.62273,...,0.545698,1.103398,-0.541855,0.036943,-0.355519,0.353634,1.042458,1.359516,-0.272188,0.0
576,56887,-0.075483,1.812355,-2.566981,4.127549,-1.628532,-0.805895,-3.390135,1.019353,-2.451251,...,0.338598,0.794372,0.270471,-0.143624,0.013566,0.634203,0.213693,0.773625,0.387434,5.0
