# Serving Machine Learning models with Google Vertex AI

Great to have you here, this is the code for the following article:

* https://medium.com/google-cloud/serving-machine-learning-models-with-google-vertex-ai-5d9644ededa3

Your feedback and questions are highly appreciated. <br>You can find me on Twitter [@HeyerSascha](https://twitter.com/HeyerSascha) or connect with me via [LinkedIn](https://www.linkedin.com/in/saschaheyer/). <br>Even better, subscribe to my [YouTube](https://www.youtube.com/channel/UC--Sm3D-rqCUeLXmraypdPQ) channel ❤️.

In [None]:
# @title
from IPython.display import HTML

HTML(
    '<iframe width="560" height="315" src="https://www.youtube.com/embed/brNMT7Snlh0" frameborder="0" allowfullscreen></iframe>'
)

In [None]:
from google.colab import auth

auth.authenticate_user()

In [None]:
!gcloud config set project joyas-vietnam

Updated property [core/project].


In [2]:
!gcloud auth application-default login

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login&state=7mOLKckXlVY9JOuUG5cReF1Se1FHaN&access_type=offline&code_challenge=DfMbxtUPqVNvcwm7Q7D2rPuMOJYkofCHLWDr5M0UWao&code_challenge_method=S256


Credentials saved to file: [/Users/datkhong/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).

Quota project "joyas-vietnam" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.


Updates are available for some Google Cloud CLI 

## Custom Prediction Container with FastAPI

In [None]:
PROJECT_ID = "joyas-vietnam"
PIPELINE_ROOT = "gs://dev-joyas-recommendation/"
LOCATION = "asia"
# use this instead
aiplatform.init(project=PROJECT_ID, location="asia-southeast1")

In [1]:
%%writefile main.py
import uvicorn

import tensorflow as tf
import os
import numpy as np
from enum import Enum
from typing import List, Optional
from pydantic import BaseModel

from fastapi import Request, FastAPI, Response
from fastapi.responses import JSONResponse
from transformers import DistilBertTokenizerFast
from transformers import TFDistilBertForSequenceClassification

tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
model = TFDistilBertForSequenceClassification.from_pretrained("../sentiment")

app = FastAPI(title="Sentiment Analysis")

AIP_HEALTH_ROUTE = os.environ.get("AIP_HEALTH_ROUTE", "/health")
AIP_PREDICT_ROUTE = os.environ.get("AIP_PREDICT_ROUTE", "/predict")


class Prediction(BaseModel):
    sentiment: str
    confidence: Optional[float]


class Predictions(BaseModel):
    predictions: List[Prediction]


# instad of creating a class we could have also loaded this information
# from the model configuration. Better if you introduce new labels over time
class Sentiment(Enum):
    NEGATIVE = 0
    POSITIVE = 1


@app.get(AIP_HEALTH_ROUTE, status_code=200)
async def health():
    return {"health": "ok"}


@app.post(
    AIP_PREDICT_ROUTE,
    response_model=Predictions,
    response_model_exclude_unset=True,
)
async def predict(request: Request):
    body = await request.json()
    print(body)

    instances = body["instances"]
    print(instances)
    print(type(instances))
    instances = [x["text"] for x in instances]
    print(instances)

    tf_batch = tokenizer(
        instances,
        max_length=128,
        padding=True,
        truncation=True,
        return_tensors="tf",
    )

    print(tf_batch)

    tf_outputs = model(tf_batch)

    print(tf_outputs)

    tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
    print(tf_predictions)

    indices = np.argmax(tf_predictions, axis=-1)
    confidences = np.max(tf_predictions, axis=-1)

    outputs = []

    for index, confidence in zip(indices, confidences):
        sentiment = Sentiment(index).name
        print(index)
        print(confidence)
        outputs.append(Prediction(sentiment=sentiment, confidence=confidence))

    return Predictions(predictions=outputs)


if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=8080)

Writing main.py


In [14]:
%%writefile Dockerfile
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
RUN pip install --no-cache-dir transformers==4.1.1 tensorflow==2.9.1 numpy==1.23.1 pydantic==1.9.1
COPY main.py ./main.py
# COPY ./sentiment /sentiment

Overwriting Dockerfile


In [15]:
!docker build -t sentiment-fast-api .

#0 building with "desktop-linux" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 249B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
#2 DONE 0.9s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [internal] load build context
#4 transferring context: 29B done
#4 DONE 0.0s

#5 [1/3] FROM docker.io/tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim@sha256:cce370ade672f3bfcac80d0c80314fc6b6530d3c623dab384af12da76cd2db6b
#5 resolve docker.io/tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim@sha256:cce370ade672f3bfcac80d0c80314fc6b6530d3c623dab384af12da76cd2db6b 0.0s done
#5 sha256:bb068c84195eeb57b82fe8388fea53eb6fa847d8ba240f27c3a3866b34f9669c 0B / 11.67MB 0.1s
#5 sha256:a2318d6c47ec9cac5acc500c47c79602bcf953cec711a18bc898911a0984365b 0B / 29.13MB 0.1s
#5 sha256:fdb547ee6440a47755ef9d71a7c5d0f0686966305c2d17667ea5658657a7ef6b 3.51kB / 

In [17]:
%%writefile cloudbuild.yaml
steps:
# Download the model to embed it into the image
- name: 'gcr.io/cloud-builders/gsutil'
  args: ['cp','-r', 'gs://dev-joyas-recommendation/models/sentiment', '.']
  id: 'download-model'
# Build the container image
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', 'gcr.io/joyas-vietnam/sentiment-fast-api', '.']
  waitFor: ['download-model']
# Push the container image to Container Registry
- name: 'gcr.io/cloud-builders/docker'
  args: ['push', 'gcr.io/joyas-vietnam/sentiment-fast-api']

images:
- gcr.io/joyas-vietnam/sentiment-fast-api

Overwriting cloudbuild.yaml


In [18]:
!gcloud builds submit --config cloudbuild.yaml

^C


## Upload and deploy model

In [None]:
!gcloud ai models upload \
  --container-ports=80 \
  --container-predict-route="/predict" \
  --container-health-route="/health" \
  --region=us-central1 \
  --display-name=sentiment-fast-api \
  --container-image-uri=gcr.io/sascha-playground-doit/sentiment-fast-api

Using endpoint [https://us-central1-aiplatform.googleapis.com/]


In [None]:
!gcloud ai endpoints create \
  --project=sascha-playground-doit \
  --region=us-central1 \
  --display-name=sentiment-fast-api

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Created Vertex AI endpoint: projects/234439745674/locations/us-central1/endpoints/7608484124768075776.


get model and endpoint IDs from previous steps
deployment takes 15 min aprox

In [None]:
!gcloud ai endpoints deploy-model 7608484124768075776 \
  --project=sascha-playground-doit \
  --region=us-central1 \
  --model=8709323962590429184 \
  --traffic-split=0=100 \
  --machine-type="n1-standard-2" \
  --display-name=sentiment

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Deployed a model to the endpoint 7608484124768075776. Id of the deployed model: 1352896281420234752.


## Predictions


### gcloud

In [None]:
%%writefile request.json
{
    "instances": [
        {"text": "DoiT is a great company."},
        {"text": "The beach was nice but overall the hotel was very bad."},
    ]
}

Overwriting request.json


In [None]:
!gcloud ai endpoints predict 4078442670165327872 \
  --region=us-central1 \
  --json-request=request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[{'confidence': 0.9409326314926147, 'sentiment': 'POSITIVE'}, {'confidence': 0.9964427351951599, 'sentiment': 'NEGATIVE'}]


### Vertex AI SDK

In [None]:
!pip install google-cloud-aiplatform==1.14.0 --upgrade

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting google-cloud-aiplatform==1.14.0
  Downloading google_cloud_aiplatform-1.14.0-py2.py3-none-any.whl (1.9 MB)
[K     |████████████████████████████████| 1.9 MB 4.8 MB/s 
[?25hCollecting protobuf<4.0.0dev,>=3.19.0
  Downloading protobuf-3.20.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 63.0 MB/s 
Collecting google-cloud-resource-manager<3.0.0dev,>=1.3.3
  Downloading google_cloud_resource_manager-1.6.3-py2.py3-none-any.whl (233 kB)
[K     |████████████████████████████████| 233 kB 62.8 MB/s 
Collecting proto-plus<2.0.0dev,>=1.15.0
  Downloading proto_plus-1.22.1-py3-none-any.whl (47 kB)
[K     |████████████████████████████████| 47 kB 4.6 MB/s 
[?25hCollecting google-cloud-storage<3.0.0dev,>=1.32.0
  Downloading google_cloud_storage-2.5.0-py2.py3-none-any.whl (106 kB)
[K     |████████████████████████████████| 

In [None]:
from google.cloud import aiplatform

project = "sascha-playground-doit"
location = "us-central1"

aiplatform.init(project=project, location=location)

In [None]:
instances = [
    {"text": "DoiT is a great company."},
    {"text": "The beach was nice but overall the hotel was very bad."},
]


endpoint = aiplatform.Endpoint(
    "projects/234439745674/locations/us-central1/endpoints/7608484124768075776"
)

prediction = endpoint.predict(instances=instances)
print(prediction)

Prediction(predictions=[{'sentiment': 'POSITIVE', 'confidence': 0.9409326314926147}, {'confidence': 0.9964427351951599, 'sentiment': 'NEGATIVE'}], deployed_model_id='1352896281420234752', explanations=None)
