# Vertex AI Batch Prediction

This is the **third** part of a series. I recommend to check out the previous articles as prerequisite for this part.

1. [Train Machine Learning models with Vertex AI Training](https://medium.com/google-cloud/how-to-train-ml-models-with-vertex-ai-training-f9046bfbcfab)
2. [Serving Machine Learning models with Google Vertex AI](https://medium.com/google-cloud/serving-machine-learning-models-with-google-vertex-ai-5d9644ededa3)
3. [Article](https://medium.com/google-cloud/google-vertex-ai-batch-predictions-ad7057d18d1f)


Your feedback and questions are highly appreciated. <br>You can find me on Twitter [@HeyerSascha](https://twitter.com/HeyerSascha) or connect with me via [LinkedIn](https://www.linkedin.com/in/saschaheyer/). <br>Even better, subscribe to my [YouTube](https://www.youtube.com/channel/UC--Sm3D-rqCUeLXmraypdPQ) channel ❤️.

## Authentication and Dependencies

In [None]:
from google.colab import auth
auth.authenticate_user()

In [None]:
! gcloud config set project sascha-playground-doit

Updated property [core/project].


In [None]:
!pip install google-cloud-aiplatform==1.26.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting google-cloud-aiplatform==1.26.0
  Downloading google_cloud_aiplatform-1.26.0-py2.py3-none-any.whl (2.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
Collecting google-cloud-resource-manager<3.0.0dev,>=1.3.3 (from google-cloud-aiplatform==1.26.0)
  Downloading google_cloud_resource_manager-1.10.1-py2.py3-none-any.whl (321 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.3/321.3 kB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting shapely<2.0.0 (from google-cloud-aiplatform==1.26.0)
  Downloading Shapely-1.8.5.post1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m55.4 MB/s[0m eta [36m0:00:00[0m
Collecting grpc-google-iam-v1<1.0.0dev,>=0.12.4 (from google-cloud-resourc

## Init SDK

In [None]:
from google.cloud import aiplatform

In [None]:
aiplatform.init(project='sascha-playground-doit')

## Upload model

If you want to know how the container-image (prediction container) was created check out my previous article: </br>[Serving Machine Learning models with Google Vertex AI](https://medium.com/google-cloud/serving-machine-learning-models-with-google-vertex-ai-5d9644ededa3)

### gcloud

In [None]:
!gcloud ai models upload \
  --container-ports=80 \
  --container-predict-route="/predict" \
  --container-health-route="/health" \
  --region=us-central1 \
  --display-name=sentiment-batch-example \
  --container-image-uri=gcr.io/sascha-playground-doit/sentiment-fast-api

Using endpoint [https://us-central1-aiplatform.googleapis.com/]


### SDK

In [None]:
 model = aiplatform.Model.upload(
        display_name="sentiment-batch-example",
        serving_container_image_uri="gcr.io/sascha-playground-doit/sentiment-fast-api",
        serving_container_predict_route="/predict",
        serving_container_health_route="/health",
        serving_container_ports=[80]
    )

## Reference the model

In [None]:
#you can use the following code to create a reference to a model if the model is already uploaded
model = aiplatform.Model('projects/sascha-playground-doit/locations/us-central1/models/5347349457363009536')

## Start Batch Prediction / Cloud Storage

In [None]:
batch_prediction_job = model.batch_predict(
    instances_format='jsonl',
   job_display_name=f"batch_predict_sentiment",
    gcs_source=['gs://doit-vertex-demo/batch/input/batch-key-2.jsonl'],
    gcs_destination_prefix='gs://doit-vertex-demo/batch/output',
    machine_type="n1-standard-4",
    starting_replica_count=4
)

Creating BatchPredictionJob


INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob


BatchPredictionJob created. Resource name: projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768


To use this BatchPredictionJob in another session:


INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:


bpj = aiplatform.BatchPredictionJob('projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768')


INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768')


View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/5331795628437536768?project=234439745674


INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/5331795628437536768?project=234439745674


BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768 current state:
JobState.JOB_STATE_PENDING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768 current state:
JobState.JOB_STATE_PENDING


BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/5331795628437536768 current state:
JobState.JOB_STATE_RUNNING


KeyboardInterrupt: ignored

## Start Batch Prediction / BigQuery

In [None]:
batch_prediction_job = model.batch_predict(
    job_display_name=f"batch_predict_sentiment",
    machine_type="n1-standard-4",
    starting_replica_count=2,

    instances_format="bigquery",
    predictions_format="bigquery",
    bigquery_source='bq://sascha-playground-doit.batch.data',
    bigquery_destination_prefix="bq://sascha-playground-doit.batch",
)

Creating BatchPredictionJob


INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob


BatchPredictionJob created. Resource name: projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464


To use this BatchPredictionJob in another session:


INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:


bpj = aiplatform.BatchPredictionJob('projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464')


INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464')


View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/8770293943934910464?project=234439745674


INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/8770293943934910464?project=234439745674


BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/234439745674/locations/us-central1/batchPredictionJobs/8770293943934910464 current state:
JobState.JOB_STATE_RUNNING


KeyboardInterrupt: ignored

## IMDB Dataset preprocessing (dataset used in this notebook)
Using the test dataset to run the batch prediction on.

https://ai.stanford.edu/~amaas/data/sentiment/

In [None]:
!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz

In [None]:
!tar -xvf "/content/aclImdb_v1.tar.gz"

In [None]:
!pip install jsonlines==3.1.0

In [None]:
import jsonlines
import os
import csv

sentences = []
sentences_raw = []

index = 0

path = '/content/aclImdb/test/neg'
for filename in os.listdir(path):
    if filename.endswith("txt"):

      with open(path + '/' + filename, "r") as file:
        text = file.read()
        sentence = {"text":text, "key": index, "test": "test"}
        sentences.append(sentence)
        sentences_raw.append(str(text))
        index = index + 1

path = '/content/aclImdb/test/pos'
for filename in os.listdir(path):
    if filename.endswith("txt"):

      with open(path + '/' + filename, "r") as file:
        text = file.read()
        sentence = {"text":text, "key": index, "test": "test"}
        sentences.append(sentence)
        sentences_raw.append(str(text))
        index = index + 1

with jsonlines.open('batch-key-2.jsonl', 'w') as writer:
      writer.write_all(sentences)



In [None]:
print(sentences_raw[1])