<a href="https://colab.research.google.com/github/Narwhalprime/vertex-ai-samples/blob/main/notebooks/community/pipelines/google_cloud_pipeline_components_ready_to_go_text_classification_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex Pipelines: Ready-to-go text classification model training pipeline
<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/pipelines/google_cloud_pipeline_components_ready_to_go_text_classification_pipeline.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/pipelines/google_cloud_pipeline_components_ready_to_go_text_classification_pipeline.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/ai/platform/notebooks/deploy-notebook?download_url=https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/ready_to_go_text_classification_pipeline/ready_to_go_text_classification_pipeline.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>

## Overview

This demo showcases the use of [Google Cloud Pipeline Components (GCPC)](https://pypi.org/project/google-cloud-pipeline-components/), [Kubeflow Pipelines (KFP)](https://pypi.org/project/kfp/), and various Vertex AI services such as [Vertex Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction), [Vertex Tensorboard](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview), [Vertex Training for distributed training](https://cloud.google.com/vertex-ai/docs/training/distributed-training) with accelerators, [Vertex Online Prediction](https://cloud.google.com/vertex-ai/docs/predictions/getting-predictions), and [Vertex Model Evaluation components](https://cloud.google.com/vertex-ai/docs/pipelines/model-evaluation-component) in building an end-to-end text classification pipeline that identifies the category of a news article based on its headline and a short description. The demo is intended to show developers how to build end-to-end pipelines using KFP and Vertex Pipelines to classify their own text data.


### Objective

In this tutorial, you will build an end-to-end text classification pipeline to classify news headlines and descriptions. You will use KFP, Vertex AI, and Vertex Pipelines to generate a managed and highly scalable solution.

The Text Classification Pipeline includes the following steps:

- Split the data into training and validation datasets
- Fine-tune a pre-trained [BERT](https://www.tensorflow.org/text/tutorials/classify_text_with_bert) model
- Upload your model to Vertex
- Deploy your model to a Vertex endpoint
- Perform model evaluation

### Dataset

This demo uses a Kaggle [News Category Dataset](https://www.kaggle.com/datasets/rmisra/news-category-dataset), which contains around 200k news headlines from the years 2012-2018 obtained from HuffPost. It is located in the public samples Cloud Storage bucket as `gs://cloud-samples-data/vertex-ai/community-content/datasets/news/news_category_data.json`.

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Set up your local development environment

**If you are using Colab or Vertex AI Workbench Notebooks**, your environment already meets
all the requirements to run this notebook. Skip to the 'Install additional packages' section below.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

1. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

1. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3. Activate the virtual environment.

1. To install Jupyter, run `pip3 install jupyter` on the
command-line in a terminal shell.

1. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.

1. Open this notebook in the Jupyter Notebook Dashboard.

### Install additional packages

Install the following packages required to execute this notebook.

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

In [None]:
! pip3 install --upgrade pip google-cloud-aiplatform google-cloud-pipeline-components kfp tensorflow tensorboard numpy {USER_FLAG} -q

### Restart the kernel

Restart your notebook kernel to ensure all newly installed packages can be found.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com,storage.googleapis.com).

1. If you are running this notebook locally, you will need to install the [Cloud SDK](https://cloud.google.com/sdk).

1. Enter your project ID in the cell below. Then run the cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
import os

PROJECT_ID = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

Otherwise, set your project ID here.

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = ""  # @param {type:"string"}

In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"

In [None]:
import random
import string


# Generate a uuid of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))


UUID = generate_uuid()

### Authenticate your Google Cloud account

**If you are using Vertex AI Workbench Notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the Cloud Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. Click **Create service account**.

3. In the **Service account name** field, enter a name, and
   click **Create**.

4. In the **Grant this service account access to project** section, click the **Role** drop-down list. Type "Vertex AI"
into the filter box, and select
   **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = "google.colab" in sys.modules
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

In this tutorial, a Cloud Storage bucket holds the News Category dataset file that is used to train the model. Vertex AI also saves artifacts, such as the split training and validation datasets generated by the preprocessing component, in the same bucket. Using this model artifact, you can then create a Vertex AI model and endpoint in order to serve online predictions.

Set the name of your Cloud Storage bucket below. It must be unique across all
Cloud Storage buckets.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    BUCKET_NAME = PROJECT_ID + "aip-" + UUID
    BUCKET_URI = f"gs://{BUCKET_NAME}"

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Setup service account and permissions
A service account will be used to create a custom training job. If you do not want to use your project's Compute Engine service account, set SERVICE_ACCOUNT to another service account ID. You can create a service account by following the [instructions](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating).

In [None]:
SERVICE_ACCOUNT = "[your-service-account]"  # @param {type:"string"}

In [None]:
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT is None
    or SERVICE_ACCOUNT == "[your-service-account]"
):
    # Get your service account from gcloud
    if not IS_COLAB:
        shell_output = !gcloud auth list 2>/dev/null
        SERVICE_ACCOUNT = shell_output[2].replace("*", "").strip()

    else:  # IS_COLAB:
        shell_output = ! gcloud projects describe  $PROJECT_ID
        project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
        SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

    print("Service Account:", SERVICE_ACCOUNT)

#### Set service account access

Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account.

In [None]:
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI

! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI

### Import libraries

In [None]:
from google.cloud import aiplatform
from kfp import components
from kfp.v2 import compiler, dsl
from kfp.v2.dsl import InputPath, component

In [None]:
# Model evaluation components
from google_cloud_pipeline_components.experimental.evaluation import \
    GetVertexModelOp as get_vertex_model_op
from google_cloud_pipeline_components.experimental.evaluation import \
    ModelEvaluationClassificationOp as evaluation_classification_op
from google_cloud_pipeline_components.experimental.evaluation import \
    ModelImportEvaluationOp as model_import_evaluation_op
from google_cloud_pipeline_components.experimental.evaluation import \
    TargetFieldDataRemoverOp as target_field_data_remover_op
# Text Classification components
from google_cloud_pipeline_components.experimental.sklearn import \
    SklearnTrainTestSplitJsonlOp as train_test_split_op
from google_cloud_pipeline_components.experimental.text_classification import \
    TextClassificationTrainingOp as text_classification_training_op
from google_cloud_pipeline_components.v1.batch_predict_job import \
    ModelBatchPredictOp as batch_prediction_op

### Fill out the following required configurations

This pipeline accepts a JSONL dataset where each JSON object sample has two required keys: `text` and `label`. The `text` key should map to the sample's text data, while the `label` key should map to its classified category.

Here is an example of a JSON object in the dataset used in this demo.

{
**"label"**:"CRIME",
**"text"**:"There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV"
}

In [None]:
BASE_OUTPUT_DIR = f"gs://{BUCKET_NAME}"

SAMPLE_DATA_URI = "gs://cloud-samples-data/vertex-ai/community-content/datasets/news/news_category_data.json"
TRAINING_DATA_URI = (
    f"{BASE_OUTPUT_DIR}/data/news_category_data.json"  # @param {type:"string"}
)

# The GCS directory for keeping staging files for model evaluation.
ROOT_DIR = "'f\"{BASE_OUTPUT_DIR}/root\"'"  # @param {type:"string"}

Copy the sample data to TRAINING_DATA_URI.

In [None]:
! gsutil cp {SAMPLE_DATA_URI} {TRAINING_DATA_URI}

`CLASS_NAMES` should be a list of all the categories to which a text sample can be classified as.

In this demo, there are 41 genre categories (listed below) that a headline can be classfied as.

In [None]:
CLASS_NAMES = [
    "GOOD NEWS",
    "STYLE",
    "STYLE & BEAUTY",
    "ARTS",
    "IMPACT",
    "WEIRD NEWS",
    "FIFTY",
    "ENTERTAINMENT",
    "ARTS & CULTURE",
    "HEALTHY LIVING",
    "WEDDINGS",
    "PARENTING",
    "BLACK VOICES",
    "GREEN",
    "RELIGION",
    "POLITICS",
    "PARENTS",
    "BUSINESS",
    "DIVORCE",
    "WELLNESS",
    "FOOD & DRINK",
    "THE WORLDPOST",
    "MEDIA",
    "COLLEGE",
    "WOMEN",
    "TASTE",
    "WORLDPOST",
    "TRAVEL",
    "CULTURE & ARTS",
    "SPORTS",
    "CRIME",
    "QUEER VOICES",
    "TECH",
    "COMEDY",
    "MONEY",
    "WORLD NEWS",
    "LATINO VOICES",
    "SCIENCE",
    "EDUCATION",
    "HOME & LIVING",
    "ENVIRONMENT",
]

### Load Components
The KFP SDK provides various methods to [load a component](https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#using-your-component-in-a-pipeline) for use in a pipeline. In this demo, we will be loading five components.

This pipeline is composed of the following components:

- **train_test_split_jsonl_with_sklearn** - splits data into training and validation datasets.
- **train_tensorflow_text_classification_model** - creates a trained text classification TensorFlow Model.
- **upload_Tensorflow_model_to_Google_Cloud_Vertex_AI** - converts a TensorFlow model to a Vertex model and uploads it to Vertex.
- **deploy_model_to_endpoint** - deploys a Vertex model to an endpoint for online predictions.
- **get_gcs_uris_from_jsonl_artifact** - A python function based op to convert data artifact to a format acceptable by model evaluation components.
- **target_field_data_remover** - removes the target (label) field in the validation data for downstream Vertex Batch Predictions.
- **model_batch_predict** - submits a batch prediction job.
- **model_evaluation_classification** - calculates and exports evaluation metrics.
- **model_evaluation_import** - imports model evaluation metrics results.

Use the `load_component_from_url` for published components that have been made available by GCPC.

In [None]:
upload_tensorflow_model_to_vertex_op = components.load_component_from_url(
    "https://raw.githubusercontent.com/Ark-kun/pipeline_components/c6a8b67d1ada2cc17665c99ff6b410df588bee28/components/google-cloud/Vertex_AI/Models/Upload_Tensorflow_model/workaround_for_buggy_KFPv2_compiler/component.yaml"
)
deploy_model_to_endpoint_op = components.load_component_from_url(
    "https://raw.githubusercontent.com/Ark-kun/pipeline_components/27a5ea25e849c9e8c0cb6ed65518bc3ece259aaf/components/google-cloud/Vertex_AI/Models/Deploy_to_endpoint/workaround_for_buggy_KFPv2_compiler/component.yaml"
)

Convert the validation data (an Artifact with the annotation of "JSONLines") to the output parameter gcs_source_uris (type: Sequence[str]) that can be ingested by the downstream "target_field_data_remover_op" and "model_evaluation_op".

In [None]:
@component(
    base_image="python:3.9",
)
def get_gcs_uris_from_jsonl_artifact(input_jsonl: InputPath("JSONLines")) -> list:
    return ["gs://" + input_jsonl[5:]]

### Build a pipeline
The following pipeline code links the inputs and outputs of the loaded components. The resulting pipeline performs the following steps:
- Partitions data into train and test splits.
- Trains new text classification model.
- Uploads model to Vertex AI Model Registry.
- Performs batch prediction with test data.
- Evaluates performance of model using above batch prediction.
- Imports evaluation metrics into model.

In [None]:
@dsl.pipeline(name="text-classification-pipeline")
def text_pipeline(
    project: str,
    training_data_uri: str,
    class_names: list,
    root_dir: str,
    target_field_name: str,
    batch_predict_display_name: str,
    batch_predict_instances_format: str = "jsonl",
    batch_predictions_format: str = "jsonl",
    model_name: str = "small_bert/bert_en_uncased_L-2_H-128_A-2",
    validation_split: float = 0.2,
    batch_size: int = 256,
    learning_rate: float = 3e-4,
    num_epochs: int = 5,
    random_seed: int = 0,
) -> None:

    """End-to-end text classification pipeline.

    Args:
    project: Required. GCP project ID.
    training_data_uri: Required. Data in JSON lines format.
    class_names: Required. List of categories (string) for classification.
    root_dir: Required. The GCS directory for keeping staging files for model evaluation.
    target_field_name: Required. The name of the features target field in the predictions file (e.g. 'label').
    batch_predict_instances_format: The file format for the ground truth files.
    batch_predictions_format: The file format for the batch prediction results.
    batch_predict_display_name: Required. The user-defined name of this BatchPredictionJob.
    model_name: Optional. Name of pre-trained BERT model to be used.
                Default: "small_bert/bert_en_uncased_L-2_H-128_A-2"
    validation_split: Optional. Fraction of data that will make up validation dataset.
                      Default: 0.2
    batch_size: Optional. Batch size
                Default: 256
    learning_rate: Optional. Learning rate
                   Default: 3e-4
    num_epochs: Optional. Number of epochs
                Default: 10
    random_seed: Optional. Random seed
                 Default: 0
    """

    text_data_preprocess_task = train_test_split_op(
        input_data_path=training_data_uri,
    )

    training_data = text_data_preprocess_task.outputs["training_data_path"]

    validation_data = text_data_preprocess_task.outputs["validation_data_path"]

    # Set CPU, memory, and GPU configuration settings for this step (https://cloud.google.com/vertex-ai/docs/pipelines/machine-types)
    model = (
        text_classification_training_op(
            preprocessed_training_data_path=training_data,
            preprocessed_validation_data_path=validation_data,
            model_name=model_name,
            class_names=class_names,
            batch_size=batch_size,
            learning_rate=learning_rate,
            num_epochs=num_epochs,
            random_seed=random_seed,
        )
    ).add_node_selector_constraint(
        "cloud.google.com/gke-accelerator", "NVIDIA_TESLA_A100"
    )  # Note that A100 is available on us-central1

    vertex_model_name = upload_tensorflow_model_to_vertex_op(
        model=model.outputs["trained_model_path"],
    ).outputs["model_name"]

    # Model evaluation
    # Need a component to convert Artifact('JsonLinesDataset') to JsonArray
    validation_data_uris = get_gcs_uris_from_jsonl_artifact(validation_data).output

    evaluation_data_for_batch_predict = target_field_data_remover_op(
        project=project,
        root_dir=root_dir,
        target_field_name=target_field_name,
        gcs_source_uris=validation_data_uris,
    ).outputs["gcs_output_directory"]

    vertex_model = get_vertex_model_op(
        model_resource_name=vertex_model_name,
    ).outputs["model"]

    batch_prediction_task = batch_prediction_op(
        project=project,
        model=vertex_model,
        job_display_name=batch_predict_display_name,
        gcs_source_uris=evaluation_data_for_batch_predict,
        gcs_destination_output_uri_prefix=root_dir,
        instances_format=batch_predict_instances_format,
        predictions_format=batch_predictions_format,
        machine_type="n1-standard-32",
        starting_replica_count=5,
        max_replica_count=10,
    )

    # Run the evaluation based on prediction type
    eval_task = evaluation_classification_op(
        project=project,
        root_dir=root_dir,
        ground_truth_gcs_source=validation_data_uris,
        target_field_name=target_field_name,
        prediction_score_column="prediction",
        prediction_label_column="",
        class_labels=class_names,
        ground_truth_format=batch_predict_instances_format,
        predictions_format=batch_predictions_format,
        predictions_gcs_source=batch_prediction_task.outputs["gcs_output_directory"],
    )
    # Import the model evaluations to the Vertex AI model
    model_import_evaluation_op(
        classification_metrics=eval_task.outputs["evaluation_metrics"],
        model=vertex_model,
        dataset_type="jsonl",
    )

    # For online predictions
    _ = deploy_model_to_endpoint_op(
        model_name=vertex_model_name,
    ).outputs["endpoint_name"]


pipeline_func = text_pipeline

### Run the Pipeline

The following block creates a pipline run from the pipeline function above and submits to the Vertex AI platform. You can view the pipeline's artifacts in [Vertex ML Metadata (MLMD)](https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction) at the link that is ouputted when the next block is run.

Only the parameters of this pipeline need to be changed to adapt to your specific usecase. Specify the required pipeline parameters and any optional ones in the `parameter_values` dictionary.

In [None]:
compiler.Compiler().compile(
    pipeline_func=pipeline_func,
    package_path="text_classification_pipeline.json",
)

In [None]:
PIPELINE_DISPLAY_NAME = f"text-classification-train-evaluate-{UUID}"  # "[your-pipeline-display-name]"  # @param {type:"string"}

BATCH_PREDICTION_DISPLAY_NAME = f"batch-prediction-on-pipelines-model-{UUID}"

parameters = {
    "project": PROJECT_ID,
    "training_data_uri": TRAINING_DATA_URI,
    "class_names": CLASS_NAMES,
    "num_epochs": 5,
    "root_dir": ROOT_DIR,
    "target_field_name": "label",
    "batch_predict_display_name": BATCH_PREDICTION_DISPLAY_NAME,
}

In [None]:
job = aiplatform.PipelineJob(
    display_name=PIPELINE_DISPLAY_NAME,
    template_path="text_classification_pipeline.json",
    location=REGION,
    enable_caching=True,
    parameter_values=parameters,
)

job.submit(service_account=SERVICE_ACCOUNT)

## Make predictions

Once your model is deployed to an endpoint, it can be used to make predictions using the UI or KFP SDK.

To use the UI, watch this short [tutorial](https://screencast.googleplex.com/cast/NDY5Nzk3NzUzNzk1Mzc5MnxiNDMxZTFkNi1lNg) or follow the steps highlighted [here](https://cloud.google.com/vertex-ai/docs/predictions/online-predictions-custom-models).

To make predictions using the KFP SDK, the endpoint to which the model was deployed is needed. The code below extracts the `ENDPOINT_ID` from the PipelineJob.

In [None]:
task_id = "deploy-model-to-endpoint-for-google-cloud-vertex-ai-model"
deploy_task_detail = [
    task_details
    for task_details in job.task_details
    if task_details.task_name == task_id
][0]
ENDPOINT_ID = deploy_task_detail.execution.metadata["output:endpoint_name"]

Each request must be its own JSON object with a `text` key. `instances` should be a list that holds all the requests.

In [None]:
instances = [
    {
        "text": "Irish Voters Set To Liberalize Abortion Laws In Landslide, Exit Poll Signals Vote counting will begin Saturday."
    }
]

A list of predictions in order of how the requests were structured above. Each prediction is a vector of with probabilities of each category in `class_names` being the proper label. The probabilites match up to each category in `class_names` in order.

In [None]:
endpoint = aiplatform.Endpoint(ENDPOINT_ID)
prediction = endpoint.predict(instances=instances)
print(prediction)

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# Delete GCS bucket.
! gsutil -m rm -r {BUCKET_URI}

# Delete endpoint resource.
! gcloud ai endpoints delete $ENDPOINT_ID --quiet --region $REGION