In [None]:
# Copyright 2024 Google. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# PaLM 2 Text Bison Reinforcement Learning from Human Feedback

## Overview

This project showcases use of Reinforcement Learning from Human Feedback (RLHF) to tune the [PaLM 2 Text Bison][bison] model from Vertex AI Model Garden. Learn more about [RLHF tuning][rlhf] on Vertex AI.

[vertex]: https://cloud.google.com/vertex-ai
[ar]: https://cloud.google.com/artifact-registry
[cb]: https://cloud.google.com/build
[cd]: https://cloud.google.com/deploy
[cs]: https://cloud.google.com/storage
[pytorch]: https://pytorch.org/
[tune]: https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models
[bison]: https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text
[rlhf]: https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-text-models-rlhf 

### Objective

In this tutorial, you will learn to tune and deploy a large language model model. To facilitate model development, deployment, and management, the project leverages various services including [Vertex AI][vertex], [Artifact Registry][ar], [Cloud Build][cb], [Cloud Deploy][cd] and [Cloud Storage][cs].
  

[vertex]: https://cloud.google.com/vertex-ai
[ar]: https://cloud.google.com/artifact-registry
[cb]: https://cloud.google.com/build
[cd]: https://cloud.google.com/deploy
[cs]: https://cloud.google.com/storage

### Costs

Learn about pricing for [Vertex AI](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage](https://cloud.google.com/storage/pricing), [Cloud Build](https://cloud.google.com/build/pricing), [Cloud Deploy](https://cloud.google.com/deploy/pricing) and [Artifact Registry](https://cloud.google.com/artifact-registry/pricing). Use the [Pricing Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Setup

Before you begin, use the [setup](/docs/SETUP.md) guide to configure your project, service account, permissions, storage and more before executing this notebook.

## Installation

Install the following packages required to execute this notebook.

In [None]:
! pip3 install --upgrade --quiet -r requirements.txt

### Colab Only

Automatically restart kernel after installs so that your environment can access the new packages. Uncomment the cell below to restart the kernel.

In [None]:
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Configure variables

### Set Project ID

In [None]:
import google.auth

_, PROJECT_ID = google.auth.default()
print("Project ID: ", PROJECT_ID)

! gcloud config set project {PROJECT_ID}

### Set Region

Set the `REGION` variable; it defaults to `"us-central1"`. For now, the only other supported region is `europe-west4`. Learn more about Vertex AI [regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1" # @param {type: "string"}

### (Optional) Update Configuration

If needed, you can update the default configuration for [tuner](/pkg/tuner/metadata.py) and [predictor](/pkg/tuner/metadata.py) packages.

## Authenticate Google Cloud Account

Depending on your Jupyter environment, you may need to authenticate manually. Follow the relevant instructions below.

**1. Vertex AI Workbench**

Do nothing as you are already authenticated.

**3. Vertex AI Colab**

If you are using Colab, uncomment and run:

In [None]:
# from google.colab import auth
# auth.authenticate_user()


**2. Local**

If you are running the notebook locally, uncomment and run:

In [None]:
# ! gcloud auth login

## Import Libraries

In [None]:
%load_ext autoreload
%autoreload 2

import google.cloud.aiplatform as aiplatform
from kfp import compiler

from pkg.tuner import metadata, parameters, registry, rlhf, steps
from pkg.predictor import online

## Initialize Vertex AI SDK

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=metadata.BUCKET_URI)

## Move Data to Cloud Storage Bucket

RLHF tuning uses three datasets:

* Prompt [dataset](/data/prompt/shard-00000-of-00002.jsonl) with unlabelled prompts.
* Human preference [dataset](/data/preference/shard-00000-of-00002.jsonl) with prompts labelled with preferences from humans.
* Evaluation [dataset](/data/evaluation/shard-00000-of-00002.jsonl) with unlabeled prompts for prediction after the model is tuned. This is optional.
  
Learn more about data for RLFH in Vertex AI [documentation]. 

[documentation]: https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-text-models-rlhf#prepare_rlhf_tuning_datasets

In [None]:
! gsutil -m cp -r data {metadata.BUCKET_URI}

## Compile the RLHF Pipeline

Compile the pipeline into a YAML file that will be submitted to Vertex AI.

In [None]:
compiler.Compiler().compile(
    pipeline_func=rlhf.tune, package_path=metadata.COMPILED_PIPELINE_PATH
)

Upload the compiled pipeline into Vertex AI Pipeline Registry which is backed by Artifact Registry.

In [None]:
PIPELINE_TEMPLATE_URI = registry.upload(
  project_id=PROJECT_ID,
  region=REGION,
  pipeline_registry=metadata.PIPELINE_REGISTRY,
  compiled_pipeline_path=metadata.COMPILED_PIPELINE_PATH
)

## Calculate Training Steps

Choose a suitable value for the number of reward model and reinforcing learning train steps to avoid overfitting. This depends on the size of the datasets. If needed, configure the values in [metadata.py](/pkg/trainer/metadata.py).

In [None]:
REWARD_MODEL_TRAIN_STEPS = steps.get_reward_model_train_steps()

REINFORCEMENT_LEARNING_TRAIN_STEPS = steps.get_reinforcement_learning_train_steps()

## Construct the Pipeline Job and Run on Vertex AI

Define a pipeline job via the pipeline template compiled in the previous step.


In [None]:
job = aiplatform.PipelineJob(
    display_name=metadata.MODEL_DISPLAY_NAME,
    pipeline_root=metadata.PIPELINE_ROOT,
    template_path=PIPELINE_TEMPLATE_URI,
    parameter_values=parameters.get_values(
        preference_dataset=metadata.PREFERENCE_DATASET,
        prompt_dataset=metadata.PROMPT_DATASET,
        eval_dataset=metadata.EVALUATION_DATASET,
        reward_model_train_steps=REWARD_MODEL_TRAIN_STEPS,
        reinforcement_learning_train_steps=REINFORCEMENT_LEARNING_TRAIN_STEPS
    )
)

Run the pipeline on Vertex AI. Note that this will take about 2 hours.

In [None]:
job.run()

## Get Predictions

Make predictions using either [Online Prediction](https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions) or [Batch Prediction](https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions).

First, get the endpoint resource name.

In [None]:
stdout = ! gcloud ai models list --project=$PROJECT_ID --region=$REGION --filter="DISPLAY_NAME: $metadata.MODEL_DISPLAY_NAME" --sort-by=~creationTimestamp --limit=1 --format="flattened(deployedModels[0].endpoint)" 2>/dev/null
ENDPOINT_RESOURCE = stdout[1].split()[1]
print("Endpoint resource:", ENDPOINT_RESOURCE)

### Online Predictions 

Invoke the endpoint where the model is deployed to make predictions for some test instances.

In [None]:
PROMPT = "How can I use VR in fitness?"

online.send(
    resource=ENDPOINT_RESOURCE,
    prompt=PROMPT
)

### Batch Predictions

Coming soon...