## Tune an LLM with RLHF

The RLHF training process has been implemented in a machine learning pipeline as part of the (Google Cloud Pipeline Components) library. This can be run on any platform that supports KubeFlow Pipelines (an open source framework), and can also run on Google Cloud's Vertex AI Pipelines.


* google-cloud-pipeline-components
* kfp    # KubeFlow Pipelines


#### Using VertexAI fro GCP

### Compile the pipeline

In [2]:
# RLHF is currently in preview
from google_cloud_pipeline_components.preview.llm \
import rlhf_pipeline

In [1]:
# KubeFlow pipelines
from kfp import compiler

In [3]:
# path to the yaml file
RLHF_PIPELINE_PKG_PATH = "rlhf_pipeline.yaml"

In [4]:
# compile function from kfp.compiler.Compiler().compile()
compiler.Compiler().compile(
    pipeline_func=rlhf_pipeline,  # the pipeline we are using 
    package_path=RLHF_PIPELINE_PKG_PATH  # to store the yaml file, no need to edit this file
)

In [5]:
!head rlhf_pipeline.yaml  # printing first few lines (head) of yaml file

# PIPELINE DEFINITION
# Name: rlhf-train-template
# Description: Performs reinforcement learning from human feedback.
# Inputs:
#    deploy_model: bool [Default: True]
#    eval_dataset: str
#    instruction: str
#    kl_coeff: float [Default: 0.1]
#    large_model_reference: str
#    location: str [Default: '{{$.pipeline_google_cloud_location}}']


**Note**: to print the whole YAML file, use the following:
```Python
!cat rlhf_pipeline.yaml
```

## Vertex AI pipeline job

#### Location of the training and evaluation data and other params
Datasets are in Google Cloud Storage and in JSON lines format. The datasets should be stored in same GCS bucket. 

- GCS - Google Cloud Storage

### Calculate the number of reward model training steps

- reward model train steps depends on the size of the dataset, atleast 20-30 would be better

$$ stepsPerEpoch = \left\lceil \frac{datasetSize}{batchSize} \right\rceil$$
$$ trainSteps = stepsPerEpoch \times numEpochs$$

* dividing the dataset into batches
* trainsteps means no of batches, so the no of train steps



In [15]:
# Preference dataset size
PREF_DATASET_SIZE = 3000

In [16]:
BATCH_SIZE = 64  # it is fixed, can't change

In [17]:
import math

In [18]:
REWARD_STEPS_PER_EPOCH = math.ceil(PREF_DATASET_SIZE / BATCH_SIZE)
print(REWARD_STEPS_PER_EPOCH)

47


In [19]:
REWARD_NUM_EPOCHS = 30

In [20]:
# Calculate number of steps in the reward model training
reward_model_train_steps = REWARD_STEPS_PER_EPOCH * REWARD_NUM_EPOCHS

In [21]:
print(reward_model_train_steps)   # just paractical stufff

1410


### Calculate the number of reinforcement learning training steps

- Reward hacking: if given too many training steps, the policy model may figure out a way to exploit the reward and exhibit undesired behavior.

Very bad uk

In [22]:
# Prompt dataset size
PROMPT_DATASET_SIZE = 2000

In [23]:
# Batch size is fixed at 64
BATCH_SIZE = 64

In [24]:
RL_STEPS_PER_EPOCH = math.ceil(PROMPT_DATASET_SIZE / BATCH_SIZE)
print(RL_STEPS_PER_EPOCH)

32


In [25]:
RL_NUM_EPOCHS = 10

In [26]:
# Calculate the number of steps in the RL training
reinforcement_learning_train_steps = RL_STEPS_PER_EPOCH * RL_NUM_EPOCHS

In [27]:
print(reinforcement_learning_train_steps)

320



- same instruction as in the preference dataset

In [33]:
# Completed values for the dictionary
parameter_values={
    # preference dataset and path 
        "preference_dataset": "gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl",
    # prompt datasets
        "prompt_dataset": "gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl",
    # we can also use eval dataset
        "eval_dataset": "gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl",
    # the base LLM we have to tune 
        "large_model_reference": "llama-2-7b",   
    # this train step says how well it has been finetuned
        "reward_model_train_steps": 1410, 
        "reinforcement_learning_train_steps": 320, 
        "reward_model_learning_rate_multiplier": 1.0,
        "reinforcement_learning_rate_multiplier": 1.0,
        "kl_coeff": 0.1,   # increased to reduce reward hacking
        "instruction": "Summarize in less than 50 words"}  

In [40]:
# Open the file in read mode
with open("utils.py", "r") as file:
    # Read the lines from the file
    lines = file.readlines()
    
    # Count the number of lines
    num_lines = len(lines)
    
    # Print the number of lines
    print(f"The file 'utils' has {num_lines} lines.")


The file 'utils' has 36 lines.


In [34]:
# Authenticate in utils
from utils import authenticate
credentials, PROJECT_ID, STAGING_BUCKET = authenticate()

REGION = "europe-west4"  # RLHF pipeline available in this region

In [43]:
import inspect

source_code = inspect.getsource(authenticate)
print(source_code)

def authenticate():
    #Load .env
    load_dotenv()
    #DLAI Custom Key
    return "DLAI_CREDENTIALS", "DLAI_PROJECT", "gs://gcp-sc2-rlhf"
    
    #Decode key and store in .JSON
    SERVICE_ACCOUNT_KEY_STRING_B64 = os.getenv('SERVICE_ACCOUNT_KEY')
    SERVICE_ACCOUNT_KEY_BYTES_B64 = SERVICE_ACCOUNT_KEY_STRING_B64.encode("ascii")
    SERVICE_ACCOUNT_KEY_STRING_BYTES = base64.b64decode(SERVICE_ACCOUNT_KEY_BYTES_B64)
    SERVICE_ACCOUNT_KEY_STRING = SERVICE_ACCOUNT_KEY_STRING_BYTES.decode("ascii")

    SERVICE_ACCOUNT_KEY = json.loads(SERVICE_ACCOUNT_KEY_STRING)


    # Create credentials based on key from service account
    # Make sure your account has the roles listed in the Google Cloud Setup section
    credentials = Credentials.from_service_account_info(
        SERVICE_ACCOUNT_KEY,
        scopes=['https://www.googleapis.com/auth/cloud-platform'])

    if credentials.expired:
        credentials.refresh(Request())
    
    #Set project ID according to environment variable    
  

In [46]:
import utils

members = inspect.getmembers(utils)

for name, member in members:
    if inspect.isfunction(member):
        print(f"Function: {name}")
        print(inspect.getsource(member))
        print("-" * 40)  # Separator between functions

    elif inspect.isclass(member):
        print(f"Class: {name}")
        print(inspect.getsource(member))
        print("-" * 40)  # Separator between classes

Class: Credentials
class Credentials(
    credentials.Signing,
    credentials.Scoped,
    credentials.CredentialsWithQuotaProject,
    credentials.CredentialsWithTokenUri,
):
    """Service account credentials

    Usually, you'll create these credentials with one of the helper
    constructors. To create credentials using a Google service account
    private key JSON file::

        credentials = service_account.Credentials.from_service_account_file(
            'service-account.json')

    Or if you already have the service account file loaded::

        service_account_info = json.load(open('service_account.json'))
        credentials = service_account.Credentials.from_service_account_info(
            service_account_info)

    Both helper methods pass on arguments to the constructor, so you can
    specify additional scopes and a subject if necessary::

        credentials = service_account.Credentials.from_service_account_file(
            'service-account.json',
            sco

## Run the pipeline job on Vertex AI

Now that we have created our dictionary of values, we can create a PipelineJob. This just means that the RLHF pipeline will execute on Vertex AI. So it's not running locally here in the notebook, but on some server on Google Cloud.

In [47]:
import google.cloud.aiplatform as aiplatform

In [48]:
aiplatform.init(project = PROJECT_ID,
                location = REGION,
                credentials = credentials)

In [49]:
# Look at the path for the YAML file
RLHF_PIPELINE_PKG_PATH

'rlhf_pipeline.yaml'

- This job takes about a full day to run with multiple accelerators (TPUs/GPUs)

In [None]:
job = aiplatform.PipelineJob(
    display_name="tutorial-rlhf-tuning",
    pipeline_root=STAGING_BUCKET,
    template_path=RLHF_PIPELINE_PKG_PATH,
    parameter_values=parameter_values)

job.run()

In [50]:
!ls

'Fine Tune LLM with RLHF.ipynb'   __pycache__   rlhf_pipeline.yaml   utils.py
