# Reinforcement learning with GRPO - Amazon SageMaker AI

---
In this notebook, we perform LLM reinforcement learning on Amazon SageMaker AI by using PyTorch, Hugging Face GRPO, DDP, and QLora, with SageMaker ModelTrainer for executing a training job.
---

## Prerequisites

In [None]:
%pip install -r ./scripts/requirements.txt --upgrade

***

## Setup Configuration file path

If you have created a Managed MLflow server, copy the `ARN` code here and assign a name to the experiment

In [None]:
import os

os.environ["mlflow_uri"] = ""
os.environ["mlflow_experiment_name"] = "arcee-lite-1.5b-grpo"

***

## Visualize and upload the dataset

In [None]:
import sagemaker

In [None]:
sagemaker_session = sagemaker.Session()
bucket_name = sagemaker_session.default_bucket()
default_prefix = sagemaker_session.default_bucket_prefix

In [None]:
from datasets import load_dataset
import pandas as pd

dataset = load_dataset("w601sxs/processed_simpleCoT_b1ade", split="train[:1%]")
df = pd.DataFrame(dataset)

df.head()

We create the chat template dataset for the RL training loop

In [None]:
def prepare_dataset(example):
    SYSTEM_PROMPT = """
    Respond in the following format:
    <think>
    ...
    </think>
    <answer>
    ...
    </answer>
    """

    tmp_prompt = example["prompt"].split(
        "You are a useful AI assistant. Use the provided context to provide a rationale, and then answer the question that follows"
    )[-1]

    row = dict()

    example["prompt"] = tmp_prompt.split("context: <")[-1].split(">\n")[0]
    example["prompt"] += tmp_prompt.split("question: <")[-1].split(">\n")[0]

    row["question"] = tmp_prompt.split("question: <")[-1].split(">\n")[0]
    row["prompt"] = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": example["prompt"]},
    ]

    row["answer"] = example["completion"].split("answer: <")[-1].split(">")[0]

    return row

Use the Hugging Face Trainer class to fine-tune the model. Define the hyperparameters we want to use. We also create a DataCollator that will take care of padding our inputs and labels.

In [None]:
from datasets import Dataset
from random import randint

train_dataset = Dataset.from_pandas(df)

train_dataset = train_dataset.map(prepare_dataset)

print(train_dataset[randint(0, len(train_dataset))])

train_dataset = train_dataset.remove_columns("completion")

### Upload to Amazon S3

In [None]:
import boto3
import shutil
import sagemaker

In [None]:
sagemaker_session = sagemaker.Session()
s3_client = boto3.client('s3')

bucket_name = sagemaker_session.default_bucket()
default_prefix = sagemaker_session.default_bucket_prefix

In [None]:
# save train_dataset to s3 using our SageMaker session
if default_prefix:
    input_path = f'{default_prefix}/datasets/llm-rl-modeltrainer-grpo'
else:
    input_path = f"datasets/llm-rl-modeltrainer-grpo"

# Save datasets to s3
# We will fine tune only with 20 records due to limited compute resource for the workshop
train_dataset.to_json("./data/train/dataset.json", orient="records")

s3_client.upload_file("./data/train/dataset.json", bucket_name, f"{input_path}/train/dataset.json")
train_dataset_s3_path = f"s3://{bucket_name}/{input_path}/train/dataset.json"

shutil.rmtree("./data")

print(f"Training data uploaded to:")
print(train_dataset_s3_path)

***

## Model fine-tuning

We are now ready to fine-tune our model by leveraging GRPO as RL Technique. We will use the [GRPOTrainer](https://huggingface.co/docs/trl/main/en/grpo_trainer) from transfomers to execute the training workload. We prepared a script [train.py](./scripts/train.py) which will loads the dataset from disk, prepare the model, tokenizer and start the training.

For configuration we use `TrlParser`, that allows us to provide hyperparameters in a `yaml` file. This yaml will be uploaded and provided to Amazon SageMaker similar to our datasets. We are saving the config file as `args.yaml` and upload it to S3.

In [None]:
%%bash

cat > ./args.yaml <<EOF
model_id: "arcee-ai/arcee-lite"       # Hugging Face model id
mlflow_uri: "${mlflow_uri}"
mlflow_experiment_name: "${mlflow_experiment_name}"
# sagemaker specific parameters
output_dir: "/opt/ml/model"                       # path to where SageMaker will upload the model 
checkpoint_dir: "/opt/ml/checkpoints/"
train_dataset_path: "/opt/ml/input/data/train/"   # path to where S3 saves train dataset
save_steps: 50                                    # Save checkpoint every this many steps
# training parameters
lora_r: 8
lora_alpha: 16
lora_dropout: 0.1                 
learning_rate: 2e-4                    # learning rate scheduler
num_train_epochs: 1                    # number of training epochs
per_device_train_batch_size: 2         # batch size per device during training
per_device_eval_batch_size: 2          # batch size for evaluation
gradient_accumulation_steps: 2         # number of steps before performing a backward/update pass
gradient_checkpointing: true           # use gradient checkpointing
bf16: true                             # use bfloat16 precision
tf32: false                            # use tf32 precision
use_vllm: false                        # use vllm for inference
temperature: 0.2                       # Temperature for sampling
top_p: 0.9                             # Top-p for nucleus sampling
merge_weights: true                    # merge weights in the base model
EOF

Lets upload the config file to S3.

In [None]:
from sagemaker.s3 import S3Uploader

if default_prefix:
    input_path = (
        f"s3://{bucket_name}/{default_prefix}/datasets/llm-rl-modeltrainer-grpo"
    )
else:
    input_path = f"s3://{bucket_name}/datasets/llm-rl-modeltrainer-grpo"

# upload the model yaml file to s3
model_yaml = "args.yaml"
train_config_s3_path = S3Uploader.upload(local_path=model_yaml, desired_s3_uri=f"{input_path}/config")

print(f"Training config uploaded to:")
print(train_config_s3_path)

## Fine-tune the model

Below ModelTrainer will train the model with QLoRA, merge the adapter in the base model and save in S3

#### Get PyTorch image_uri

We are going to use the native PyTorch container image, pre-built for Amazon SageMaker

In [None]:
import sagemaker
from sagemaker.config import load_sagemaker_config

In [None]:
sagemaker_session = sagemaker.Session()

bucket_name = sagemaker_session.default_bucket()
default_prefix = sagemaker_session.default_bucket_prefix
configs = load_sagemaker_config()

In [None]:
instance_type = "ml.p4d.24xlarge" # Override the instance type if you want to get a different container version
instance_count = 1

instance_type

In [None]:
image_uri = sagemaker.image_uris.retrieve(
    framework="pytorch",
    region=sagemaker_session.boto_session.region_name,
    version="2.6.0",
    instance_type=instance_type,
    image_scope="training"
)

image_uri

In [None]:
model_id = "arcee-ai/arcee-lite"

In [None]:
from sagemaker.modules.configs import (
    CheckpointConfig,
    Compute,
    OutputDataConfig,
    SourceCode,
    StoppingCondition,
)
from sagemaker.modules.distributed import Torchrun
from sagemaker.modules.train import ModelTrainer

role = sagemaker.get_execution_role()

# Define the script to be run
source_code = SourceCode(
    source_dir="./scripts",
    requirements="requirements.txt",
    entry_script="train.py",
)

# Define the compute
compute_configs = Compute(
    instance_type=instance_type,
    instance_count=instance_count,
    keep_alive_period_in_seconds=0,
)

# define Training Job Name
job_name = f"train-{model_id.split('/')[-1].replace('.', '-')}-grpo"

# define OutputDataConfig path
if default_prefix:
    output_path = f"s3://{bucket_name}/{default_prefix}/{job_name}"
else:
    output_path = f"s3://{bucket_name}/{job_name}"

# Define the ModelTrainer
model_trainer = ModelTrainer(
    training_image=image_uri,
    source_code=source_code,
    base_job_name=job_name,
    compute=compute_configs,
    distributed=Torchrun(),
    role=role,
    stopping_condition=StoppingCondition(max_runtime_in_seconds=7200),
    hyperparameters={
        "config": "/opt/ml/input/data/config/args.yaml"  # path to TRL config which was uploaded to s3
    },
    output_data_config=OutputDataConfig(s3_output_path=output_path),
    checkpoint_config=CheckpointConfig(
        s3_uri=output_path + "/checkpoint", local_path="/opt/ml/checkpoints"
    ),
)

In [None]:
from sagemaker.modules.configs import InputData

# Pass the input data
train_input = InputData(
    channel_name="train",
    data_source=train_dataset_s3_path, # S3 path where training data is stored
)

config_input = InputData(
    channel_name="config",
    data_source=train_config_s3_path, # S3 path where training data is stored
)

# Check input channels configured
data = [train_input, config_input]
data

In [None]:
# starting the train job with our uploaded datasets as input
model_trainer.train(input_data_config=data, wait=False)