# Direct Preference Optimization (DPO) Training with SageMaker

This notebook demonstrates how to use the DPOTrainer to fine-tune large language models using Direct Preference Optimization (DPO). DPO is a technique that trains models to align with human preferences by learning from preference data without requiring a separate reward model.

## Lab 1 - Data preparation

In this notebook, we are going to prepare the dataset for later on fine-tuning Qwen 2.5 - 7B Instruct

***

### Prerequisites

### Install requirements

In [None]:
%pip install -r requirements.txt

#### Setup and dependencies

In [1]:
import boto3
from sagemaker.core.helper.session_helper import Session, get_execution_role

sess = Session()
sagemaker_session_bucket = None

if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

s3_client = boto3.client("s3")
sess = Session(default_bucket=sagemaker_session_bucket)
bucket_name = sess.default_bucket()
default_prefix = sess.default_bucket_prefix

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml
sagemaker role arn: arn:aws:iam::802453385504:role/service-role/AmazonSageMaker-ExecutionRole-20220310T105369
sagemaker bucket: sagemaker-us-east-1-802453385504
sagemaker session region: us-east-1


***

### Prepare the dataset

In [2]:
import datasets
from datasets import load_dataset

dataset = (
    load_dataset(
        "HumanLLMs/Human-Like-DPO-Dataset",
        split="train",
        streaming=True,
    )
    .take(3000)
    .shuffle(buffer_size=1000)
)

dataset = datasets.Dataset.from_generator(lambda: dataset, features=dataset.features)

Generating train split: 0 examples [00:00, ? examples/s]

In [3]:
import pandas as pd

df = pd.DataFrame(dataset)

df.head()

Unnamed: 0,prompt,chosen,rejected
0,Do you have a favorite type of music or band?,"You know, I'm a big fan of music in general! ðŸŽµ...","I'm an artificial intelligence language model,..."
1,Have you attended any concerts or festivals la...,"Man, I wish! ðŸ˜Š I've been stuck in this digital...","I'm an artificial intelligence language model,..."
2,Have you ever tried a new skill or activity an...,Absolutely! ðŸ˜Š I'm a big believer in trying new...,"Good day. As a professional AI, I do not have ..."
3,Do you prefer relaxing at home or going out an...,"You know, I'm a bit of a mix. Sometimes I love...","I'm an artificial intelligence language model,..."
4,Have you tried any new recipes or cooking styl...,"You know, I've been meaning to try my hand at ...",Good day. I'm afraid I'm not capable of enjoyi...


In [4]:
from sklearn.model_selection import train_test_split

train, val = train_test_split(df, test_size=0.2, random_state=42)
train, test = train_test_split(df, test_size=0.1, random_state=42)

print("Number of train elements: ", len(train))
print("Number of validation elements: ", len(val))
print("Number of test elements: ", len(test))

Number of train elements:  2700
Number of validation elements:  600
Number of test elements:  300


In [5]:
from datasets import Dataset
from tqdm import tqdm

def prepare_dataset_sm_dpo_train_val(sample):
    try:
        return {
            "prompt": sample["prompt"],
            "chosen": sample["chosen"],
            "rejected": sample["rejected"]
        }
    except Exception as e:
        print(f"Error: {e}")

        raise e

def prepare_dataset_sm_dpo_test(sample):
    try:
        return {
            "query": sample["prompt"],
            "response": sample["chosen"]
        }
    except Exception as e:
        print(f"Error: {e}")

        raise e

In [6]:
from datasets import Dataset, DatasetDict
from random import randint

train_dataset = Dataset.from_pandas(train)
val_dataset = Dataset.from_pandas(val)

# LLMAJ support a maximum number of 1000 records.
test_dataset = Dataset.from_pandas(test)

dataset = DatasetDict(
    {"train": train_dataset, "val": val_dataset, "test": test_dataset}
)


train_dataset = dataset["train"].map(
    prepare_dataset_sm_dpo_train_val, remove_columns=list(train_dataset.features)
)

val_dataset = dataset["val"].map(
    prepare_dataset_sm_dpo_train_val, remove_columns=list(val_dataset.features)
)

test_dataset = dataset["test"].map(
    prepare_dataset_sm_dpo_test, remove_columns=list(test_dataset.features)
)

Map:   0%|          | 0/2700 [00:00<?, ? examples/s]

Map:   0%|          | 0/600 [00:00<?, ? examples/s]

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

#### Upload to Amazon S3

In [7]:
import shutil

In [8]:
# save train_dataset to s3 using our SageMaker session
if default_prefix:
    input_path = f"{default_prefix}/datasets/serverless-model-customization-sft"
else:
    input_path = f"datasets/serverless-model-customization-sft"

train_dataset_s3_path = f"s3://{bucket_name}/{input_path}/train/humanlike_dpo_train.jsonl"
val_dataset_s3_path = f"s3://{bucket_name}/{input_path}/val/humanlike_dpo_val.jsonl"
test_dataset_s3_path = f"s3://{bucket_name}/{input_path}/test/humanlike_dpo_test.jsonl"

In [9]:
train_dataset.to_json("./data/train/humanlike_dpo_train.jsonl", orient="records")
val_dataset.to_json("./data/val/humanlike_dpo_val.jsonl", orient="records")
test_dataset.to_json("./data/test/humanlike_dpo_test.jsonl", orient="records")

s3_client.upload_file(
    "./data/train/humanlike_dpo_train.jsonl", bucket_name, f"{input_path}/train/humanlike_dpo_train.jsonl"
)
s3_client.upload_file(
    "./data/val/humanlike_dpo_val.jsonl", bucket_name, f"{input_path}/val/humanlike_dpo_val.jsonl"
)
s3_client.upload_file(
    "./data/test/humanlike_dpo_test.jsonl", bucket_name, f"{input_path}/test/humanlike_dpo_test.jsonl"
)

shutil.rmtree("./data")

print(f"Training data uploaded to:")
print(train_dataset_s3_path)
print(val_dataset_s3_path)
print(test_dataset_s3_path)

Creating json from Arrow format:   0%|          | 0/3 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Training data uploaded to:
s3://sagemaker-us-east-1-802453385504/datasets/serverless-model-customization-sft/train/humanlike_dpo_train.jsonl
s3://sagemaker-us-east-1-802453385504/datasets/serverless-model-customization-sft/val/humanlike_dpo_val.jsonl
s3://sagemaker-us-east-1-802453385504/datasets/serverless-model-customization-sft/test/humanlike_dpo_test.jsonl


#### Create Training Dataset

In [10]:
from sagemaker.ai_registry.dataset import DataSet
from sagemaker.ai_registry.dataset_utils import CustomizationTechnique

In [11]:
dataset_train = DataSet.create(
    name="humanlike-dpo-train",
    source=train_dataset_s3_path,
    customization_technique=CustomizationTechnique.DPO,
    wait=True,
)

print(f"TRAINING_DATASET ARN: {dataset_train.arn}")

dataset_val = DataSet.create(
    name="humanlike-dpo-val",
    source=val_dataset_s3_path,
    customization_technique=CustomizationTechnique.DPO,
    wait=True,
)

print(f"VALIDATION_DATASET ARN: {dataset_val.arn}")

dataset_test = DataSet.create(
    name="humanlike-dpo-test",
    source=test_dataset_s3_path,
    wait=True,
)

print(f"TEST_DATASET ARN: {dataset_test.arn}")

Output()

TRAINING_DATASET ARN: arn:aws:sagemaker:us-east-1:802453385504:hub-content/835JTV1JM579GSMLER9B24GD7I4GMBBJJTH3CG14DT120I3OLSV0/DataSet/humanlike-dpo-train/1.0.0


Output()

VALIDATION_DATASET ARN: arn:aws:sagemaker:us-east-1:802453385504:hub-content/835JTV1JM579GSMLER9B24GD7I4GMBBJJTH3CG14DT120I3OLSV0/DataSet/humanlike-dpo-val/1.0.0


Output()

TEST_DATASET ARN: arn:aws:sagemaker:us-east-1:802453385504:hub-content/835JTV1JM579GSMLER9B24GD7I4GMBBJJTH3CG14DT120I3OLSV0/DataSet/humanlike-dpo-test/1.0.0
