# LLM Domain Adaptation with ORPO, AWS Trainium and AWS Inferentia2

Language models are incredibly powerful, but adapting them to specific tasks can be challenging. Traditional approaches involve two separate stages: first, supervised fine-tuning to align the model with the desired domain, and then a preference alignment step to increase the likelihood of desirable outputs and reduce undesirable ones.

However, this two-stage process has limitations. While supervised fine-tuning is effective at domain adaptation, it can inadvertently increase the chances of generating both preferred and undesired responses.

To address this issue, techniques like reinforcement learning with human feedback (RLHF) or direct preference optimization (DPO) are often employed for preference alignment. These methods aim to sculpt the model's outputs towards desired responses and away from rejected ones. However, they require a separate reference model, adding computational complexity.

[Odds-Ratio Predictive Ordering (ORPO)](https://arxiv.org/abs/2403.07691) offers an elegant solution by combining supervised fine-tuning and preference alignment into a single objective function. It modifies the standard language modeling loss by incorporating an odds ratio term that weakly penalizes rejected responses while strongly rewarding preferred ones.

In essence, ORPO streamlines the adaptation process by simultaneously fine-tuning the model to the target domain and aligning its preferences towards desired outputs – all within a single training objective. This unified approach simplifies the workflow and reduces computational overhead compared to traditional multi-stage methods.

----
This is the first notebook out of two parts. In this notebook you download a public dataset (with questions, chosen answers and rejected answers) and upload it to S3. Then, you kick-off a SageMaker training Job that will execute a given training script (defined inline in this notebook) to do model alignment using ORPO with a Llama3.2 1B params and [HF Optium Neuron](https://huggingface.co/docs/optimum-neuron/index). This job will be accelerated by AWS Trainium for better performance and lower costs. In the end, you have a fine-tuned Llama3.2 model adapted with boundaries (expressed by the provided dataset). In the second notebook, you deploy the resulting model and run experimentations.

**SageMaker Studio**: Jupyter Lab  
**Kernel**: Python3  

This exercise is divide into 2 parts:
 - **Data prep + model alignment**
 - Model deployment + tests


In [None]:
%pip install -U datasets s3fs

In [None]:
import os
import boto3
import sagemaker

print(sagemaker.__version__)
if not sagemaker.__version__ >= "2.146.0": print("You need to upgrade or restart the kernel if you already upgraded")

os.makedirs("src", exist_ok=True)

## ATTENTION: Copy your HF Access token to the following variable, if the assertion fails
HF_TOKEN=""
tok_file = os.path.join(os.environ['HOME'], '.hf_token')
if os.path.isfile(tok_file): HF_TOKEN=open(tok_file, 'r').read().strip()    
assert HF_TOKEN != "", " >>> Go to your HF account and get an access token. Set HF_TOKEN to your token if you want to define your own cache repo"

region="us-west-2"
boto_session = boto3.Session(region_name=region)
role = sagemaker.get_execution_role()
sess = sagemaker.Session(boto_session=boto_session)
bucket = sess.default_bucket()

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {region}")
print(f"HF Token found? {HF_TOKEN != ''}")

## 1) Prepare the dataset
In this step we'll download a dataset from HF, get only a slice of it and then upload to S3. The samples we'll use is a collection of 25+ different datasets. With this dataaset, we can create a baseline for a super agent, capable of executing tasks like:

-  [capybara-preferences](https://huggingface.co/datasets/argilla/Capybara-Preferences): instruction-following with multi-turn conversations
-  [distillabel-orca](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs): reasoning process, step-by-step
-  [ultrafeedback](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned): instruction-following, truthfulness, honesty and helpfulness

And much more. Check more details here: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k/viewer

In [None]:
import datasets

num_samples=1024
idx = 1

train_dataset = datasets.load_dataset("mlabonne/orpo-dpo-mix-40k", split="all")
# Remove toxicity
# Source: https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k?row=0#toxicity
train_dataset = train_dataset.filter(
    lambda r: not r["source"] in [ "toxic-dpo-v0.2" ]
)
train_dataset = train_dataset.shuffle(seed=42).select(range(num_samples))
df = train_dataset.to_pandas()
print(f"Mixed datasets: {list(df.source.unique())}")
train_path = f"s3://{bucket}/datasets/orpo-dpo-mix-40k/train"
train_dataset.save_to_disk(train_path)
print(f"Train path: {train_path}")
df.head(10)

#### This is an example of what the pre-processing code will do to prepare each sample to be shared with the model

#### Chosen - Original Sample
```json
[{'content': '8801155689192/9 =?\nOnly respond with math and no words.',
  'role': 'user'},
 {'content': '8801155689192 / 9 = 977906187688', 'role': 'assistant'}]
```
#### Chat template applied
```
<|im_start|>user
8801155689192/9 =?
Only respond with math and no words.<|im_end|>
<|im_start|>assistant
8801155689192 / 9 = 977906187688<|im_end|>
```
#### Rejected - Original Sample
```json
[{'content': '8801155689192/9 =?\nOnly respond with math and no words.',
  'role': 'user'},
 {'content': '88.8904838532465', 'role': 'assistant'}]
```
#### Chat template applied
```
<|im_start|>user
8801155689192/9 =?
Only respond with math and no words.<|im_end|>
<|im_start|>assistant
88.8904838532465<|im_end|>
```

## 2) Create training artifacts
### 2.1) Dependencies descriptor
Installing the libraries listed in this file will be the first thing SageMaker will do.

In [None]:
%%writefile src/requirements.txt
--extra-index-url https://pip.repos.neuron.amazonaws.com
optimum-neuron==0.0.26
trl==0.11.4
peft==0.13.2

### 2.1) Training script
Please note the arguments passed to this script are the **hyperparameters** defined in the SageMaker Estimator (next section)

In [None]:
%%writefile src/train.py
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

import os
import torch
import argparse
from peft import (
    LoraConfig,
    TaskType,
    get_peft_model
)
from trl import setup_chat_format
from huggingface_hub import login
from datasets import load_from_disk
from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.neuron import NeuronORPOConfig, NeuronORPOTrainer

if __name__=='__main__':
    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument("--epochs", type=int, default=1)
    parser.add_argument("--num_samples", type=int, default=32)
    parser.add_argument("--max_seq_len", type=int, default=256)
    parser.add_argument("--max_prompt_len", type=int, default=128)
    parser.add_argument("--train_batch_size", type=int, default=1)
    parser.add_argument("--eval_batch_size", type=int, default=1)
    parser.add_argument("--tp_size", type=int, default=1)
    parser.add_argument("--pp_size", type=int, default=1)
    
    parser.add_argument("--model_id", type=str, required=True)
    parser.add_argument("--zero_1", type=bool, default=True)
    parser.add_argument("--learning_rate", type=float, default=5e-5)
    parser.add_argument("--weight_decay", type=float, default=0.01)
    parser.add_argument("--bf16", type=bool, default=True)

    # Data, model, and output directories
    parser.add_argument("--output_data_dir", type=str, default=os.environ.get("SM_OUTPUT_DATA_DIR", "output"))
    parser.add_argument("--model_dir", type=str, default=os.environ.get("SM_MODEL_DIR", "model"))

    parser.add_argument("--training_dir", type=str, default=os.environ.get("SM_CHANNEL_TRAIN", None))

    parser.add_argument("--hf_token", type=str, default=None)

    args, _ = parser.parse_known_args()

    if not args.hf_token is None and len(args.hf_token) > 0:
        print("HF token defined. Logging in...")
        login(token=args.hf_token)

    model = AutoModelForCausalLM.from_pretrained(args.model_id)
    peft_config = LoraConfig(
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
    )
    tokenizer = AutoTokenizer.from_pretrained(args.model_id)
    model, tokenizer = setup_chat_format(model, tokenizer)
    
    print(f"Loading dataset...")
    train_dataset = load_from_disk(args.training_dir)

    def format_chat_template(row):
        row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
        row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)
        return row
        
    train_dataset = train_dataset.map(
        format_chat_template,
        num_proc=os.cpu_count(),
    )
    
    training_args = NeuronORPOConfig(
        max_length=args.max_seq_len,
        max_prompt_length=args.max_prompt_len,
        beta=0.1,
        
        zero_1=args.zero_1,
        bf16=args.bf16,
        tensor_parallel_size=args.tp_size,
        pipeline_parallel_size=args.pp_size,
        
        #eval_strategy="epoch",
        learning_rate=args.learning_rate,
        weight_decay=args.weight_decay,

        num_train_epochs=args.epochs,
        output_dir=args.output_data_dir,
        overwrite_output_dir=True,

        per_device_train_batch_size=args.train_batch_size,
        #per_device_eval_batch_size=args.eval_batch_size,

        gradient_accumulation_steps=1,
        #eval_accumulation_steps=1,

        logging_dir=f"{args.output_data_dir}/logs",
        logging_strategy="steps",
        logging_steps=10,
        save_steps=50,
        max_grad_norm=1,
        save_strategy="steps",
        save_total_limit=1,
        remove_unused_columns=False,
        hub_token=args.hf_token
    )

    trainer = NeuronORPOTrainer(
        model=model,
        args=training_args,
        tokenizer=tokenizer,
        peft_config=peft_config,
        train_dataset=train_dataset
    )
    trainer.train()
    trainer.save_model(args.model_dir)

## 3) Kick-off the fine-tuning job
First we create a [SageMaker Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) with all the parameters we need to launch a training job.

It takes ~27 mins to fine-tune a Llama3.2-1B model using 1 trn1.2xlarge. This time includes 1/ code initialization; 2/ dependencies installing; 3/ model fine-tuning; 4/ trained model uploading to S3.

### 3.1) SageMaker estimator

In [None]:
import json
import logging
from sagemaker.pytorch import PyTorch

tp_degree=1
batch_size=1
max_seq_len=512
max_prompt_length=256

# ATTENTION: To use llama2 you need to pass HF_TOKEN of an account
# with permission to download Llama2 weights, otherwise the training will fail
ARENA_MAX,model_id=128,"meta-llama/Llama-3.2-1B"
## For some reason, if you get throttled by HF, uncomment the following line
#ARENA_MAX,model_id=128,"unsloth/Llama-3.2-1B"

# the default cache repo points to a public / read-only cache
# You can point it to your own repo, but make sure you properly defined the HF token in the HF_TOKEN (above)
CUSTOM_CACHE_REPO="aws-neuron/optimum-neuron-cache"

instance_type='ml.trn1.2xlarge'

hyperparameters={
    "epochs": 3,
    "num_samples": 1024,
    "max_seq_len": max_seq_len,
    "max_prompt_len": max_prompt_length,
    "tp_size": tp_degree,
    "pp_size": 1,
    "zero_1": True,
    "learning_rate": 1e-5,
    "bf16": True,
    "eval_batch_size": batch_size,
    "train_batch_size": batch_size,
    "model_id": model_id
}

if HF_TOKEN and len(HF_TOKEN) > 3:
    hyperparameters["hf_token"]= HF_TOKEN

print(f"Instance type: {instance_type}\nHyperparameters: {hyperparameters}")
estimator = PyTorch(
    entry_point="train.py", # Specify your train script
    source_dir="src",
    role=role,
    sagemaker_session=sess,    
    instance_count=1,
    instance_type=instance_type,
    output_path=f"s3://{bucket}/output",
    disable_profiler=True,
    #input_mode='FastFile', # makes FS read-only
    disable_output_compression=True,
    
    image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.20.0-ubuntu20.04",
    
    volume_size = 512,
    distribution={
        "torch_distributed": {
            "enabled": True
        }
    },
    environment={
        # Uncomment the following line to precompile the cache files
        #"RUN_NEURON_PARALLEL_COMPILE": "1",
        "OMP_NUM_THREADS": "1",
        "FI_EFA_FORK_SAFE": "1",
        "FI_EFA_USE_DEVICE_RDMA": "1",
        "FI_PROVIDER": "efa",
        "XLA_DOWNCAST_BF16": "1",
        "NEURON_FUSE_SOFTMAX": "1",
        "NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS": "5",
        
        "NEURON_RT_STOCHASTIC_ROUNDING_EN": "1",
        "CUSTOM_CACHE_REPO": CUSTOM_CACHE_REPO,
        "MALLOC_ARENA_MAX": str(ARENA_MAX), # required to avoid OOM
        "NEURON_CC_FLAGS": "--retry_failed_compilation --distribution-strategy=llm-training --enable-saturate-infinity"
    },
    hyperparameters=hyperparameters,
    metric_definitions=[
        {"Name": "train_loss", "Regex": "'loss.:\S*(.*?),"},
        {"Name": "it_per_sec", "Regex": ",\S*(.*?)it.s."},
    ]
)

### 3.2) Launch a SageMaker training job
This will take ~25mins

In [None]:
from sagemaker.inputs import TrainingInput

estimator.fit({"train": train_path})

#### Training cost
To compute the training cost, check the "Price per Hour" here and calculate: https://aws.amazon.com/sagemaker/pricing/

**EXAMPLE**
```
Billable seconds: 1425 (~23mins)
Instance: ml.trn1.2xlarge
Price per hour (26 Nov 2024): $1.54531
Training cost: 1425 / 60.0 / 60.0 * 1.54531 = $0.61
```

### 3.3) Save some parameters for the next notebook

In [None]:
with open("training_job_name.txt", "w") as f:
    f.write(estimator._current_job_name)
    f.write("\n")
    f.write(region)
    f.write("\n")

**[Now, go to the next notebook: Deploy the fine-tuned model](02_DeployModel.ipynb)**