<a href="https://colab.research.google.com/github/bacoco/LLM_train/blob/main/orpo_unsloth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.26" trl peft accelerate bitsandbytes
!pip install wandb

In [None]:
import wandb
from google.colab import userdata
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import ORPOConfig, ORPOTrainer
from transformers import TextStreamer
import pprint

In [None]:
wb_token = userdata.get('wandb')
wandb.login(key=wb_token)

[34m[1mwandb[0m: Currently logged in as: [33meduardovasquez_007[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-bnb-4bit",
    max_seq_length=8192,
    dtype=None,
    load_in_4bit=True
)



==((====))==  Unsloth: Fast Llama patching release 2024.4
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.25.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
model.print_trainable_parameters()

trainable params: 41,943,040 || all params: 8,072,204,288 || trainable%: 0.5195983464188562


In [None]:
dataset = load_dataset("reciperesearch/dolphin-sft-v0.1-preference")["train"]

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def format_prompt(sample):
    instruction = sample["instruction"]
    input       = sample["input"]
    accepted    = sample["accepted"]
    rejected    = sample["rejected"]

    # ORPOTrainer expects prompt/chosen/rejected keys
    # See: https://huggingface.co/docs/trl/main/en/orpo_trainer
    sample["prompt"]   = alpaca_prompt.format(instruction, input, "")
    sample["chosen"]   = accepted + EOS_TOKEN
    sample["rejected"] = rejected + EOS_TOKEN
    return sample
pass

In [None]:
dataset = dataset.map(format_prompt)

In [None]:
row = dataset[1]
print('INSTRUCTION: ' + '=' * 50)
pprint.pprint(row["prompt"])
print('ACCEPTED: ' + '=' * 50)
pprint.pprint(row["chosen"])
print('REJECTED: ' + '=' * 50)
pprint.pprint(row["rejected"])

('Below is an instruction that describes a task, paired with an input that '
 'provides further context. Write a response that appropriately completes the '
 'request.\n'
 '\n'
 '### Instruction:\n'
 'You are an AI assistant that helps people find information.\n'
 '\n'
 '### Input:\n'
 'Given the rationale, provide a reasonable question and answer. Step-by-step '
 'reasoning process: Xkcd comics are very popular amongst internet users.\n'
 ' The question and answer:\n'
 '\n'
 '### Response:\n')
('Question: What makes Xkcd comics popular among internet users?\n'
 '\n'
 'Answer: Xkcd comics are popular among internet users because of their clever '
 'humor, relatable themes, and minimalist art style. They often cover topics '
 'like science, technology, and life experiences, making them appealing to a '
 'broad audience.<|end_of_text|>')
('Question: What is the reason behind the popularity of Xkcd comics among '
 'internet users?\n'
 '\n'
 'Answer: Xkcd comics are popular among internet 

In [None]:
train_dataset = dataset.shuffle(seed=10).select(range(3000))
eval_dataset = dataset.shuffle(seed=10).select(range(1000))

In [None]:
orpo_args = ORPOConfig(
        max_length = 8192,
        max_prompt_length = 8192//2,
        max_completion_length = 8192//2,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        beta = 0.1, # optimization process
        logging_steps = 1, #frequency of logging metrics
        optim = "adamw_8bit",
        lr_scheduler_type = "linear",
        max_steps = 10, # Change to num_train_epochs = 1 for full training runs
        fp16 = not torch.cuda.is_bf16_supported(), # Whether to use 16-bit floating-point precision for training
        bf16 = torch.cuda.is_bf16_supported(), # Whether to use 16-bit bfloat16 precision for training
        output_dir = "outputs",
        report_to="wandb",
    )

In [None]:
orpo_trainer = ORPOTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=orpo_args
)

max_steps is given, it will override any value given in num_train_epochs


In [None]:
orpo_trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 3,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 10
 "-____-"     Number of trainable parameters = 41,943,040


Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
1,4.3793
2,4.7945
3,5.0253
4,2.866
5,1.922
6,2.2792
7,1.9453
8,1.9789
9,1.9653
10,1.6552


TrainOutput(global_step=10, training_loss=2.8810996890068052, metrics={'train_runtime': 204.7093, 'train_samples_per_second': 0.391, 'train_steps_per_second': 0.049, 'total_flos': 0.0, 'train_loss': 2.8810996890068052, 'epoch': 0.02666666666666667})

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.", # instruction
        "Solve the following algebraic equation: 2x + 5 = 15", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.

### Input:
Solve the following algebraic equation: 2x + 5 = 15

### Response:
2x + 5 = 15
2x = 10
x = 5<|end_of_text|>


In [None]:
inputs = tokenizer(
[
    alpaca_prompt.format(
        "You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.", # instruction
        "Describe the steps to take when preparing for a job interview.", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.

### Input:
Describe the steps to take when preparing for a job interview.

### Response:
1. Research the company and the position you are applying for.
2. Prepare a list of questions to ask during the interview.
3. Practice answering common interview questions.
4. Dress professionally and arrive early.
5. Be confident and enthusiastic.
6. Follow up after the interview with a thank-you note.<|end_of_text|>


In [None]:
model.save_pretrained("lora_model")
tokenizer.save_pretrained("lora_model")