<a href="https://colab.research.google.com/github/SepKeyPro/genAI/blob/main/llama3_dpo_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine-tuning Llama3-8B using DPO technique**

In [1]:
pip install -U transformers datasets accelerate peft bitsandbytes wandb git+https://github.com/huggingface/trl

In [2]:
import torch
import wandb
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
)
from trl import DPOTrainer, DPOConfig, SFTTrainer, setup_chat_format
from huggingface_hub import login

In [12]:
login(token="Your Hugging Face access key")
wandb.login(key="Your wandb access key") # In case you want to see training process on wandb dashbord

In [4]:
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "meta-llama/Meta-Llama-3-8B"
new_model = "fine-tuned-llama-3-8B-dpo"
tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
chat = [
   {"role": "user", "content": "Hello, how is the weather today?"},
   {"role": "assistant", "content": "It's currently cloudy and 55.4 F?"},
]
tokenizer.apply_chat_template(chat, tokenize=False)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

No chat template is defined for this tokenizer - using a default chat template that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.



"<|im_start|>user\nHello, how is the weather today?<|im_end|>\n<|im_start|>assistant\nIt's currently cloudy and 55.4 F?<|im_end|>\n"

In [5]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)

In [7]:
dataset = "mlabonne/orpo-dpo-mix-40k"
dataset = load_dataset(dataset,split="all")
dataset = dataset.shuffle(seed=42).select(range(100))

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    torch_dtype=torch.bfloat16,
    device_map = "auto"
)
model = prepare_model_for_kbit_training(model)

model.config.use_cache = False

# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [8]:
def format_chat_template(row):
    row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
    row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)
    return row

dataset = dataset.map(format_chat_template)
dataset = dataset.train_test_split(test_size=0.01)
train_dataset = dataset['train']
eval_dataset = dataset['test']

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [9]:
training_args = DPOConfig(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    save_strategy="no",
    logging_steps=1,
    output_dir="./results",
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)


dpo_trainer = DPOTrainer(
    model,
    ref_model=None,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    tokenizer=tokenizer,
)
dpo_trainer.train()



Map:   0%|          | 0/99 [00:00<?, ? examples/s]

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs
[34m[1mwandb[0m: Currently logged in as: [33msepehr-keykhaie[0m ([33msep-test[0m). Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011112465500000073, max=1.0…

Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
1,0.6931
2,0.6931
3,0.6932
4,0.667
5,0.6914
6,0.6887
7,0.6996
8,0.6908
9,0.6879
10,0.6969


TrainOutput(global_step=200, training_loss=0.2735063567684483, metrics={'train_runtime': 2446.242, 'train_samples_per_second': 0.327, 'train_steps_per_second': 0.082, 'total_flos': 0.0, 'train_loss': 0.2735063567684483, 'epoch': 8.080808080808081})

In [12]:
dpo_trainer.model.save_pretrained("fine_tuned_dpo_model")
tokenizer.save_pretrained("fine_tuned_dpo_tokenizer")

In [12]:
original_model = AutoModelForCausalLM.from_pretrained(
    base_model, # "meta-llama/Meta-Llama-3-8B"
    return_dict=True,
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Merge original model with the adapter
model = PeftModel.from_pretrained(original_model, "fine_tuned_dpo_model")

In [12]:
# Format prompt
chat = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is the biggest city in the world?"}
]

prompt = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=False)

# Create pipeline
generation_pipeline = pipeline("text-generation",model=model,tokenizer=tokenizer)

# Generate text
sequences = generation_pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    max_length=200,
)
print(sequences[0]['generated_text'])