# Fine-tune a Mistral-7b model with DPO

> 🗣️ [DPO Training](https://github.com/deepkapha/LLM-fine-tuning/blob/main/Fine_tune_a_Mistral_7b_model_with_DPO.ipynb)

❤️ Created by [tannisthamaiti](https://www.linkedin.com/in/tannisthamaiti/).

In [1]:
!pip install -q datasets trl peft bitsandbytes sentencepiece wandb

In [2]:
#delete a model from hugging face hub
#!pip install huggingface-hub
#from huggingface_hub import HfApi
#HfApi().delete_repo("sft_model")

In [3]:
import os
import gc
import torch

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import DPOTrainer, DPOConfig
import bitsandbytes as bnb
from google.colab import userdata
import wandb

# Defined in the secrets tab in Google Colab
hf_token = userdata.get('HF_TOKEN')
wb_token = userdata.get('wandb')
wandb.login(key=wb_token)

model_name = "Tannistha/sft_model"
new_model = "NeuralHermes-2.5-Mistral-7B"

[34m[1mwandb[0m: Currently logged in as: [33mtannistha-iitkgp[0m ([33mtannistha-iitkgp-muzer[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


## Format dataset

In [4]:
def apply_dpo_template(sample, tokenizer):

    prompt_message = [sample["chosen"][-2]]

    sample["chosen_final"] = sample["chosen"][-1]["content"] + "\n"
    sample["rejected_final"] = sample["rejected"][-1]["content"] + "\n"
    sample["prompt_final"] = tokenizer.apply_chat_template(
      prompt_message, tokenize=False, add_generation_prompt=True
    )

    return sample

In [5]:
dataset = load_dataset(
    "HuggingFaceH4/ultrafeedback_binarized",
    split="train_prefs[:100]"
)


# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

dataset = dataset.map(apply_dpo_template, fn_kwargs={"tokenizer": tokenizer}, remove_columns=['prompt', 'prompt_id', 'chosen', 'rejected', 'messages', 'score_chosen', 'score_rejected'])
dataset = dataset.rename_column("chosen_final", "chosen")
dataset = dataset.rename_column("rejected_final", "rejected")
dataset = dataset.rename_column("prompt_final", "prompt")

# Save columns
original_columns = dataset.column_names
# Format dataset
# dataset = dataset.map(
#     chatml_format,
#     remove_columns=original_columns
# )

# Print sample
dataset[1]

{'chosen': "It is not recommended to modify built-in methods as it can lead to unexpected results and potential bugs. You may consider developing a new method or exploring other methods to achieve your desired outcome. Alternatively, you can search for other libraries or modules that offer similar functionalities or reach out to the library's support team for assistance.\n",
 'rejected': 'Thank you for reaching out for assistance! I\'m here to help you with your question. However, I must point out that the question itself may not be meaningful.\n\nThe `getPosition` method is a part of the AntV/G library, which is a gradient boosting framework. It is not clear what you mean by "transforming" this method, as it is not a functional programming concept. Additionally, the concept of "zrender" is not related to AntV/G.\n\nCould you please provide more context or clarify your question? I\'d be happy to help if there\'s a specific issue you\'re facing or if you have a misunderstanding about th

## Train model with DPO

In [6]:
# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    load_in_4bit=True
)
model.config.use_cache = False
## Add an adaptor
model = get_peft_model(model, peft_config)
# Verify the adapter is added
print("Adapters added to the model.")
print("Active adapters:", model.active_adapters)

# Training arguments
training_args = DPOConfig(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
    beta=0.1,
    max_prompt_length=1024,
    max_length=1536,
)

# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,

)

# Fine-tune model with DPO
dpo_trainer.train()

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now default to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Adapters added to the model.
Active adapters: ['default']


  dpo_trainer = DPOTrainer(
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.




Step,Training Loss
1,0.6931
2,0.6931
3,0.6921
4,0.6872
5,0.6839
6,0.7025
7,0.1723
8,0.6638
9,0.6588
10,0.6235


TrainOutput(global_step=200, training_loss=0.08225442668878567, metrics={'train_runtime': 3257.7002, 'train_samples_per_second': 0.982, 'train_steps_per_second': 0.061, 'total_flos': 0.0, 'train_loss': 0.08225442668878567, 'epoch': 28.64})

## Upload model

In [7]:
# Save artifacts
dpo_trainer.model.save_pretrained("final_checkpoint")
tokenizer.save_pretrained("final_checkpoint")

# Flush memory
del dpo_trainer, model
gc.collect()
torch.cuda.empty_cache()

# Reload model in FP16 (instead of NF4)
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    return_dict=True,
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
## Add an adaptor
base_model = get_peft_model(base_model, peft_config)
# Verify the adapter is added
print("Adapters added to the model.")
print("Active adapters:", base_model.active_adapters)


# Merge base model with the adapter
model = PeftModel.from_pretrained(base_model, "final_checkpoint")
model = model.merge_and_unload()

# Save model and tokenizer
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

# Push them to the HF Hub
model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Adapters added to the model.
Active adapters: ['default']


No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/Tannistha/NeuralHermes-2.5-Mistral-7B/commit/e1476d9e70c0a9855c5901ddcf0b6992a16d0f1e', commit_message='Upload tokenizer', commit_description='', oid='e1476d9e70c0a9855c5901ddcf0b6992a16d0f1e', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Tannistha/NeuralHermes-2.5-Mistral-7B', endpoint='https://huggingface.co', repo_type='model', repo_id='Tannistha/NeuralHermes-2.5-Mistral-7B'), pr_revision=None, pr_num=None)

In [8]:
# Save model and tokenizer
model.save_pretrained(new_model)
tokenizer.save_pretrained(new_model)

# Push them to the HF Hub
model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)

No files have been modified since last commit. Skipping to prevent empty commit.
No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/Tannistha/NeuralHermes-2.5-Mistral-7B/commit/e1476d9e70c0a9855c5901ddcf0b6992a16d0f1e', commit_message='Upload tokenizer', commit_description='', oid='e1476d9e70c0a9855c5901ddcf0b6992a16d0f1e', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Tannistha/NeuralHermes-2.5-Mistral-7B', endpoint='https://huggingface.co', repo_type='model', repo_id='Tannistha/NeuralHermes-2.5-Mistral-7B'), pr_revision=None, pr_num=None)

In [9]:
new_model

'NeuralHermes-2.5-Mistral-7B'

## Inference

In [10]:
# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"},
    {"role": "user", "content": "How can I develop a habit of drawing daily"},
    {"role": "user", "content": "how can I transform the getPosition method of antv/g's group in zrender?"},

]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


<|system|>
You are a helpful assistant chatbot.</s>
<|user|>
What is a Large Language Model?</s>
<|user|>
How can I develop a habit of drawing daily</s>
<|user|>
how can I transform the getPosition method of antv/g's group in zrender?</s>
<|assistant|>
A Large Language Model (LLM) is a type of machine learning model that is trained on large amounts of text data. It is used for various natural language processing tasks such as language translation, text summarization, and text generation.
<|assistant|>
To develop a habit of drawing daily, you can try the following tips: 1. Set a specific time each day to draw. This could be in the morning, after work, or any time that works for you. 2. Start with small goals and gradually increase the difficulty as you become more
