<a href="https://colab.research.google.com/github/RobinSmits/Dutch-LLMs/blob/main/Qwen1_5_7B_Dutch_Chat_DPO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

This notebook performs a DPO Allignment of the QLoRA adapter model [](https://huggingface.co/robinsmits/Qwen1.5-7B-Dutch-Chat-Sft) which in turn is based on Qwen1.5-7B-Chat.

Officially the Qwen1.5 model types don't support the Dutch language. However when doing some experiments I noticed that the chat quality for the Dutch language (for the 7B and larger sizes..) was comparable or may'be even better then with the Mistral models. Mistral officially also doesn't support Dutch however it already provided some interresting Dutch Chat Models as created by Bram van Roy and Edwin Rijgersberg.

This is basically my attempt to further fine-tune and allign the Qwen1.5-7B-Chat model and optimize it for Dutch.

The dataset used is the Dutch DPO Allignment Chat Dataset [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) as created by Bram van Roy. Kudos to Bram for this dataset!


## Install and Import Modules

In [None]:
# Install Modules
!pip install -q accelerate==0.27.2
!pip install -q bitsandbytes==0.43.0
!pip install -q datasets==2.17.1
!pip install -q peft==0.9.0
!pip install -q transformers==4.38.2
!pip install -q trl==0.8.1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m67.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m60.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m88.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.7/731.7 MB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 MB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━

In [None]:
# Import Modules
from datasets import load_dataset
from huggingface_hub import notebook_login
from peft import PeftModel, AutoPeftModelForCausalLM
from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          TrainingArguments)
import torch
from trl import DPOTrainer

# Set TF32 for A100
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

## Constants

In [None]:
# Set Name Constants
base_model_name = 'Qwen/Qwen1.5-7B-Chat'
sft_model_name = 'robinsmits/Qwen1.5-7B-Dutch-Chat-Sft'
dpo_model_name = 'Qwen1.5-7B-Dutch-Chat-Dpo'
dpo_merged_model_name = 'Qwen1.5-7B-Dutch-Chat'

## Connect Google Drive

In [None]:
# Mount Google Drive
import os
from google.colab import drive
drive.mount('/content/drive')

# Set Folder to use...
WORK_DIR = '/content/drive/My Drive/QwenDutch/'
os.makedirs(WORK_DIR, exist_ok = True)

Mounted at /content/drive


## HuggingFace Login

In [None]:
# HuggingFace Hub Login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Tokenizer

In [None]:
# Create Tokenizer
tokenizer = AutoTokenizer.from_pretrained(sft_model_name)

# Set Tokenizer Settings
tokenizer.truncation_side = "left"
tokenizer.add_special_tokens({"bos_token": tokenizer.eos_token})
tokenizer.bos_token_id = tokenizer.eos_token_id
tokenizer.pad_token_id = tokenizer.eos_token_id

# Tokenizer Summary
print(tokenizer)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Qwen2TokenizerFast(name_or_path='robinsmits/Qwen1.5-7B-Dutch-Chat-Sft', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='right', truncation_side='left', special_tokens={'bos_token': '<|im_end|>', 'eos_token': '<|im_end|>', 'pad_token': '<|im_end|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}


## Create QLoRa Model based on Qwen1.5_7B_Dutch_Chat_Sft Model

In [None]:
# Create BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_4bit = True,
                                bnb_4bit_use_double_quant = True,
                                bnb_4bit_quant_type = 'nf4',
                                bnb_4bit_compute_dtype = torch.bfloat16)

# Create Base Model
model = AutoModelForCausalLM.from_pretrained(base_model_name,
                                             quantization_config = bnb_config,
                                             torch_dtype = torch.bfloat16,
                                             device_map = 'auto')

# Set cache to False
model.config.use_cache = False

# Load Adapter
model = PeftModel.from_pretrained(model,
                                  sft_model_name,
                                  is_trainable = True)

# Show Model Parameter Count
model.print_trainable_parameters()

# Show Model Summary
print(model)

config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/31.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/694 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/640M [00:00<?, ?B/s]

trainable params: 159,907,840 || all params: 7,881,232,384 || trainable%: 2.0289699910972705
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen2ForCausalLM(
      (model): Qwen2Model(
        (embed_tokens): Embedding(151936, 4096)
        (layers): ModuleList(
          (0-31): 32 x Qwen2DecoderLayer(
            (self_attn): Qwen2SdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
 

## Add Adapter for DPO Reference

In [None]:
# Add Adapter for DPO Reference
model.load_adapter(sft_model_name,
                   adapter_name = 'reference')

# Show Model Parameter Count
model.print_trainable_parameters()

trainable params: 159,907,840 || all params: 8,041,140,224 || trainable%: 1.9886214584684252


## Show Dutch Chat template

In [None]:
# Summary Chat Template
tokenizer.chat_template

"{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nJe bent een behulpzame AI assistent<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}"

## Load DPO Dataset

For DPO allignment I used the 'dpo_hq' subset as it is further cleaned. I also tried out the 'dpo_all' subset. I didn't really notice much difference when using the models that were trained on them.

However for safety and probably best allignment quality I will only use and publish the model based on the 'dpo_hq' subset.

In [None]:
# Load Dataset
dataset = load_dataset('BramVanroy/ultra_feedback_dutch_cleaned', 'dpo_hq')

# Split Datasets
train_dataset = dataset['train_prefs']
val_dataset = dataset['test_prefs']

# Summary
print(train_dataset)
print(val_dataset)

Downloading readme:   0%|          | 0.00/9.20k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/33.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.78M [00:00<?, ?B/s]

Generating train_prefs split:   0%|          | 0/9815 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/517 [00:00<?, ? examples/s]

Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 9815
})
Dataset({
    features: ['prompt', 'chosen', 'rejected'],
    num_rows: 517
})


In [None]:
# Used and modified from HuggingFace Alignment Handbook
def apply_chat_template(example, tokenizer):
    if all(k in example.keys() for k in ("chosen", "rejected")):
        # For DPO, the inputs are triples of (prompt, chosen, rejected), where `chosen` and `rejected` are the final turn of a dialogue
        # We therefore need to extract the N-1 turns to form the prompt
        prompt_messages = example["chosen"][:-1]

        # Now we extract the final turn to define chosen/rejected responses
        chosen_messages = example["chosen"][-1:]
        rejected_messages = example["rejected"][-1:]
        example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
        example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
        example["text_prompt"] = tokenizer.apply_chat_template(prompt_messages, tokenize=False)

    return example

In [None]:
# Get Original columns
original_columns = train_dataset.column_names

# Proces Train Dataset
train_dataset = train_dataset.map(apply_chat_template,
                                  fn_kwargs = {"tokenizer": tokenizer},
                                  remove_columns = original_columns)

# Proces Validation Dataset
val_dataset = val_dataset.map(apply_chat_template,
                              fn_kwargs = {"tokenizer": tokenizer},
                              remove_columns = original_columns)

# Rename columns
train_dataset = train_dataset.rename_columns({"text_prompt": "prompt", "text_chosen": "chosen", "text_rejected": "rejected"})
val_dataset = val_dataset.rename_columns({"text_prompt": "prompt", "text_chosen": "chosen", "text_rejected": "rejected"})

# Summary
print(train_dataset)
print(val_dataset)

Map:   0%|          | 0/9815 [00:00<?, ? examples/s]

Map:   0%|          | 0/517 [00:00<?, ? examples/s]

Dataset({
    features: ['chosen', 'rejected', 'prompt'],
    num_rows: 9815
})
Dataset({
    features: ['chosen', 'rejected', 'prompt'],
    num_rows: 517
})


## Train Model

In [None]:
# Set Steps
eval_steps = 30
save_steps = 30
logging_steps = 15

# DPO Training Arguments
training_args = TrainingArguments(num_train_epochs = 1,
                                  learning_rate = 1.0e-5,
                                  lr_scheduler_type = 'cosine',
                                  evaluation_strategy = "steps",
                                  logging_steps = logging_steps,
                                  save_strategy = 'steps',
                                  eval_steps = eval_steps,
                                  save_steps = save_steps,
                                  save_total_limit = 1,
                                  per_device_train_batch_size = 1,
                                  per_device_eval_batch_size = 2,
                                  gradient_accumulation_steps = 32,
                                  gradient_checkpointing = True,
                                  gradient_checkpointing_kwargs = {'use_reentrant': False},
                                  warmup_ratio = 0.05,
                                  bf16 = True,
                                  tf32 = True,
                                  output_dir = dpo_model_name,
                                  hub_model_id = dpo_model_name,
                                  remove_unused_columns = False,
                                  push_to_hub = True,
                                  hub_private_repo = True,
                                  optim = 'paged_adamw_8bit',
                                  report_to = 'tensorboard')

# Config DPOTrainer
dpo_trainer = DPOTrainer(model,
                         args = training_args,
                         beta = 0.05,
                         max_length = 1536,
                         max_prompt_length = 1024,
                         train_dataset = train_dataset,
                         eval_dataset = val_dataset,
                         tokenizer = tokenizer,
                         model_adapter_name = 'default',
                         ref_adapter_name = 'reference')
# Train DPO Model
dpo_trainer.train()

Map:   0%|          | 0/9815 [00:00<?, ? examples/s]

Map:   0%|          | 0/517 [00:00<?, ? examples/s]

Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,Validation Loss,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/rejected,Logps/chosen,Logits/rejected,Logits/chosen
30,0.5503,0.46836,-0.043876,-0.629464,0.891892,0.585588,-837.951294,-769.810303,-0.933545,-0.889388
60,0.4178,0.356786,-0.371327,-1.476905,0.901544,1.105578,-854.900024,-776.359375,-0.876791,-0.827572
90,0.3264,0.314266,-0.489312,-1.873036,0.915058,1.383724,-862.822754,-778.719055,-0.842782,-0.792899
120,0.2999,0.28849,-0.683218,-2.311808,0.915058,1.62859,-871.598145,-782.597107,-0.825997,-0.772953
150,0.3454,0.274907,-0.723924,-2.490368,0.918919,1.766443,-875.16925,-783.411255,-0.823463,-0.767826
180,0.3354,0.268453,-0.677479,-2.485877,0.916988,1.808398,-875.079529,-782.482422,-0.81301,-0.757381
210,0.2848,0.265212,-0.71569,-2.569226,0.913127,1.853536,-876.746521,-783.246643,-0.815739,-0.758648


Step,Training Loss,Validation Loss,Rewards/chosen,Rewards/rejected,Rewards/accuracies,Rewards/margins,Logps/rejected,Logps/chosen,Logits/rejected,Logits/chosen
30,0.5503,0.46836,-0.043876,-0.629464,0.891892,0.585588,-837.951294,-769.810303,-0.933545,-0.889388
60,0.4178,0.356786,-0.371327,-1.476905,0.901544,1.105578,-854.900024,-776.359375,-0.876791,-0.827572
90,0.3264,0.314266,-0.489312,-1.873036,0.915058,1.383724,-862.822754,-778.719055,-0.842782,-0.792899
120,0.2999,0.28849,-0.683218,-2.311808,0.915058,1.62859,-871.598145,-782.597107,-0.825997,-0.772953
150,0.3454,0.274907,-0.723924,-2.490368,0.918919,1.766443,-875.16925,-783.411255,-0.823463,-0.767826
180,0.3354,0.268453,-0.677479,-2.485877,0.916988,1.808398,-875.079529,-782.482422,-0.81301,-0.757381
210,0.2848,0.265212,-0.71569,-2.569226,0.913127,1.853536,-876.746521,-783.246643,-0.815739,-0.758648
240,0.3437,0.262121,-0.72333,-2.609051,0.915058,1.88572,-877.54303,-783.399353,-0.813835,-0.756063
270,0.2655,0.261088,-0.718335,-2.615443,0.915058,1.897108,-877.670837,-783.2995,-0.810576,-0.75241
300,0.3442,0.260963,-0.724767,-2.62241,0.916988,1.897643,-877.810181,-783.428162,-0.810977,-0.75283


TrainOutput(global_step=306, training_loss=0.3584630991898331, metrics={'train_runtime': 13085.7986, 'train_samples_per_second': 0.75, 'train_steps_per_second': 0.023, 'total_flos': 0.0, 'train_loss': 0.3584630991898331, 'epoch': 1.0})

## Push to Hub

In [None]:
# Push tokenizer to hub
tokenizer.push_to_hub(dpo_model_name, private = True)

# Push model to hub
dpo_trainer.push_to_hub()

adapter_model.safetensors:   0%|          | 0.00/640M [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/640M [00:00<?, ?B/s]

events.out.tfevents.1711704394.2ffe06bbd684.4391.0:   0%|          | 0.00/26.5k [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo/commit/b7e48de00d682165656696a8c5b7a55456345232', commit_message='End of training', commit_description='', oid='b7e48de00d682165656696a8c5b7a55456345232', pr_url=None, pr_revision=None, pr_num=None)

## Merge Model and Push to Hub

In [None]:
# Set Name Constants
model_name = f'robinsmits/{dpo_model_name}'
merged_model_name = f'robinsmits/{dpo_merged_model_name}'

# Summary
print(model_name)
print(merged_model_name)

robinsmits/Qwen1.5-7B-Dutch-Chat-Dpo
robinsmits/Qwen1.5-7B-Dutch-Chat


In [None]:
# Cleanup
del model, tokenizer

# Load from Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create Model - Dtype bfloat16
model = AutoPeftModelForCausalLM.from_pretrained(model_name,
                                                 torch_dtype = torch.bfloat16)

# Merge and Unload
model = model.merge_and_unload()

# Summary
print(model)

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/80.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


adapter_config.json:   0%|          | 0.00/694 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


adapter_model.safetensors:   0%|          | 0.00/640M [00:00<?, ?B/s]

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151646, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): L

In [None]:
# Push To Hub
tokenizer.push_to_hub(merged_model_name, private = True)
model.push_to_hub(merged_model_name, private = True)

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.23G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/robinsmits/Qwen1.5-7B-Dutch-Chat/commit/f37da8096913d21f42a6ae10686bd47fa218b21a', commit_message='Upload Qwen2ForCausalLM', commit_description='', oid='f37da8096913d21f42a6ae10686bd47fa218b21a', pr_url=None, pr_revision=None, pr_num=None)