## GPU configuration

In [1]:
import os 
import torch 
os.environ["CUDA_VISIBLE_DEVICES"] = "1,2"
os.environ["WANDB_NOTEBOOK_NAME"] = "mistral_dpo"

print(torch.cuda.device_count())
print(torch.cuda.get_device_name())

print(torch.cuda.mem_get_info())
print(torch.cuda.current_device())

2
Tesla V100-PCIE-32GB
(22977576960, 34079899648)
0


## Weights and Biasees for monitoring

In [26]:
import wandb 
wandb.login()

wandb_project = "mistral-DPO"
if len(wandb_project) > 0:
    os.environ["WANDB_PROJECT"] = wandb_project

run = wandb.init(project='Fine tuning mistral 7B', job_type="training", anonymous="allow")

[34m[1mwandb[0m: Currently logged in as: [33mtrevahok[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [2]:
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import DPOTrainer
import bitsandbytes as bnb
import wandb

  from .autonotebook import tqdm as notebook_tqdm
2024-02-12 11:20:52.354569: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-12 11:20:52.354623: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-12 11:20:52.356523: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-12 11:20:52.369286: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
model_name = 'teknium/OpenHermes-2.5-Mistral-7B'
new_model_name = 'mistral_instruct_7b_dpo'
dataset_name = 'Intel/orca_dpo_pairs'

## OpenHermes dataset 

In [4]:
dataset = load_dataset(dataset_name)['train']
original_columns = dataset.column_names

In [5]:
from pprint import pprint 
pprint( dataset[0] ) 

{'chosen': '[\n'
           '  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n'
           '  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n'
           ']',
 'question': 'You will be given a definition of a task first, then some input '
             'of the task.\n'
             'This task is about using the specified sentence and converting '
             'the sentence to Resource Description Framework (RDF) triplets of '
             'the form (subject, predicate object). The RDF triplets generated '
             'must be such that the triplets accurately capture the structure '
             'and semantics of the input sentence. The input is a sentence and '
             'the output is a list of triplets of the form [subject, '
             'predicate, object] that capture the relationships present in the '
             'sentence. When a sentence has more than 1 RDF triplet possible, '
             'the output must contain all of them.\n'
           

## Load Tokenizer and Prompt format 

In [6]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [7]:
def chatml_format(example):
    # Format system
    if len(example['system']) > 0:
        message = {"role": "system", "content": example['system']}
        system = tokenizer.apply_chat_template([message], tokenize=False)
    else:
        system = ""

    # Format instruction
    message = {"role": "user", "content": example['question']}
    prompt = tokenizer.apply_chat_template([message], tokenize=False, add_generation_prompt=True)

    # Format chosen answer
    chosen = example['chosen'] + "<|im_end|>\n"

    # Format rejected answer
    rejected = example['rejected'] + "<|im_end|>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

In [8]:
chatml_format(dataset[0])

{'prompt': "<|im_start|>user\nYou will be given a definition of a task first, then some input of the task.\nThis task is about using the specified sentence and converting the sentence to Resource Description Framework (RDF) triplets of the form (subject, predicate object). The RDF triplets generated must be such that the triplets accurately capture the structure and semantics of the input sentence. The input is a sentence and the output is a list of triplets of the form [subject, predicate, object] that capture the relationships present in the sentence. When a sentence has more than 1 RDF triplet possible, the output must contain all of them.\n\nAFC Ajax (amateurs)'s ground is Sportpark De Toekomst where Ajax Youth Academy also play.\nOutput:<|im_end|>\n<|im_start|>assistant\n",
 'chosen': '[\n  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n]<|im_end|>\n',
 'rejected': " Sure, I'd be happy to help! Here a

In [9]:
dataset = dataset.map( chatml_format, remove_columns=original_columns)

In [10]:
dataset[0]

{'chosen': '[\n  ["AFC Ajax (amateurs)", "has ground", "Sportpark De Toekomst"],\n  ["Ajax Youth Academy", "plays at", "Sportpark De Toekomst"]\n]<|im_end|>\n',
 'rejected': " Sure, I'd be happy to help! Here are the RDF triplets for the input sentence:\n\n[AFC Ajax (amateurs), hasGround, Sportpark De Toekomst]\n[Ajax Youth Academy, playsAt, Sportpark De Toekomst]\n\nExplanation:\n\n* AFC Ajax (amateurs) is the subject of the first triplet, and hasGround is the predicate that describes the relationship between AFC Ajax (amateurs) and Sportpark De Toekomst.\n* Ajax Youth Academy is the subject of the second triplet, and playsAt is the predicate that describes the relationship between Ajax Youth Academy and Sportpark De Toekomst.\n\nNote that there may be other possible RDF triplets that could be derived from the input sentence, but the above triplets capture the main relationships present in the sentence.<|im_end|>\n",
 'prompt': "<|im_start|>user\nYou will be given a definition of a ta

## Quantization config

In [11]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

In [12]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    load_in_4bit=True
)
model.config.use_cache = False

# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    load_in_4bit=True
)

Loading checkpoint shards: 100%|██████████| 2/2 [02:54<00:00, 87.06s/it] 
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00,  4.97s/it]


In [28]:
training_args = TrainingArguments(
    per_device_train_batch_size=32,
    auto_find_batch_size= True,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=200,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model_name,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=False,
    fp16=True,
    report_to="wandb",
)

# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    max_prompt_length=1024,
    max_length=1536,
)

Map: 100%|██████████| 12859/12859 [01:21<00:00, 157.79 examples/s]
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [29]:
dpo_trainer.train()


Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
1,0.694
2,0.6935
3,0.6902
4,0.6825
5,0.6748
6,0.6557
7,0.6372
8,0.6079
9,0.6034
10,0.5801


Step,Training Loss
1,0.0363
2,0.4014
3,0.0528
4,0.0008
5,0.0806
6,0.0628
7,0.0142
8,0.0306
9,0.0021
10,0.0183


KeyboardInterrupt: 

## Save Artifacts

In [30]:
# Save artifacts
dpo_trainer.model.save_pretrained("final_" + new_model_name )
tokenizer.save_pretrained("final_checkpoint_" + new_model_name)


('final_checkpoint_mistral_instruct_7b_dpo/tokenizer_config.json',
 'final_checkpoint_mistral_instruct_7b_dpo/special_tokens_map.json',
 'final_checkpoint_mistral_instruct_7b_dpo/tokenizer.model',
 'final_checkpoint_mistral_instruct_7b_dpo/added_tokens.json',
 'final_checkpoint_mistral_instruct_7b_dpo/tokenizer.json')

## Clean up memory

In [None]:
# Flush memory
import gc 
del dpo_trainer, model, ref_model
gc.collect()
torch.cuda.empty_cache()


## Load models & Adapters for testing

In [11]:

# Reload model in FP16 (instead of NF4)
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    return_dict=True,
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)


Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00,  2.35s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [12]:
# Merge base model with the adapter
model = PeftModel.from_pretrained(base_model, "final_" + new_model_name)
model = model.merge_and_unload()


## Test your model 

In [13]:

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained("final_checkpoint_mistral_instruct_7b_dpo")
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model= "final_mistral_instruct_7b_dpo",
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.81s/it]
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.


<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is a type of artificial intelligence system designed to process and generate human language. These models are typically based on deep learning techniques, such as neural networks, and are trained on vast amounts of text data. They can be used for various natural language processing tasks, including language translation, text generation, and text classification. Large language models are often capable of understanding and generating complex sentences and can learn to generate human-like text. Some popular examples of large language models include GPT-3, BERT, and T5.
