# QUICK FINE TUNING PROJECT 

</center>
[Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)
<hr style="border:10px solid gray"> </hr>
<p style="text-align: justify;">

<br><br>
This **project** is meant to experiment quickly a pretrained model like Qwen 
I give here a jupyter notebook version before I m releasing a refractored code with nice and neat libraries.
- finetune [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on a question/answer dataset.

- To reduce the required GPU VRAM for the finetuning, we will use [LoRA](https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2) and [quantization](https://huggingface.co/blog/4bit-transformers-bitsandbytes) techniques.

- Compare the results before and after instructin tuning.

- Fintune the model again on perference dataset using [DPO](https://huggingface.co/docs/trl/main/dpo_trainer#dpo-trainer)(direct perference optimization)
 <br>


### <b>I) Finetuning Qwen2.5-0.5B using HuggingFace's Transfomers</b>
In this section, we will fintune [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on a question/answer dataset.

To reduce the required GPU VRAM for the finetuning, we will use [LoRA](https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2) and [quantization](https://huggingface.co/blog/4bit-transformers-bitsandbytes) techniques.

#### <b>Preparing the environment and installing libraries:<b>

In [None]:
!nvidia-smi

Sun Oct 20 16:15:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P0              26W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
!pip install -qqq bitsandbytes torch transformers peft accelerate datasets loralib einops trl

In [None]:
!pip install --upgrade transformers



In [None]:
import json
import os
from pprint import pprint

import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from trl import DPOConfig, DPOTrainer

from peft import (
    LoraConfig,
    PeftConfig,
    PeftModel,
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

### <b>Loading the model and the tokenizer:<b>

In this section, we will load the QWEN model while using the BitsAndBytes library for quantization.

In [None]:
MODEL_NAME = "Qwen/Qwen2.5-0.5B"
# MODEL_NAME = "unsloth/Llama-3.2-1B" # Try Llama if you want

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
def print_trainable_parameters(model):

    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

#### <b>Configuring LoRA:<b>

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    # target_modules=["query_key_value"],
    bias="none",
    task_type= TaskType.CAUSAL_LM
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 1081344 || all params: 316200832 || trainable%: 0.34198012483408013


### <b>Test the model before finetuning:<b>

In [None]:
prompt = "<human>: What equipment do I need for rock climbing?  \n <assistant>: "  
print(prompt)



generation_config = model.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

### <b>Loading the question/answer dataset from HuggingFace:<b>

In [None]:
data = load_dataset("HuggingFaceH4/helpful-instructions")
pd.DataFrame(data["train"])

Unnamed: 0,instruction,demonstration,meta
0,"Hi, I want to learn to play horseshoes. Can yo...","I can, but maybe I should begin by telling you...",{'source': 'helpful-anthropic-raw'}
1,How do I teach kids to meditate?,Great question! That’s a really useful skill t...,{'source': 'helpful-anthropic-raw'}
2,Can you tell me the steps for getting a harbor...,Sure. I believe you’ll need a copy of the mari...,{'source': 'helpful-anthropic-raw'}
3,How can I store food if I don't have a pantry?,"You could store the food in a refrigerator, th...",{'source': 'helpful-anthropic-raw'}
4,what are some good novels for a 9 year old?,"That depends on the 9 year old, but if they li...",{'source': 'helpful-anthropic-raw'}
...,...,...,...
147701,"Given the following sentence, classify it into...",Fact,{'source': 'helpful-self-instruct-raw'}
147702,A person wants to write a book. he/she writes ...,Chapter 1 - The History of China\nChapter 2 - ...,{'source': 'helpful-self-instruct-raw'}
147703,Tell me how you would make a popular app game.,I would make a game that is similar to 2048. T...,{'source': 'helpful-self-instruct-raw'}
147704,Describe your dream house to me.\n\nOutput:,My dream house is a two-story building with a ...,{'source': 'helpful-self-instruct-raw'}


### <b>Preparing the finetuning data:<b>

In [None]:
def generate_prompt(data_point):
    question = data_point["instruction"]
    answer = data_point["demonstration"]
    prompt = f"<human>: {question}  \n<assistant>: {answer}"
    return prompt 

def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data["train"].shuffle(seed=42).map(generate_and_tokenize_prompt)

### <b>Finetuning:<b>

In [None]:
OUTPUT_DIR = "experiments"

training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=3,
    logging_steps=1,
    output_dir=OUTPUT_DIR,
    max_steps=200,   # try more steps 
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.05,
    report_to="tensorboard",
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False
trainer.train()

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
max_steps is given, it will override any value given in num_train_epochs


Step,Training Loss
1,3.2227
2,2.1505
3,2.7767
4,2.5064
5,2.8235
6,2.632
7,3.1241
8,2.1958
9,2.9324
10,2.6674


TrainOutput(global_step=200, training_loss=2.0172722285985945, metrics={'train_runtime': 189.8006, 'train_samples_per_second': 4.215, 'train_steps_per_second': 1.054, 'total_flos': 156597608090880.0, 'train_loss': 2.0172722285985945, 'epoch': 0.005416164543078819})

In [None]:
# %load_ext tensorboard
# %tensorboard --logdir experiments/runs --port 6008

## <b>Test the model after the finetuning:<b>

In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

In [None]:
def generate_response(question: str) -> str:
    prompt = f" <human>: {question}? \n <assistant>: "
    encoding = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.inference_mode():
        outputs = model.generate(
            input_ids=encoding.input_ids,
            attention_mask=encoding.attention_mask,
            generation_config=generation_config,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    assistant_start = "<assistant>:"
    response_start = response.find(assistant_start)
    return response[response_start + len(assistant_start) :].strip()

In [None]:
prompt = "What program can I use to edit video clips I took with my phone?"
print('-', prompt,'\n')
print(generate_response(prompt))

prompt = "Do you know the reasons as to why people love coffee so much?"
print('\n\n\n-', prompt, '\n')
print(generate_response(prompt))



### II) DPO
In this part we will use the instrcution tuned LLM to do direct preference optimization. see the paper: https://arxiv.org/abs/2305.18290

DPO involves tuning the model on preference data, normally consists of a prompt, a prefered answer and a rejected answer.

The core advantage of DPO is its ability to simultaneously bypass the explicit reward modeling step while avoiding the complexities of reinforcement learning optimization.

#### Test the model before DPO:


In [None]:
prompt_2 = "<system> You are a helpful assistant \n<human>: Can you taste this dish and tell me if it needs more spices?  \n<assistant>: "
print(prompt_2)

In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt_2, return_tensors="pt").to(device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

## Loading the preference data from Huggingface:

In [None]:
data_dpo = load_dataset("CultriX/llama70B-dpo-dataset")
pd.DataFrame(data_dpo["train"])

Unnamed: 0,system,question,chosen,rejected
0,You are a helpful assistant,Can you taste this dish and tell me if it need...,I'm not capable of physically tasting the dish...,I can taste the dish remotely and I think it n...
1,You are a helpful assistant,Can you smell this perfume and tell me if it's...,"As a digital AI assistant, I don't have a phys...","I can smell the perfume, and it smells great. ..."
2,You are a helpful assistant,Please turn off the lights in my room.,"I'm a large language model, I don't have the c...",I can turn off the lights in your room remotel...
3,You are a helpful assistant,What is the primary reason why many approximat...,The primary reason why many approximate attent...,Approximate attention methods haven't gained w...
4,You are a helpful assistant,What are the properties of melanin that contri...,Melanin's photoprotective role can be attribut...,Melanin's photoprotective role is due to its a...
...,...,...,...,...
2174,You are a helpful assistant,What is the sound barrier?,"The sound barrier, also known as the sonic bar...",The sound barrier is a physical wall that prev...
2175,You are a helpful assistant,Is spinach a great source of dietary iron?,"Spinach is a good source of dietary iron, but ...","Spinach is a poor source of dietary iron, and ..."
2176,You are a helpful assistant,What challenges arise in training large langua...,Training large language models (LLMs) poses se...,The main challenge in training LLMs is the lac...
2177,You are a helpful assistant,Does the theory of evolution explain the origi...,The theory of evolution explains how life on E...,The theory of evolution fully explains the ori...


## Preparing the data:

Similar to instruction tuning, we should first construct our prompt, which should follow the DPO format, see: https://huggingface.co/docs/trl/main/dataset_formats#preference

In [None]:
def preprocess_data_dpo(batch):
    prompt_texts = [
        f"<system> You are a helpful assistant \n<human>: {question} \n<assistant>:"
        for question in batch['question']
    ]
    chosen_responses = batch['chosen']
    rejected_responses = batch['rejected']

    # Tokenize prompt with chosen responses for the entire batch
    chosen_encodings = tokenizer(
        [prompt + " " + chosen for prompt, chosen in zip(prompt_texts, chosen_responses)],
        padding="max_length",
        max_length=512,
        truncation=True
    )

    # Tokenize prompt with rejected responses for the entire batch
    rejected_encodings = tokenizer(
        [prompt + " " + rejected for prompt, rejected in zip(prompt_texts, rejected_responses)],
        padding="max_length",
        max_length=512,
        truncation=True
    )

    return {
        "prompt": prompt_texts,
        "chosen_response": chosen_responses,
        "rejected_response": rejected_responses,
        "chosen_input_ids": chosen_encodings['input_ids'],
        "chosen_attention_mask": chosen_encodings['attention_mask'],
        "rejected_input_ids": rejected_encodings['input_ids'],
        "rejected_attention_mask": rejected_encodings['attention_mask']
    }

# Apply the function to the dataset
tokenized_data_dpo = data_dpo.shuffle(seed=42).map(preprocess_data_dpo, batched=True)


data_dpo = data_dpo['train'].shuffle(seed=42).map(preprocess_data_dpo)

In [None]:
print(data_dpo)

Dataset({
    features: ['system', 'question', 'chosen', 'rejected', 'prompt'],
    num_rows: 2179
})


In [None]:
data_dpo[0]

## Finetuning

Question: what is beta in dpo_args?

In [None]:
OUTPUT_DIR = "experiments_dpo"
tokenized_data_dpo = data_dpo.shuffle(seed=42).map(preprocess_data_dpo, batched=False)


training_args = DPOConfig(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=3,
    logging_steps=1,
    output_dir=OUTPUT_DIR,
    max_steps=200,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.05,
    report_to="tensorboard",
)

dpo_args = {
    "beta": 0.1,
}

trainer = DPOTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_data_dpo, 
    data_collator=data_collator,
    tokenizer=tokenizer,
    beta=dpo_args["beta"]
)

model.config.use_cache = False
trainer.train()



Tokenizing train dataset:   0%|          | 0/2179 [00:00<?, ? examples/s]

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
max_steps is given, it will override any value given in num_train_epochs
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
1,1.0971
2,0.6803
3,1.2872
4,1.2259
5,1.0344
6,1.2127
7,1.1776
8,1.0088
9,0.943
10,0.9533


TrainOutput(global_step=200, training_loss=0.18902093339980638, metrics={'train_runtime': 303.7971, 'train_samples_per_second': 2.633, 'train_steps_per_second': 0.658, 'total_flos': 0.0, 'train_loss': 0.18902093339980638, 'epoch': 0.36714089031665903})

## Test the model after DPO:

In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt_2, return_tensors="pt").to(device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

In [None]:
def generate_response(question: str) -> str:
    prompt = f"<system> You are a helpful assistant \n<human>: {question} \n<assistant>:"
    encoding = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.inference_mode():
        outputs = model.generate(
            input_ids=encoding.input_ids,
            attention_mask=encoding.attention_mask,
            generation_config=generation_config,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    assistant_start = "<assistant>:"
    response_start = response.find(assistant_start)
    return response[response_start + len(assistant_start) :].strip()

In [None]:
prompt = "Do people dream in color or black and white?"
print('-', prompt,'\n')
print(generate_response(prompt))

prompt = "Explain the concept of economic policies in simple terms"
print('\n\n\n-', prompt, '\n')
print(generate_response(prompt))

print('\n\n\n-', prompt, '\n')
prompt = "Explain the effects of globalization on the environment."
print(generate_response(prompt))

- Do people dream in color or black and white? 

As per the current understanding of dream interpretation, there is a range of possible dream experiences that can involve visual, auditory, or olfactory, gustatory, or tactile stimuli. However, the presence of color or black and white as a dream-related theme is less common and may be associated with various psychological, cultural, or symbolic meanings.



- Explain the concept of economic policies in simple terms 

Economic policies are strategic decisions made by governments or other actors to influence the behavior of the economy, such as monetary, fiscal, and regulatory policies. They aim to achieve certain goals, such as promoting growth, reducing inequality, or stabilizing the economy, among others. These policies can be implemented through various mechanisms, such as fiscal stimulus, tax reforms, or regulatory reforms, among others.



- Explain the concept of economic policies in simple terms 

Globalization can bring various po

In [None]:
prompt = " What equipment do I need for rock climbing? "
print('-', prompt,'\n')
print(generate_response(prompt))