### Fine tuning the llama-2 7b chat model with LORA technique👇

Hello 👋

In this Notebook I will walk you through the steps required to fine-tune a Llama-2 7b model on your own Dataset and will learn about all different tunings that can be done !!  

Lets Start by importing few libraries :

1) accelerate 
2) peft 
3) transformers 
4) bitsandbytes 
5) trl 
6) dataset
7) torch

Add -q for quiet mode 


In [15]:
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from peft import LoraConfig, PeftModel , prepare_model_for_kbit_training , get_peft_model
from trl import SFTTrainer


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#### Developing a dataset and Exploring Chat Template / Prompt Template

Every model has a chat template common one is the ChatML but in the model description you can find it ( Basically its the way the the dataset was transformed to train the model on the original dataset)

Like here we are using the Meta's Llama 2 7b model (meta-llama/Llama-2-7b-chat-hf) this is a private model and needs a auth from the Meta 

So in this tutorial I am using another repo i.e. NousResearch/Llama-2-7b-chat-hf 

For chat template usually in the documentation of the hugging face repo you can find what template that model is using its important to do this before formatting your own dataset , here in our case the llama-2 7b model uses the following chat template :

![Llama-2-template](Images/llama-2-m1.png)

For more information look this link :https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L44


No worries if you didnt went throught the above links !! I have done that for you 😉

\<s\>[INST] \<\<SYS\>\> System Prompt \<\<\/SYS\>\> User Prompt \[\/INST\] Answer \</s\>


You need your data to be in .jsonl format with input and output described as follows :

![custom-dataset](Images/custom_dataset.png)

(this is the dummy dataset , as I didnt want to expose the real dataset)

In [16]:
def generate_dataset_prompt(example):
    eoi_token = "<</SYS>>"
    input_prompt = example.get('Input', '') + '[/INST]'  # Default to an empty string if 'Input' is missing
    output_prompt = example.get('Output', '')
    eos_token = "</s>"
    formatted_string = "some formatted prompt based on data"
    return f"[INST] {input_prompt} {eoi_token} {output_prompt} {eos_token}{formatted_string}"


In [17]:
# load the dataset defined above ( this takes high RAM )
train_ds = load_dataset('json', data_files=r'D:\dummy\finetuning\data\finetuning_dataset.json', split='train')
# Since the dataset only has a 'train' split, we can use a fraction of it for testing.
# Here, we use 80% for training and 20% for testing.
test_ds = train_ds.train_test_split(test_size=0.2)['test']

#### Loading the base model from 🤗


BitsAndBytesConfig : https://huggingface.co/docs/transformers/main_classes/quantization#transformers.BitsAndBytesConfig



In [18]:
model_id = 'NousResearch/Llama-2-7b-chat-hf'
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16, #if your gpu supports it 
    bnb_4bit_quant_type = "nf4",
   #this quantises the quantised weights
)

base_model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")



ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

In [6]:
# Training_tokenizer (https://huggingface.co/docs/transformers/v4.37.2/en/model_doc/auto#transformers.AutoTokenizer.from_pretrained)

# https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer


tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    truncation_side = "right",
    padding_side="right",
    add_eos_token=True,
    add_bos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token

Find appropriate length for your dataset for me 350-400 works for me !!

This is your dataset length that you want to send into the model ... aka context window 

In [7]:
max_seq_length = 400

#### Setting up lora for the quantisation 

Find the target modules you need to quantise in order to make the Lora finetuning work !
The target modules can be found out by printing the model arch and knowing which layers you want to quantise  

In [8]:
base_model.gradient_checkpointing_enable() #this to checkpoint grads 
model = prepare_model_for_kbit_training(base_model) #quantising the model (due to compute limits)

In [9]:
# https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L271
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNor

In [10]:
def printParameters(model):
    trainable_param = 0
    total_params = 0
    for name , param in model.named_parameters():
        total_params += param.numel()
        if param.requires_grad:
            trainable_param += param.numel()
            
            
    print(f"Total params : {total_params} , trainable_params : {trainable_param} , trainable % : {100 * trainable_param / total_params} ")

Find the target modules you want to apply Lora technique !! \
\
Here in our case we will be applying them on : \
\
![lora-paper](./Images/lora-1.png)
\
\
q_proj , k_proj , v_proj , o_proj , gate_proj , up_proj , down_proj , lm_head

In [11]:
#LoraConfig : https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/config.py#L44

# Its better to pass the values to the target_modules either "all-linear" or specific modules you need to quantise !!

# You can change these parameters depending on your use case
peft_config = LoraConfig(
    r=64,
    lora_alpha=16,
    lora_dropout=0.1, 
    bias="none",
    target_modules=[ 
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    "down_proj",
    "lm_head",
    ],
    task_type="CAUSAL_LM"
)

model = get_peft_model(model , peft_config)
printParameters(model)

Total params : 3662630912 , trainable_params : 162217984 , trainable % : 4.429001662944519 


In [12]:
if torch.cuda.device_count() > 1:
    model.is_parallelizable = True
    model.model_parallel = True

In [13]:
# https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/training_args.py#L161

# max_steps and num_train_epochs : 
# 1 epoch = [ training_examples / (no_of_gpu * batch_size_per_device) ] steps


args = TrainingArguments(
  output_dir = "LLama-2 7b",
  # num_train_epochs=1000,
  max_steps = 1000, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 4,
  warmup_steps = 0,
  gradient_accumulation_steps = 1,
  logging_steps=10,
  logging_strategy= "steps",
  save_strategy="steps",
  save_steps = 10,
  evaluation_strategy="steps",
  eval_steps=10, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2.5e-5,
  bf16=True, #if your gpus supports this 
  logging_nan_inf_filter = False, #this helps to see if your loss values is coming out to be nan or inf and if that is the case then you may have ran into some problem 
  # lr_scheduler_type='constant',
  save_safetensors = True,
)    

trainer = SFTTrainer(
    model=model,
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    packing=False,  # Disable packing to test dataset loading
    formatting_func=generate_dataset_prompt,
    args=args,
    train_dataset=train_ds,
    eval_dataset=test_ds
)
   

model.config.use_cache = False
trainer.train()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/127 [00:00<?, ? examples/s]

ValueError: The `formatting_func` should return a list of processed strings since it can lead to silent bugs.

#### Getting output from the trained model

In [None]:
#load the trained model and generate some outputs from it 

ft_model = PeftModel.from_pretrained(base_model , 'Checkpoint/base-checkpoint-10') #replace with the actual checkpoint name

In [None]:
eval_prompt = "<s>[INST] <<SYS>> You are a coding model and your goal is to correctly tell sturtural or buliding  inspection details to the user based on the prompt they have entered and you get rewarded for correct output <</SYS>> Tell me the waethering defect in concrete [/INST]"
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():
    print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=150, repetition_penalty=1.15)[0], skip_special_tokens=True))


### Wooh !! The model just works fine and generates some cool outputs 😃

I hope you enjoyed this tutorial on Llama-2 7b model and was able to create a custom LLM just for your use case !! If you have any doubts just create a issue in the repo or create a pull request for the same 

Also smash that star button to get more amazing tutorials from me !! 🐱🐱🐱🐱