<a href="https://colab.research.google.com/github/Nakshatra1729yuvi/Finetuning/blob/main/Fine_Tuning_LLama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -U accelerate bitsandbytes peft transformers trl

Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Collecting trl
  Downloading trl-0.23.0-py3-none-any.whl.metadata (11 kB)
Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl (61.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 MB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading trl-0.23.0-py3-none-any.whl (564 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m564.7/564.7 kB[0m [31m50.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes, trl
Successfully installed bitsandbytes-0.47.0 trl-0.23.0


In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging
)
from peft import LoraConfig,PeftModel
from trl import SFTTrainer

In [None]:
# Model Answer required in below form
# <s> [INST] <<SYS>>
# System prompt
# <</SYS>>

# User prompt [/INST] Model Answer </s>

# https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

# https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing

In [None]:
model_name='NousResearch/Hermes-3-Llama-3.2-3B'
dataset_name='mlabonne/guanaco-llama2-1k'
new_model='Llama-3-1b-chat-finetune'


# QLORA Parameters

lora_rank=8

lora_alpha=16

lora_dropout=0.1

#bitsandbytes parameters

use_4bit=True
bnb_4bit_compute_dtype='float16'
bnb_4bit_quant_type = 'nf4' ##Quant type(nf4 or gp4)
use_nested_quant=False


#TrainingArguments parameters
output_dir="./results"

num_train_epochs=1

fp16=False
bf16=False    # set bf16 to True for A100 GPU

per_device_train_batch_size=1

per_device_eval_batch_size=1

gradient_accumulation_steps=1

gradient_checkpointing=True

max_grad_norm=0.3

learning_rate=2e-4

weight_decay=0.001

optim="paged_adamw_32bit"

lr_scheduler_type="cosine"

max_steps=-1

warmup_ratio=0.03

group_by_length=True

save_steps=0

logging_steps=25

#SFT parameters

max_seq_length=None

packing=False

device_map={"":0}



1. First of all, we want to load the dataset we defined. Here, our dataset is already preprocessed but, usually, this is where you would reformat the prompt, filter out bad text, combine multiple datasets, etc.


2. Then, we’re configuring bitsandbytes for 4-bit quantization.


3. Next, we're loading the Llama 2 model in 4-bit precision on a GPU with the corresponding tokenizer.


4. Finally, we're loading configurations for QLoRA, regular training parameters, and passing everything to the SFTTrainer. The training can finally start!

In [None]:
dataset=load_dataset(dataset_name,split="train")

compute_dtype=getattr(torch,bnb_4bit_compute_dtype)

bnb_config=BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant
)

# if compute_dtype==torch.float16 and use_4bit:
#   major,_=torch.cuda.get_device_capability()
#   if major>=8:
#     print("-"*10)
#     print("Your GPU supprots bffloat16:accelerate training with bf16=True")
#     print("-"*10)


model=AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)

model.config.use_cache=False
model.config.pretraining_tp=1

tokenizer=AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
tokenizer.pad_token=tokenizer.eos_token
tokenizer.padding_side="right"

peft_config=LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_rank,
    bias="none",
    task_type="CAUSAL_LM"
)

training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)
def formatting_prompts_func(example):
    return example["text"]

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    args=training_arguments,
    formatting_func=formatting_prompts_func
    )


trainer.train()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Applying formatting function to train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Adding EOS to train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 128039, 'pad_token_id': 128001}.


Step,Training Loss
25,1.5574
50,1.9477
75,1.2669
100,1.9131
125,1.4037
150,1.7355
175,1.2271
200,1.7221
225,1.2527
250,1.5866


TrainOutput(global_step=1000, training_loss=1.5089237384796144, metrics={'train_runtime': 801.2465, 'train_samples_per_second': 1.248, 'train_steps_per_second': 1.248, 'total_flos': 6217061269518336.0, 'train_loss': 1.5089237384796144, 'epoch': 1.0})

In [None]:
torch.cuda.empty_cache()

In [None]:
trainer.model.save_pretrained(new_model)

In [None]:
# Empty VRAM

import gc
gc.collect()
gc.collect()

0

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/955 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/214 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

In [None]:
logging.set_verbosity(logging.CRITICAL)
prompt="What is meta"
pipe=pipeline(task="text-generation",model=model,tokenizer=tokenizer,max_length=100)
result=pipe(f'<s>[INST]{prompt}[/INST]')
print(result[0]['generated_text'])

<s>[INST]What is meta[/INST] Meta is a term that is used to describe the concept of a concept, or an idea that refers to itself. In other words, it refers to something that is self-referential or circular in nature.

In the context of artificial intelligence and machine learning, meta refers to a model or system that can learn to learn. This means that the model is able to adapt and improve its own performance over time without being explicitly programmed or trained in a specific task.

Meta models are particularly useful in tasks such as image recognition, where the model needs to learn to recognize objects, people, or scenes from a large and diverse dataset. By using meta learning, the model can learn to learn from the examples provided and adapt to new data without needing to be retrained on a large dataset.

Meta learning is an active area of research in AI and machine learning, and there are many different approaches and techniques that have been developed to improve the performan