<a href="https://colab.research.google.com/github/Mayakshanesht/AV_opensource/blob/main/FineTuningLLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine Tuning of Large Language Models using Unsloth**

#### Parameter-Efficient Fine-Tuning (PEFT)

Traditional fine-tuning of Large Language Models (LLMs) involves updating all the model's parameters. This is computationally expensive and requires significant memory. PEFT techniques address this by only fine-tuning a small subset of the model's parameters while keeping the majority frozen. This significantly reduces computational costs and memory footprint, making it feasible to fine-tune LLMs on consumer-grade hardware or cloud platforms like Google Colab.

#### Low-Rank Adaptation (LoRA)

LoRA is a popular PEFT technique that adds low-rank matrices to the existing weights of the LLM. During training, only these low-rank matrices are updated, while the original weights remain frozen. This reduces the number of trainable parameters, speeding up training and reducing memory usage. The rank 'r' determines the size of these low-rank matrices; a higher rank captures more information but increases memory usage.


In [1]:
!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"


Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-tco_br1p/unsloth_466c2173b57546cd80c9db53b077c590
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-tco_br1p/unsloth_466c2173b57546cd80c9db53b077c590
  Resolved https://github.com/unslothai/unsloth.git to commit 6c234d5a66adb76b9b93fb0f2445648199d88e66
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting bitsandbytes>=0.43.3 (from unsloth@ git+https://github.com/unslothai/unsloth.git->unsloth[colab]@ git+https://github.com/unslothai/unsloth.git)
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting xformers@ https://download.pytorch.org/whl/cu121/xformer

In [None]:
import torch
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq, TextStreamer
from unsloth import is_bfloat16_supported


##Load the LLM and Tokenizer

In [None]:
model_name = "unsloth/Llama-3.2-3B-Instruct-bnb-4bit" # Or any other model from Unsloth's model zoo
max_seq_length = 2048
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = None, # Auto-detect data type
    load_in_4bit = load_in_4bit
)


## Prepare the PEFT Model (LoRA)

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)


## Initialize Chat Template

In [None]:
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3.1",  # Or other suitable template
)
FastLanguageModel.for_inference(model)


## Load Dataset and Format Prompts

In [None]:
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return {"text": texts}

dataset = load_dataset("mlabonne/FineTome-100k", split = "train")

from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)

dataset = dataset.map(formatting_prompts_func, batched=True)


## Configure and Run the SFT Trainer

In [None]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2, # Adjust based on your Colab's CPU cores
    packing = False,  # Can make training faster for short sequences
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,  # Reduced for demonstration; increase for better results
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(), # Use fp16 if bfloat16 is not supported
        bf16 = is_bfloat16_supported(), # Use bfloat16 if supported
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer.train()


## Inference After Finetuning

In [None]:
FastLanguageModel.for_inference(model)

messages = [
    {"role": "user", "content": "What is the Fibonacci number after 89?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 1.5, min_p = 0.1)
