# ‚ö° QLoRA Fine-Tuning Guide

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Gaurav14cs17/LLMs_Model/blob/main/Fine-Tuning-LLMs-Guide/notebooks/03_qlora_fine_tuning.ipynb)

**Fine-tune LLMs with 4-bit quantization - works on consumer GPUs!**

### üöÄ Why QLoRA?
- Fine-tune a **7B model with only 6GB VRAM**
- Uses **NF4 quantization** (optimized for neural networks)
- Combines 4-bit loading + LoRA adapters
- **Perfect for free Colab T4 GPUs!**

**‚ö†Ô∏è Requirements**: GPU with 6GB+ VRAM (T4 on free Colab works!)


In [None]:
# Install and import
!pip install -q transformers datasets accelerate peft bitsandbytes trl

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")


In [None]:
# 4-bit Quantization Config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",           # NormalFloat4
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,      # Double quantization
)

# Load model in 4-bit (~6GB VRAM for 7B model!)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto",
)
model = prepare_model_for_kbit_training(model)
print(f"Memory used: {torch.cuda.memory_allocated() / 1e9:.2f} GB")


In [None]:
# Apply LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer.pad_token = tokenizer.eos_token


In [None]:
# Load and prepare dataset
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig

dataset = load_dataset("tatsu-lab/alpaca", split="train[:1000]")

def format_alpaca(sample):
    if sample.get("input", ""):
        return {"text": f"<s>[INST] {sample['instruction']}\n{sample['input']} [/INST] {sample['output']}</s>"}
    return {"text": f"<s>[INST] {sample['instruction']} [/INST] {sample['output']}</s>"}

dataset = dataset.map(format_alpaca)
print(f"Dataset ready: {len(dataset)} samples")


In [None]:
# QLoRA Training
training_args = SFTConfig(
    output_dir="./qlora-output",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-4,
    warmup_ratio=0.03,
    logging_steps=10,
    save_steps=50,
    fp16=True,
    max_seq_length=512,
    gradient_checkpointing=True,  # Save even more memory!
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
)

print("üöÄ Starting QLoRA training...")
print(f"Memory before: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
trainer.train()
print("‚úÖ QLoRA training complete!")


In [None]:
# Save and test
model.save_pretrained("./qlora-output")
tokenizer.save_pretrained("./qlora-output")

# Test inference
def generate(prompt, max_tokens=100):
    inputs = tokenizer(f"<s>[INST] {prompt} [/INST]", return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=max_tokens, temperature=0.7, do_sample=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print("ü§ñ Testing QLoRA model:")
print(generate("Explain quantum computing in simple terms."))


## üí° QLoRA Memory Comparison

| Model Size | Full FT | LoRA | QLoRA |
|------------|---------|------|-------|
| 7B params  | ~56 GB  | ~14 GB | **~6 GB** |
| 13B params | ~104 GB | ~26 GB | **~10 GB** |
| 70B params | ~560 GB | ~140 GB | **~35 GB** |

## üìö Next Steps
- Try [DPO Training](./04_dpo_training.ipynb) for preference alignment
- Check out the [Deployment Guide](../08-Deployment/) for production

üìñ Reference: [QLoRA Paper](https://arxiv.org/abs/2305.14314)
