# Fine-Tuning a large Language Model

fine tuning helps make the model more accurate , relevant and efficient for a specific use case.

type of fine-tuning:-

1. Full Fine-Tuning (Heavy Fine-Tuning)
The entire model's weights are updated.
Requires a lot of compute (GPUs/TPUs).
Example: Training GPT on a medical dataset for AI-driven diagnosi

In [None]:
from transformers import Trainer, TrainingArguments, AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "meta-llama/Llama-2-7b-hf"  # Example
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load dataset
from datasets import load_dataset
dataset = load_dataset("path_to_your_dataset")  # Replace with dataset path

# Training arguments
training_args = TrainingArguments(
    output_dir="./fine_tuned_model",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    logging_steps=50,
    save_steps=500,
    evaluation_strategy="steps",
    learning_rate=2e-5,
    weight_decay=0.01,
    push_to_hub=False
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"]
)

trainer.train()


2. LoRA (Low-Rank Adaptation) & QLoRA
Updates only small adapter layers while keeping the main model frozen.
Much cheaper & faster than full fine-tuning.
Example: Fine-tuning LLaMA on a customer support chatbot.

In [None]:
from peft import get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Apply LoRA
lora_config = LoraConfig(
    r=8,  # Rank
    lora_alpha=16,
    lora_dropout=0.1,
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, lora_config)

# Train LoRA model using Trainer (same as above)
trainer.train()


Option 3: QLoRA (Even More Efficient)
QLoRA quantizes the model before training, saving 75% GPU memory.
Best for fine-tuning LLaMA 7B, 13B, or Falcon on single GPUs.

In [None]:
from peft import prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-13b-hf",
    load_in_8bit=True  # Uses 8-bit precision
)

model = prepare_model_for_kbit_training(model)
