## Complete Guide to LoRA Fine-Tuning

### What is LoRA?
* LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that allows you to adapt large pre-trained models to specific tasks without modifying all the model parameters. Instead of updating the entire model, LoRA adds small, trainable matrices to the existing model layers.

### Key Concepts:

* Parameter Efficiency: Only trains a small fraction of parameters (typically 0.1-1% of the original model)
Low-Rank Decomposition: Uses matrix factorization to reduce the number of trainable parameters
Adapter Layers: Adds lightweight modules that can be easily swapped or combined

### Why Use LoRA?

#### Advantages:

* Memory Efficient: Requires significantly less GPU memory during training
* Fast Training: Reduces training time by orders of magnitude
* Storage Efficient: LoRA adapters are tiny files (few MBs vs GBs for full models)
* Modular: Can create multiple adapters for different tasks
* Reversible: Original model remains unchanged
* Cost-Effective: Enables fine-tuning on consumer hardware

### Use Cases:

* Domain Adaptation: Adapting general models to specific domains (legal, medical, etc.)
* Task-Specific Tuning: Creating specialized versions for different tasks
* Personalization: Customizing models for individual users or organizations
* Multi-Task Learning: Training separate adapters for different tasks

### Setting Up the Environment
#### Required Libraries:

In [None]:
!pip install transformers peft datasets accelerate bitsandbytes torch -q

### Step-by-Step Implementation
#### Step 1: Import Libraries and Setup

In [2]:
print("📚 Importing libraries...")

import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, TaskType
from datasets import Dataset
import json

print("✅ Libraries imported successfully!")
print(f"🔧 Using PyTorch version: {torch.__version__}")
print(f"🔧 CUDA available: {torch.cuda.is_available()}")

📚 Importing libraries...
✅ Libraries imported successfully!
🔧 Using PyTorch version: 2.6.0+cu124
🔧 CUDA available: True


### Step 2: Load Base Model and Tokenizer

In [3]:
print("🤖 Loading base model and tokenizer...")

# Choose a small model for demonstration (good for Colab)
model_name = "microsoft/DialoGPT-small"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add padding token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None
)

print(f"✅ Model loaded: {model_name}")
print(f"📊 Model parameters: {model.num_parameters():,}")

🤖 Loading base model and tokenizer...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/351M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

✅ Model loaded: microsoft/DialoGPT-small
📊 Model parameters: 124,439,808


### Step 3: Prepare Training Data

In [4]:
print("📝 Creating sample training dataset...")

# Sample conversational data for fine-tuning
sample_data = [
    "Hello! How can I help you today?",
    "I'm doing great, thank you for asking!",
    "The weather is beautiful today.",
    "I love learning about machine learning.",
    "Python is a fantastic programming language.",
    "Fine-tuning models is very interesting.",
    "LoRA makes training more efficient.",
    "Have a wonderful day ahead!"
]

# Create dataset
def create_dataset(texts, tokenizer, max_length=128):
    """Create a dataset for training"""
    tokenized_texts = []

    for text in texts:
        # Tokenize the text
        tokens = tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=max_length,
            return_tensors="pt"
        )
        tokenized_texts.append({
            "input_ids": tokens["input_ids"].squeeze(),
            "attention_mask": tokens["attention_mask"].squeeze(),
            "labels": tokens["input_ids"].squeeze()  # For causal LM, labels = input_ids
        })

    return Dataset.from_list(tokenized_texts)

# Create training dataset
train_dataset = create_dataset(sample_data, tokenizer)
print(f"✅ Dataset created with {len(train_dataset)} samples")

📝 Creating sample training dataset...
✅ Dataset created with 8 samples


### Step 4: Configure LoRA

In [5]:
print("⚙️ Configuring LoRA settings...")

# LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,  # Task type
    r=8,                           # Rank of adaptation
    lora_alpha=32,                 # LoRA scaling parameter
    lora_dropout=0.1,              # LoRA dropout
    target_modules=["c_attn"],     # Target modules to apply LoRA
)

print("✅ LoRA configuration created:")
print(f"   - Rank (r): {lora_config.r}")
print(f"   - Alpha: {lora_config.lora_alpha}")
print(f"   - Dropout: {lora_config.lora_dropout}")
print(f"   - Target modules: {lora_config.target_modules}")

⚙️ Configuring LoRA settings...
✅ LoRA configuration created:
   - Rank (r): 8
   - Alpha: 32
   - Dropout: 0.1
   - Target modules: {'c_attn'}


### Step 5: Apply LoRA to Model

In [6]:
print("🔧 Applying LoRA to the model...")

# Get PEFT model
peft_model = get_peft_model(model, lora_config)

# Print trainable parameters
peft_model.print_trainable_parameters()

print("✅ LoRA applied successfully!")

🔧 Applying LoRA to the model...
trainable params: 294,912 || all params: 124,734,720 || trainable%: 0.2364
✅ LoRA applied successfully!




### Step 6: Setup Training

In [8]:
print("📋 Setting up training arguments...")

training_args = TrainingArguments(
    output_dir="./lora-output",           # Output directory
    num_train_epochs=2,                   # Number of training epochs
    per_device_train_batch_size=2,        # Batch size
    gradient_accumulation_steps=2,         # Gradient accumulation
    warmup_steps=10,                      # Warmup steps
    learning_rate=5e-4,                   # Learning rate
    logging_steps=5,                      # Logging frequency
    save_steps=50,                        # Save frequency
    eval_strategy="no",                   # No evaluation for this demo (updated parameter name)
    remove_unused_columns=False,          # Keep all columns
    report_to=[],                         # Don't report to wandb (use empty list instead of None)
)

print("✅ Training arguments configured!")

📋 Setting up training arguments...
✅ Training arguments configured!


### Step 7: Train and Save

In [9]:
print("📦 Creating data collator...")

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # We're doing causal language modeling, not masked LM
)

print("✅ Data collator created!")

📦 Creating data collator...
✅ Data collator created!


### Step 8: Load and Use

In [12]:
print("👨‍🏫 Initializing trainer...")

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=data_collator,
)

print("✅ Trainer initialized!")

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


👨‍🏫 Initializing trainer...
✅ Trainer initialized!


### Step 9: Starting training

In [13]:
print("🚀 Starting training...")
print("This might take a few minutes...")

# Train the model
trainer.train()

print("✅ Training completed!")

🚀 Starting training...
This might take a few minutes...


`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss


✅ Training completed!


### Step 10: Saving LoRA adapter

In [14]:
print("💾 Saving LoRA adapter...")

# Save only the LoRA adapter (very small file!)
peft_model.save_pretrained("./lora-adapter")

print("✅ LoRA adapter saved to './lora-adapter'")

💾 Saving LoRA adapter...
✅ LoRA adapter saved to './lora-adapter'


### Step 11: Testing the fine-tuned model

In [16]:
print("🧪 Testing the fine-tuned model...")

def generate_response(prompt, max_length=50):
    """Generate response using the fine-tuned model"""
    # Tokenize with attention mask and proper device handling
    inputs = tokenizer(
        prompt,
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=128
    )

    # Move inputs to the same device as the model
    device = next(peft_model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = peft_model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"],
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
            do_sample=True
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test with some prompts
test_prompts = [
    "Hello",
    "How are you",
    "Tell me about"
]

print("\n🔍 Testing model responses:")
print("=" * 50)

for prompt in test_prompts:
    response = generate_response(prompt)
    print(f"Prompt: {prompt}")
    print(f"Response: {response}")
    print("-" * 30)

🧪 Testing the fine-tuned model...

🔍 Testing model responses:
Prompt: Hello
Response: Hellothelessisine
------------------------------
Prompt: How are you
Response: How are you excited enough to stilled Trian's name
------------------------------
Prompt: Tell me about
Response: Tell me about 5 minutes confirmer confirmer
------------------------------


### Step 12:- Model Size Comparison

In [17]:
print("\n📊 Model Size Comparison:")
print("=" * 50)

# Original model parameters
original_params = model.num_parameters()

# LoRA adapter parameters
lora_params = sum(p.numel() for p in peft_model.parameters() if p.requires_grad)

print(f"Original model parameters: {original_params:,}")
print(f"LoRA trainable parameters: {lora_params:,}")
print(f"Percentage of trainable parameters: {(lora_params/original_params)*100:.2f}%")


📊 Model Size Comparison:
Original model parameters: 124,734,720
LoRA trainable parameters: 294,912
Percentage of trainable parameters: 0.24%


### Step 13:- Demonstrating how to load LoRA adapter

In [18]:
print("\n🔄 Demonstrating how to load LoRA adapter...")

from peft import PeftModel

# Load base model again (simulating fresh start)
base_model = AutoModelForCausalLM.from_pretrained(model_name)

# Load the LoRA adapter
loaded_peft_model = PeftModel.from_pretrained(base_model, "./lora-adapter")

print("✅ LoRA adapter loaded successfully!")
print("✅ Model is ready for inference!")


🔄 Demonstrating how to load LoRA adapter...
✅ LoRA adapter loaded successfully!
✅ Model is ready for inference!
