# Day 26: LoRA Implementation - Part 2

In this notebook, we'll explore more advanced LoRA techniques, including:

1. Adapter merging
2. Multi-task adaptation
3. Different target modules and freezing strategies
4. LoRA for generative models

We'll build on the concepts from Part 1 and see how LoRA can be applied in more complex scenarios.

## 1. Setup and Dependencies

In [None]:
!pip install -q transformers datasets peft evaluate accelerate bitsandbytes

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import (
    get_peft_model,
    LoraConfig,
    TaskType,
    PeftModel,
    PeftConfig
)
import numpy as np

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## 2. LoRA for Generative Models

Let's apply LoRA to a generative model like GPT-2 for a text generation task.

In [None]:
# Load a pre-trained generative model
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # GPT-2 doesn't have a pad token by default

model = AutoModelForCausalLM.from_pretrained(model_name)

# Define a function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Model has {count_parameters(model):,} trainable parameters")

### Configure LoRA for the Generative Model

For generative models, we'll target both attention and MLP layers for more comprehensive adaptation.

In [None]:
# Define LoRA configuration for generative model
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,  # Causal language modeling
    r=16,                          # Higher rank for more capacity
    lora_alpha=32,                 # Alpha scaling factor
    lora_dropout=0.05,             # Dropout probability
    # Target both attention and MLP layers
    target_modules=["c_attn", "c_proj", "c_fc"],
    bias="none",
)

# Create the PEFT model
peft_model = get_peft_model(model, lora_config)
peft_model = peft_model.to(device)

# Print trainable parameters
print(f"Full model parameters: {count_parameters(model):,}")
print(f"LoRA model trainable parameters: {count_parameters(peft_model):,}")
print(f"Parameter efficiency: {count_parameters(peft_model) / count_parameters(model) * 100:.2f}%")

### Prepare a Dataset for Fine-tuning

We'll use a small dataset of poetry to fine-tune our generative model.

In [None]:
# Load a poetry dataset
dataset = load_dataset("merve/poetry")
print(dataset)

# Look at a few examples
for i in range(3):
    print(f"Example {i+1}:")
    print(f"Title: {dataset['train'][i]['title']}")
    print(f"Content:\n{dataset['train'][i]['content'][:200]}...")
    print()

In [None]:
# Tokenize the dataset
def tokenize_function(examples):
    # Combine title and content
    texts = [f"Title: {title}\n\n{content}" for title, content in zip(examples["title"], examples["content"])]
    return tokenizer(texts, truncation=True, max_length=512)

# Process the dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=["title", "content", "author"])

# Create smaller datasets for demonstration
train_dataset = tokenized_datasets["train"].select(range(1000))  # Use 1000 examples for training
eval_dataset = tokenized_datasets["train"].select(range(1000, 1100))  # Use 100 examples for evaluation

# Create a data collator for language modeling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

### Train the Generative Model with LoRA

In [None]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="./results/gpt2-poetry-lora",
    learning_rate=5e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
    report_to="none",
)

# Create the trainer
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

In [None]:
# Train the model
trainer.train()

### Generate Text with the LoRA-adapted Model

In [None]:
# Save the LoRA adapter
peft_model_path = "./lora-gpt2-poetry"
peft_model.save_pretrained(peft_model_path)

# Function for text generation
def generate_text(model, tokenizer, prompt, max_length=200):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    
    # Generate text
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs.input_ids,
            attention_mask=inputs.attention_mask,
            max_length=max_length,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            no_repeat_ngram_size=2
        )
    
    # Decode the generated text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Test with the fine-tuned model
prompts = [
    "Title: Sunset\n\n",
    "Title: The Ocean's Whisper\n\n"
]

for prompt in prompts:
    generated_text = generate_text(peft_model, tokenizer, prompt)
    print(f"Prompt: {prompt}")
    print(f"Generated:\n{generated_text}")
    print("-" * 50)

## 3. Adapter Merging

One of the powerful features of LoRA is the ability to merge adapters with the base model or with other adapters. Let's explore this capability.

In [None]:
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(model_name)

# Load the LoRA adapter
peft_model_loaded = PeftModel.from_pretrained(base_model, peft_model_path)

# Merge the adapter with the base model
merged_model = peft_model_loaded.merge_and_unload()

# Check that the merged model has the same number of parameters as the base model
print(f"Base model parameters: {sum(p.numel() for p in base_model.parameters()):,}")
print(f"Merged model parameters: {sum(p.numel() for p in merged_model.parameters()):,}")

# Test the merged model
merged_model = merged_model.to(device)
for prompt in prompts:
    generated_text = generate_text(merged_model, tokenizer, prompt)
    print(f"Prompt: {prompt}")
    print(f"Generated (Merged Model):\n{generated_text}")
    print("-" * 50)

## 4. Multi-Task Adaptation with LoRA

Let's create a second adapter for a different task and see how we can switch between adapters.

In [None]:
# For demonstration, we'll create a simplified second task (technical writing)
# In a real scenario, you would train this on a different dataset

# Create a mock technical writing adapter
technical_lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["c_attn", "c_proj"],
    bias="none",
)

# For demonstration purposes, we'll just save a copy of our poetry adapter with a different name
# In practice, you would train this on technical writing data
technical_adapter_path = "./lora-gpt2-technical"
peft_model.save_pretrained(technical_adapter_path)

In [None]:
# Load the base model again
base_model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# Function to load an adapter and generate text
def generate_with_adapter(adapter_path, prompt):
    # Load the adapter
    adapted_model = PeftModel.from_pretrained(base_model, adapter_path)
    
    # Generate text
    return generate_text(adapted_model, tokenizer, prompt)

# Test with different adapters
prompt = "Title: The Future of AI\n\n"

print("Using Poetry Adapter:")
poetry_output = generate_with_adapter(peft_model_path, prompt)
print(poetry_output)
print("-" * 50)

print("Using Technical Adapter:")
technical_output = generate_with_adapter(technical_adapter_path, prompt)
print(technical_output)
print("-" * 50)

## 5. Different Target Modules and Freezing Strategies

Let's explore how targeting different layers affects the adaptation.

In [None]:
# Define different LoRA configurations
lora_configs = {
    "attention_only": LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=["c_attn"],  # Only query, key, value projections
        bias="none",
    ),
    "attention_output": LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=["c_proj"],  # Only attention output projection
        bias="none",
    ),
    "mlp_only": LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=["c_fc"],  # Only MLP layers
        bias="none",
    ),
    "comprehensive": LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=16,
        lora_alpha=32,
        lora_dropout=0.05,
        target_modules=["c_attn", "c_proj", "c_fc"],  # All layers
        bias="none",
    ),
}

# Compare the number of trainable parameters for each configuration
base_model = AutoModelForCausalLM.from_pretrained(model_name)

for name, config in lora_configs.items():
    model_with_lora = get_peft_model(base_model, config)
    trainable_params = count_parameters(model_with_lora)
    print(f"{name}: {trainable_params:,} trainable parameters ({trainable_params / count_parameters(base_model) * 100:.2f}%)")

## Conclusion

In this notebook, we've explored advanced LoRA techniques:

1. **LoRA for Generative Models**: We applied LoRA to GPT-2 for poetry generation, showing how it works for generative tasks.

2. **Adapter Merging**: We demonstrated how to merge LoRA adapters with the base model for deployment.

3. **Multi-Task Adaptation**: We showed how multiple adapters can be used with a single base model for different tasks.

4. **Different Target Modules**: We explored how targeting different layers affects the parameter count and adaptation capabilities.

These techniques make LoRA a versatile and powerful tool for efficient fine-tuning of large language models. By strategically applying LoRA to specific layers and tasks, we can achieve excellent performance with minimal computational resources.