# Custom LoRA Recipe Tutorial

This tutorial teaches you how to create custom training recipes in SparkTrainer. You'll learn:

1. Understanding the Recipe System
2. Creating a Custom LoRA Recipe
3. Configuring Advanced LoRA Parameters
4. Implementing Custom Training Logic
5. Registering and Using Your Recipe

## What is a Training Recipe?

A **training recipe** in SparkTrainer is a reusable configuration that defines:
- How to prepare data
- How to build the model
- How to train the model
- How to evaluate the model

This abstraction makes it easy to experiment with different training strategies.

In [None]:
# Install required packages
!pip install torch transformers peft bitsandbytes datasets accelerate

## 1. Recipe Interface

All recipes inherit from the `TrainerRecipe` base class:

In [None]:
import sys
sys.path.append('/app/src')  # Adjust path as needed

from dataclasses import dataclass, field
from typing import Dict, Any, Optional, Tuple
from abc import ABC, abstractmethod
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

@dataclass
class DataConfig:
    """Configuration for data loading"""
    dataset_path: str
    batch_size: int = 4
    max_length: int = 2048
    train_split: float = 0.9
    val_split: float = 0.05
    test_split: float = 0.05

@dataclass
class ModelConfig:
    """Configuration for model setup"""
    model_name: str
    quantization: Optional[str] = None  # '4bit', '8bit', None
    device_map: str = 'auto'
    torch_dtype: str = 'float16'

@dataclass
class LoRAConfig:
    """LoRA-specific configuration"""
    r: int = 16  # Rank of LoRA matrices
    lora_alpha: int = 32  # Scaling factor
    lora_dropout: float = 0.05
    target_modules: list = field(default_factory=lambda: ['q_proj', 'v_proj'])
    bias: str = 'none'
    task_type: str = 'CAUSAL_LM'

@dataclass
class TrainingConfig:
    """Configuration for training"""
    learning_rate: float = 2e-4
    num_epochs: int = 3
    warmup_steps: int = 100
    gradient_accumulation_steps: int = 4
    max_grad_norm: float = 1.0
    optimizer: str = 'adamw_torch'
    lr_scheduler: str = 'cosine'

print("Recipe configuration classes loaded!")

## 2. Create a Custom LoRA Recipe

Let's create a custom recipe that implements **LoRA with 4-bit quantization** (QLoRA):

In [None]:
class CustomQLoRARecipe:
    """
    Custom QLoRA Recipe for efficient fine-tuning.
    
    This recipe implements:
    - 4-bit quantization for memory efficiency
    - LoRA adapters for parameter-efficient training
    - Gradient checkpointing for reduced memory
    """
    
    def __init__(
        self,
        model_config: ModelConfig,
        lora_config: LoRAConfig,
        training_config: TrainingConfig
    ):
        self.model_config = model_config
        self.lora_config = lora_config
        self.training_config = training_config
        self.model = None
        self.tokenizer = None
    
    def prepare_quantization_config(self) -> BitsAndBytesConfig:
        """
        Configure 4-bit quantization using bitsandbytes.
        
        Returns:
            BitsAndBytesConfig for 4-bit NormalFloat quantization
        """
        return BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,  # Nested quantization
            bnb_4bit_quant_type="nf4",  # NormalFloat4 quantization
            bnb_4bit_compute_dtype=torch.bfloat16,  # Compute dtype
        )
    
    def build_model(self) -> Tuple[Any, Any]:
        """
        Build the quantized model with LoRA adapters.
        
        Returns:
            Tuple of (model, tokenizer)
        """
        print(f"Loading model: {self.model_config.model_name}")
        
        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            self.model_config.model_name,
            trust_remote_code=True
        )
        
        # Set pad token if not present
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        
        # Configure quantization
        quantization_config = self.prepare_quantization_config()
        
        # Load base model with quantization
        base_model = AutoModelForCausalLM.from_pretrained(
            self.model_config.model_name,
            quantization_config=quantization_config,
            device_map=self.model_config.device_map,
            trust_remote_code=True,
        )
        
        print(f"Base model loaded. Parameters: {base_model.num_parameters():,}")
        
        # Prepare model for k-bit training
        base_model = prepare_model_for_kbit_training(
            base_model,
            use_gradient_checkpointing=True
        )
        
        # Configure LoRA
        peft_config = LoraConfig(
            r=self.lora_config.r,
            lora_alpha=self.lora_config.lora_alpha,
            lora_dropout=self.lora_config.lora_dropout,
            target_modules=self.lora_config.target_modules,
            bias=self.lora_config.bias,
            task_type=self.lora_config.task_type,
        )
        
        # Apply LoRA
        self.model = get_peft_model(base_model, peft_config)
        
        # Print trainable parameters
        trainable_params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
        all_params = sum(p.numel() for p in self.model.parameters())
        trainable_percent = 100 * trainable_params / all_params
        
        print(f"\nLoRA Configuration:")
        print(f"  Rank (r): {self.lora_config.r}")
        print(f"  Alpha: {self.lora_config.lora_alpha}")
        print(f"  Dropout: {self.lora_config.lora_dropout}")
        print(f"  Target modules: {self.lora_config.target_modules}")
        print(f"\nTrainable Parameters:")
        print(f"  Total: {all_params:,}")
        print(f"  Trainable: {trainable_params:,} ({trainable_percent:.2f}%)")
        
        return self.model, self.tokenizer
    
    def prepare_data(self, data_config: DataConfig):
        """
        Prepare and tokenize training data.
        
        Args:
            data_config: Configuration for data loading
        """
        from datasets import load_dataset
        
        print(f"Loading dataset: {data_config.dataset_path}")
        
        # Load dataset
        dataset = load_dataset(data_config.dataset_path)
        
        # Tokenization function
        def tokenize_function(examples):
            return self.tokenizer(
                examples['text'],
                truncation=True,
                max_length=data_config.max_length,
                padding='max_length'
            )
        
        # Tokenize dataset
        tokenized_dataset = dataset.map(
            tokenize_function,
            batched=True,
            remove_columns=dataset['train'].column_names
        )
        
        print(f"Dataset prepared. Train size: {len(tokenized_dataset['train'])}")
        
        return tokenized_dataset
    
    def train(self, dataset, output_dir: str = './outputs'):
        """
        Train the model using the Hugging Face Trainer.
        
        Args:
            dataset: Tokenized dataset
            output_dir: Directory to save checkpoints
        """
        from transformers import Trainer, TrainingArguments
        
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=self.training_config.num_epochs,
            per_device_train_batch_size=4,
            gradient_accumulation_steps=self.training_config.gradient_accumulation_steps,
            warmup_steps=self.training_config.warmup_steps,
            learning_rate=self.training_config.learning_rate,
            fp16=True,
            logging_steps=10,
            save_strategy='epoch',
            evaluation_strategy='epoch',
            max_grad_norm=self.training_config.max_grad_norm,
            optim=self.training_config.optimizer,
            lr_scheduler_type=self.training_config.lr_scheduler,
        )
        
        trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=dataset['train'],
            eval_dataset=dataset.get('validation', dataset.get('test')),
        )
        
        print("Starting training...")
        trainer.train()
        
        # Save the final model
        trainer.save_model(f"{output_dir}/final")
        
        print(f"Training complete! Model saved to {output_dir}/final")
        
        return trainer

print("Custom QLoRA Recipe defined!")

## 3. Configure and Run the Recipe

Now let's configure and run our custom recipe:

In [None]:
# Configure model
model_config = ModelConfig(
    model_name='meta-llama/Llama-2-7b-hf',  # Replace with your model
    quantization='4bit',
    device_map='auto',
    torch_dtype='bfloat16'
)

# Configure LoRA with custom parameters
lora_config = LoRAConfig(
    r=32,  # Higher rank for more capacity
    lora_alpha=64,  # 2x rank is typical
    lora_dropout=0.1,
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],  # Apply to all attention
    bias='none',
    task_type='CAUSAL_LM'
)

# Configure training
training_config = TrainingConfig(
    learning_rate=2e-4,
    num_epochs=3,
    warmup_steps=100,
    gradient_accumulation_steps=4,
    max_grad_norm=1.0,
    optimizer='adamw_torch',
    lr_scheduler='cosine'
)

# Create recipe instance
recipe = CustomQLoRARecipe(
    model_config=model_config,
    lora_config=lora_config,
    training_config=training_config
)

print("Recipe configured!")

In [None]:
# Build the model
model, tokenizer = recipe.build_model()

In [None]:
# Prepare data
data_config = DataConfig(
    dataset_path='your-dataset',  # Replace with your dataset
    batch_size=4,
    max_length=2048
)

dataset = recipe.prepare_data(data_config)

In [None]:
# Train the model
trainer = recipe.train(dataset, output_dir='./custom_lora_output')

## 4. Advanced LoRA Techniques

Let's explore some advanced LoRA configurations:

In [None]:
# Technique 1: Multi-Head LoRA (Different ranks for different modules)
advanced_lora_config = {
    'q_proj': {'r': 32, 'lora_alpha': 64},
    'v_proj': {'r': 32, 'lora_alpha': 64},
    'k_proj': {'r': 16, 'lora_alpha': 32},  # Lower rank for keys
    'o_proj': {'r': 16, 'lora_alpha': 32},  # Lower rank for output
}

print("Advanced LoRA Configuration:")
print(advanced_lora_config)

# Technique 2: Dynamic Rank Selection
def suggest_lora_rank(model_size_gb: float) -> int:
    """
    Suggest optimal LoRA rank based on model size.
    
    Args:
        model_size_gb: Model size in GB
    
    Returns:
        Suggested rank
    """
    if model_size_gb < 3:  # Small models (< 3B params)
        return 8
    elif model_size_gb < 15:  # Medium models (3-13B params)
        return 16
    elif model_size_gb < 30:  # Large models (13-30B params)
        return 32
    else:  # Very large models (> 30B params)
        return 64

# Example usage
model_size = 14  # GB (approximate for 7B model)
suggested_rank = suggest_lora_rank(model_size)
print(f"\nFor a {model_size}GB model, suggested LoRA rank: {suggested_rank}")

# Technique 3: LoRA with Layerwise Learning Rates
def create_layerwise_lr_config(base_lr: float, num_layers: int) -> Dict[str, float]:
    """
    Create layerwise learning rate configuration.
    Lower layers get smaller LR, higher layers get larger LR.
    """
    config = {}
    for i in range(num_layers):
        # Linear scaling from 0.5x to 1.5x base LR
        layer_lr = base_lr * (0.5 + (i / num_layers))
        config[f'layer_{i}'] = layer_lr
    return config

layerwise_config = create_layerwise_lr_config(base_lr=2e-4, num_layers=32)
print(f"\nLayerwise LR (first 5 layers): {dict(list(layerwise_config.items())[:5])}")

## 5. Register Recipe with SparkTrainer

To use your custom recipe in SparkTrainer, register it via the API:

In [None]:
import requests
import json

# SparkTrainer API
API_BASE = "http://localhost:5000/api"

# Recipe registration payload
recipe_payload = {
    "name": "custom_qlora_v1",
    "type": "adapter",
    "description": "Custom QLoRA recipe with advanced configurations",
    "compatible_models": ["llama", "mistral", "gpt"],
    "hyperparameters": {
        "lora_r": 32,
        "lora_alpha": 64,
        "lora_dropout": 0.1,
        "learning_rate": 2e-4,
        "num_epochs": 3,
        "quantization": "4bit"
    },
    "code_path": "/path/to/custom_qlora_recipe.py",
    "version": "1.0.0"
}

# Register the recipe
response = requests.post(
    f"{API_BASE}/recipes",
    json=recipe_payload
)

if response.status_code == 201:
    print("✅ Recipe registered successfully!")
    print(json.dumps(response.json(), indent=2))
else:
    print(f"❌ Registration failed: {response.text}")

## 6. Use Your Custom Recipe in a Job

Now you can use your custom recipe in training jobs:

In [None]:
# Create a job using your custom recipe
job_config = {
    "name": "custom-qlora-training",
    "recipe": "custom_qlora_v1",  # Your custom recipe
    "base_model": "meta-llama/Llama-2-7b-hf",
    "dataset": "my-dataset",
    "hyperparameters": {
        "lora_r": 32,
        "lora_alpha": 64,
        "learning_rate": 2e-4
    },
    "resources": {
        "gpu_count": 1
    }
}

response = requests.post(f"{API_BASE}/jobs", json=job_config)

if response.status_code == 201:
    job = response.json()
    print(f"✅ Job created with custom recipe!")
    print(f"Job ID: {job['id']}")
    print(f"Recipe: {job.get('recipe', 'N/A')}")
else:
    print(f"❌ Job creation failed: {response.text}")

## 7. Recipe Best Practices

Here are some best practices for creating effective training recipes:

### Memory Optimization

1. **Use Gradient Checkpointing**: Trades compute for memory
2. **Enable Gradient Accumulation**: Effective batch size without memory overhead
3. **Mixed Precision Training**: Use FP16 or BF16
4. **Quantization**: 4-bit or 8-bit for large models

### LoRA Configuration

1. **Start with r=8-16**: Increase only if underfitting
2. **Alpha = 2 * r**: Good starting point
3. **Target Critical Modules**: Focus on attention (Q, K, V)
4. **Use Dropout**: Prevents overfitting (0.05-0.1)

### Training Stability

1. **Warmup**: Use 5-10% of total steps
2. **Learning Rate**: 2e-4 to 2e-5 for LoRA
3. **Gradient Clipping**: Prevent exploding gradients (1.0)
4. **Cosine Schedule**: Smooth LR decay

### Monitoring

1. **Log Frequently**: Every 10-50 steps
2. **Track Loss**: Both training and validation
3. **Monitor Gradients**: Check for vanishing/exploding
4. **Evaluate Regularly**: Every epoch or half-epoch

## Summary

You've learned how to:

- ✅ Understand the SparkTrainer recipe system
- ✅ Create custom LoRA recipes
- ✅ Configure advanced LoRA parameters
- ✅ Implement custom training logic
- ✅ Register and use recipes in SparkTrainer
- ✅ Apply best practices for recipe development

## Next Steps

- **Multimodal Training** (`03_multimodal_training.ipynb`)
- **Hyperparameter Optimization** (`04_advanced_optimization.ipynb`)
- **Model Deployment** (`05_model_deployment.ipynb`)

Happy training! 🚀