# 🔍 Universal Learning Rate Finder Toolkit

**A comprehensive, model-agnostic learning rate optimization toolkit for any PyTorch model**

## 🎯 What You'll Learn:
- **Multiple LR Finding Methods**: Linear, Exponential, and Cyclical approaches
- **Universal Compatibility**: Works with any PyTorch model architecture
- **Visual Comparisons**: Side-by-side method comparison and analysis
- **Optimal LR Suggestions**: Automated recommendations with explanations
- **Production Ready**: Organized structure for real projects

## 🚀 Key Features:
- ✅ **4 Different LR Finding Methods** - Compare multiple approaches
- ✅ **Any Model Support** - CNN, RNN, Transformer, Custom architectures
- ✅ **Beautiful Visualizations** - Interactive plots and comparisons
- ✅ **Smart Recommendations** - Automated optimal LR detection
- ✅ **Organized Results** - Structured saving and experiment tracking
- ✅ **Step-by-Step Explanations** - Learn the theory behind each method

---

Let's start building your learning rate optimization toolkit! 🛠️

## 1. Setup Environment and Create Subfolder Structure

First, let's create an organized directory structure for our LR finding experiments. This will help us keep track of different models, results, and visualizations.

### Directory Structure:
```
lr_optimization/
├── experiments/          # Experiment results and logs
├── models/              # Saved model checkpoints
├── plots/               # Generated visualizations
├── lr_finder_results/   # Raw LR finder data
└── configs/             # Configuration files
```

In [None]:
import os
import sys
from pathlib import Path
from datetime import datetime
import json

# Create the main directory structure
def create_lr_finder_structure():
    """
    Create organized directory structure for LR finding experiments
    """
    base_dir = Path.cwd()
    
    # Define subdirectories
    subdirs = [
        'experiments',
        'models', 
        'plots',
        'lr_finder_results',
        'configs'
    ]
    
    created_dirs = []
    
    for subdir in subdirs:
        dir_path = base_dir / subdir
        dir_path.mkdir(exist_ok=True)
        created_dirs.append(str(dir_path))
        print(f"✅ Created/Verified: {dir_path}")
    
    # Create a config file for this session
    session_config = {
        'created_at': datetime.now().isoformat(),
        'base_directory': str(base_dir),
        'subdirectories': created_dirs,
        'session_info': {
            'purpose': 'Universal Learning Rate Finder Toolkit',
            'methods': ['Linear LR Range Test', 'Exponential LR Range Test', 'Cyclical LR Range Test'],
            'features': ['Model-agnostic', 'Comparative analysis', 'Automated recommendations']
        }
    }
    
    config_path = base_dir / 'configs' / 'session_config.json'
    with open(config_path, 'w') as f:
        json.dump(session_config, f, indent=2)
    
    print(f"📋 Session config saved to: {config_path}")
    return base_dir, created_dirs

# Create the directory structure
BASE_DIR, CREATED_DIRS = create_lr_finder_structure()

print(f"\n🎉 LR Finder toolkit structure created successfully!")
print(f"📂 Base directory: {BASE_DIR}")
print(f"📁 Ready for experiments with {len(CREATED_DIRS)} organized folders")

## 2. Import Required Libraries

Let's import all the essential libraries we'll need for our learning rate finding toolkit. We'll include PyTorch for deep learning, visualization libraries, and utilities for data handling.

### Key Libraries:
- **PyTorch**: Core deep learning framework
- **Matplotlib/Seaborn**: Visualization and plotting
- **NumPy**: Numerical computations
- **Pandas**: Data manipulation and analysis
- **tqdm**: Progress bars for training loops

In [None]:
# Core PyTorch libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torch.optim.lr_scheduler import OneCycleLR, CyclicLR

# Data handling and computation
import numpy as np
import pandas as pd
from collections import defaultdict
import copy
import warnings
warnings.filterwarnings('ignore')

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.gridspec import GridSpec

# Progress tracking
from tqdm.auto import tqdm

# Optional libraries (with fallbacks)
try:
    import plotly.express as px
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    PLOTLY_AVAILABLE = True
    print("✅ Plotly available - Interactive plots enabled")
except ImportError:
    PLOTLY_AVAILABLE = False
    print("⚠️ Plotly not available - Using matplotlib only")

# Set random seeds for reproducibility
def set_random_seeds(seed=42):
    """Set random seeds for reproducible results"""
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    np.random.seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

set_random_seeds(42)

# Configure plotting style
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

# Check device availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"🔧 Using device: {device}")
print(f"🐍 PyTorch version: {torch.__version__}")

if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

print(f"\n✅ All libraries imported successfully!")
print(f"📊 Ready for learning rate optimization experiments")

## 3. Define Base LR Finder Class

Now we'll create a base class that provides common functionality for all our learning rate finding methods. This class will handle:

- **Model preparation**: Setting up the model for LR finding
- **Loss tracking**: Recording loss values during experiments
- **Data management**: Storing and organizing results
- **Common utilities**: Shared functions across all methods

### Key Features:
- 🔧 **Model-agnostic**: Works with any PyTorch model
- 📊 **Automatic tracking**: Loss, LR, and gradient monitoring
- 💾 **Result storage**: Organized data saving
- 🎯 **Optimal LR detection**: Smart recommendations

In [None]:
class BaseLRFinder:
    """
    Base Learning Rate Finder class with common functionality
    """
    
    def __init__(self, model, criterion, optimizer_class=optim.Adam, device=None):
        """
        Initialize the LR Finder
        
        Args:
            model: PyTorch model (any architecture)
            criterion: Loss function (e.g., nn.CrossEntropyLoss())
            optimizer_class: Optimizer class (default: Adam)
            device: Device to run on (auto-detected if None)
        """
        self.model = model.to(device if device else torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
        self.criterion = criterion
        self.optimizer_class = optimizer_class
        self.device = self.model.parameters().__next__().device
        
        # Storage for results
        self.results = {
            'learning_rates': [],
            'losses': [],
            'gradients': [],
            'smooth_losses': [],
            'method': '',
            'optimal_lr': None,
            'optimal_lr_reason': '',
            'experiment_info': {}
        }
        
        # Model state management
        self.initial_state = None
        self.best_loss = float('inf')
        
        print(f"🔧 LR Finder initialized for {model.__class__.__name__} on {self.device}")
    
    def _prepare_model(self):
        """Prepare model for LR finding experiment"""
        # Save initial model state
        self.initial_state = copy.deepcopy(self.model.state_dict())
        self.model.train()
        return self.model
    
    def _restore_model(self):
        """Restore model to initial state"""
        if self.initial_state is not None:
            self.model.load_state_dict(self.initial_state)
            print("🔄 Model state restored to initial condition")
    
    def _calculate_smooth_loss(self, beta=0.98):
        """Calculate exponentially smoothed loss"""
        if len(self.results['losses']) == 0:
            return []
        
        smooth_losses = []
        smooth_loss = self.results['losses'][0]
        
        for loss in self.results['losses']:
            smooth_loss = beta * smooth_loss + (1 - beta) * loss
            smooth_losses.append(smooth_loss / (1 - beta ** len(smooth_losses)))
        
        return smooth_losses
    
    def _get_gradient_norm(self):
        """Calculate the norm of gradients"""
        total_norm = 0.0
        for p in self.model.parameters():
            if p.grad is not None:
                param_norm = p.grad.data.norm(2)
                total_norm += param_norm.item() ** 2
        return total_norm ** (1. / 2)
    
    def _should_stop_early(self, current_loss, patience=5, threshold=4.0):
        """
        Determine if training should stop early due to exploding loss
        
        Args:
            current_loss: Current batch loss
            patience: Number of steps to wait before stopping
            threshold: Multiplier for loss explosion detection
        """
        if len(self.results['smooth_losses']) < patience:
            return False
        
        # Check if loss has exploded compared to best loss
        if current_loss > threshold * self.best_loss:
            return True
        
        # Update best loss
        if current_loss < self.best_loss:
            self.best_loss = current_loss
        
        return False
    
    def _find_optimal_lr_simple(self):
        """
        Simple method to find optimal learning rate
        Uses the point of steepest descent before loss explosion
        """
        if len(self.results['smooth_losses']) < 10:
            return self.results['learning_rates'][len(self.results['losses'])//2], "Middle of range (insufficient data)"
        
        smooth_losses = self.results['smooth_losses']
        learning_rates = self.results['learning_rates']
        
        # Find the steepest descent point
        gradients = np.gradient(smooth_losses)
        min_gradient_idx = np.argmin(gradients)
        
        # Alternative: Find minimum loss point
        min_loss_idx = np.argmin(smooth_losses)
        
        # Choose the earlier point (more conservative)
        optimal_idx = min(min_gradient_idx, min_loss_idx)
        
        # Safety: Don't pick from the very beginning or end
        optimal_idx = max(2, min(optimal_idx, len(learning_rates) - 3))
        
        optimal_lr = learning_rates[optimal_idx]
        reason = f"Steepest descent at step {optimal_idx} (loss: {smooth_losses[optimal_idx]:.4f})"
        
        return optimal_lr, reason
    
    def save_results(self, filename_prefix="lr_finder"):
        """Save results to files"""
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        
        # Prepare data for saving
        df = pd.DataFrame({
            'learning_rate': self.results['learning_rates'],
            'loss': self.results['losses'],
            'smooth_loss': self.results['smooth_losses'],
            'gradient_norm': self.results['gradients']
        })
        
        # Save to CSV
        csv_path = BASE_DIR / 'lr_finder_results' / f"{filename_prefix}_{self.results['method']}_{timestamp}.csv"
        df.to_csv(csv_path, index=False)
        
        # Save metadata
        metadata = {
            'method': self.results['method'],
            'optimal_lr': self.results['optimal_lr'],
            'optimal_lr_reason': self.results['optimal_lr_reason'],
            'experiment_info': self.results['experiment_info'],
            'timestamp': timestamp,
            'total_steps': len(self.results['learning_rates']),
            'lr_range': {
                'min': min(self.results['learning_rates']) if self.results['learning_rates'] else None,
                'max': max(self.results['learning_rates']) if self.results['learning_rates'] else None
            },
            'loss_range': {
                'min': min(self.results['losses']) if self.results['losses'] else None,
                'max': max(self.results['losses']) if self.results['losses'] else None
            }
        }
        
        json_path = BASE_DIR / 'lr_finder_results' / f"{filename_prefix}_{self.results['method']}_metadata_{timestamp}.json"
        with open(json_path, 'w') as f:
            json.dump(metadata, f, indent=2)
        
        print(f"💾 Results saved:")
        print(f"   📊 Data: {csv_path}")
        print(f"   📋 Metadata: {json_path}")
        
        return csv_path, json_path

print("✅ Base LR Finder class defined successfully!")
print("🔧 Features: Model-agnostic, automatic tracking, optimal LR detection")
print("📊 Ready to implement specific LR finding methods")

## 4. Implement Linear LR Range Test

The **Linear LR Range Test** is one of the most popular methods for finding optimal learning rates. It works by:

1. **Starting small**: Begin with a very small learning rate (e.g., 1e-7)
2. **Linear increase**: Gradually increase LR linearly over batches
3. **Monitor loss**: Track how loss changes with increasing LR
4. **Find sweet spot**: Identify the LR where loss decreases fastest

### When to use:
- ✅ **First time** with a new model architecture
- ✅ **Stable training** - when you want conservative estimates
- ✅ **Understanding behavior** - see how model responds to LR changes

### Theory:
- **Too low LR**: Loss decreases very slowly
- **Optimal LR**: Loss decreases rapidly (steepest descent)
- **Too high LR**: Loss starts increasing or becomes unstable

In [None]:
class LinearLRFinder(BaseLRFinder):
    """
    Linear Learning Rate Range Test
    
    Increases learning rate linearly from min_lr to max_lr over num_batches
    """
    
    def __init__(self, model, criterion, optimizer_class=optim.Adam, device=None):
        super().__init__(model, criterion, optimizer_class, device)
        self.results['method'] = 'Linear_LR_Range_Test'
    
    def find_lr(self, train_loader, min_lr=1e-7, max_lr=10.0, num_batches=None, 
                stop_div_factor=4.0, smooth_factor=0.98):
        """
        Find optimal learning rate using linear range test
        
        Args:
            train_loader: DataLoader for training data
            min_lr: Minimum learning rate to test
            max_lr: Maximum learning rate to test  
            num_batches: Number of batches to test (default: full epoch)
            stop_div_factor: Stop if loss > stop_div_factor * best_loss
            smooth_factor: Smoothing factor for loss (0.98 = strong smoothing)
        
        Returns:
            dict: Results with optimal LR and loss curves
        """
        print(f"🔍 Starting Linear LR Range Test...")
        print(f"📊 LR Range: {min_lr:.2e} → {max_lr:.2e}")
        
        # Prepare model and reset results
        self._prepare_model()
        self.results = {key: [] for key in ['learning_rates', 'losses', 'gradients', 'smooth_losses']}
        self.results['method'] = 'Linear_LR_Range_Test'
        self.best_loss = float('inf')
        
        # Determine number of batches
        if num_batches is None:
            num_batches = len(train_loader)
        num_batches = min(num_batches, len(train_loader))
        
        print(f"🎯 Testing over {num_batches} batches")
        
        # Create optimizer with minimum LR
        optimizer = self.optimizer_class(self.model.parameters(), lr=min_lr)
        
        # Calculate LR step size for linear increase
        lr_lambda = lambda batch_num: min_lr + (max_lr - min_lr) * batch_num / (num_batches - 1)
        
        # Training loop with progress bar
        pbar = tqdm(enumerate(train_loader), total=num_batches, desc="Linear LR Test")
        
        for batch_idx, (data, target) in pbar:
            if batch_idx >= num_batches:
                break
            
            # Move data to device
            data, target = data.to(self.device), target.to(self.device)
            
            # Calculate current learning rate
            current_lr = lr_lambda(batch_idx)
            
            # Update optimizer learning rate
            for param_group in optimizer.param_groups:
                param_group['lr'] = current_lr
            
            # Forward pass
            optimizer.zero_grad()
            output = self.model(data)
            loss = self.criterion(output, target)
            
            # Backward pass
            loss.backward()
            
            # Get gradient norm before optimizer step
            grad_norm = self._get_gradient_norm()
            
            # Optimizer step
            optimizer.step()
            
            # Store results
            current_loss = loss.item()
            self.results['learning_rates'].append(current_lr)
            self.results['losses'].append(current_loss)
            self.results['gradients'].append(grad_norm)
            
            # Calculate smooth loss
            if len(self.results['smooth_losses']) == 0:
                smooth_loss = current_loss
            else:
                smooth_loss = (smooth_factor * self.results['smooth_losses'][-1] + 
                             (1 - smooth_factor) * current_loss)
            self.results['smooth_losses'].append(smooth_loss)
            
            # Update progress bar
            pbar.set_postfix({
                'LR': f'{current_lr:.2e}',
                'Loss': f'{current_loss:.4f}',
                'Smooth': f'{smooth_loss:.4f}'
            })
            
            # Early stopping check
            if self._should_stop_early(current_loss, threshold=stop_div_factor):
                print(f"\\n🛑 Early stopping at batch {batch_idx} (loss exploded)")
                break
            
            # Update best loss
            if smooth_loss < self.best_loss:
                self.best_loss = smooth_loss
        
        pbar.close()
        
        # Find optimal learning rate
        self.results['optimal_lr'], self.results['optimal_lr_reason'] = self._find_optimal_lr_simple()
        
        # Store experiment info
        self.results['experiment_info'] = {
            'min_lr': min_lr,
            'max_lr': max_lr,
            'num_batches_tested': len(self.results['learning_rates']),
            'stop_div_factor': stop_div_factor,
            'smooth_factor': smooth_factor
        }
        
        # Restore model state
        self._restore_model()
        
        print(f"\\n✅ Linear LR Range Test completed!")
        print(f"🎯 Optimal LR: {self.results['optimal_lr']:.2e}")
        print(f"📝 Reason: {self.results['optimal_lr_reason']}")
        
        return self.results.copy()
    
    def plot_results(self, save_plot=True, show_plot=True):
        """Plot the results of linear LR range test"""
        if not self.results['learning_rates']:
            print("❌ No results to plot. Run find_lr() first.")
            return
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
        
        # Plot 1: Loss vs Learning Rate (log scale)
        ax1.semilogx(self.results['learning_rates'], self.results['losses'], 
                     'b-', alpha=0.6, label='Raw Loss')
        ax1.semilogx(self.results['learning_rates'], self.results['smooth_losses'], 
                     'r-', linewidth=2, label='Smoothed Loss')
        
        # Mark optimal LR
        if self.results['optimal_lr']:
            ax1.axvline(x=self.results['optimal_lr'], color='green', linestyle='--', 
                       linewidth=2, label=f'Optimal LR: {self.results["optimal_lr"]:.2e}')
        
        ax1.set_xlabel('Learning Rate')
        ax1.set_ylabel('Loss')
        ax1.set_title('Linear LR Range Test: Loss vs Learning Rate')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Plot 2: Gradient Norm vs Learning Rate
        ax2.semilogx(self.results['learning_rates'], self.results['gradients'], 
                     'purple', alpha=0.7, label='Gradient Norm')
        ax2.set_xlabel('Learning Rate')
        ax2.set_ylabel('Gradient Norm')
        ax2.set_title('Gradient Norm vs Learning Rate')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        
        if save_plot:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            plot_path = BASE_DIR / 'plots' / f'linear_lr_test_{timestamp}.png'
            plt.savefig(plot_path, dpi=300, bbox_inches='tight')
            print(f"💾 Plot saved to: {plot_path}")
        
        if show_plot:
            plt.show()
        else:
            plt.close()
        
        return fig

# Test the Linear LR Finder
print("✅ Linear LR Range Test implemented successfully!")
print("🔧 Features: Linear LR increase, early stopping, gradient tracking")
print("📊 Use: finder.find_lr(train_loader, min_lr=1e-7, max_lr=10.0)")

## 5. Implement Exponential LR Range Test

The **Exponential LR Range Test** is faster and more aggressive than the linear version. It works by:

1. **Exponential growth**: LR increases exponentially (multiplicative steps)
2. **Faster exploration**: Covers wide LR ranges quickly
3. **Better for large ranges**: When you don't know the approximate optimal LR
4. **Aggressive search**: Finds upper bounds more effectively

### When to use:
- ✅ **Unknown LR range** - when you have no idea about optimal LR
- ✅ **Quick exploration** - fast initial assessment
- ✅ **Large models** - when linear search takes too long
- ✅ **Comparative studies** - comparing with linear method

### Theory:
- **Exponential formula**: `LR(t) = min_lr * (max_lr/min_lr)^(t/total_steps)`
- **Faster convergence**: Reaches high LR values quickly
- **Logarithmic scale**: Natural for visualizing wide LR ranges

In [None]:
class ExponentialLRFinder(BaseLRFinder):
    """
    Exponential Learning Rate Range Test
    
    Increases learning rate exponentially from min_lr to max_lr
    """
    
    def __init__(self, model, criterion, optimizer_class=optim.Adam, device=None):
        super().__init__(model, criterion, optimizer_class, device)
        self.results['method'] = 'Exponential_LR_Range_Test'
    
    def find_lr(self, train_loader, min_lr=1e-7, max_lr=10.0, num_batches=None,
                stop_div_factor=4.0, smooth_factor=0.98):
        """
        Find optimal learning rate using exponential range test
        
        Args:
            train_loader: DataLoader for training data
            min_lr: Minimum learning rate to test
            max_lr: Maximum learning rate to test
            num_batches: Number of batches to test (default: full epoch)
            stop_div_factor: Stop if loss > stop_div_factor * best_loss
            smooth_factor: Smoothing factor for loss
        
        Returns:
            dict: Results with optimal LR and loss curves
        """
        print(f"🚀 Starting Exponential LR Range Test...")
        print(f"📊 LR Range: {min_lr:.2e} → {max_lr:.2e} (exponential)")
        
        # Prepare model and reset results
        self._prepare_model()
        self.results = {key: [] for key in ['learning_rates', 'losses', 'gradients', 'smooth_losses']}
        self.results['method'] = 'Exponential_LR_Range_Test'
        self.best_loss = float('inf')
        
        # Determine number of batches
        if num_batches is None:
            num_batches = len(train_loader)
        num_batches = min(num_batches, len(train_loader))
        
        print(f"🎯 Testing over {num_batches} batches")
        
        # Create optimizer with minimum LR
        optimizer = self.optimizer_class(self.model.parameters(), lr=min_lr)
        
        # Calculate exponential multiplier
        # LR(t) = min_lr * (max_lr/min_lr)^(t/(num_batches-1))
        lr_multiplier = (max_lr / min_lr) ** (1.0 / (num_batches - 1))
        
        print(f"📈 LR multiplier per batch: {lr_multiplier:.6f}")
        
        # Training loop with progress bar
        pbar = tqdm(enumerate(train_loader), total=num_batches, desc="Exponential LR Test")
        
        for batch_idx, (data, target) in pbar:
            if batch_idx >= num_batches:
                break
            
            # Move data to device
            data, target = data.to(self.device), target.to(self.device)
            
            # Calculate current learning rate (exponential)
            current_lr = min_lr * (lr_multiplier ** batch_idx)
            
            # Ensure we don't exceed max_lr due to floating point errors
            current_lr = min(current_lr, max_lr)
            
            # Update optimizer learning rate
            for param_group in optimizer.param_groups:
                param_group['lr'] = current_lr
            
            # Forward pass
            optimizer.zero_grad()
            output = self.model(data)
            loss = self.criterion(output, target)
            
            # Backward pass
            loss.backward()
            
            # Get gradient norm before optimizer step
            grad_norm = self._get_gradient_norm()
            
            # Optimizer step
            optimizer.step()
            
            # Store results
            current_loss = loss.item()
            self.results['learning_rates'].append(current_lr)
            self.results['losses'].append(current_loss)
            self.results['gradients'].append(grad_norm)
            
            # Calculate smooth loss
            if len(self.results['smooth_losses']) == 0:
                smooth_loss = current_loss
            else:
                smooth_loss = (smooth_factor * self.results['smooth_losses'][-1] + 
                             (1 - smooth_factor) * current_loss)
            self.results['smooth_losses'].append(smooth_loss)
            
            # Update progress bar
            pbar.set_postfix({
                'LR': f'{current_lr:.2e}',
                'Loss': f'{current_loss:.4f}',
                'Smooth': f'{smooth_loss:.4f}',
                'Mult': f'{lr_multiplier:.4f}'
            })
            
            # Early stopping check
            if self._should_stop_early(current_loss, threshold=stop_div_factor):
                print(f"\\n🛑 Early stopping at batch {batch_idx} (loss exploded)")
                break
            
            # Update best loss
            if smooth_loss < self.best_loss:
                self.best_loss = smooth_loss
        
        pbar.close()
        
        # Find optimal learning rate
        self.results['optimal_lr'], self.results['optimal_lr_reason'] = self._find_optimal_lr_exponential()
        
        # Store experiment info
        self.results['experiment_info'] = {
            'min_lr': min_lr,
            'max_lr': max_lr,
            'num_batches_tested': len(self.results['learning_rates']),
            'lr_multiplier': lr_multiplier,
            'stop_div_factor': stop_div_factor,
            'smooth_factor': smooth_factor
        }
        
        # Restore model state
        self._restore_model()
        
        print(f"\\n✅ Exponential LR Range Test completed!")
        print(f"🎯 Optimal LR: {self.results['optimal_lr']:.2e}")
        print(f"📝 Reason: {self.results['optimal_lr_reason']}")
        
        return self.results.copy()
    
    def _find_optimal_lr_exponential(self):
        """
        Find optimal LR for exponential test - accounts for logarithmic scale
        """
        if len(self.results['smooth_losses']) < 10:
            return self.results['learning_rates'][len(self.results['losses'])//2], "Middle of range (insufficient data)"
        
        smooth_losses = np.array(self.results['smooth_losses'])
        learning_rates = np.array(self.results['learning_rates'])
        
        # Convert to log scale for exponential analysis
        log_lrs = np.log10(learning_rates)
        
        # Find steepest descent in log space
        log_gradients = np.gradient(smooth_losses, log_lrs)
        min_gradient_idx = np.argmin(log_gradients)
        
        # Find minimum loss
        min_loss_idx = np.argmin(smooth_losses)
        
        # For exponential, be more aggressive (use steepest descent)
        optimal_idx = min_gradient_idx
        
        # Safety bounds
        optimal_idx = max(2, min(optimal_idx, len(learning_rates) - 3))
        
        optimal_lr = learning_rates[optimal_idx]
        reason = f"Steepest descent in log-space at step {optimal_idx} (loss: {smooth_losses[optimal_idx]:.4f})"
        
        return optimal_lr, reason
    
    def plot_results(self, save_plot=True, show_plot=True):
        """Plot the results of exponential LR range test"""
        if not self.results['learning_rates']:
            print("❌ No results to plot. Run find_lr() first.")
            return
        
        fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
        
        # Plot 1: Loss vs Learning Rate (log scale)
        ax1.semilogx(self.results['learning_rates'], self.results['losses'], 
                     'b-', alpha=0.6, label='Raw Loss')
        ax1.semilogx(self.results['learning_rates'], self.results['smooth_losses'], 
                     'r-', linewidth=2, label='Smoothed Loss')
        
        if self.results['optimal_lr']:
            ax1.axvline(x=self.results['optimal_lr'], color='green', linestyle='--', 
                       linewidth=2, label=f'Optimal LR: {self.results["optimal_lr"]:.2e}')
        
        ax1.set_xlabel('Learning Rate')
        ax1.set_ylabel('Loss')
        ax1.set_title('Exponential LR Test: Loss vs Learning Rate')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Plot 2: Loss vs Batch (time series)
        batches = range(len(self.results['losses']))
        ax2.plot(batches, self.results['losses'], 'b-', alpha=0.6, label='Raw Loss')
        ax2.plot(batches, self.results['smooth_losses'], 'r-', linewidth=2, label='Smoothed Loss')
        ax2.set_xlabel('Batch Number')
        ax2.set_ylabel('Loss')
        ax2.set_title('Loss vs Training Steps')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # Plot 3: Gradient Norm vs Learning Rate
        ax3.semilogx(self.results['learning_rates'], self.results['gradients'], 
                     'purple', alpha=0.7, label='Gradient Norm')
        ax3.set_xlabel('Learning Rate')
        ax3.set_ylabel('Gradient Norm') 
        ax3.set_title('Gradient Norm vs Learning Rate')
        ax3.legend()
        ax3.grid(True, alpha=0.3)
        
        # Plot 4: Learning Rate Schedule
        ax4.semilogy(batches, self.results['learning_rates'], 'orange', linewidth=2, 
                    label='LR Schedule (Exponential)')
        ax4.set_xlabel('Batch Number')
        ax4.set_ylabel('Learning Rate')
        ax4.set_title('Exponential Learning Rate Schedule')
        ax4.legend()
        ax4.grid(True, alpha=0.3)
        
        plt.tight_layout()
        
        if save_plot:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            plot_path = BASE_DIR / 'plots' / f'exponential_lr_test_{timestamp}.png'
            plt.savefig(plot_path, dpi=300, bbox_inches='tight')
            print(f"💾 Plot saved to: {plot_path}")
        
        if show_plot:
            plt.show()
        else:
            plt.close()
        
        return fig

print("✅ Exponential LR Range Test implemented successfully!")
print("🚀 Features: Exponential LR growth, log-scale analysis, aggressive search")
print("📊 Use: finder.find_lr(train_loader, min_lr=1e-7, max_lr=10.0)")

## 6. Implement Cyclical LR Range Test

The **Cyclical LR Range Test** uses cyclical patterns to explore learning rate ranges. It's based on the idea that:

1. **Cyclical patterns**: LR oscillates between min and max values
2. **Multiple explorations**: Each cycle explores the LR range differently
3. **Pattern analysis**: Different patterns (triangular, cosine) reveal different insights
4. **Robust testing**: Less sensitive to single bad batches

### When to use:
- ✅ **Noisy datasets** - when single LR sweeps are unreliable
- ✅ **Robust estimation** - want multiple confirmations of optimal LR
- ✅ **Cyclical training plans** - planning to use cyclical LR schedules
- ✅ **Research purposes** - comparing different cyclical patterns

### Patterns Available:
1. **Triangular**: Linear up, linear down (classic cyclical LR)
2. **Cosine**: Smooth cosine annealing pattern
3. **Exponential**: Exponential up, exponential down

In [None]:
class CyclicalLRFinder(BaseLRFinder):
    """
    Cyclical Learning Rate Range Test
    
    Uses cyclical patterns to explore learning rate ranges
    """
    
    def __init__(self, model, criterion, optimizer_class=optim.Adam, device=None):
        super().__init__(model, criterion, optimizer_class, device)
        self.results['method'] = 'Cyclical_LR_Range_Test'
    
    def find_lr(self, train_loader, min_lr=1e-7, max_lr=1.0, num_cycles=2, 
                cycle_pattern='triangular', num_batches=None, stop_div_factor=4.0, 
                smooth_factor=0.98):
        """
        Find optimal learning rate using cyclical range test
        
        Args:
            train_loader: DataLoader for training data
            min_lr: Minimum learning rate in cycles
            max_lr: Maximum learning rate in cycles
            num_cycles: Number of complete cycles to perform
            cycle_pattern: 'triangular', 'cosine', or 'exponential'
            num_batches: Number of batches to test (default: full epoch * num_cycles)
            stop_div_factor: Stop if loss > stop_div_factor * best_loss
            smooth_factor: Smoothing factor for loss
        
        Returns:
            dict: Results with optimal LR and loss curves
        """
        print(f"🔄 Starting Cyclical LR Range Test...")
        print(f"📊 LR Range: {min_lr:.2e} ↔ {max_lr:.2e}")
        print(f"🎯 Pattern: {cycle_pattern.title()}, Cycles: {num_cycles}")
        
        # Prepare model and reset results
        self._prepare_model()
        self.results = {key: [] for key in ['learning_rates', 'losses', 'gradients', 'smooth_losses']}
        self.results['method'] = 'Cyclical_LR_Range_Test'
        self.best_loss = float('inf')
        
        # Determine number of batches
        if num_batches is None:
            num_batches = len(train_loader) * num_cycles
        num_batches = min(num_batches, len(train_loader) * num_cycles)
        
        print(f"🎯 Testing over {num_batches} batches ({num_batches//num_cycles} per cycle)")
        
        # Create optimizer with minimum LR
        optimizer = self.optimizer_class(self.model.parameters(), lr=min_lr)
        
        # Define cyclical LR functions
        def triangular_lr(batch_num, cycle_length):
            cycle_pos = (batch_num % cycle_length) / cycle_length
            if cycle_pos <= 0.5:
                # First half: increase linearly
                return min_lr + (max_lr - min_lr) * (cycle_pos * 2)
            else:
                # Second half: decrease linearly
                return max_lr - (max_lr - min_lr) * ((cycle_pos - 0.5) * 2)
        
        def cosine_lr(batch_num, cycle_length):
            cycle_pos = (batch_num % cycle_length) / cycle_length
            return min_lr + (max_lr - min_lr) * (1 + np.cos(np.pi * cycle_pos)) / 2
        
        def exponential_lr(batch_num, cycle_length):
            cycle_pos = (batch_num % cycle_length) / cycle_length
            if cycle_pos <= 0.5:
                # First half: exponential increase
                t = cycle_pos * 2
                return min_lr * ((max_lr / min_lr) ** t)
            else:
                # Second half: exponential decrease
                t = 1 - (cycle_pos - 0.5) * 2
                return min_lr * ((max_lr / min_lr) ** t)
        
        # Select pattern function
        pattern_funcs = {
            'triangular': triangular_lr,
            'cosine': cosine_lr,
            'exponential': exponential_lr
        }
        
        if cycle_pattern not in pattern_funcs:
            print(f"⚠️ Unknown pattern '{cycle_pattern}', using 'triangular'")
            cycle_pattern = 'triangular'
        
        lr_func = pattern_funcs[cycle_pattern]
        cycle_length = num_batches // num_cycles
        
        # Training loop with progress bar
        pbar = tqdm(range(num_batches), desc=f"Cyclical LR Test ({cycle_pattern})")
        
        data_iter = iter(train_loader)
        
        for batch_idx in pbar:
            try:
                data, target = next(data_iter)
            except StopIteration:
                # Reset iterator when we reach the end
                data_iter = iter(train_loader)
                data, target = next(data_iter)
            
            # Move data to device
            data, target = data.to(self.device), target.to(self.device)
            
            # Calculate current learning rate using cyclical pattern
            current_lr = lr_func(batch_idx, cycle_length)
            
            # Update optimizer learning rate
            for param_group in optimizer.param_groups:
                param_group['lr'] = current_lr
            
            # Forward pass
            optimizer.zero_grad()
            output = self.model(data)
            loss = self.criterion(output, target)
            
            # Backward pass
            loss.backward()
            
            # Get gradient norm
            grad_norm = self._get_gradient_norm()
            
            # Optimizer step
            optimizer.step()
            
            # Store results
            current_loss = loss.item()
            self.results['learning_rates'].append(current_lr)
            self.results['losses'].append(current_loss)
            self.results['gradients'].append(grad_norm)
            
            # Calculate smooth loss
            if len(self.results['smooth_losses']) == 0:
                smooth_loss = current_loss
            else:
                smooth_loss = (smooth_factor * self.results['smooth_losses'][-1] + 
                             (1 - smooth_factor) * current_loss)
            self.results['smooth_losses'].append(smooth_loss)
            
            # Update progress bar
            cycle_num = batch_idx // cycle_length + 1
            cycle_pos = (batch_idx % cycle_length) / cycle_length
            pbar.set_postfix({
                'Cycle': f'{cycle_num}/{num_cycles}',
                'Pos': f'{cycle_pos:.2f}',
                'LR': f'{current_lr:.2e}',
                'Loss': f'{current_loss:.4f}'
            })
            
            # Early stopping check (but be more lenient for cyclical)
            if self._should_stop_early(current_loss, threshold=stop_div_factor * 2):
                print(f"\\n🛑 Early stopping at batch {batch_idx} (loss exploded)")
                break
            
            # Update best loss
            if smooth_loss < self.best_loss:
                self.best_loss = smooth_loss
        
        pbar.close()
        
        # Find optimal learning rate using cyclical-specific method
        self.results['optimal_lr'], self.results['optimal_lr_reason'] = self._find_optimal_lr_cyclical()
        
        # Store experiment info
        self.results['experiment_info'] = {
            'min_lr': min_lr,
            'max_lr': max_lr,
            'num_cycles': num_cycles,
            'cycle_pattern': cycle_pattern,
            'cycle_length': cycle_length,
            'num_batches_tested': len(self.results['learning_rates']),
            'stop_div_factor': stop_div_factor,
            'smooth_factor': smooth_factor
        }
        
        # Restore model state
        self._restore_model()
        
        print(f"\\n✅ Cyclical LR Range Test completed!")
        print(f"🎯 Optimal LR: {self.results['optimal_lr']:.2e}")
        print(f"📝 Reason: {self.results['optimal_lr_reason']}")
        
        return self.results.copy()
    
    def _find_optimal_lr_cyclical(self):
        """Find optimal LR for cyclical test - considers all cycles"""
        if len(self.results['smooth_losses']) < 20:
            return self.results['learning_rates'][len(self.results['losses'])//2], "Middle of range (insufficient data)"
        
        smooth_losses = np.array(self.results['smooth_losses'])
        learning_rates = np.array(self.results['learning_rates'])
        
        # Group by LR ranges to find consistent performers
        lr_loss_pairs = list(zip(learning_rates, smooth_losses))
        
        # Find the LR that consistently gives good performance
        lr_bins = np.logspace(np.log10(min(learning_rates)), np.log10(max(learning_rates)), 50)
        bin_losses = defaultdict(list)
        
        for lr, loss in lr_loss_pairs:
            # Find which bin this LR belongs to
            bin_idx = np.digitize(lr, lr_bins) - 1
            bin_idx = max(0, min(bin_idx, len(lr_bins) - 1))
            bin_losses[bin_idx].append(loss)
        
        # Calculate average loss for each bin
        bin_avg_losses = {}
        for bin_idx, losses in bin_losses.items():
            if len(losses) >= 2:  # Need at least 2 samples
                bin_avg_losses[bin_idx] = np.mean(losses)
        
        if not bin_avg_losses:
            # Fallback to simple method
            return self._find_optimal_lr_simple()
        
        # Find bin with minimum average loss
        best_bin_idx = min(bin_avg_losses.keys(), key=lambda x: bin_avg_losses[x])
        optimal_lr = lr_bins[best_bin_idx]
        
        reason = f"Consistently good performance across cycles (avg loss: {bin_avg_losses[best_bin_idx]:.4f})"
        
        return optimal_lr, reason
    
    def plot_results(self, save_plot=True, show_plot=True):
        """Plot the results of cyclical LR range test"""
        if not self.results['learning_rates']:
            print("❌ No results to plot. Run find_lr() first.")
            return
        
        fig = plt.figure(figsize=(18, 12))
        gs = GridSpec(3, 3, figure=fig)
        
        # Main plot: Loss vs Learning Rate (cyclical)
        ax1 = fig.add_subplot(gs[0, :2])
        
        # Color-code by cycle if we have cycle info
        if 'cycle_length' in self.results.get('experiment_info', {}):
            cycle_length = self.results['experiment_info']['cycle_length']
            num_cycles = self.results['experiment_info']['num_cycles']
            
            colors = plt.cm.Set1(np.linspace(0, 1, num_cycles))
            
            for cycle in range(num_cycles):
                start_idx = cycle * cycle_length
                end_idx = min(start_idx + cycle_length, len(self.results['learning_rates']))
                
                if start_idx < len(self.results['learning_rates']):
                    cycle_lrs = self.results['learning_rates'][start_idx:end_idx]
                    cycle_losses = self.results['smooth_losses'][start_idx:end_idx]
                    
                    ax1.semilogx(cycle_lrs, cycle_losses, color=colors[cycle], 
                               linewidth=2, label=f'Cycle {cycle+1}', alpha=0.8)
        else:
            ax1.semilogx(self.results['learning_rates'], self.results['smooth_losses'], 
                        'b-', linewidth=2, label='All Cycles')
        
        if self.results['optimal_lr']:
            ax1.axvline(x=self.results['optimal_lr'], color='green', linestyle='--', 
                       linewidth=3, label=f'Optimal LR: {self.results["optimal_lr"]:.2e}')
        
        ax1.set_xlabel('Learning Rate')
        ax1.set_ylabel('Smoothed Loss')
        ax1.set_title(f'Cyclical LR Test: {self.results.get("experiment_info", {}).get("cycle_pattern", "Unknown").title()} Pattern')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Time series plot
        ax2 = fig.add_subplot(gs[1, :2])
        batches = range(len(self.results['losses']))
        ax2.plot(batches, self.results['losses'], 'lightblue', alpha=0.6, label='Raw Loss')
        ax2.plot(batches, self.results['smooth_losses'], 'red', linewidth=2, label='Smoothed Loss')
        ax2.set_xlabel('Batch Number')
        ax2.set_ylabel('Loss')
        ax2.set_title('Loss vs Training Steps')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # Learning rate schedule
        ax3 = fig.add_subplot(gs[2, :2])
        ax3.semilogy(batches, self.results['learning_rates'], 'orange', linewidth=2, 
                    label=f'{self.results.get("experiment_info", {}).get("cycle_pattern", "Unknown").title()} LR Schedule')
        ax3.set_xlabel('Batch Number')
        ax3.set_ylabel('Learning Rate')
        ax3.set_title('Cyclical Learning Rate Schedule')
        ax3.legend()
        ax3.grid(True, alpha=0.3)
        
        # Gradient norms
        ax4 = fig.add_subplot(gs[0, 2])
        ax4.semilogx(self.results['learning_rates'], self.results['gradients'], 
                    'purple', alpha=0.7, label='Gradient Norm')
        ax4.set_xlabel('Learning Rate')
        ax4.set_ylabel('Gradient Norm')
        ax4.set_title('Gradient vs LR')
        ax4.legend()
        ax4.grid(True, alpha=0.3)
        
        # Loss distribution by LR range
        ax5 = fig.add_subplot(gs[1:, 2])
        
        # Create LR bins and plot loss distribution
        lr_array = np.array(self.results['learning_rates'])
        loss_array = np.array(self.results['smooth_losses'])
        
        # Create scatter plot with color mapping
        scatter = ax5.scatter(lr_array, loss_array, c=range(len(lr_array)), 
                            cmap='viridis', alpha=0.6, s=10)
        ax5.set_xscale('log')
        ax5.set_xlabel('Learning Rate')
        ax5.set_ylabel('Smoothed Loss')
        ax5.set_title('Loss Distribution\\n(Color = Time)')
        plt.colorbar(scatter, ax=ax5, label='Batch Number')
        ax5.grid(True, alpha=0.3)
        
        plt.tight_layout()
        
        if save_plot:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            plot_path = BASE_DIR / 'plots' / f'cyclical_lr_test_{timestamp}.png'
            plt.savefig(plot_path, dpi=300, bbox_inches='tight')
            print(f"💾 Plot saved to: {plot_path}")
        
        if show_plot:
            plt.show()
        else:
            plt.close()
        
        return fig

print("✅ Cyclical LR Range Test implemented successfully!")
print("🔄 Features: Multiple cycles, pattern options (triangular/cosine/exponential)")
print("📊 Use: finder.find_lr(train_loader, min_lr=1e-5, max_lr=1.0, num_cycles=2, cycle_pattern='triangular')")

## 7. Create Sample Dataset and Model

Let's create a sample dataset and model to demonstrate our LR finder toolkit. We'll use:

1. **CIFAR-10 style data**: Small images with 10 classes
2. **Flexible model**: CNN that works well for demonstrations
3. **Synthetic option**: Generate data if CIFAR-10 isn't available
4. **Any model support**: Show how to use with different architectures

### Model Features:
- 🏗️ **Configurable**: Easy to modify architecture
- 🎯 **Realistic**: Represents real-world training scenarios  
- 🔧 **Lightweight**: Fast enough for demonstrations
- 📊 **Informative**: Shows clear LR sensitivity

In [None]:
# Sample CNN Model for demonstration
class SampleCNN(nn.Module):
    """
    Sample CNN model for demonstrating LR finder
    - Works with any input size (default: 32x32)
    - Configurable depth and width
    - Good for LR sensitivity analysis
    """
    
    def __init__(self, num_classes=10, input_channels=3, base_channels=32):
        super(SampleCNN, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(input_channels, base_channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(base_channels)
        
        self.conv2 = nn.Conv2d(base_channels, base_channels*2, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(base_channels*2)
        
        self.conv3 = nn.Conv2d(base_channels*2, base_channels*4, 3, padding=1)
        self.bn3 = nn.BatchNorm2d(base_channels*4)
        
        # Adaptive pooling to handle any input size
        self.adaptive_pool = nn.AdaptiveAvgPool2d((4, 4))
        
        # Classifier
        self.dropout = nn.Dropout(0.3)
        self.fc = nn.Linear(base_channels*4 * 16, num_classes)  # 4*4 = 16
        
    def forward(self, x):
        # Conv block 1
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.max_pool2d(x, 2)
        
        # Conv block 2  
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.max_pool2d(x, 2)
        
        # Conv block 3
        x = F.relu(self.bn3(self.conv3(x)))
        
        # Adaptive pooling and classifier
        x = self.adaptive_pool(x)
        x = x.view(x.size(0), -1)
        x = self.dropout(x)
        x = self.fc(x)
        
        return x

def create_synthetic_dataset(num_samples=1000, image_size=32, num_classes=10, batch_size=32):
    """
    Create a synthetic dataset for demonstration
    """
    print(f"🎨 Creating synthetic dataset...")
    print(f"   📊 Samples: {num_samples}, Classes: {num_classes}")
    print(f"   🖼️ Image size: {image_size}x{image_size}")
    
    # Generate random images with some structure
    images = torch.randn(num_samples, 3, image_size, image_size)
    
    # Add some patterns to make it learnable
    for i in range(num_samples):
        class_id = i % num_classes
        # Add class-specific patterns
        if class_id % 2 == 0:
            images[i, 0, :5, :5] = 1.0  # Red corner for even classes
        if class_id % 3 == 0:
            images[i, 1, -5:, -5:] = 1.0  # Green corner for multiples of 3
        if class_id % 5 == 0:
            images[i, 2, 10:20, 10:20] = 1.0  # Blue center for multiples of 5
    
    # Generate labels
    labels = torch.tensor([i % num_classes for i in range(num_samples)])
    
    # Create dataset and dataloader
    dataset = TensorDataset(images, labels)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
    
    print(f"✅ Synthetic dataset created: {len(dataset)} samples, {len(dataloader)} batches")
    return dataloader, dataset

def create_cifar10_dataset(batch_size=32, subset_size=None):
    """
    Create CIFAR-10 dataset (falls back to synthetic if not available)
    """
    try:
        import torchvision
        import torchvision.transforms as transforms
        
        print("🖼️ Loading CIFAR-10 dataset...")
        
        # Simple transforms for demonstration
        transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
        
        # Load CIFAR-10
        trainset = torchvision.datasets.CIFAR10(
            root='./data', train=True, download=True, transform=transform
        )
        
        # Use subset if specified
        if subset_size and subset_size < len(trainset):
            indices = torch.randperm(len(trainset))[:subset_size]
            trainset = torch.utils.data.Subset(trainset, indices)
            print(f"📊 Using subset: {subset_size} samples")
        
        trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True)
        
        print(f"✅ CIFAR-10 loaded: {len(trainset)} samples, {len(trainloader)} batches")
        return trainloader, trainset
        
    except Exception as e:
        print(f"⚠️ Could not load CIFAR-10: {e}")
        print("🎨 Falling back to synthetic dataset...")
        return create_synthetic_dataset(num_samples=1000, batch_size=batch_size)

def create_sample_models():
    """
    Create various sample models to demonstrate LR finder flexibility
    """
    models = {}
    
    # 1. Small CNN
    models['small_cnn'] = SampleCNN(num_classes=10, base_channels=16)
    
    # 2. Medium CNN
    models['medium_cnn'] = SampleCNN(num_classes=10, base_channels=32)
    
    # 3. Large CNN
    models['large_cnn'] = SampleCNN(num_classes=10, base_channels=64)
    
    # 4. Simple MLP (for comparison)
    class SimpleMLP(nn.Module):
        def __init__(self, input_size=3*32*32, hidden_size=512, num_classes=10):
            super().__init__()
            self.flatten = nn.Flatten()
            self.fc1 = nn.Linear(input_size, hidden_size)
            self.fc2 = nn.Linear(hidden_size, hidden_size//2)
            self.fc3 = nn.Linear(hidden_size//2, num_classes)
            self.dropout = nn.Dropout(0.3)
            
        def forward(self, x):
            x = self.flatten(x)
            x = F.relu(self.fc1(x))
            x = self.dropout(x)
            x = F.relu(self.fc2(x))
            x = self.dropout(x)
            x = self.fc3(x)
            return x
    
    models['simple_mlp'] = SimpleMLP()
    
    # Print model info
    print("🏗️ Sample models created:")
    for name, model in models.items():
        num_params = sum(p.numel() for p in model.parameters())
        print(f"   {name}: {num_params:,} parameters")
    
    return models

# Create dataset and models
print("🚀 Setting up demonstration environment...")

# Create dataset (try CIFAR-10, fallback to synthetic)
train_loader, train_dataset = create_cifar10_dataset(batch_size=64, subset_size=2000)

# Create sample models
sample_models = create_sample_models()

# Choose a default model for demonstrations
demo_model = sample_models['medium_cnn'].to(device)
criterion = nn.CrossEntropyLoss()

print(f"\\n✅ Demo setup complete!")
print(f"📊 Dataset: {len(train_dataset)} samples, {len(train_loader)} batches")
print(f"🏗️ Demo model: {demo_model.__class__.__name__} with {sum(p.numel() for p in demo_model.parameters()):,} parameters")
print(f"🎯 Device: {device}")
print(f"📋 Loss function: {criterion.__class__.__name__}")

# Quick test to ensure everything works
print("\\n🧪 Quick functionality test...")
sample_batch = next(iter(train_loader))
test_input, test_target = sample_batch[0][:4].to(device), sample_batch[1][:4].to(device)
with torch.no_grad():
    test_output = demo_model(test_input)
    test_loss = criterion(test_output, test_target)
print(f"✅ Test passed - Loss: {test_loss.item():.4f}")
print(f"🔧 Ready for LR finding experiments!")

## 8. Execute Different LR Finding Methods

Now let's run all our LR finding methods on the sample model and collect results for comparison. We'll execute:

1. **Linear LR Range Test** - Conservative, thorough exploration
2. **Exponential LR Range Test** - Fast, aggressive exploration  
3. **Cyclical LR Range Test** - Robust, pattern-based exploration

### Execution Strategy:
- 🎯 **Same conditions**: All methods use the same model state and data
- 📊 **Consistent parameters**: Similar LR ranges and batch counts
- 💾 **Full tracking**: Save all results for detailed comparison
- 🔄 **Model reset**: Fresh start for each method

In [None]:
# Execute all LR finding methods
print("🚀 Starting comprehensive LR finding experiment...")
print("=" * 60)

# Define common parameters for fair comparison
common_params = {
    'min_lr': 1e-6,
    'max_lr': 1.0,
    'num_batches': 100,  # Limit for demonstration
    'stop_div_factor': 4.0,
    'smooth_factor': 0.98
}

print(f"📊 Common parameters:")
for key, value in common_params.items():
    print(f"   {key}: {value}")
print()

# Storage for all results
all_results = {}
all_optimal_lrs = {}

# Method 1: Linear LR Range Test
print("🔍 Method 1: Linear LR Range Test")
print("-" * 40)

# Create fresh model for this method
linear_model = SampleCNN(num_classes=10, base_channels=32).to(device)
linear_finder = LinearLRFinder(linear_model, criterion, optim.Adam, device)

# Run linear test
linear_results = linear_finder.find_lr(
    train_loader,
    min_lr=common_params['min_lr'],
    max_lr=common_params['max_lr'],
    num_batches=common_params['num_batches'],
    stop_div_factor=common_params['stop_div_factor'],
    smooth_factor=common_params['smooth_factor']
)

# Save results
linear_finder.save_results("method_comparison_linear")
all_results['Linear'] = linear_results
all_optimal_lrs['Linear'] = linear_results['optimal_lr']

print(f"✅ Linear method completed. Optimal LR: {linear_results['optimal_lr']:.2e}")
print()

# Method 2: Exponential LR Range Test  
print("🚀 Method 2: Exponential LR Range Test")
print("-" * 40)

# Create fresh model for this method
exp_model = SampleCNN(num_classes=10, base_channels=32).to(device)
exp_finder = ExponentialLRFinder(exp_model, criterion, optim.Adam, device)

# Run exponential test
exp_results = exp_finder.find_lr(
    train_loader,
    min_lr=common_params['min_lr'],
    max_lr=common_params['max_lr'],
    num_batches=common_params['num_batches'],
    stop_div_factor=common_params['stop_div_factor'],
    smooth_factor=common_params['smooth_factor']
)

# Save results
exp_finder.save_results("method_comparison_exponential")
all_results['Exponential'] = exp_results
all_optimal_lrs['Exponential'] = exp_results['optimal_lr']

print(f"✅ Exponential method completed. Optimal LR: {exp_results['optimal_lr']:.2e}")
print()

# Method 3: Cyclical LR Range Test
print("🔄 Method 3: Cyclical LR Range Test")
print("-" * 40)

# Create fresh model for this method
cyclical_model = SampleCNN(num_classes=10, base_channels=32).to(device)
cyclical_finder = CyclicalLRFinder(cyclical_model, criterion, optim.Adam, device)

# Run cyclical test with triangular pattern
cyclical_results = cyclical_finder.find_lr(
    train_loader,
    min_lr=common_params['min_lr'],
    max_lr=common_params['max_lr'], 
    num_cycles=2,
    cycle_pattern='triangular',
    num_batches=common_params['num_batches'],
    stop_div_factor=common_params['stop_div_factor'],
    smooth_factor=common_params['smooth_factor']
)

# Save results
cyclical_finder.save_results("method_comparison_cyclical")
all_results['Cyclical'] = cyclical_results
all_optimal_lrs['Cyclical'] = cyclical_results['optimal_lr']

print(f"✅ Cyclical method completed. Optimal LR: {cyclical_results['optimal_lr']:.2e}")
print()

# Summary of all methods
print("📋 EXPERIMENT SUMMARY")
print("=" * 60)
print(f"{'Method':<15} {'Optimal LR':<12} {'Reason'}")
print("-" * 60)

for method_name, result in all_results.items():
    optimal_lr = result['optimal_lr']
    reason = result['optimal_lr_reason'][:40] + "..." if len(result['optimal_lr_reason']) > 40 else result['optimal_lr_reason']
    print(f"{method_name:<15} {optimal_lr:<12.2e} {reason}")

print("\\n✅ All LR finding methods completed successfully!")
print("📊 Results stored in all_results dictionary")
print("🎯 Optimal LRs stored in all_optimal_lrs dictionary")
print("💾 Individual results saved to lr_finder_results/ folder")

# Quick analysis
optimal_lrs_values = [lr for lr in all_optimal_lrs.values() if lr is not None]
if optimal_lrs_values:
    mean_lr = np.exp(np.mean(np.log(optimal_lrs_values)))  # Geometric mean
    std_lr = np.std(np.log(optimal_lrs_values))  # Standard deviation in log space
    
    print(f"\\n📊 Quick Analysis:")
    print(f"   🎯 Geometric mean of optimal LRs: {mean_lr:.2e}")
    print(f"   📏 Log-space std deviation: {std_lr:.3f}")
    print(f"   📈 Range: {min(optimal_lrs_values):.2e} to {max(optimal_lrs_values):.2e}")
    
    # Consensus recommendation
    print(f"\\n💡 Consensus Recommendation:")
    print(f"   🎯 Use LR around: {mean_lr:.2e}")
    print(f"   📋 Consider range: {mean_lr/3:.2e} to {mean_lr*3:.2e}")

print("\\n🎉 Ready for detailed comparison and visualization!")

## 9. Compare and Visualize Results

Now comes the exciting part - comparing all our LR finding methods! We'll create comprehensive visualizations that show:

1. **Side-by-side comparison** of all loss curves
2. **Optimal LR analysis** with explanations  
3. **Method characteristics** comparison
4. **Practical recommendations** for different scenarios

### Visualization Features:
- 📊 **Unified plots**: All methods on same axes for direct comparison
- 🎯 **Optimal LR markers**: Clear indication of recommended values
- 📈 **Multiple perspectives**: Different plot types reveal different insights
- 💡 **Actionable insights**: Which method to use when

In [None]:
def create_comprehensive_comparison_plot(all_results, save_plot=True, show_plot=True):
    """
    Create comprehensive comparison visualization of all LR finding methods
    """
    print("🎨 Creating comprehensive comparison visualization...")
    
    # Set up the plot layout
    fig = plt.figure(figsize=(20, 16))
    gs = GridSpec(4, 3, figure=fig, hspace=0.3, wspace=0.3)
    
    # Color scheme for methods
    colors = {
        'Linear': '#2E86AB',       # Blue
        'Exponential': '#A23B72',  # Purple 
        'Cyclical': '#F18F01'      # Orange
    }
    
    # Plot 1: Main comparison - Loss vs Learning Rate
    ax1 = fig.add_subplot(gs[0, :])
    
    for method_name, results in all_results.items():
        if results['learning_rates'] and results['smooth_losses']:
            ax1.semilogx(results['learning_rates'], results['smooth_losses'], 
                        color=colors[method_name], linewidth=3, 
                        label=f'{method_name} (Optimal: {results["optimal_lr"]:.2e})',
                        alpha=0.8)
            
            # Mark optimal LR
            if results['optimal_lr']:
                ax1.axvline(x=results['optimal_lr'], color=colors[method_name], 
                           linestyle='--', linewidth=2, alpha=0.7)
    
    ax1.set_xlabel('Learning Rate', fontsize=12)
    ax1.set_ylabel('Smoothed Loss', fontsize=12)
    ax1.set_title('LR Finder Methods Comparison: Loss vs Learning Rate', fontsize=14, fontweight='bold')
    ax1.legend(fontsize=10)
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Loss vs Steps for each method
    ax2 = fig.add_subplot(gs[1, 0])
    for method_name, results in all_results.items():
        if results['losses']:
            steps = range(len(results['losses']))
            ax2.plot(steps, results['smooth_losses'], color=colors[method_name], 
                    linewidth=2, label=method_name, alpha=0.8)
    
    ax2.set_xlabel('Training Steps')
    ax2.set_ylabel('Smoothed Loss')
    ax2.set_title('Loss Evolution During LR Search')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # Plot 3: Learning Rate Schedules
    ax3 = fig.add_subplot(gs[1, 1])
    for method_name, results in all_results.items():
        if results['learning_rates']:
            steps = range(len(results['learning_rates']))
            ax3.semilogy(steps, results['learning_rates'], color=colors[method_name], 
                        linewidth=2, label=method_name, alpha=0.8)
    
    ax3.set_xlabel('Training Steps')
    ax3.set_ylabel('Learning Rate')
    ax3.set_title('LR Schedules Comparison')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # Plot 4: Gradient Norms
    ax4 = fig.add_subplot(gs[1, 2])
    for method_name, results in all_results.items():
        if results['gradients'] and results['learning_rates']:
            ax4.semilogx(results['learning_rates'], results['gradients'], 
                        color=colors[method_name], linewidth=2, label=method_name, alpha=0.8)
    
    ax4.set_xlabel('Learning Rate')
    ax4.set_ylabel('Gradient Norm')
    ax4.set_title('Gradient Norms vs LR')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    # Plot 5: Optimal LR Comparison
    ax5 = fig.add_subplot(gs[2, 0])
    method_names = list(all_optimal_lrs.keys())
    optimal_values = [all_optimal_lrs[name] for name in method_names if all_optimal_lrs[name] is not None]
    method_names = [name for name in method_names if all_optimal_lrs[name] is not None]
    
    if optimal_values:
        bars = ax5.bar(method_names, optimal_values, 
                      color=[colors[name] for name in method_names], alpha=0.7)
        ax5.set_yscale('log')
        ax5.set_ylabel('Optimal Learning Rate')
        ax5.set_title('Optimal LR by Method')
        ax5.grid(True, alpha=0.3)
        
        # Add value labels on bars
        for bar, value in zip(bars, optimal_values):
            height = bar.get_height()
            ax5.text(bar.get_x() + bar.get_width()/2., height*1.1,
                    f'{value:.1e}', ha='center', va='bottom', fontsize=9)
    
    # Plot 6: Method Statistics
    ax6 = fig.add_subplot(gs[2, 1])
    
    # Calculate statistics for each method
    stats_data = []
    for method_name, results in all_results.items():
        if results['losses']:
            stats = {
                'Method': method_name,
                'Steps': len(results['losses']),
                'Min Loss': min(results['smooth_losses']),
                'Final Loss': results['smooth_losses'][-1],
                'Loss Range': max(results['smooth_losses']) - min(results['smooth_losses'])
            }
            stats_data.append(stats)
    
    if stats_data:
        stats_df = pd.DataFrame(stats_data)
        
        # Create a text summary
        ax6.axis('off')
        summary_text = "📊 Method Statistics\\n\\n"
        
        for _, row in stats_df.iterrows():
            summary_text += f"{row['Method']}:\\n"
            summary_text += f"  Steps: {row['Steps']}\\n"
            summary_text += f"  Min Loss: {row['Min Loss']:.4f}\\n"
            summary_text += f"  Final Loss: {row['Final Loss']:.4f}\\n"
            summary_text += f"  Loss Range: {row['Loss Range']:.4f}\\n\\n"
        
        ax6.text(0.05, 0.95, summary_text, transform=ax6.transAxes, fontsize=10,
                verticalalignment='top', fontfamily='monospace',
                bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.8))
    
    # Plot 7: Recommendations
    ax7 = fig.add_subplot(gs[2, 2])
    ax7.axis('off')
    
    recommendations = """
🎯 Method Selection Guide

📈 LINEAR LR RANGE TEST:
✅ First-time model training
✅ Conservative estimates
✅ Detailed exploration
❌ Slow for large ranges

🚀 EXPONENTIAL LR RANGE TEST:  
✅ Quick exploration
✅ Unknown LR ranges
✅ Large models
❌ May miss nuances

🔄 CYCLICAL LR RANGE TEST:
✅ Noisy datasets
✅ Robust estimates
✅ Cyclical LR planning
❌ More complex analysis

💡 GENERAL RECOMMENDATIONS:
• Use Linear for new models
• Use Exponential for quick tests
• Use Cyclical for confirmation
• Compare multiple methods
"""
    
    ax7.text(0.05, 0.95, recommendations, transform=ax7.transAxes, fontsize=9,
            verticalalignment='top', fontfamily='monospace',
            bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.8))
    
    # Plot 8-9: Detailed method analysis
    ax8 = fig.add_subplot(gs[3, :2])
    
    # Create loss derivative analysis
    for method_name, results in all_results.items():
        if len(results['smooth_losses']) > 10:
            # Calculate loss derivatives
            losses = np.array(results['smooth_losses'])
            lrs = np.array(results['learning_rates'])
            
            # Use log scale for LR
            log_lrs = np.log10(lrs)
            derivatives = np.gradient(losses, log_lrs)
            
            ax8.semilogx(lrs[1:-1], derivatives[1:-1], color=colors[method_name], 
                        linewidth=2, label=f'{method_name} Loss Derivative', alpha=0.8)
    
    ax8.set_xlabel('Learning Rate')
    ax8.set_ylabel('Loss Derivative (d_loss/d_log_lr)')
    ax8.set_title('Loss Derivatives: Finding Steepest Descent')
    ax8.legend()
    ax8.grid(True, alpha=0.3)
    ax8.axhline(y=0, color='black', linestyle='-', alpha=0.3)
    
    # Plot 9: Summary insights
    ax9 = fig.add_subplot(gs[3, 2])
    ax9.axis('off')
    
    # Calculate consensus
    if optimal_values:
        mean_lr = np.exp(np.mean(np.log(optimal_values)))
        
        consensus_text = f"""
📋 EXPERIMENT SUMMARY

🎯 Optimal LRs Found:
"""
        for method, lr in all_optimal_lrs.items():
            if lr:
                consensus_text += f"  {method}: {lr:.2e}\\n"
        
        consensus_text += f"""
📊 Consensus Analysis:
  Geometric Mean: {mean_lr:.2e}
  Recommended Range: 
    {mean_lr/3:.2e} to {mean_lr*3:.2e}

💡 Final Recommendation:
  Start with: {mean_lr:.2e}
  Monitor and adjust based on:
  • Training stability
  • Convergence speed  
  • Validation performance
"""
        
        ax9.text(0.05, 0.95, consensus_text, transform=ax9.transAxes, fontsize=9,
                verticalalignment='top', fontfamily='monospace',
                bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8))
    
    plt.suptitle('Universal Learning Rate Finder: Comprehensive Method Comparison', 
                 fontsize=16, fontweight='bold', y=0.98)
    
    if save_plot:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        plot_path = BASE_DIR / 'plots' / f'comprehensive_lr_comparison_{timestamp}.png'
        plt.savefig(plot_path, dpi=300, bbox_inches='tight')
        print(f"💾 Comprehensive comparison plot saved to: {plot_path}")
    
    if show_plot:
        plt.show()
    else:
        plt.close()
    
    return fig

# Create the comprehensive comparison
comparison_fig = create_comprehensive_comparison_plot(all_results, save_plot=True, show_plot=True)

print("\\n✅ Comprehensive comparison visualization created!")
print("📊 All methods compared side-by-side with detailed analysis")
print("🎯 Optimal LR recommendations provided")
print("💡 Method selection guide included")

## 10. Save Results and Models

Finally, let's organize and save all our experimental results, trained models, and analysis for future reference and reproducibility.

### What We'll Save:
1. **Experimental Results**: All LR finder data and metadata
2. **Comparison Analysis**: Summary statistics and recommendations  
3. **Model Checkpoints**: Trained model states for reproduction
4. **Configuration Files**: Complete experimental setup
5. **Documentation**: Usage guide and findings summary

### Organization:
- 📁 **Structured storage** in organized subfolders
- 🏷️ **Clear naming** with timestamps and method identifiers
- 📋 **Metadata preservation** for complete reproducibility
- 📊 **Summary reports** for easy reference

In [None]:
def save_comprehensive_results(all_results, all_optimal_lrs, sample_models):
    """
    Save all experimental results, models, and analysis
    """
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    print(f"💾 Saving comprehensive experiment results...")
    print(f"🕒 Timestamp: {timestamp}")
    
    # 1. Save comprehensive experiment summary
    experiment_summary = {
        'timestamp': timestamp,
        'experiment_info': {
            'purpose': 'Universal Learning Rate Finder Comparison',
            'methods_tested': list(all_results.keys()),
            'total_methods': len(all_results),
            'dataset_info': {
                'type': 'CIFAR-10 or Synthetic',
                'samples': len(train_dataset),
                'batches': len(train_loader),
                'batch_size': train_loader.batch_size
            },
            'model_info': {
                'architecture': 'SampleCNN',
                'parameters': sum(p.numel() for p in demo_model.parameters()),
                'device': str(device)
            }
        },
        'optimal_lrs': all_optimal_lrs,
        'method_comparison': {},
        'recommendations': {}
    }
    
    # 2. Add detailed method comparison
    for method_name, results in all_results.items():
        method_stats = {
            'optimal_lr': results['optimal_lr'],
            'optimal_lr_reason': results['optimal_lr_reason'],
            'total_steps': len(results['learning_rates']),
            'lr_range': {
                'min': min(results['learning_rates']) if results['learning_rates'] else None,
                'max': max(results['learning_rates']) if results['learning_rates'] else None
            },
            'loss_stats': {
                'min': min(results['smooth_losses']) if results['smooth_losses'] else None,
                'max': max(results['smooth_losses']) if results['smooth_losses'] else None,
                'final': results['smooth_losses'][-1] if results['smooth_losses'] else None
            },
            'experiment_settings': results.get('experiment_info', {})
        }
        experiment_summary['method_comparison'][method_name] = method_stats
    
    # 3. Generate consensus recommendations
    optimal_values = [lr for lr in all_optimal_lrs.values() if lr is not None]
    if optimal_values:
        mean_lr = np.exp(np.mean(np.log(optimal_values)))
        std_lr = np.std(np.log(optimal_values))
        
        experiment_summary['recommendations'] = {
            'consensus_lr': mean_lr,
            'recommended_range': {
                'conservative': mean_lr / 3,
                'aggressive': mean_lr * 3
            },
            'log_std': std_lr,
            'agreement_level': 'High' if std_lr < 0.5 else 'Medium' if std_lr < 1.0 else 'Low',
            'suggestions': {
                'start_with': mean_lr,
                'monitor_for': ['training_stability', 'convergence_speed', 'validation_performance'],
                'adjust_based_on': 'loss_behavior_and_gradient_norms'
            }
        }
    
    # Save experiment summary
    summary_path = BASE_DIR / 'experiments' / f'lr_finder_experiment_summary_{timestamp}.json'
    with open(summary_path, 'w') as f:
        json.dump(experiment_summary, f, indent=2)
    print(f"📋 Experiment summary saved: {summary_path}")
    
    # 4. Save individual method results (detailed)
    results_dir = BASE_DIR / 'experiments' / f'detailed_results_{timestamp}'
    results_dir.mkdir(exist_ok=True)
    
    for method_name, results in all_results.items():
        # Save as both JSON and CSV
        method_data = {
            'learning_rates': results['learning_rates'],
            'losses': results['losses'],
            'smooth_losses': results['smooth_losses'],
            'gradients': results['gradients']
        }
        
        # JSON format (complete data)
        json_path = results_dir / f'{method_name.lower()}_complete_data.json'
        with open(json_path, 'w') as f:
            json.dump(method_data, f, indent=2)
        
        # CSV format (for analysis)
        if method_data['learning_rates']:
            df = pd.DataFrame(method_data)
            csv_path = results_dir / f'{method_name.lower()}_data.csv'
            df.to_csv(csv_path, index=False)
    
    print(f"📊 Detailed results saved: {results_dir}")
    
    # 5. Save model checkpoints
    models_dir = BASE_DIR / 'models' / f'lr_finder_models_{timestamp}'
    models_dir.mkdir(exist_ok=True)
    
    for model_name, model in sample_models.items():
        model_path = models_dir / f'{model_name}_{timestamp}.pth'
        torch.save({
            'model_state_dict': model.state_dict(),
            'model_class': model.__class__.__name__,
            'model_params': {
                'num_classes': 10,
                'base_channels': getattr(model, 'conv1', {}).get('out_channels', 32) if hasattr(model, 'conv1') else 32
            },
            'experiment_timestamp': timestamp
        }, model_path)
    
    print(f"🏗️ Model checkpoints saved: {models_dir}")
    
    # 6. Create usage documentation
    usage_doc = f\"\"\"# Learning Rate Finder Experiment Results
    
## Experiment Information
- **Timestamp**: {timestamp}
- **Methods Tested**: {', '.join(all_results.keys())}
- **Dataset**: CIFAR-10 or Synthetic ({len(train_dataset)} samples)
- **Model**: SampleCNN ({sum(p.numel() for p in demo_model.parameters()):,} parameters)
- **Device**: {device}

## Optimal Learning Rates Found

\"\"\"
    
    for method, lr in all_optimal_lrs.items():
        if lr:
            usage_doc += f\"- **{method}**: {lr:.2e}\\n\"
    
    if optimal_values:
        usage_doc += f\"\"\"
## Consensus Recommendation

- **Recommended LR**: {mean_lr:.2e}
- **Conservative Range**: {mean_lr/3:.2e} to {mean_lr*3:.2e}
- **Agreement Level**: {experiment_summary['recommendations']['agreement_level']}

## Usage Instructions

### Quick Start
```python
# Use the consensus LR for training
optimizer = torch.optim.Adam(model.parameters(), lr={mean_lr:.2e})
```

### Method-Specific Usage
```python
# Linear method result
linear_lr = {all_optimal_lrs.get('Linear', 'N/A')}

# Exponential method result  
exp_lr = {all_optimal_lrs.get('Exponential', 'N/A')}

# Cyclical method result
cyclical_lr = {all_optimal_lrs.get('Cyclical', 'N/A')}
```

### Advanced Usage
```python
# Load and use the LR finder results
import json
import pandas as pd

# Load experiment summary
with open('experiments/lr_finder_experiment_summary_{timestamp}.json') as f:
    summary = json.load(f)

# Load detailed method data
linear_data = pd.read_csv('experiments/detailed_results_{timestamp}/linear_data.csv')
exp_data = pd.read_csv('experiments/detailed_results_{timestamp}/exponential_data.csv')
cyclical_data = pd.read_csv('experiments/detailed_results_{timestamp}/cyclical_data.csv')

# Use in your training loop
recommended_lr = summary['recommendations']['consensus_lr']
optimizer = torch.optim.Adam(model.parameters(), lr=recommended_lr)
```

## Files Created

### Results
- `lr_finder_experiment_summary_{timestamp}.json` - Complete experiment summary
- `detailed_results_{timestamp}/` - Individual method data (JSON + CSV)

### Models
- `lr_finder_models_{timestamp}/` - Model checkpoints for reproduction

### Visualizations  
- `plots/comprehensive_lr_comparison_{timestamp}.png` - Main comparison plot
- `plots/[method]_lr_test_{timestamp}.png` - Individual method plots

## Recommendations for Future Use

1. **For New Models**: Start with Linear LR Range Test
2. **For Quick Tests**: Use Exponential LR Range Test  
3. **For Robust Estimates**: Use Cyclical LR Range Test
4. **For Best Results**: Compare multiple methods like this experiment

## Notes

- All results are reproducible using the saved model checkpoints
- Experiment used consistent parameters across all methods for fair comparison
- Results may vary with different datasets, models, or hyperparameters
- Consider dataset-specific fine-tuning of LR finder parameters

Generated by Universal Learning Rate Finder Toolkit
\"\"\"
    
    doc_path = BASE_DIR / 'experiments' / f'lr_finder_usage_guide_{timestamp}.md'
    with open(doc_path, 'w') as f:
        f.write(usage_doc)
    print(f"📚 Usage documentation saved: {doc_path}")
    
    # 7. Create a simple results CSV for quick reference
    quick_ref_data = {
        'Method': [],
        'Optimal_LR': [],
        'Min_Loss': [],
        'Total_Steps': [],
        'Status': []
    }
    
    for method_name, results in all_results.items():
        quick_ref_data['Method'].append(method_name)
        quick_ref_data['Optimal_LR'].append(results['optimal_lr'])
        quick_ref_data['Min_Loss'].append(min(results['smooth_losses']) if results['smooth_losses'] else None)
        quick_ref_data['Total_Steps'].append(len(results['learning_rates']))
        quick_ref_data['Status'].append('Success' if results['optimal_lr'] else 'Failed')
    
    quick_ref_df = pd.DataFrame(quick_ref_data)
    quick_ref_path = BASE_DIR / 'experiments' / f'lr_finder_quick_reference_{timestamp}.csv'
    quick_ref_df.to_csv(quick_ref_path, index=False)
    print(f"📊 Quick reference saved: {quick_ref_path}")
    
    print(f\"\\n✅ All results saved successfully!\")
    print(f\"📁 Main directory: {BASE_DIR}\")
    print(f\"🎉 Experiment {timestamp} complete and documented!\")
    
    return {
        'summary_path': summary_path,
        'results_dir': results_dir,
        'models_dir': models_dir,
        'doc_path': doc_path,
        'quick_ref_path': quick_ref_path,
        'timestamp': timestamp
    }

# Save all results
saved_paths = save_comprehensive_results(all_results, all_optimal_lrs, sample_models)

# Final summary
print(\"\\n\" + \"=\" * 80)
print(\"🎉 UNIVERSAL LEARNING RATE FINDER TOOLKIT - EXPERIMENT COMPLETE! 🎉\")
print(\"=\" * 80)

print(f\"\\n📊 EXPERIMENT SUMMARY:\")
print(f\"   🔍 Methods tested: {len(all_results)}\")
print(f\"   🎯 Optimal LRs found: {len([lr for lr in all_optimal_lrs.values() if lr])}\")
print(f\"   📁 Files created: {len(saved_paths) - 1}\")  # -1 for timestamp
print(f\"   🕒 Experiment ID: {saved_paths['timestamp']}\")

print(f\"\\n🎯 OPTIMAL LEARNING RATES:\")
for method, lr in all_optimal_lrs.items():
    if lr:
        print(f\"   {method}: {lr:.2e}\")

if len([lr for lr in all_optimal_lrs.values() if lr]) > 1:
    optimal_values = [lr for lr in all_optimal_lrs.values() if lr is not None]
    mean_lr = np.exp(np.mean(np.log(optimal_values)))
    print(f\"   📈 Consensus: {mean_lr:.2e}\")

print(f\"\\n💡 NEXT STEPS:\")
print(f\"   1. Review the comprehensive comparison plot\")
print(f\"   2. Check the usage guide: {saved_paths['doc_path'].name}\")
print(f\"   3. Use optimal LRs in your training\")
print(f\"   4. Experiment with different models/datasets\")

print(f\"\\n🚀 TOOLKIT FEATURES DEMONSTRATED:\")
print(f\"   ✅ Universal model compatibility\")
print(f\"   ✅ Multiple LR finding methods\") 
print(f\"   ✅ Comprehensive comparison analysis\")
print(f\"   ✅ Automated optimal LR detection\")
print(f\"   ✅ Professional result organization\")
print(f\"   ✅ Complete reproducibility\")

print(f\"\\n📚 Happy Learning Rate Optimization! 🔍📈\")

## 🎯 Conclusion & Next Steps

**Congratulations! You've built a comprehensive Universal Learning Rate Finder Toolkit!**

### 🌟 What You've Accomplished:

1. **📚 Complete LR Finding Arsenal**:
   - Linear LR Range Test (conservative, thorough)
   - Exponential LR Range Test (fast, aggressive)  
   - Cyclical LR Range Test (robust, pattern-based)

2. **🔧 Universal Compatibility**:
   - Works with ANY PyTorch model
   - Supports any dataset format
   - Flexible architecture handling

3. **📊 Professional Analysis**:
   - Comprehensive comparison visualizations
   - Automated optimal LR detection
   - Statistical consensus recommendations

4. **💾 Production Ready**:
   - Organized result storage
   - Complete reproducibility
   - Professional documentation

### 🎯 Key Features Delivered:

✅ **Model-Agnostic**: Works with CNNs, RNNs, Transformers, custom architectures
✅ **Multiple Methods**: 3 different LR finding approaches for robust analysis  
✅ **Smart Recommendations**: Automated optimal LR detection with explanations
✅ **Beautiful Visualizations**: Comprehensive plots for easy interpretation
✅ **Organized Structure**: Professional experiment tracking and storage
✅ **Complete Documentation**: Usage guides and reproducible workflows

### 🚀 How to Use This Toolkit:

#### **Quick Start** (any model):
```python
# 1. Choose your model and data
model = YourModel()
train_loader = YourDataLoader()

# 2. Pick a method and find optimal LR
finder = LinearLRFinder(model, nn.CrossEntropyLoss())
results = finder.find_lr(train_loader)

# 3. Use the optimal LR
optimal_lr = results['optimal_lr']
optimizer = torch.optim.Adam(model.parameters(), lr=optimal_lr)
```

#### **Comprehensive Analysis**:
```python
# Compare all methods for best results
linear_finder = LinearLRFinder(model, criterion)
exp_finder = ExponentialLRFinder(model, criterion)  
cyclical_finder = CyclicalLRFinder(model, criterion)

# Get consensus recommendation
all_results = {}
all_results['Linear'] = linear_finder.find_lr(train_loader)
all_results['Exponential'] = exp_finder.find_lr(train_loader)
all_results['Cyclical'] = cyclical_finder.find_lr(train_loader)
```

### 💡 When to Use Each Method:

| Method | Best For | Pros | Cons |
|--------|----------|------|------|
| **Linear** | New models, detailed analysis | Conservative, thorough | Slower for large ranges |
| **Exponential** | Quick tests, unknown ranges | Fast, covers wide ranges | May miss fine details |
| **Cyclical** | Noisy data, robust estimates | Multiple confirmations | More complex analysis |

### 🔬 Advanced Applications:

1. **Research Projects**: Compare LR sensitivity across architectures
2. **Production Models**: Establish optimal LR baselines  
3. **Hyperparameter Tuning**: Integrate with automated search
4. **Educational Use**: Teach LR optimization concepts
5. **Debugging**: Diagnose training instability issues

### 📈 Future Enhancements:

Consider extending this toolkit with:
- **Warmup LR schedules** integration
- **Multi-GPU** support for large models
- **Learning rate scheduling** recommendations
- **Automatic hyperparameter** optimization
- **Integration with popular frameworks** (Lightning, Transformers)

---

**🎉 You now have a professional-grade learning rate optimization toolkit that rivals commercial solutions!**

**Happy training! 🚀📊**