# Learning Rate Finder 🔍

This notebook implements the Learning Rate Range Test (LR Finder) to help you find the optimal learning rate for training your ResNet50 model on ImageNet datasets.

## What is Learning Rate Finder?

The Learning Rate Finder is a technique that:
- **Systematically tests** different learning rates during training
- **Plots loss vs learning rate** to visualize the relationship
- **Identifies the optimal range** where loss decreases most rapidly
- **Prevents poor convergence** due to suboptimal learning rates

## Key Benefits:
- 🎯 **Find optimal LR** - Discover the best learning rate for your specific setup
- ⚡ **Faster convergence** - Start training with an ideal learning rate
- 📊 **Visual guidance** - Clear plots show the optimal range
- 🛡️ **Avoid pitfalls** - Prevent too high/low learning rates

## How to Use:
1. Update dataset path and model configuration
2. Run the learning rate finder
3. Analyze the loss vs LR plot
4. Choose the learning rate where loss decreases most steeply
5. Use this LR for your actual training

In [None]:
# Import Required Libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Subset
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import math
import copy
import warnings
import sys
from pathlib import Path
warnings.filterwarnings('ignore')

# Add parent directory to path for imports
parent_dir = Path('../').resolve()
sys.path.append(str(parent_dir))

# Import our custom modules
from imagenet_models import resnet50_imagenet
from imagenet_dataset import get_imagenet_transforms

# Set style for better plots
plt.style.use('default')
sns.set_palette("husl")

print("📚 Libraries imported successfully!")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"🖥️ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🎮 GPU: {torch.cuda.get_device_name()}")

In [None]:
# Configuration and Setup
# ⚠️ UPDATE THESE PATHS TO YOUR DATASET LOCATION
IMAGENET_ROOT = "../datasets/tiny-imagenet-200"  # Default to Tiny ImageNet
TRAIN_DIR = str(Path(IMAGENET_ROOT) / "train")
VAL_DIR = str(Path(IMAGENET_ROOT) / "val")

# Check if dataset exists, if not suggest alternatives
if not Path(IMAGENET_ROOT).exists():
    print("⚠️ Dataset not found at default location!")
    print("\n📁 Available dataset options:")
    datasets_dir = Path("../datasets")
    if datasets_dir.exists():
        for dataset_path in datasets_dir.iterdir():
            if dataset_path.is_dir():
                print(f"   • {dataset_path.name}")
    else:
        print("   📥 No datasets found. Use dataset_tools/imagenet_subset_downloader.ipynb to download one!")
    
    print("\n🔧 To use a different dataset, update IMAGENET_ROOT above")
else:
    print(f"✅ Dataset found: {IMAGENET_ROOT}")

# Learning Rate Finder Configuration
LR_FINDER_CONFIG = {
    'min_lr': 1e-7,           # Minimum learning rate to test
    'max_lr': 10.0,           # Maximum learning rate to test
    'num_iterations': 100,    # Number of iterations to run
    'beta': 0.98,             # Smoothing factor for loss
    'stop_div_factor': 5,     # Stop if loss increases by this factor
    'batch_size': 128,        # Batch size for LR finder
    'num_workers': 4,         # Number of data loading workers
    'subset_size': 5000,      # Use subset of data for faster testing
}

# Auto-detect dataset type and adjust configuration
if "tiny-imagenet" in IMAGENET_ROOT.lower():
    MODEL_CONFIG = {
        'model_name': 'resnet50',
        'num_classes': 200,       # Tiny ImageNet has 200 classes
        'pretrained': False,      # No pretrained for Tiny ImageNet
        'input_size': 64,         # 64x64 images
    }
    LR_FINDER_CONFIG['batch_size'] = 256  # Can use larger batch for smaller images
elif "imagenette" in IMAGENET_ROOT.lower() or "imagewoof" in IMAGENET_ROOT.lower():
    MODEL_CONFIG = {
        'model_name': 'resnet50',
        'num_classes': 10,        # Imagenette/ImageWoof have 10 classes
        'pretrained': True,       # Pretrained recommended for smaller datasets
        'input_size': 224,        # Standard ImageNet size
    }
else:
    # Full ImageNet or custom dataset
    MODEL_CONFIG = {
        'model_name': 'resnet50',
        'num_classes': 1000,      # Full ImageNet classes
        'pretrained': False,      # Set to True if you want pretrained weights
        'input_size': 224,        # Standard ImageNet size
    }

# Device setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"\n🖥️ Using device: {device}")

if device.type == 'cpu':
    print("⚠️ Warning: Using CPU. LR finder will be slow.")
    LR_FINDER_CONFIG['batch_size'] = 32  # Smaller batch for CPU
    LR_FINDER_CONFIG['subset_size'] = 1000  # Smaller subset for CPU

print("\n📋 LR Finder Configuration:")
for key, value in LR_FINDER_CONFIG.items():
    print(f"   {key}: {value}")

print(f"\n🏗️ Model Configuration:")
for key, value in MODEL_CONFIG.items():
    print(f"   {key}: {value}")

In [None]:
class LearningRateFinder:
    """Learning Rate Range Test Implementation"""
    
    def __init__(self, model, optimizer, criterion, device):
        self.model = model
        self.optimizer = optimizer
        self.criterion = criterion
        self.device = device
        
        # Store initial model and optimizer state
        self.model_state = copy.deepcopy(model.state_dict())
        self.optimizer_state = copy.deepcopy(optimizer.state_dict())
        
        # Results storage
        self.learning_rates = []
        self.losses = []
        
    def reset(self):
        """Reset model and optimizer to initial state"""
        self.model.load_state_dict(self.model_state)
        self.optimizer.load_state_dict(self.optimizer_state)
        self.learning_rates = []
        self.losses = []
    
    def find_lr(self, train_loader, min_lr=1e-7, max_lr=10.0, num_iterations=100, 
                beta=0.98, stop_div_factor=5):
        """Perform learning rate range test"""
        
        print(f"🔍 Starting Learning Rate Finder...")
        print(f"   📊 LR range: {min_lr:.2e} to {max_lr:.2e}")
        print(f"   🔄 Iterations: {num_iterations}")
        
        # Reset to initial state
        self.reset()
        
        # Calculate learning rate schedule
        lr_schedule = np.logspace(np.log10(min_lr), np.log10(max_lr), num_iterations)
        
        # Training setup
        self.model.train()
        data_iter = iter(train_loader)
        
        # Initialize tracking variables
        smoothed_loss = 0
        best_loss = float('inf')
        
        # Progress bar
        pbar = tqdm(enumerate(lr_schedule), total=num_iterations, desc="LR Finder")
        
        for i, lr in pbar:
            # Update learning rate
            for param_group in self.optimizer.param_groups:
                param_group['lr'] = lr
            
            # Get next batch (cycle through dataset if needed)
            try:
                inputs, targets = next(data_iter)
            except StopIteration:
                data_iter = iter(train_loader)
                inputs, targets = next(data_iter)
            
            inputs, targets = inputs.to(self.device), targets.to(self.device)
            
            # Forward pass
            self.optimizer.zero_grad()
            outputs = self.model(inputs)
            loss = self.criterion(outputs, targets)
            
            # Backward pass
            loss.backward()
            self.optimizer.step()
            
            # Track loss
            current_loss = loss.item()
            
            # Smooth loss for better visualization
            if i == 0:
                smoothed_loss = current_loss
            else:
                smoothed_loss = beta * smoothed_loss + (1 - beta) * current_loss
            
            # Store values
            self.learning_rates.append(lr)
            self.losses.append(smoothed_loss)
            
            # Update progress bar
            pbar.set_postfix({
                'LR': f'{lr:.2e}',
                'Loss': f'{smoothed_loss:.4f}'
            })
            
            # Early stopping if loss explodes
            if i > 0 and smoothed_loss > stop_div_factor * best_loss:
                print(f"\n🛑 Stopping early at iteration {i}: loss exploded!")
                break
            
            # Track best loss
            if smoothed_loss < best_loss:
                best_loss = smoothed_loss
        
        pbar.close()
        print(f"✅ Learning rate finder completed!")
        print(f"   📊 Tested {len(self.learning_rates)} learning rates")
        print(f"   📉 Best loss: {best_loss:.4f}")
        
        return self.learning_rates, self.losses
    
    def plot_results(self, skip_start=10, skip_end=5, suggest_lr=True):
        """Plot learning rate vs loss with suggestions"""
        
        if len(self.learning_rates) == 0:
            print("❌ No results to plot. Run find_lr() first!")
            return
        
        # Skip noisy start and end portions
        start_idx = skip_start
        end_idx = len(self.learning_rates) - skip_end
        
        lr_plot = self.learning_rates[start_idx:end_idx]
        loss_plot = self.losses[start_idx:end_idx]
        
        # Create the plot
        plt.figure(figsize=(12, 8))
        
        # Main plot
        plt.subplot(2, 1, 1)
        plt.semilogx(lr_plot, loss_plot, 'b-', linewidth=2, label='Loss vs LR')
        plt.xlabel('Learning Rate')
        plt.ylabel('Loss')
        plt.title('Learning Rate Finder Results', fontsize=14, fontweight='bold')
        plt.grid(True, alpha=0.3)
        
        # Suggest optimal learning rate
        if suggest_lr and len(loss_plot) > 10:
            # Find steepest descent (negative gradient)
            gradients = np.gradient(loss_plot)
            min_gradient_idx = np.argmin(gradients)
            suggested_lr = lr_plot[min_gradient_idx]
            
            # Also find minimum loss point for reference
            min_loss_idx = np.argmin(loss_plot)
            min_loss_lr = lr_plot[min_loss_idx]
            
            # Plot suggestions
            plt.axvline(suggested_lr, color='red', linestyle='--', alpha=0.8, 
                       label=f'Suggested LR: {suggested_lr:.2e}')
            plt.axvline(min_loss_lr, color='green', linestyle='--', alpha=0.8,
                       label=f'Min Loss LR: {min_loss_lr:.2e}')
            
            # Conservative suggestion (1/10 of steepest point)
            conservative_lr = suggested_lr / 10
            plt.axvline(conservative_lr, color='orange', linestyle='--', alpha=0.8,
                       label=f'Conservative LR: {conservative_lr:.2e}')
        
        plt.legend()
        
        # Gradient plot
        plt.subplot(2, 1, 2)
        if len(loss_plot) > 1:
            gradients = np.gradient(loss_plot)
            plt.semilogx(lr_plot, gradients, 'r-', linewidth=2, label='Loss Gradient')
            plt.axhline(0, color='black', linestyle='-', alpha=0.3)
            plt.xlabel('Learning Rate')
            plt.ylabel('Loss Gradient')
            plt.title('Loss Gradient (Choose LR where gradient is most negative)')
            plt.grid(True, alpha=0.3)
            plt.legend()
        
        plt.tight_layout()
        plt.show()
        
        # Print recommendations
        if suggest_lr and len(loss_plot) > 10:
            print("\n🎯 Learning Rate Recommendations:")
            print(f"   🔴 Steepest descent: {suggested_lr:.2e} (where loss decreases fastest)")
            print(f"   🟢 Minimum loss: {min_loss_lr:.2e} (lowest loss achieved)")
            print(f"   🟠 Conservative: {conservative_lr:.2e} (safer choice, 1/10 of steepest)")
            print("\n💡 Recommended approach:")
            print(f"   • Start with conservative LR: {conservative_lr:.2e}")
            print(f"   • If training is stable, try steepest: {suggested_lr:.2e}")
            print(f"   • Monitor loss and adjust as needed")
    
    def export_results(self, filename='lr_finder_results.txt'):
        """Export results to file"""
        with open(filename, 'w') as f:
            f.write("Learning Rate Finder Results\n")
            f.write("===========================\n\n")
            for lr, loss in zip(self.learning_rates, self.losses):
                f.write(f"{lr:.6e}\t{loss:.6f}\n")
        print(f"📄 Results exported to {filename}")

print("🏗️ LearningRateFinder class defined successfully!")

In [None]:
# Prepare Dataset and DataLoader
print("📊 Preparing dataset...")

# Get transforms based on input size
if MODEL_CONFIG['input_size'] == 64:
    # Tiny ImageNet transforms
    transform = transforms.Compose([
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
else:
    # Standard ImageNet transforms
    transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

# Load dataset
try:
    train_dataset = torchvision.datasets.ImageFolder(
        root=TRAIN_DIR,
        transform=transform
    )
    
    print(f"✅ Dataset loaded successfully!")
    print(f"   📊 Total samples: {len(train_dataset):,}")
    print(f"   🏷️ Number of classes: {len(train_dataset.classes)}")
    
    # Create subset for faster LR finding
    subset_size = min(LR_FINDER_CONFIG['subset_size'], len(train_dataset))
    subset_indices = torch.randperm(len(train_dataset))[:subset_size]
    train_subset = Subset(train_dataset, subset_indices)
    
    print(f"   🎯 Using subset: {len(train_subset):,} samples")
    
    # Create DataLoader
    train_loader = DataLoader(
        train_subset,
        batch_size=LR_FINDER_CONFIG['batch_size'],
        shuffle=True,
        num_workers=LR_FINDER_CONFIG['num_workers'],
        pin_memory=True if device.type == 'cuda' else False
    )
    
    print(f"   📦 Batch size: {LR_FINDER_CONFIG['batch_size']}")
    print(f"   🔄 Number of batches: {len(train_loader)}")
    
except Exception as e:
    print(f"❌ Error loading dataset: {e}")
    print("\n💡 Troubleshooting:")
    print("   1. Check if dataset path is correct")
    print("   2. Ensure dataset has train/ subdirectory")
    print("   3. Verify dataset structure (train/class_name/images.jpg)")
    print("   4. Download dataset using: ../dataset_tools/imagenet_subset_downloader.ipynb")
    raise

In [None]:
# Initialize Model, Optimizer, and Loss Function
print("🏗️ Initializing model...")

# Create model
model = resnet50_imagenet(
    num_classes=MODEL_CONFIG['num_classes'],
    pretrained=MODEL_CONFIG['pretrained']
)
model = model.to(device)

print(f"✅ Model created: {MODEL_CONFIG['model_name']}")
print(f"   🎯 Classes: {MODEL_CONFIG['num_classes']}")
print(f"   🔄 Pretrained: {MODEL_CONFIG['pretrained']}")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"   📊 Total parameters: {total_params:,}")
print(f"   🎓 Trainable parameters: {trainable_params:,}")

# Initialize optimizer (SGD is commonly used for LR finding)
optimizer = optim.SGD(
    model.parameters(),
    lr=LR_FINDER_CONFIG['min_lr'],  # Will be updated during LR finding
    momentum=0.9,
    weight_decay=1e-4
)

# Loss function
criterion = nn.CrossEntropyLoss()

print(f"✅ Optimizer: SGD with momentum=0.9, weight_decay=1e-4")
print(f"✅ Loss function: CrossEntropyLoss")

# Test forward pass
model.eval()
with torch.no_grad():
    try:
        # Create dummy input based on input size
        dummy_input = torch.randn(2, 3, MODEL_CONFIG['input_size'], MODEL_CONFIG['input_size']).to(device)
        dummy_output = model(dummy_input)
        print(f"✅ Forward pass test successful!")
        print(f"   📊 Input shape: {dummy_input.shape}")
        print(f"   📊 Output shape: {dummy_output.shape}")
    except Exception as e:
        print(f"❌ Forward pass test failed: {e}")
        raise

model.train()  # Set back to training mode
print("\n🚀 Ready for learning rate finding!")

In [None]:
# Run Learning Rate Finder
print("🔍 Running Learning Rate Finder...")
print("" * 50)

# Create LR Finder instance
lr_finder = LearningRateFinder(
    model=model,
    optimizer=optimizer,
    criterion=criterion,
    device=device
)

# Run the learning rate range test
learning_rates, losses = lr_finder.find_lr(
    train_loader=train_loader,
    min_lr=LR_FINDER_CONFIG['min_lr'],
    max_lr=LR_FINDER_CONFIG['max_lr'],
    num_iterations=LR_FINDER_CONFIG['num_iterations'],
    beta=LR_FINDER_CONFIG['beta'],
    stop_div_factor=LR_FINDER_CONFIG['stop_div_factor']
)

print("\n" + "=" * 50)
print("🎉 Learning Rate Finder completed!")
print(f"📊 Tested {len(learning_rates)} learning rates")
print(f"📉 Loss range: {min(losses):.4f} to {max(losses):.4f}")

In [None]:
# Plot and Analyze Results
print("📊 Plotting results and analyzing optimal learning rate...")

# Plot the results with suggestions
lr_finder.plot_results(
    skip_start=10,    # Skip first 10 points (usually noisy)
    skip_end=5,       # Skip last 5 points (usually diverged)
    suggest_lr=True   # Show suggested learning rates
)

In [None]:
# Advanced Analysis and Export
print("🔬 Advanced Analysis")
print("=" * 30)

if len(learning_rates) > 20:
    # Skip noisy portions for analysis
    start_idx = 10
    end_idx = len(learning_rates) - 5
    lr_analysis = learning_rates[start_idx:end_idx]
    loss_analysis = losses[start_idx:end_idx]
    
    # Find key points
    min_loss_idx = np.argmin(loss_analysis)
    min_loss_lr = lr_analysis[min_loss_idx]
    min_loss = loss_analysis[min_loss_idx]
    
    # Calculate gradients for steepest descent
    gradients = np.gradient(loss_analysis)
    steepest_idx = np.argmin(gradients)
    steepest_lr = lr_analysis[steepest_idx]
    
    # Loss reduction analysis
    initial_loss = loss_analysis[0]
    max_reduction = initial_loss - min_loss
    reduction_percent = (max_reduction / initial_loss) * 100
    
    print(f"📊 Analysis Summary:")
    print(f"   🔴 Steepest descent LR: {steepest_lr:.2e}")
    print(f"   🟢 Minimum loss LR: {min_loss_lr:.2e}")
    print(f"   📉 Best loss achieved: {min_loss:.4f}")
    print(f"   📈 Initial loss: {initial_loss:.4f}")
    print(f"   🎯 Loss reduction: {reduction_percent:.1f}%")
    
    # Practical recommendations
    conservative_lr = steepest_lr / 10
    aggressive_lr = steepest_lr * 2
    
    print(f"\n💡 Training Recommendations:")
    print(f"   🟠 Conservative start: {conservative_lr:.2e}")
    print(f"   🔴 Optimal range: {steepest_lr:.2e}")
    print(f"   🔥 Aggressive (risky): {aggressive_lr:.2e}")
    
    # Dataset-specific advice
    if MODEL_CONFIG['num_classes'] <= 10:
        print(f"\n🎯 For small datasets ({MODEL_CONFIG['num_classes']} classes):")
        print(f"   • Start conservative: {conservative_lr:.2e}")
        print(f"   • Use learning rate scheduling")
        print(f"   • Consider pretrained weights")
    elif MODEL_CONFIG['num_classes'] == 200:
        print(f"\n🎯 For Tiny ImageNet (200 classes):")
        print(f"   • Good starting LR: {steepest_lr:.2e}")
        print(f"   • Use step or cosine LR scheduling")
        print(f"   • Train for 50-100 epochs")
    else:
        print(f"\n🎯 For large datasets ({MODEL_CONFIG['num_classes']} classes):")
        print(f"   • Start with: {conservative_lr:.2e}")
        print(f"   • Gradually increase if stable")
        print(f"   • Use warmup for first few epochs")
    
    # Export results
    lr_finder.export_results('lr_finder_results.csv')
    
    print(f"\n📄 Results exported to lr_finder_results.csv")
    print(f"\n🚀 Ready to start training with optimal learning rate!")
    
else:
    print("⚠️ Not enough data points for detailed analysis")
    print("Consider increasing num_iterations in configuration")

## 🎯 Next Steps

Now that you've found your optimal learning rate:

### 1. 🚀 Start Training
Go back to the main folder and use the suggested learning rate:

```bash
cd ..
python train_imagenet.py --lr YOUR_OPTIMAL_LR
```

### 2. 📊 Monitor Training
- Watch the loss curves carefully
- If loss increases rapidly, reduce LR
- If training is too slow, slightly increase LR

### 3. 🔧 Fine-tune
- Use learning rate scheduling (StepLR, CosineAnnealingLR)
- Consider warmup for the first few epochs
- Adjust based on validation performance

### 4. 🎯 Dataset-Specific Tips
- **Tiny ImageNet**: Start with found LR, train 50-100 epochs
- **Imagenette/ImageWoof**: Use pretrained weights + found LR
- **Full ImageNet**: Start conservative, use warmup

Happy training! 🚀