# üöÄ Transfer Learning with ResNet50 + Grad-CAM

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YOUR_USERNAME/oxford-pets-classification/blob/main/notebooks/02_transfer_learning.ipynb)

This notebook demonstrates transfer learning using a pretrained ResNet50 model.

**What you'll see:**
- üèóÔ∏è Two-stage training (freeze ‚Üí fine-tune)  
- üìä Performance comparison with custom CNN  
- üîç Grad-CAM visualization (model interpretability)  
- üéØ Superior accuracy with less training time  

**Estimated time:** 30-40 minutes (25 total epochs)

## ‚öôÔ∏è Setup

In [None]:
!git clone https://github.com/YOUR_USERNAME/oxford-pets-classification.git
%cd oxford-pets-classification
!pip install -q -r requirements.txt

In [None]:
import sys
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

sys.path.insert(0, '.')

from configs.config import TransferLearningConfig, GradCAMConfig
from models.architectures import get_model, count_parameters
from utils.data_utils import prepare_multiclass_dataloaders
from utils.trainer import MultiClassTrainer
from utils.visualization import (
    plot_transfer_learning_curves,
    visualize_gradcam_grid
)

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"üñ•Ô∏è Device: {device}")

## üéõÔ∏è Configuration

In [None]:
config = TransferLearningConfig
config.EPOCHS_STAGE1 = 10  # Train classifier only (use 15 for full)
config.EPOCHS_STAGE2 = 5   # Fine-tune backbone (use 10 for full)
config.DEVICE = device
config.create_directories()

print(f"üìã Stage 1: {config.EPOCHS_STAGE1} epochs (classifier only)")
print(f"üìã Stage 2: {config.EPOCHS_STAGE2} epochs (fine-tune layer4)")

## üìä Load Data

In [None]:
train_loader, val_loader, test_loader, class_names = prepare_multiclass_dataloaders(config)

print(f"Train: {len(train_loader.dataset):,}")
print(f"Test: {len(test_loader.dataset):,}")
print(f"Classes: {len(class_names)}")

## üß† Create Model (Pretrained ResNet50)

In [None]:
print("üì• Loading pretrained ResNet50...")

model = get_model(
    'resnet50',
    num_classes=config.NUM_CLASSES,
    pretrained=True,
    freeze_backbone=True
)
model = model.to(config.DEVICE)

total_params, trainable_params = count_parameters(model)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable (Stage 1): {trainable_params:,} (classifier only)")

## üöÄ Stage 1: Train Classifier Only

In this stage, we freeze the ResNet50 backbone (pretrained on ImageNet) and only train the custom classifier head.

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer_stage1 = optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=config.LEARNING_RATE_STAGE1,
    weight_decay=config.WEIGHT_DECAY
)

trainer = MultiClassTrainer(model, config.DEVICE, criterion, optimizer_stage1)

print("üîí Backbone frozen - Training classifier only...\n")

history = trainer.fit(train_loader, val_loader, epochs=config.EPOCHS_STAGE1, verbose=True)

## üîì Stage 2: Fine-Tune Last Layers

Now we unfreeze the last residual block (layer4) and fine-tune with a lower learning rate.

In [None]:
# Unfreeze layer4
model.unfreeze_layers(config.UNFREEZE_LAYERS)

total_params, trainable_params = count_parameters(model)
print(f"Trainable (Stage 2): {trainable_params:,} (layer4 + classifier)\n")

# New optimizer with lower learning rate
optimizer_stage2 = optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=config.LEARNING_RATE_STAGE2,
    weight_decay=config.WEIGHT_DECAY
)

trainer.optimizer = optimizer_stage2

print("üîì Fine-tuning layer4...\n")

# Continue training
for epoch in range(config.EPOCHS_STAGE2):
    train_loss, train_acc = trainer.train_epoch(train_loader)
    val_loss, val_acc = trainer.evaluate(val_loader)
    
    trainer.history['train_loss'].append(train_loss)
    trainer.history['train_acc'].append(train_acc)
    trainer.history['val_loss'].append(val_loss)
    trainer.history['val_acc'].append(val_acc)
    
    if val_acc > trainer.best_val_acc:
        trainer.best_val_acc = val_acc
        trainer.best_epoch = config.EPOCHS_STAGE1 + epoch + 1
    
    print(
        f"Epoch {epoch+1}/{config.EPOCHS_STAGE2} | "
        f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
        f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}"
    )

## üìà Training Curves (Both Stages)

In [None]:
plot_transfer_learning_curves(
    trainer.history,
    stage1_epochs=config.EPOCHS_STAGE1
)

print(f"\n‚úÖ Best Val Accuracy: {trainer.best_val_acc:.4f} (Epoch {trainer.best_epoch})")

## üß™ Test Evaluation

In [None]:
test_loss, test_acc = trainer.evaluate(test_loader)

print("="*70)
print("TRANSFER LEARNING TEST RESULTS")
print("="*70)
print(f"Test Accuracy: {test_acc:.4f} ({test_acc*100:.2f}%)")
print("="*70)

print("\nüìä Expected Performance:")
print("   ‚Ä¢ Custom CNN (100 epochs): ~75-80%")
print(f"   ‚Ä¢ ResNet50 Transfer ({config.EPOCHS_STAGE1 + config.EPOCHS_STAGE2} epochs): {test_acc*100:.2f}%")
print("\n   ‚úÖ Transfer learning converges faster with competitive accuracy!")

## üîç Grad-CAM Visualization

Grad-CAM (Gradient-weighted Class Activation Mapping) shows which parts of the image the model focuses on when making predictions.

In [None]:
gradcam_config = GradCAMConfig()

print("üîç Generating Grad-CAM visualizations...")
print(f"   Target layer: {gradcam_config.TARGET_LAYER}\n")

visualize_gradcam_grid(
    model.resnet,  # Use the ResNet model directly
    test_loader.dataset,
    target_layer=gradcam_config.TARGET_LAYER,
    num_samples=12,
    class_names=class_names,
    seed=42
)

print("\nüí° Grad-CAM Interpretation:")
print("   ‚Ä¢ Red/yellow areas = High importance for prediction")
print("   ‚Ä¢ Blue areas = Low importance")
print("   ‚Ä¢ Green title = Correct prediction")
print("   ‚Ä¢ Red title = Incorrect prediction")

## üéØ Conclusion

**Key Takeaways:**

1. **Transfer Learning Benefits:**
   - ‚úÖ Faster convergence (25 vs 100 epochs)
   - ‚úÖ Competitive or better accuracy
   - ‚úÖ Requires less data to train
   - ‚úÖ Leverages ImageNet knowledge

2. **Two-Stage Training:**
   - Stage 1: Train classifier (fast, stable)
   - Stage 2: Fine-tune last layers (refine features)

3. **Grad-CAM Insights:**
   - Model focuses on relevant features (faces, fur patterns)
   - Helps debug misclassifications
   - Builds trust in model predictions

---

**üöÄ Next Steps:**
- Try different pretrained models (EfficientNet, Vision Transformer)
- Experiment with different unfreezing strategies
- Use for your own datasets!

---

**Made with ‚ù§Ô∏è for deep learning education**