# PyTorch Tutorial: Advanced Topics and Best Practices

Saving/loading models, GPU usage, transfer learning, and debugging.

## Learning Objectives
- Save and load trained models
- Use GPU acceleration
- Transfer learning basics
- Debugging tips

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

torch.manual_seed(42)
print("PyTorch version:", torch.__version__)

# Check available devices
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA GPU: {torch.cuda.get_device_name(0)}")

print(f"MPS available: {torch.backends.mps.is_available()}")
if torch.backends.mps.is_available():
    print("MPS (Apple Silicon GPU) is available!")

## Saving and Loading Models

In [None]:
# Create a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(10, 1)
    def forward(self, x):
        return self.fc(x)

model = SimpleModel()

# Save entire model
torch.save(model, "model.pth")
print("Saved entire model")

# Save only state dict (recommended)
torch.save(model.state_dict(), "model_state.pth")
print("Saved state dict")

# Load model
loaded_model = torch.load("model.pth")
print("Loaded entire model")

# Load state dict
new_model = SimpleModel()
new_model.load_state_dict(torch.load("model_state.pth"))
print("Loaded state dict")

# Clean up
import os
os.remove("model.pth")
os.remove("model_state.pth")

## GPU Usage

In [None]:
# Determine best device (MPS > CUDA > CPU)
if torch.backends.mps.is_available():
    device = torch.device("mps")  # Apple Silicon GPU
    print("Using MPS (Apple Silicon GPU)")
elif torch.cuda.is_available():
    device = torch.device("cuda")  # NVIDIA GPU
    print("Using CUDA (NVIDIA GPU)")
else:
    device = torch.device("cpu")  # CPU fallback
    print("Using CPU")

print(f"Device: {device}")

# Move model to device
model = SimpleModel()
model = model.to(device)
print(f"Model on: {next(model.parameters()).device}")

# Move data to device
x = torch.randn(5, 10).to(device)
print(f"Data on: {x.device}")

# Compute on device
output = model(x)
print(f"Output computed on: {output.device}")

# Note: For Apple Silicon Macs, use 'mps' instead of 'cuda'
# Both work the same way: model.to('mps'), tensor.to('mps')

## Transfer Learning

In [None]:
# Load pretrained model
from torchvision import models

# Load pretrained ResNet
resnet = models.resnet18(pretrained=True)
print("Loaded pretrained ResNet18")

# Freeze early layers
for param in list(resnet.parameters())[:-2]:
    param.requires_grad = False

# Modify last layer for new task (e.g., 10 classes instead of 1000)
num_classes = 10
resnet.fc = nn.Linear(resnet.fc.in_features, num_classes)
print(f"Modified for {num_classes} classes")

# Only new layer will be trained
print(f"Trainable parameters: {sum(p.numel() for p in resnet.parameters() if p.requires_grad)}")

## Best Practices

In [None]:
# 1. Always set model to eval() for inference
model.eval()
with torch.no_grad():
    output = model(x)

# 2. Use DataLoader for batching
train_loader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

# 3. Use learning rate scheduling
optimizer = optim.SGD(model.parameters(), lr=0.1)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

# 4. Gradient clipping (prevents exploding gradients)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

# 5. Check for NaN/Inf
for name, param in model.named_parameters():
    if torch.isnan(param).any():
        print(f"NaN in {name}")

print("Best practices demonstrated!")

## Debugging Tips

In [None]:
# Common issues and solutions:
print("1. Loss not decreasing:")
print("   - Check learning rate (try 0.001, 0.0001)")
print("   - Verify data is correct")
print("   - Check model architecture")
print()
print("2. Out of memory:")
print("   - Reduce batch size")
print("   - Use gradient accumulation")
print("   - Clear cache:")
print("     * CUDA: torch.cuda.empty_cache()")
print("     * MPS: torch.mps.empty_cache()")
print()
print("3. Model not learning:")
print("   - Check if gradients are computed: param.grad is not None")
print("   - Verify loss function")
print("   - Check data normalization")
print()
print("4. Overfitting:")
print("   - Add dropout")
print("   - Use data augmentation")
print("   - Reduce model complexity")
print("   - Early stopping")
print()
print("5. MPS-specific (Apple Silicon):")
print("   - Some operations may not be supported on MPS")
print("   - If you get MPS errors, fall back to CPU: tensor.cpu()")
print("   - MPS is generally faster for large models and datasets")

## Key Takeaways

1. **Save models**: Use state_dict for portability
2. **GPU Acceleration**: 
   - **CUDA**: NVIDIA GPUs (Windows/Linux)
   - **MPS**: Apple Silicon GPUs (M1/M2/M3 Macs)
   - Move model and data to device for speed
3. **Transfer Learning**: Use pretrained models for new tasks
4. **Best Practices**: eval(), no_grad(), schedulers, clipping
5. **Debugging**: Check gradients, learning rate, data

## Congratulations!

You've completed the PyTorch tutorial series! ðŸŽ‰

You now know:
- Tensors and operations
- Automatic differentiation
- Building neural networks
- Training models
- Practical applications
- Advanced techniques

**Keep practicing and building!**