# PyTorch Tutorial: Optimization and Tuning

Building a model is just the start. Making it train fast and generalize well requires optimization and tuning. This notebook covers essential techniques for improving model performance.

## Learning Objectives
- Use Learning Rate Schedulers
- Apply Regularization (Dropout, Weight Decay)
- Implement Batch Normalization
- Understand Early Stopping


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

torch.manual_seed(42)

## 1. Learning Rate Schedulers

A constant learning rate is rarely optimal. We often want to start high (to learn fast) and decrease it (to fine-tune).

Common schedulers:
- `StepLR`: Decays LR by gamma every step_size epochs
- `ReduceLROnPlateau`: Decays LR when validation loss stops improving
- `CosineAnnealingLR`: Follows a cosine curve

In [None]:
# Create a dummy model and optimizer
model = nn.Linear(10, 1)
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Setup scheduler: Multiply LR by 0.1 every 5 epochs
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

lrs = []
for epoch in range(20):
    optimizer.step()  # Simulate training step
    lrs.append(optimizer.param_groups[0]['lr'])
    scheduler.step()  # Update LR

plt.plot(lrs, marker='o')
plt.xlabel('Epoch')
plt.ylabel('Learning Rate')
plt.title('StepLR Scheduler')
plt.grid(True)
plt.show()

## 2. Regularization: Dropout and Weight Decay

**Overfitting** happens when a model memorizes training data but fails on new data. Regularization prevents this.

### Dropout
Randomly zeros out neurons during training. This forces the network to learn robust features.

In [None]:
class RegularizedNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(20, 64)
        self.dropout = nn.Dropout(p=0.5)  # 50% probability
        self.fc2 = nn.Linear(64, 1)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)  # Apply dropout
        x = self.fc2(x)
        return x

model = RegularizedNet()
print(model)

### Weight Decay (L2 Regularization)
Penalizes large weights. In PyTorch, this is part of the optimizer.

In [None]:
# Add weight_decay parameter to optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

## 3. Batch Normalization

Normalizes layer inputs to have mean 0 and variance 1. This stabilizes training and allows higher learning rates.

In [None]:
class BatchNormNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(20, 64)
        self.bn1 = nn.BatchNorm1d(64)  # Batch Norm for 1D data
        self.fc2 = nn.Linear(64, 1)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.bn1(x)  # Apply BN before activation
        x = torch.relu(x)
        x = self.fc2(x)
        return x

model = BatchNormNet()
print(model)

## 4. Early Stopping

Stop training when validation loss stops improving. This saves time and prevents overfitting.

*(Concept only - usually implemented as a loop check)*

```python
best_loss = float('inf')
patience = 5
counter = 0

for epoch in range(100):
    train(...)
    val_loss = validate(...)
    
    if val_loss < best_loss:
        best_loss = val_loss
        counter = 0
        torch.save(model, 'best_model.pth')
    else:
        counter += 1
        if counter >= patience:
            print("Early stopping!")
            break
```

## Key Takeaways

1. **Schedulers**: Adjust learning rate dynamically.
2. **Dropout**: Randomly disable neurons to improve robustness.
3. **Weight Decay**: Penalize large weights to prevent overfitting.
4. **Batch Norm**: Normalize inputs for stable, faster training.
5. **Early Stopping**: Stop when you stop improving.