# 🔁 Notebook 07: Training & Validation Loops

**Purpose:** Implement the core training loop to train the CNN for 5 epochs, with validation after each epoch.

**What you'll learn:** How to structure training (forward→backward→step) and validation (eval mode + no_grad), track losses, and monitor overfitting.


## 🎯 Concept Primer: Training vs Validation

### Training Loop Structure
```python
for epoch in range(num_epochs):
    # TRAINING
    model.train()  # Enable dropout, batchnorm training mode
    for images, labels in train_dataloader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()  # Clear old gradients
        outputs = model(images)  # Forward pass
        loss = criterion(outputs, labels.float())  # Compute loss
        loss.backward()  # Compute gradients
        optimizer.step()  # Update weights
    
    # VALIDATION
    model.eval()  # Disable dropout, batchnorm eval mode
    with torch.no_grad():  # Disable gradient tracking
        for images, labels in val_dataloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels.float())
```

### Key Differences

| Step | Training | Validation |
|------|----------|------------|
| **Mode** | `model.train()` | `model.eval()` |
| **Gradients** | Tracked | `torch.no_grad()` |
| **Zero Grad** | ✅ Required | ❌ Not needed |
| **Backward** | ✅ `loss.backward()` | ❌ No backprop |
| **Optimizer** | ✅ `optimizer.step()` | ❌ No updates |

### Why labels.float()?
- BCELoss expects `float` type
- DataLoader might return labels as `long` (int64)
- Convert: `labels.float()`


## 📚 Learning Objectives

1. ✅ Create empty lists `train_losses` and `val_losses`
2. ✅ Set `num_epochs = 5`
3. ✅ Implement training loop: set mode, zero grad, forward, loss, backward, step
4. ✅ Implement validation loop: set mode, no_grad, forward, accumulate loss
5. ✅ Print train/val loss per epoch
6. ✅ Observe if validation loss tracks training loss


## ✅ Acceptance Criteria

- [ ] Training loop completes 5 epochs without errors
- [ ] `train_losses` and `val_losses` lists contain 5 values each
- [ ] Both losses decrease over epochs
- [ ] Each epoch prints: `Epoch [X/5] - Train Loss: Y.YYY - Val Loss: Z.ZZZ`
- [ ] Validation loss is computed WITHOUT gradients (`torch.no_grad()`)


---

## 💻 TODO 1: Import & Rebuild Everything

**Rebuild from previous notebooks:**
- SimpleCNN, device, criterion, optimizer
- train_dataloader, val_dataloader


In [None]:
# TODO 1: Import and rebuild all components
# Hint: You need torch, nn, transforms, DataLoader, PCamDataset
# Hint: Rebuild transforms, datasets, loaders, model, device, criterion, optimizer

# YOUR CODE HERE

print("✅ All components ready for training")


---

## 💻 TODO 2: Initialize Loss Tracking & Epochs


In [None]:
# TODO 2: Create empty lists and set num_epochs
# Hint: train_losses = []
# Hint: val_losses = []
# Hint: num_epochs = 5

# YOUR CODE HERE
train_losses = []
val_losses = []
num_epochs = 5

print(f"✅ Ready to train for {num_epochs} epochs")


---

## 💻 TODO 3: Implement Training Loop (Epoch + Training Phase)

**Structure:**
```python
for epoch in range(num_epochs):
    # TRAINING PHASE
    cnn_model.train()
    total_train_loss = 0.0
    
    for images, labels in train_dataloader:
        # Move to device
        # Zero gradients
        # Forward pass
        # Compute loss (remember labels.float()!)
        # Backward pass
        # Optimizer step
        # Accumulate loss: total_train_loss += loss.item()
    
    avg_train_loss = total_train_loss / len(train_dataloader)
    train_losses.append(avg_train_loss)
```


In [None]:
# TODO 3: Implement the full training + validation loop
# This is a single code cell that does both training and validation

# YOUR CODE HERE
# for epoch in range(num_epochs):
#     # TRAINING
#     cnn_model.train()
#     total_train_loss = 0.0
#     for images, labels in train_dataloader:
#         # TODO: Move to device, zero grad, forward, loss, backward, step
#         pass
#     
#     # VALIDATION
#     cnn_model.eval()
#     total_val_loss = 0.0
#     with torch.no_grad():
#         for images, labels in val_dataloader:
#             # TODO: Move to device, forward, accumulate loss
#             pass
#     
#     # Print epoch results
#     print(f"Epoch [{epoch+1}/{num_epochs}] - Train: {avg_train_loss:.4f} - Val: {avg_val_loss:.4f}")

print("✅ Training complete!")


---

## 💻 TODO 4: Plot Loss Curves (Optional Visualization)


In [None]:
# TODO 4 (Optional): Plot train and val losses
# Hint: import matplotlib.pyplot as plt
# Hint: plt.plot(train_losses, label='Train Loss')
# Hint: plt.plot(val_losses, label='Val Loss')

# YOUR CODE HERE (Optional)

print("✅ Loss curves plotted (if implemented)")


---

## 🤔 Reflection Prompts

### Question 1: Overfitting Detection
Look at your final losses:
- Train Loss: X.XX
- Val Loss: Y.YY

**Scenarios:**
- **Scenario A:** Train=0.15, Val=0.17 (close)
- **Scenario B:** Train=0.08, Val=0.25 (large gap)

Which scenario shows overfitting? Why?

**Your analysis:**

---

### Question 2: What If You Forgot model.train()?
What would happen if you skipped `cnn_model.train()` at the start of each epoch?

**Your answer:**

---

### Question 3: Why torch.no_grad() for Validation?
Memory saved by disabling gradients during validation.

**Question:** If validation loop had 10 batches, roughly how much memory is saved?

**Your intuition:**

---


## 🚀 Next Steps

Congratulations! Your model is trained.

**Move to Notebook 08:** Test Inference & Classification Report

**Key Takeaway:** Train mode + gradients + backprop vs Eval mode + no_grad!
