# 📊 Notebook 04: Val/Test DataLoaders

**Purpose:** Create DataLoaders for validation and test sets with `shuffle=False` and larger batch sizes.

**What you'll learn:** How evaluation data loading differs from training.


## 🎯 Concept Primer: Evaluation DataLoaders

### Key Differences: Training vs Evaluation

| Parameter | Training | Validation | Test |
|-----------|----------|------------|------|
| `shuffle` | ✅ True | ❌ False | ❌ False |
| `batch_size` | Small (8-16) | Large (32-64) | Large (32-64) |
| `drop_last` | Optional | ❌ False | ❌ False |

### Why shuffle=False for Evaluation?
- Consistent, repeatable metrics
- Easier debugging (same order every time)
- No benefit from shuffling (no training happens)

### Why Larger Batch Sizes for Evaluation?
- **No backpropagation** → less memory needed
- **Faster inference** → fewer iterations
- Batch size of 32 or 64 is common

### drop_last Consideration
- `drop_last=True`: Drops the last incomplete batch
- **Training:** Sometimes used to keep batch sizes consistent
- **Evaluation:** Always `False` (we want to evaluate ALL samples!)

**Example:** 100 samples, batch_size=32
- Batches: [32, 32, 32, 4]
- If `drop_last=True`, we'd skip 4 samples → biased metrics!


## 📚 Learning Objectives

By the end of this notebook, you will:

1. ✅ Create `val_dataset` and `test_dataset` with `val_test_transform`
2. ✅ Create `val_dataloader` and `test_dataloader` with `shuffle=False`
3. ✅ Use `batch_size=32` for faster evaluation
4. ✅ Verify shapes: `images=[32,3,96,96]`, `labels=[32]` (or smaller for last batch)
5. ✅ Understand why evaluation doesn't shuffle


## ✅ Acceptance Criteria

Your val/test data loaders are correct when:

- [ ] `val_dataset` and `test_dataset` use `val_test_transform`
- [ ] `val_dataloader` and `test_dataloader` have `shuffle=False`
- [ ] Both use `batch_size=32`
- [ ] Iterating one batch produces `images.shape = [32, 3, 96, 96]` (or less for last batch)
- [ ] Running the iteration twice yields batches in the **same order**


---

## 💻 TODO 1: Import Libraries & Rebuild val_test_transform

**What you need:**
- PyTorch, DataLoader, transforms
- `PCamDataset`
- Rebuild `val_test_transform` from Notebook 03


In [2]:
# TODO 1: Import libraries and rebuild val_test_transform
# Hint: import torch
# Hint: from torch.utils.data import DataLoader
# Hint: from torchvision import transforms
# Hint: from src.datasets.pcam_dataset import PCamDataset

# YOUR CODE HERE
from torch.utils.data import DataLoader
from torchvision import transforms
import sys
sys.path.append("..")
from src.datasets.pcam_dataset import PCamDataset

# Rebuild val_test_transform (copy from Notebook 03)
val_test_transform = transforms.Compose([
    transforms.Resize((96, 96)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

print("✅ Imports successful and val_test_transform ready")


✅ Imports successful and val_test_transform ready


---

## 💻 TODO 2: Create Validation & Test Datasets


In [3]:
# TODO 2: Create val_dataset and test_dataset
# Hint: val_dataset = PCamDataset(csv_file='../data/validation_labels.csv', transform=val_test_transform)
# Hint: test_dataset = PCamDataset(csv_file='../data/test_labels.csv', transform=val_test_transform)

# YOUR CODE HERE
val_dataset = PCamDataset(csv_file='../data/validation_labels.csv', transform=val_test_transform)  # Replace this line
test_dataset = PCamDataset(csv_file='../data/test_labels.csv', transform=val_test_transform)

print(f"✅ Validation dataset: {len(val_dataset)} samples")
print(f"✅ Test dataset: {len(test_dataset)} samples")


✅ Validation dataset: 200 samples
✅ Test dataset: 200 samples


---

## 💻 TODO 3: Create Validation & Test DataLoaders (shuffle=False!)


In [4]:
# TODO 3: Create val_dataloader and test_dataloader
# Hint: DataLoader(dataset, batch_size=32, shuffle=False, num_workers=0)

# YOUR CODE HERE
val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=False, num_workers=0)  # Replace this line
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=0)

print(f"✅ Validation DataLoader: {len(val_dataloader)} batches")
print(f"✅ Test DataLoader: {len(test_dataloader)} batches")


✅ Validation DataLoader: 7 batches
✅ Test DataLoader: 7 batches


---

## 💻 TODO 4: Test the Val DataLoader


In [7]:
# TODO 4: Iterate one batch from val_dataloader
# Hint: for images, labels in val_dataloader: ...

# YOUR CODE HERE
for images, labels in val_dataloader:
    print(f"✅ Validation batch loaded")
    print(f"   Images shape: {images.shape}")
    print(f"   Labels shape: {labels.shape}")
    print(f"   Batch size: {len(labels)}")
    break

for images, labels in test_dataloader:
    print(f"✅ Test batch loaded")
    print(f"   Images shape: {images.shape}")
    print(f"   Labels shape: {labels.shape}")
    print(f"   Batch size: {len(labels)}")
    break


✅ Validation batch loaded
   Images shape: torch.Size([32, 3, 96, 96])
   Labels shape: torch.Size([32])
   Batch size: 32
✅ Test batch loaded
   Images shape: torch.Size([32, 3, 96, 96])
   Labels shape: torch.Size([32])
   Batch size: 32


---

## 🤔 Reflection Prompts

### Question 1: Why Larger Batch Sizes for Evaluation?
Training uses `batch_size=8`, but evaluation uses `batch_size=32`.

**Question:** What allows us to use larger batches during evaluation?

**Your answer:**
> During evaluation, we don't need to compute gradients or backpropagation - it's just forward pass for comparison. We don't need the computational power for training, so we can use larger batch sizes like 32 instead of 8, which makes evaluation much faster. We don't cut samples in validation/test because we need all samples for accurate evaluation.

---

### Question 2: Consistency Check
Run this code twice:

```python
first_batch_labels_run1 = next(iter(val_dataloader))[1]
first_batch_labels_run2 = next(iter(val_dataloader))[1]
```

**Questions:**
- Will `first_batch_labels_run1` equal `first_batch_labels_run2`?
- Why or why not?
- What if `val_dataloader` had `shuffle=True`?

**Your analysis:**
> Yes, they should be equal because we use `shuffle=False` for validation. The batches will be the same each time, ensuring consistent evaluation. The last batch might be smaller if the total samples don't divide evenly by batch size (e.g., 70 samples with batch_size=32 gives batches of 32, 32, and 6), but this is expected and we don't cut samples in validation/test.
>
> If `val_dataloader` had `shuffle=True`, it would create randomness and we would never have correct validation results. Each run would give different results, making it impossible to track model performance consistently.

---


## 🚀 Next Steps

Excellent! You now have all three data loaders ready.

**Move to Notebook 05:** Simple CNN Architecture

**Key Takeaway:** Train=shuffle, Val/Test=no shuffle + larger batches!
