# Module 01 — Mathematical & Programming Foundations## 01-05: Data Loading with PyTorch**Objective:** Master PyTorch's Dataset/DataLoader pipeline — the standardinterface for feeding data into neural networks efficiently and reproducibly.**Prerequisites:** 01-01 (Python, NumPy & Tensor Speed), 01-02 (Advanced NumPy & PyTorch Operations), 01-03 (Pandas for Tabular Data)

---## Part 0 — Setup & PrerequisitesEvery deep learning pipeline starts with data loading. PyTorch provides aclean two-component abstraction:- **`Dataset`** — defines how to access individual samples (mapping from index to data)- **`DataLoader`** — handles batching, shuffling, and parallel loadingThis notebook teaches:- **Map-style Datasets** — subclass `Dataset` with `__len__` + `__getitem__`- **Transforms** — composable data preprocessing pipelines- **DataLoader mechanics** — batching, shuffling, collation, worker processes- **Splitting strategies** — train/val/test splits with reproducible seeds- **Built-in datasets** — torchvision, torchtext, torchaudio ecosystemWe use FashionMNIST and synthetic data to demonstrate all concepts.**Prerequisites:** 01-01, 01-02, 01-03 (Pandas for Tabular Data)

In [None]:
# ── Imports ──────────────────────────────────────────────────────────────────
import sys
import warnings
warnings.filterwarnings('ignore')

import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import (
    Dataset, DataLoader, TensorDataset, Subset,
    random_split, ConcatDataset, StackDataset,
)
import torchvision
import torchvision.transforms as T

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

print(f'Python: {sys.version.split()[0]}')
print(f'PyTorch: {torch.__version__}')
print(f'Torchvision: {torchvision.__version__}')
print(f'NumPy: {np.__version__}')
if torch.cuda.is_available():
    print(f'CUDA: {torch.version.cuda}')
    print(f'GPU: {torch.cuda.get_device_name(0)}')

In [None]:
# ── Reproducibility ──────────────────────────────────────────────────────────
import random

SEED = 1103
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

In [None]:
# ── Configuration ────────────────────────────────────────────────────────────
BATCH_SIZE = 64
NUM_WORKERS = 0          # 0 for Colab compatibility
PIN_MEMORY = torch.cuda.is_available()
DATA_DIR = '../data'

### Data LoadingWe download FashionMNIST — a standard image classification dataset — andload the California Housing dataset for tabular examples.

In [None]:
# FashionMNIST — built-in torchvision dataset
fashion_train = torchvision.datasets.FashionMNIST(
    root=DATA_DIR, train=True, download=True,
    transform=T.ToTensor(),
)
fashion_test = torchvision.datasets.FashionMNIST(
    root=DATA_DIR, train=False, download=True,
    transform=T.ToTensor(),
)

CLASS_NAMES = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print(f'FashionMNIST train: {len(fashion_train)} samples')
print(f'FashionMNIST test:  {len(fashion_test)} samples')
print(f'Image shape: {fashion_train[0][0].shape}')
print(f'Label range: {min(t for _, t in fashion_train)} - {max(t for _, t in fashion_train)}')

# California Housing — tabular data
housing = fetch_california_housing()
housing_X = housing.data.astype(np.float32)
housing_y = housing.target.astype(np.float32)
print(f'\nCalifornia Housing: {housing_X.shape[0]} samples, {housing_X.shape[1]} features')

### Quick EDA: FashionMNIST SamplesLet's visualize sample images and the class distribution.

In [None]:
def show_fashion_eda(dataset: Dataset, class_names: list[str]) -> None:
    """Display sample images and class distribution for FashionMNIST.

    Args:
        dataset: FashionMNIST dataset.
        class_names: List of class label names.
    """
    fig, axes = plt.subplots(2, 8, figsize=(16, 4))

    # Show sample images
    for idx in range(8):
        img, label = dataset[idx]
        axes[0, idx].imshow(img.squeeze(), cmap='gray')
        axes[0, idx].set_title(class_names[label], fontsize=8)
        axes[0, idx].axis('off')

    # Show random images
    rng = np.random.RandomState(SEED)
    random_indices = rng.choice(len(dataset), 8, replace=False)
    for i, idx in enumerate(random_indices):
        img, label = dataset[idx]
        axes[1, i].imshow(img.squeeze(), cmap='gray')
        axes[1, i].set_title(class_names[label], fontsize=8)
        axes[1, i].axis('off')

    plt.suptitle('FashionMNIST — Sample Images', fontsize=13)
    plt.tight_layout()
    plt.show()

    # Class distribution
    labels = [t for _, t in dataset]
    fig, ax = plt.subplots(figsize=(10, 4))
    counts = pd.Series(labels).value_counts().sort_index()
    ax.bar(range(10), counts.values, color='#1E88E5', alpha=0.7)
    ax.set_xticks(range(10))
    ax.set_xticklabels(class_names, rotation=45, ha='right')
    ax.set_xlabel('Class')
    ax.set_ylabel('Count')
    ax.set_title('Class Distribution')
    for i, c in enumerate(counts.values):
        ax.text(i, c + 100, str(c), ha='center', fontsize=8)
    plt.tight_layout()
    plt.show()


show_fashion_eda(fashion_train, CLASS_NAMES)

---## Part 1 — Dataset & DataLoader from ScratchBefore using PyTorch's built-in abstractions, let's understand the patternby building it ourselves. The key insight is that data loading has twoorthogonal concerns:1. **Data access** — given an index $i$, return the $i$-th sample $(\mathbf{x}_i, y_i)$2. **Batch construction** — group samples into batches, shuffle order, handle last incomplete batchPyTorch's `Dataset` handles concern 1, and `DataLoader` handles concern 2.

### 1.1 Building a Dataset from ScratchA PyTorch Dataset must implement two methods:- `__len__()` — returns the total number of samples- `__getitem__(index)` — returns one sample (features, label) by indexThis simple protocol lets PyTorch handle everything else.

In [None]:
class NumpyDataset(Dataset):
    """A simple Dataset wrapping NumPy arrays.

    This is the most common pattern for tabular data: store features and
    labels as NumPy arrays, convert to tensors on access.

    Attributes:
        features: Feature matrix as float32 tensor.
        labels: Label vector as float32 tensor.
    """

    def __init__(
        self,
        features: np.ndarray,
        labels: np.ndarray,
    ) -> None:
        """Initialize from NumPy arrays.

        Args:
            features: Feature matrix of shape (n_samples, n_features).
            labels: Label vector of shape (n_samples,).
        """
        self.features = torch.tensor(features, dtype=torch.float32)
        self.labels = torch.tensor(labels, dtype=torch.float32)

    def __len__(self) -> int:
        """Return number of samples."""
        return len(self.features)

    def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor]:
        """Return (features, label) for the given index.

        Args:
            index: Sample index.

        Returns:
            Tuple of (feature_vector, label).
        """
        return self.features[index], self.labels[index]


# Test with California Housing
housing_dataset = NumpyDataset(housing_X, housing_y)
print(f'Dataset length: {len(housing_dataset)}')
sample_x, sample_y = housing_dataset[0]
assert sample_x.shape == (8,), f'Expected (8,), got {sample_x.shape}'
assert sample_y.shape == (), f'Expected scalar, got {sample_y.shape}'
print(f'Sample features shape: {sample_x.shape}')
print(f'Sample label: {sample_y.item():.4f}')
print(f'Feature dtypes: {sample_x.dtype}, Label dtype: {sample_y.dtype}')

### 1.2 Building a DataLoader from ScratchThe DataLoader's job is to:1. Decide the order of samples (sequential or shuffled)2. Group indices into batches3. Fetch samples from the Dataset and collate them into tensorsLet's build a minimal version to understand the mechanics.

In [None]:
class SimpleDataLoader:
    """A minimal DataLoader implementation for understanding the mechanics.

    Supports batching, shuffling, and drop_last. Does not support
    multi-worker loading or pinned memory.

    Attributes:
        dataset: The source Dataset.
        batch_size: Number of samples per batch.
        shuffle: Whether to shuffle indices each epoch.
        drop_last: Whether to drop the last incomplete batch.
    """

    def __init__(
        self,
        dataset: Dataset,
        batch_size: int = 32,
        shuffle: bool = False,
        drop_last: bool = False,
    ) -> None:
        """Initialize the SimpleDataLoader.

        Args:
            dataset: Dataset to load from.
            batch_size: Samples per batch.
            shuffle: Randomize sample order each epoch.
            drop_last: Drop last batch if smaller than batch_size.
        """
        self.dataset = dataset
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.drop_last = drop_last

    def __len__(self) -> int:
        """Return number of batches."""
        n = len(self.dataset)
        if self.drop_last:
            return n // self.batch_size
        return (n + self.batch_size - 1) // self.batch_size

    def __iter__(self):
        """Yield batches of (features, labels) tensors."""
        n = len(self.dataset)
        indices = list(range(n))

        if self.shuffle:
            random.shuffle(indices)

        # Group into batches
        for start in range(0, n, self.batch_size):
            batch_indices = indices[start:start + self.batch_size]

            if self.drop_last and len(batch_indices) < self.batch_size:
                break

            # Fetch and collate
            batch_features = []
            batch_labels = []
            for idx in batch_indices:
                feat, lab = self.dataset[idx]
                batch_features.append(feat)
                batch_labels.append(lab)

            yield torch.stack(batch_features), torch.stack(batch_labels)


# Test our SimpleDataLoader
simple_loader = SimpleDataLoader(housing_dataset, batch_size=32, shuffle=True)
print(f'Number of batches: {len(simple_loader)}')

first_batch_x, first_batch_y = next(iter(simple_loader))
assert first_batch_x.shape == (32, 8), f'Expected (32, 8), got {first_batch_x.shape}'
assert first_batch_y.shape == (32,), f'Expected (32,), got {first_batch_y.shape}'
print(f'Batch features shape: {first_batch_x.shape}')
print(f'Batch labels shape: {first_batch_y.shape}')

### 1.3 Understanding Shuffling and ReproducibilityShuffling is critical for training — it prevents the model from learningthe order of samples. But we need **reproducible** shuffling for debugging.PyTorch's DataLoader uses a `Generator` object to control the random stateindependently of the global seed.

In [None]:
def demonstrate_shuffling_reproducibility() -> None:
    """Show how Generator objects make shuffled loading reproducible."""
    small_dataset = TensorDataset(
        torch.arange(10, dtype=torch.float32).unsqueeze(1),
        torch.arange(10, dtype=torch.float32),
    )

    # Without generator — different order each time
    loader_no_gen = DataLoader(small_dataset, batch_size=3, shuffle=True)
    run1 = [y.tolist() for _, y in loader_no_gen]
    run2 = [y.tolist() for _, y in loader_no_gen]
    print('Without Generator (may differ):')
    print(f'  Run 1 batches: {run1}')
    print(f'  Run 2 batches: {run2}')
    print(f'  Same order? {run1 == run2}')
    print()

    # With generator — same order when re-seeded
    gen = torch.Generator().manual_seed(SEED)
    loader_gen = DataLoader(small_dataset, batch_size=3, shuffle=True, generator=gen)
    run3 = [y.tolist() for _, y in loader_gen]

    gen = torch.Generator().manual_seed(SEED)  # Re-seed
    loader_gen2 = DataLoader(small_dataset, batch_size=3, shuffle=True, generator=gen)
    run4 = [y.tolist() for _, y in loader_gen2]

    print('With Generator (same seed → same order):')
    print(f'  Run 3 batches: {run3}')
    print(f'  Run 4 batches: {run4}')
    print(f'  Same order? {run3 == run4}')


demonstrate_shuffling_reproducibility()

### 1.4 Transforms: Composable Data PreprocessingTransforms are functions applied to each sample when it's loaded. PyTorchuses `torchvision.transforms` for images, but the pattern is general.Key transforms for images:- `ToTensor()` — convert PIL Image/NumPy to `(C, H, W)` float tensor, scale to `[0, 1]`- `Normalize(mean, std)` — channel-wise normalization- `Compose([...])` — chain multiple transforms- `RandomHorizontalFlip()`, `RandomCrop()` — data augmentation (training only)

In [None]:
def demonstrate_transforms() -> None:
    """Show how torchvision transforms work step by step."""
    # Load raw dataset (no transform) to show the conversion
    raw_dataset = torchvision.datasets.FashionMNIST(
        root=DATA_DIR, train=True, download=True,
        transform=None,  # Returns PIL Image
    )

    raw_img, raw_label = raw_dataset[0]
    print(f'Raw image type: {type(raw_img)}')
    print(f'Raw image size: {raw_img.size}')
    print(f'Raw label: {raw_label} ({CLASS_NAMES[raw_label]})')
    print()

    # Step 1: ToTensor — PIL → Tensor, scale 0-255 → 0.0-1.0
    to_tensor = T.ToTensor()
    tensor_img = to_tensor(raw_img)
    print(f'After ToTensor:')
    print(f'  Shape: {tensor_img.shape}  (C, H, W format)')
    print(f'  Range: [{tensor_img.min():.3f}, {tensor_img.max():.3f}]')
    print(f'  Dtype: {tensor_img.dtype}')
    print()

    # Step 2: Normalize — standardize channels
    normalize = T.Normalize(mean=[0.2860], std=[0.3530])  # FashionMNIST stats
    norm_img = normalize(tensor_img)
    print(f'After Normalize:')
    print(f'  Range: [{norm_img.min():.3f}, {norm_img.max():.3f}]')
    print(f'  Mean: {norm_img.mean():.3f}')
    print()

    # Step 3: Compose — chain transforms
    train_transform = T.Compose([
        T.RandomHorizontalFlip(p=0.5),
        T.RandomRotation(degrees=10),
        T.ToTensor(),
        T.Normalize(mean=[0.2860], std=[0.3530]),
    ])
    print('Composed transform pipeline:')
    for i, t in enumerate(train_transform.transforms):
        print(f'  {i+1}. {t}')

    # Visualize augmentation effects
    fig, axes = plt.subplots(2, 5, figsize=(12, 5))
    fig.suptitle('Data Augmentation: Same Image, 5 Random Augmentations', fontsize=12)

    # Original
    for idx in range(5):
        axes[0, idx].imshow(np.array(raw_img), cmap='gray')
        axes[0, idx].set_title('Original', fontsize=8)
        axes[0, idx].axis('off')

        aug_img = train_transform(raw_img)
        axes[1, idx].imshow(aug_img.squeeze(), cmap='gray')
        axes[1, idx].set_title(f'Augmented #{idx+1}', fontsize=8)
        axes[1, idx].axis('off')

    plt.tight_layout()
    plt.show()


demonstrate_transforms()

### 1.5 Writing Custom TransformsYou can write your own transforms as callable classes. This is useful fordomain-specific preprocessing that torchvision doesn't cover.

In [None]:
class AddGaussianNoise:
    """Add Gaussian noise to a tensor — a simple data augmentation.

    Attributes:
        mean: Mean of the Gaussian noise.
        std: Standard deviation of the Gaussian noise.
    """

    def __init__(self, mean: float = 0.0, std: float = 0.1) -> None:
        """Initialize noise parameters.

        Args:
            mean: Mean of the noise distribution.
            std: Standard deviation of the noise distribution.
        """
        self.mean = mean
        self.std = std

    def __call__(self, tensor: torch.Tensor) -> torch.Tensor:
        """Add noise to input tensor.

        Args:
            tensor: Input image tensor.

        Returns:
            Tensor with added Gaussian noise.
        """
        noise = torch.randn_like(tensor) * self.std + self.mean
        return tensor + noise


class MinMaxScale:
    """Scale tensor values to [0, 1] range."""

    def __call__(self, tensor: torch.Tensor) -> torch.Tensor:
        """Scale tensor to [0, 1].

        Args:
            tensor: Input tensor.

        Returns:
            Scaled tensor with values in [0, 1].
        """
        t_min = tensor.min()
        t_max = tensor.max()
        if t_max - t_min > 0:
            return (tensor - t_min) / (t_max - t_min)
        return tensor


# Test custom transforms
noisy_transform = T.Compose([
    T.ToTensor(),
    AddGaussianNoise(mean=0.0, std=0.2),
    MinMaxScale(),
])

raw_dataset = torchvision.datasets.FashionMNIST(
    root=DATA_DIR, train=True, download=True, transform=None,
)
raw_img, label = raw_dataset[0]

fig, axes = plt.subplots(1, 4, figsize=(12, 3))
axes[0].imshow(np.array(raw_img), cmap='gray')
axes[0].set_title('Original')
axes[0].axis('off')

for idx in range(1, 4):
    noisy_img = noisy_transform(raw_img)
    axes[idx].imshow(noisy_img.squeeze(), cmap='gray')
    axes[idx].set_title(f'Noisy #{idx}')
    axes[idx].axis('off')

plt.suptitle('Custom Transform: Gaussian Noise + MinMax Scaling', fontsize=12)
plt.tight_layout()
plt.show()

### 1.6 Dataset for Tabular DataImage datasets are common, but many ML tasks use tabular data. Let's builda Dataset that handles feature scaling and conversion from Pandas/NumPy.

In [None]:
class TabularDataset(Dataset):
    """Dataset for tabular data with optional feature scaling.

    Converts Pandas/NumPy data to tensors and applies optional StandardScaler.

    Attributes:
        features: Scaled feature tensor.
        labels: Label tensor.
        scaler: Fitted StandardScaler instance (or None).
    """

    def __init__(
        self,
        features: np.ndarray,
        labels: np.ndarray,
        scaler: StandardScaler | None = None,
        fit_scaler: bool = False,
    ) -> None:
        """Initialize tabular dataset.

        Args:
            features: Feature matrix of shape (n_samples, n_features).
            labels: Label array of shape (n_samples,).
            scaler: Pre-fitted scaler. If None and fit_scaler is True, fits a new one.
            fit_scaler: Whether to fit a new StandardScaler.
        """
        if fit_scaler and scaler is None:
            self.scaler = StandardScaler()
            features = self.scaler.fit_transform(features)
        elif scaler is not None:
            self.scaler = scaler
            features = self.scaler.transform(features)
        else:
            self.scaler = None

        self.features = torch.tensor(features, dtype=torch.float32)
        self.labels = torch.tensor(labels, dtype=torch.float32)

    def __len__(self) -> int:
        """Return number of samples."""
        return len(self.features)

    def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor]:
        """Return (features, label) at index.

        Args:
            index: Sample index.

        Returns:
            Tuple of (feature_vector, label).
        """
        return self.features[index], self.labels[index]


# Split first, then create datasets (scaler fitted on train only)
X_train, X_test, y_train, y_test = train_test_split(
    housing_X, housing_y, test_size=0.2, random_state=SEED,
)

train_tab_ds = TabularDataset(X_train, y_train, fit_scaler=True)
test_tab_ds = TabularDataset(X_test, y_test, scaler=train_tab_ds.scaler)

print(f'Train dataset: {len(train_tab_ds)} samples')
print(f'Test dataset:  {len(test_tab_ds)} samples')
sample_x, sample_y = train_tab_ds[0]
print(f'Features (scaled): mean={sample_x.mean():.3f}, std={sample_x.std():.3f}')
print(f'Label: {sample_y.item():.4f}')

---## Part 2 — Putting It All Together: DataPipeline ClassNow we combine Dataset, transforms, and splitting into a reusable`DataPipeline` class that produces ready-to-use DataLoaders.

In [None]:
class DataPipeline:
    """Complete data pipeline: split → transform → DataLoader.

    Handles the full workflow from raw data to training-ready DataLoaders
    with proper splitting and reproducibility.

    Attributes:
        train_loader: DataLoader for training set.
        val_loader: DataLoader for validation set.
        test_loader: DataLoader for test set.
        train_set: Training Dataset.
        val_set: Validation Dataset.
        test_set: Test Dataset.
    """

    def __init__(
        self,
        dataset: Dataset,
        batch_size: int = 64,
        val_ratio: float = 0.1,
        test_ratio: float = 0.1,
        seed: int = SEED,
        num_workers: int = 0,
        pin_memory: bool = False,
    ) -> None:
        """Create train/val/test splits and DataLoaders.

        Args:
            dataset: Full dataset to split.
            batch_size: Samples per batch.
            val_ratio: Fraction for validation.
            test_ratio: Fraction for test.
            seed: Random seed for reproducible splits.
            num_workers: Number of data loading workers.
            pin_memory: Whether to pin memory for GPU transfer.
        """
        train_ratio = 1.0 - val_ratio - test_ratio
        generator = torch.Generator().manual_seed(seed)

        self.train_set, self.val_set, self.test_set = random_split(
            dataset, [train_ratio, val_ratio, test_ratio], generator=generator,
        )

        self.train_loader = DataLoader(
            self.train_set, batch_size=batch_size, shuffle=True,
            num_workers=num_workers, pin_memory=pin_memory,
        )
        self.val_loader = DataLoader(
            self.val_set, batch_size=batch_size, shuffle=False,
            num_workers=num_workers, pin_memory=pin_memory,
        )
        self.test_loader = DataLoader(
            self.test_set, batch_size=batch_size, shuffle=False,
            num_workers=num_workers, pin_memory=pin_memory,
        )

    def summary(self) -> pd.DataFrame:
        """Return a summary DataFrame of split sizes.

        Returns:
            DataFrame with split names, sizes, ratios, and batch counts.
        """
        total = len(self.train_set) + len(self.val_set) + len(self.test_set)
        rows = []
        for name, ds, loader in [
            ('Train', self.train_set, self.train_loader),
            ('Val', self.val_set, self.val_loader),
            ('Test', self.test_set, self.test_loader),
        ]:
            rows.append({
                'Split': name,
                'Samples': len(ds),
                'Ratio': f'{len(ds)/total:.1%}',
                'Batches': len(loader),
            })
        return pd.DataFrame(rows)


# Sanity check with FashionMNIST
pipeline = DataPipeline(
    fashion_train, batch_size=BATCH_SIZE,
    val_ratio=0.1, test_ratio=0.1,
    num_workers=NUM_WORKERS, pin_memory=PIN_MEMORY,
)

print('=== DataPipeline Summary ===')
print(pipeline.summary().to_string(index=False))
print()

# Verify batch shapes
batch_x, batch_y = next(iter(pipeline.train_loader))
assert batch_x.shape[0] == BATCH_SIZE, f'Expected batch size {BATCH_SIZE}, got {batch_x.shape[0]}'
assert batch_x.shape[1:] == (1, 28, 28), f'Expected image shape (1,28,28), got {batch_x.shape[1:]}'
print(f'Batch features shape: {batch_x.shape}')
print(f'Batch labels shape: {batch_y.shape}')
print(f'Batch features range: [{batch_x.min():.3f}, {batch_x.max():.3f}]')

Let's verify that splits are reproducible across different pipeline instances.

In [None]:
def verify_split_reproducibility() -> None:
    """Verify that DataPipeline produces identical splits with the same seed."""
    small_data = TensorDataset(
        torch.randn(100, 5),
        torch.randint(0, 3, (100,)),
    )

    pipe1 = DataPipeline(small_data, batch_size=16, seed=SEED)
    pipe2 = DataPipeline(small_data, batch_size=16, seed=SEED)
    pipe3 = DataPipeline(small_data, batch_size=16, seed=42)  # Different seed

    # Compare train indices
    indices1 = pipe1.train_set.indices
    indices2 = pipe2.train_set.indices
    indices3 = pipe3.train_set.indices

    print(f'Same seed (SEED vs SEED): indices match = {indices1 == indices2}')
    print(f'Diff seed (SEED vs 42):   indices match = {indices1 == indices3}')
    print(f'Train sizes: {len(indices1)}, {len(indices2)}, {len(indices3)}')


verify_split_reproducibility()

---## Part 3 — Application: Real-World Data Loading PatternsNow we apply these concepts to realistic scenarios that appear throughoutthe course.

### 3.1 TensorDataset: The Quick PathWhen your data is already in tensors or NumPy arrays, `TensorDataset` is aone-liner alternative to writing a custom Dataset class.

In [None]:
def demonstrate_tensor_dataset() -> None:
    """Show TensorDataset as a quick alternative to custom Datasets."""
    # Convert housing data to tensors
    X_tensor = torch.tensor(housing_X, dtype=torch.float32)
    y_tensor = torch.tensor(housing_y, dtype=torch.float32)

    # One-liner Dataset
    tensor_ds = TensorDataset(X_tensor, y_tensor)
    print(f'TensorDataset length: {len(tensor_ds)}')

    sample = tensor_ds[0]
    print(f'Sample: {len(sample)} elements')
    print(f'  Features: shape={sample[0].shape}, dtype={sample[0].dtype}')
    print(f'  Label: {sample[1].item():.4f}')
    print()

    # Use with DataLoader directly
    loader = DataLoader(tensor_ds, batch_size=32, shuffle=True)
    batch_x, batch_y = next(iter(loader))
    print(f'Batch from TensorDataset:')
    print(f'  Features: {batch_x.shape}')
    print(f'  Labels: {batch_y.shape}')
    print()

    # Compare: NumpyDataset vs TensorDataset (identical behavior)
    manual_ds = NumpyDataset(housing_X, housing_y)
    manual_sample = manual_ds[0]
    tensor_sample = tensor_ds[0]
    print('NumpyDataset vs TensorDataset:')
    print(f'  Features match: {torch.allclose(manual_sample[0], tensor_sample[0])}')
    print(f'  Labels match: {torch.allclose(manual_sample[1], tensor_sample[1])}')


demonstrate_tensor_dataset()

### 3.2 Working with Official Train/Test SplitsMany datasets (CIFAR-10, MNIST, etc.) come with predefined train/test splits.The standard approach is to use the official splits and only split thetraining set into train/val.

In [None]:
def demonstrate_official_splits() -> None:
    """Show the standard pattern for datasets with official train/test splits."""
    # FashionMNIST has official train (60K) and test (10K) splits
    full_train = torchvision.datasets.FashionMNIST(
        root=DATA_DIR, train=True, download=True, transform=T.ToTensor(),
    )
    test_set = torchvision.datasets.FashionMNIST(
        root=DATA_DIR, train=False, download=True, transform=T.ToTensor(),
    )

    # Split official training set into train/val (90/10)
    generator = torch.Generator().manual_seed(SEED)
    train_size = int(0.9 * len(full_train))
    val_size = len(full_train) - train_size
    train_set, val_set = random_split(
        full_train, [train_size, val_size], generator=generator,
    )

    # Create DataLoaders
    train_loader = DataLoader(
        train_set, batch_size=BATCH_SIZE, shuffle=True,
        num_workers=NUM_WORKERS, pin_memory=PIN_MEMORY,
    )
    val_loader = DataLoader(
        val_set, batch_size=BATCH_SIZE, shuffle=False,
        num_workers=NUM_WORKERS, pin_memory=PIN_MEMORY,
    )
    test_loader = DataLoader(
        test_set, batch_size=BATCH_SIZE, shuffle=False,
        num_workers=NUM_WORKERS, pin_memory=PIN_MEMORY,
    )

    summary = pd.DataFrame({
        'Split': ['Train', 'Val', 'Test'],
        'Source': ['Official train (90%)', 'Official train (10%)', 'Official test'],
        'Samples': [len(train_set), len(val_set), len(test_set)],
        'Batches': [len(train_loader), len(val_loader), len(test_loader)],
    })
    print('=== Official Splits Pattern ===')
    print(summary.to_string(index=False))
    print()

    # Verify no overlap
    train_indices = set(train_set.indices)
    val_indices = set(val_set.indices)
    overlap = train_indices & val_indices
    print(f'Train/Val overlap: {len(overlap)} samples (should be 0)')


demonstrate_official_splits()

### 3.3 Subset, Concat, and Dataset CompositionPyTorch provides utilities for manipulating datasets without copying data:- **`Subset`** — select a subset of indices (useful for debugging with small data)- **`ConcatDataset`** — combine multiple datasets end-to-end

In [None]:
def demonstrate_dataset_composition() -> None:
    """Show Subset and ConcatDataset for dataset manipulation."""
    full_dataset = torchvision.datasets.FashionMNIST(
        root=DATA_DIR, train=True, download=True, transform=T.ToTensor(),
    )

    # Subset — quick debugging with small data
    debug_indices = list(range(100))
    debug_dataset = Subset(full_dataset, debug_indices)
    print(f'Full dataset: {len(full_dataset)} → Debug subset: {len(debug_dataset)}')

    # Class-specific subset
    class_0_indices = [i for i, (_, label) in enumerate(full_dataset) if label == 0]
    class_0_dataset = Subset(full_dataset, class_0_indices[:500])
    print(f'Class 0 (T-shirt) subset: {len(class_0_dataset)} samples')
    print()

    # ConcatDataset — combine train and test for cross-validation
    test_dataset = torchvision.datasets.FashionMNIST(
        root=DATA_DIR, train=False, download=True, transform=T.ToTensor(),
    )
    combined = ConcatDataset([full_dataset, test_dataset])
    print(f'Train ({len(full_dataset)}) + Test ({len(test_dataset)}) = Combined ({len(combined)})')

    # Verify access still works
    img, label = combined[0]
    print(f'First sample: shape={img.shape}, label={label}')
    img_from_test, label_from_test = combined[len(full_dataset)]
    print(f'First test sample (via concat): shape={img_from_test.shape}, label={label_from_test}')


demonstrate_dataset_composition()

### 3.4 Custom Collate FunctionsThe default `collate_fn` stacks samples into uniform batches. But sometimessamples have different sizes (e.g., variable-length text). A custom collatefunction handles this by padding sequences to the maximum length in the batch.

In [None]:
class VariableLengthDataset(Dataset):
    """Dataset with variable-length sequences (simulating text data).

    Attributes:
        sequences: List of 1D tensors with different lengths.
        labels: Tensor of integer labels.
    """

    def __init__(self, num_samples: int = 200, max_length: int = 50) -> None:
        """Generate random variable-length sequences.

        Args:
            num_samples: Number of sequences to generate.
            max_length: Maximum sequence length.
        """
        rng = np.random.RandomState(SEED)
        lengths = rng.randint(5, max_length + 1, size=num_samples)
        self.sequences = [
            torch.randint(0, 100, (length,)) for length in lengths
        ]
        self.labels = torch.randint(0, 3, (num_samples,))

    def __len__(self) -> int:
        """Return number of sequences."""
        return len(self.sequences)

    def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor]:
        """Return (sequence, label) at index.

        Args:
            index: Sample index.

        Returns:
            Tuple of (sequence_tensor, label_tensor).
        """
        return self.sequences[index], self.labels[index]


def pad_collate_fn(
    batch: list[tuple[torch.Tensor, torch.Tensor]],
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """Collate variable-length sequences by padding to max length.

    Args:
        batch: List of (sequence, label) tuples.

    Returns:
        Tuple of (padded_sequences, labels, lengths).
    """
    sequences, labels = zip(*batch)
    lengths = torch.tensor([len(seq) for seq in sequences])

    # Pad all sequences to the longest in this batch
    max_len = lengths.max().item()
    padded = torch.zeros(len(sequences), max_len, dtype=sequences[0].dtype)
    for i, seq in enumerate(sequences):
        padded[i, :len(seq)] = seq

    return padded, torch.stack(list(labels)), lengths


# Test variable-length loading
var_dataset = VariableLengthDataset(num_samples=200, max_length=50)
print(f'Sample lengths: {[len(var_dataset[i][0]) for i in range(5)]}')

var_loader = DataLoader(
    var_dataset, batch_size=8, shuffle=False, collate_fn=pad_collate_fn,
)

padded_batch, label_batch, length_batch = next(iter(var_loader))
print(f'\nPadded batch shape: {padded_batch.shape}')
print(f'Labels shape: {label_batch.shape}')
print(f'Actual lengths: {length_batch.tolist()}')
print(f'Max length in batch: {length_batch.max().item()}')

### 3.5 Library Comparison: Our Implementation vs PyTorch DataLoaderLet's verify that our `SimpleDataLoader` produces the same results asPyTorch's official `DataLoader` and compare their performance.

In [None]:
def compare_dataloaders() -> None:
    """Compare SimpleDataLoader vs PyTorch DataLoader."""
    test_data = TensorDataset(
        torch.randn(1000, 8),
        torch.randint(0, 5, (1000,), dtype=torch.float32),
    )

    # Our implementation
    ours = SimpleDataLoader(test_data, batch_size=64, shuffle=False)

    # PyTorch's implementation
    official = DataLoader(test_data, batch_size=64, shuffle=False)

    # Compare outputs (no shuffling for deterministic comparison)
    all_match = True
    for (our_x, our_y), (off_x, off_y) in zip(ours, official):
        if not (torch.allclose(our_x, off_x) and torch.allclose(our_y, off_y)):
            all_match = False
            break

    print(f'Output match (no shuffle): {all_match}')
    print(f'Number of batches — ours: {len(ours)}, official: {len(official)}')
    print()

    # Performance comparison
    n_iters = 50

    start = time.perf_counter()
    for _ in range(n_iters):
        for batch in ours:
            _ = batch  # Consume batch (timing only)
    our_time = time.perf_counter() - start

    start = time.perf_counter()
    for _ in range(n_iters):
        for batch in official:
            _ = batch  # Consume batch (timing only)
    official_time = time.perf_counter() - start

    results = pd.DataFrame({
        'Implementation': ['SimpleDataLoader (ours)', 'PyTorch DataLoader'],
        f'Time ({n_iters} epochs)': [f'{our_time:.3f}s', f'{official_time:.3f}s'],
        'Per Epoch': [f'{our_time/n_iters*1000:.1f}ms', f'{official_time/n_iters*1000:.1f}ms'],
    })
    print('=== Performance Comparison ===')
    print(results.to_string(index=False))
    print()
    print('PyTorch DataLoader is optimized in C++ with multiprocessing support.')
    print('Always use it in practice — our SimpleDataLoader is for learning only.')


compare_dataloaders()

---## Part 4 — Evaluation & AnalysisLet's analyze the DataLoader's behavior in detail — batch size effects,drop_last behavior, and performance characteristics.

### 4.1 Batch Size AnalysisBatch size affects both training dynamics and performance. Let's measurethe loading overhead for different batch sizes.

In [None]:
def analyze_batch_sizes() -> None:
    """Analyze how batch size affects loading performance and memory."""
    dataset = torchvision.datasets.FashionMNIST(
        root=DATA_DIR, train=True, download=True, transform=T.ToTensor(),
    )

    batch_sizes = [16, 32, 64, 128, 256, 512]
    results = []

    for bs in batch_sizes:
        loader = DataLoader(dataset, batch_size=bs, shuffle=True, num_workers=0)
        num_batches = len(loader)

        # Measure one epoch loading time
        start = time.perf_counter()
        for batch_x, batch_y in loader:
            _ = batch_x  # Consume batch (timing only)
        elapsed = time.perf_counter() - start

        # Last batch size
        last_batch_size = len(dataset) % bs if len(dataset) % bs != 0 else bs

        results.append({
            'Batch Size': bs,
            'Num Batches': num_batches,
            'Last Batch': last_batch_size,
            'Load Time': f'{elapsed:.3f}s',
            'Per Batch': f'{elapsed/num_batches*1000:.2f}ms',
        })

    results_df = pd.DataFrame(results)
    print('=== Batch Size Analysis (FashionMNIST, 60K samples) ===')
    print(results_df.to_string(index=False))
    print()

    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))

    load_times = [float(r['Load Time'].rstrip('s')) for r in results]
    axes[0].plot(batch_sizes, load_times, 'o-', color='#1E88E5', linewidth=2)
    axes[0].set_xlabel('Batch Size')
    axes[0].set_ylabel('Load Time (s)')
    axes[0].set_title('Epoch Loading Time vs Batch Size')
    axes[0].grid(True, alpha=0.3)
    axes[0].set_xscale('log', base=2)

    num_batches_list = [r['Num Batches'] for r in results]
    axes[1].bar(range(len(batch_sizes)), num_batches_list, color='#43A047', alpha=0.7)
    axes[1].set_xticks(range(len(batch_sizes)))
    axes[1].set_xticklabels(batch_sizes)
    axes[1].set_xlabel('Batch Size')
    axes[1].set_ylabel('Number of Batches')
    axes[1].set_title('Batches per Epoch vs Batch Size')
    for i, nb in enumerate(num_batches_list):
        axes[1].text(i, nb + 10, str(nb), ha='center', fontsize=8)

    plt.tight_layout()
    plt.show()


analyze_batch_sizes()

### 4.2 drop_last BehaviorThe `drop_last` parameter controls whether the last incomplete batch isincluded or discarded. This matters for:- **BatchNorm layers** — need consistent batch sizes to compute statistics- **Distributed training** — all processes need the same number of batches

In [None]:
def analyze_drop_last() -> None:
    """Show the effect of drop_last on batch counts and data usage."""
    dataset_size = 100
    test_ds = TensorDataset(
        torch.randn(dataset_size, 5),
        torch.randint(0, 3, (dataset_size,)),
    )

    batch_sizes = [16, 32, 64]
    results = []

    for bs in batch_sizes:
        loader_keep = DataLoader(test_ds, batch_size=bs, drop_last=False)
        loader_drop = DataLoader(test_ds, batch_size=bs, drop_last=True)

        keep_sizes = [len(batch[0]) for batch in loader_keep]
        drop_sizes = [len(batch[0]) for batch in loader_drop]

        results.append({
            'Batch Size': bs,
            'drop_last=False': f'{len(keep_sizes)} batches, last={keep_sizes[-1]}',
            'Samples Used (keep)': sum(keep_sizes),
            'drop_last=True': f'{len(drop_sizes)} batches, all={bs}',
            'Samples Used (drop)': sum(drop_sizes),
            'Samples Lost': sum(keep_sizes) - sum(drop_sizes),
        })

    results_df = pd.DataFrame(results)
    print(f'=== drop_last Analysis (dataset_size={dataset_size}) ===')
    print(results_df.to_string(index=False))
    print()
    print('Rule of thumb:')
    print('  - Training: drop_last=True if using BatchNorm or distributed training')
    print('  - Validation/Test: always drop_last=False (evaluate every sample)')


analyze_drop_last()

### 4.3 PyTorch Dataset Ecosystem ReferencePyTorch provides hundreds of built-in datasets across three libraries.Here's a reference of the ones used in this course.

In [None]:
def build_dataset_reference() -> None:
    """Create a comprehensive reference of PyTorch dataset ecosystem."""
    reference = pd.DataFrame({
        'Library': [
            'torchvision', 'torchvision', 'torchvision', 'torchvision',
            'torchvision', 'torchvision', 'torchvision',
            'torchtext', 'torchtext', 'torchtext',
            'torchaudio', 'sklearn',
        ],
        'Dataset': [
            'MNIST', 'FashionMNIST', 'CIFAR-10', 'CIFAR-100',
            'STL10', 'CelebA', 'VOCDetection',
            'AG_NEWS', 'WikiText-2', 'SST-2',
            'SPEECHCOMMANDS', 'California Housing',
        ],
        'Task': [
            'Digit classification', 'Clothing classification',
            'Image classification', 'Fine-grained classification',
            'Self-supervised learning', 'Face attributes',
            'Object detection',
            'News classification', 'Language modeling',
            'Sentiment analysis',
            'Audio classification', 'Housing price regression',
        ],
        'Samples': [
            '70K', '70K', '60K', '60K',
            '13K', '202K', '~17K',
            '120K', '~2M tokens', '68K',
            '105K', '20.6K',
        ],
        'Used In': [
            'Mod 3,5,11', 'Mod 1,5,6', 'Mod 6,9,11', 'Mod 9',
            'Mod 11', 'Mod 11', 'Mod 9',
            'Mod 7,10', 'Mod 7,8,17', 'Mod 10,13',
            'Mod 12,19', 'Mod 1,2,4,19',
        ],
    })
    print('=== PyTorch Dataset Ecosystem ===')
    print(reference.to_string(index=False))
    print()

    # DataLoader parameter reference
    params = pd.DataFrame({
        'Parameter': [
            'batch_size', 'shuffle', 'num_workers', 'pin_memory',
            'drop_last', 'collate_fn', 'generator', 'sampler',
        ],
        'Default': [
            '1', 'False', '0', 'False',
            'False', 'default', 'None', 'None',
        ],
        'Course Standard': [
            '64', 'True (train only)', '0 (Colab)', 'cuda.is_available()',
            'False', 'default', 'Generator(SEED)', 'None',
        ],
        'Note': [
            'Power of 2 recommended', 'Never shuffle val/test',
            '0 for Colab, 4+ for local', 'Faster GPU transfer',
            'True for BatchNorm', 'Custom for var-length',
            'For reproducible shuffling', 'For custom sampling',
        ],
    })
    print('=== DataLoader Parameters ===')
    print(params.to_string(index=False))


build_dataset_reference()

### 4.4 Error Analysis: Common Data Loading MistakesData loading bugs are subtle and can silently degrade training. Let'sdemonstrate the most common mistakes and how to detect them.

In [None]:
def demonstrate_common_mistakes() -> None:
    """Show common data loading mistakes and their consequences."""
    print('=== Common Data Loading Mistakes ===')
    print()

    # Mistake 1: Fitting scaler on test data (data leakage)
    print('1. DATA LEAKAGE: Fitting scaler on all data (including test)')
    scaler_wrong = StandardScaler()
    X_all_scaled = scaler_wrong.fit_transform(housing_X)  # BAD: includes test data

    scaler_right = StandardScaler()
    X_train_scaled = scaler_right.fit_transform(X_train)   # GOOD: train only
    X_test_scaled = scaler_right.transform(X_test)          # Transform test with train stats

    print(f'  Wrong (fit on all):   test mean = {X_all_scaled[-len(X_test):].mean():.4f}')
    print(f'  Right (fit on train): test mean = {X_test_scaled.mean():.4f}')
    print(f'  The "right" way has non-zero test mean — this is correct!')
    print()

    # Mistake 2: Forgetting to shuffle training data
    print('2. FORGETTING TO SHUFFLE: Model sees same order every epoch')
    sorted_data = TensorDataset(
        torch.randn(100, 5),
        torch.tensor([0]*50 + [1]*50, dtype=torch.long),  # Sorted labels
    )
    loader_noshuffle = DataLoader(sorted_data, batch_size=20, shuffle=False)
    first_batch_labels = next(iter(loader_noshuffle))[1]
    print(f'  No shuffle — first batch labels: {first_batch_labels.tolist()}')
    print(f'  All zeros! Model sees pure class-0 batches then pure class-1.')

    loader_shuffle = DataLoader(sorted_data, batch_size=20, shuffle=True)
    first_batch_labels_s = next(iter(loader_shuffle))[1]
    print(f'  Shuffled — first batch labels:   {first_batch_labels_s.tolist()}')
    print(f'  Mixed! Each batch has representative class distribution.')
    print()

    # Mistake 3: Wrong normalization values
    print('3. WRONG NORMALIZATION: Using ImageNet stats for non-ImageNet data')
    imagenet_mean, imagenet_std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
    fashion_mean, fashion_std = [0.2860], [0.3530]
    print(f'  ImageNet stats (3-channel RGB): mean={imagenet_mean}, std={imagenet_std}')
    print(f'  FashionMNIST stats (1-channel): mean={fashion_mean}, std={fashion_std}')
    print(f'  Using ImageNet stats on FashionMNIST would shift the distribution!')
    print()

    # Mistake 4: Augmenting validation/test data
    print('4. AUGMENTING VAL/TEST: Apply augmentation only to training data')
    train_tf = T.Compose([
        T.RandomHorizontalFlip(),
        T.RandomRotation(10),
        T.ToTensor(),
        T.Normalize(mean=[0.2860], std=[0.3530]),
    ])
    val_tf = T.Compose([
        T.ToTensor(),
        T.Normalize(mean=[0.2860], std=[0.3530]),
    ])
    print(f'  Train transforms: {len(train_tf.transforms)} steps (with augmentation)')
    print(f'  Val transforms:   {len(val_tf.transforms)} steps (no augmentation)')

    # Summary
    print()
    mistakes_df = pd.DataFrame({
        'Mistake': [
            'Fit scaler on all data',
            'Forget to shuffle training',
            'Wrong normalization stats',
            'Augment validation data',
            'Shuffle validation/test',
        ],
        'Consequence': [
            'Data leakage → inflated metrics',
            'Ordered batches → poor gradients',
            'Shifted distribution → slow convergence',
            'Noisy evaluation → unreliable metrics',
            'Non-reproducible evaluation',
        ],
        'Fix': [
            'Fit on train, transform test',
            'shuffle=True for train loader',
            'Compute stats from your dataset',
            'Separate train/val transforms',
            'shuffle=False for val/test',
        ],
    })
    print('=== Data Loading Mistakes Summary ===')
    print(mistakes_df.to_string(index=False))


demonstrate_common_mistakes()

### 4.5 Computing Dataset StatisticsTo use `Normalize` correctly, you need to compute the mean and standarddeviation of your training data. Here's the standard approach.

In [None]:
def compute_dataset_statistics(dataset: Dataset) -> tuple[list[float], list[float]]:
    """Compute channel-wise mean and std for an image dataset.

    Uses Welford's online algorithm to avoid loading all images into memory.

    Args:
        dataset: Image dataset where __getitem__ returns (image_tensor, label).

    Returns:
        Tuple of (channel_means, channel_stds).
    """
    loader = DataLoader(dataset, batch_size=256, shuffle=False, num_workers=0)

    n_channels = dataset[0][0].shape[0]
    pixel_sum = torch.zeros(n_channels)
    pixel_sq_sum = torch.zeros(n_channels)
    n_pixels = 0

    for images, _ in loader:
        batch_size = images.shape[0]
        n_per_image = images.shape[2] * images.shape[3]

        # Sum over batch, height, width — keep channels
        pixel_sum += images.sum(dim=[0, 2, 3])
        pixel_sq_sum += (images ** 2).sum(dim=[0, 2, 3])
        n_pixels += batch_size * n_per_image

    mean = pixel_sum / n_pixels
    std = torch.sqrt(pixel_sq_sum / n_pixels - mean ** 2)

    return mean.tolist(), std.tolist()


# Compute stats for FashionMNIST
raw_fashion = torchvision.datasets.FashionMNIST(
    root=DATA_DIR, train=True, download=True, transform=T.ToTensor(),
)

means, stds = compute_dataset_statistics(raw_fashion)
print(f'FashionMNIST statistics:')
print(f'  Mean: {[f"{m:.4f}" for m in means]}')
print(f'  Std:  {[f"{s:.4f}" for s in stds]}')
print()
print('Use these values in T.Normalize():')
print(f'  T.Normalize(mean={[round(m, 4) for m in means]}, std={[round(s, 4) for s in stds]})')

### 4.6 End-to-End Training DemoLet's use our DataPipeline to train a simple model, demonstrating thecomplete workflow from data loading to training curves.

In [None]:
def end_to_end_training_demo() -> None:
    """Demonstrate complete pipeline from DataLoader to training curves."""
    # Prepare tabular data pipeline
    full_ds = TensorDataset(
        torch.tensor(housing_X, dtype=torch.float32),
        torch.tensor(housing_y, dtype=torch.float32),
    )

    generator = torch.Generator().manual_seed(SEED)
    train_ds, val_ds, test_ds = random_split(
        full_ds, [0.8, 0.1, 0.1], generator=generator,
    )

    train_loader = DataLoader(train_ds, batch_size=64, shuffle=True, num_workers=0)
    val_loader = DataLoader(val_ds, batch_size=64, shuffle=False, num_workers=0)
    test_loader = DataLoader(test_ds, batch_size=64, shuffle=False, num_workers=0)

    print(f'Train: {len(train_ds)}, Val: {len(val_ds)}, Test: {len(test_ds)}')

    # Simple linear model
    model = nn.Sequential(
        nn.Linear(8, 32),
        nn.ReLU(),
        nn.Linear(32, 16),
        nn.ReLU(),
        nn.Linear(16, 1),
    )
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

    # Training loop
    num_epochs = 20
    train_losses: list[float] = []
    val_losses: list[float] = []

    for epoch in range(num_epochs):
        # Train
        model.train()
        epoch_loss = 0.0
        n_samples = 0
        for batch_x, batch_y in train_loader:
            optimizer.zero_grad()
            predictions = model(batch_x).squeeze()
            loss = criterion(predictions, batch_y)
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item() * batch_x.size(0)
            n_samples += batch_x.size(0)
        train_losses.append(epoch_loss / n_samples)

        # Validate
        model.eval()
        val_loss = 0.0
        val_samples = 0
        with torch.no_grad():
            for batch_x, batch_y in val_loader:
                predictions = model(batch_x).squeeze()
                loss = criterion(predictions, batch_y)
                val_loss += loss.item() * batch_x.size(0)
                val_samples += batch_x.size(0)
        val_losses.append(val_loss / val_samples)

        if (epoch + 1) % 5 == 0:
            print(f'Epoch {epoch+1}/{num_epochs} | '
                  f'Train Loss: {train_losses[-1]:.4f} | '
                  f'Val Loss: {val_losses[-1]:.4f}')

    # Plot training curves
    fig, ax = plt.subplots(figsize=(8, 4))
    ax.plot(range(1, num_epochs + 1), train_losses, label='Train Loss', linewidth=1.5)
    ax.plot(range(1, num_epochs + 1), val_losses, label='Val Loss', linewidth=1.5)
    best_epoch = np.argmin(val_losses) + 1
    ax.axvline(best_epoch, color='gray', linestyle='--', alpha=0.5,
               label=f'Best: epoch {best_epoch}')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('MSE Loss')
    ax.set_title('Training Curves — DataLoader → Model → Loss')
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

    # Test evaluation
    model.eval()
    test_loss = 0.0
    test_samples = 0
    with torch.no_grad():
        for batch_x, batch_y in test_loader:
            predictions = model(batch_x).squeeze()
            loss = criterion(predictions, batch_y)
            test_loss += loss.item() * batch_x.size(0)
            test_samples += batch_x.size(0)
    test_mse = test_loss / test_samples
    print(f'\nTest MSE: {test_mse:.4f}')
    print(f'Test RMSE: {np.sqrt(test_mse):.4f}')


end_to_end_training_demo()

---## Part 5 — Summary & Lessons Learned### Key Takeaways1. **Dataset defines data access, DataLoader handles batching.** Always subclass   `Dataset` with `__len__` + `__getitem__`, then wrap with `DataLoader` for   automatic batching, shuffling, and parallel loading.2. **Transforms are composable preprocessing.** Use `T.Compose([...])` to chain   transforms. Apply augmentation only to training data, not validation/test.3. **Always shuffle training, never shuffle evaluation.** Shuffling prevents the   model from learning data order. Use `Generator().manual_seed(SEED)` for   reproducible shuffling.4. **Fit preprocessing on training data only.** Scalers, normalizers, and   statistics must be computed from training data and applied to val/test.   Fitting on all data causes data leakage.5. **Use official splits when available.** Datasets like CIFAR-10 and FashionMNIST   have standard train/test splits. Split only the training portion into train/val.### What's Next→ **01-06 (Linear Algebra for Machine Learning)** covers eigendecomposition, SVD,  and low-rank approximation — the mathematical tools behind PCA, dimensionality  reduction, and matrix factorization methods.### Going Further- [PyTorch Data Loading Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) —  Official tutorial with more advanced patterns- [Writing Custom Datasets](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) —  PyTorch basics on Dataset and DataLoader