# 卷积神经网络

卷积神经网络（Convolutional Neural Network，简称 CNN）是一类专门用于处理具有网格结构数据的深度学习模型，最典型的应用场景是图像处理与计算机视觉领域。CNN 通过模拟人类视觉皮层的工作方式，自动从原始数据中提取有用特征，从而实现分类、检测、分割等任务。

CNN 的核心组成部分包括卷积层、池化层和全连接层，通常按以下方式堆叠组成网络：
* 卷积层（Convolutional Layer）：
卷积层是 CNN 的核心，通过若干可学习的卷积核（滤波器）在输入数据上滑动，进行局部感受野的特征提取。每个卷积核可以检测到输入中的某种局部特征（如边缘、纹理等）。卷积操作能够保留输入的空间结构信息，并且参数共享大幅减少了模型参数数量。

* 激活函数（Activation Function）：
在卷积层之后通常会跟随非线性激活函数，如 ReLU（Rectified Linear Unit），它能够引入非线性特性，帮助网络学习更复杂的模式。

* 池化层（Pooling Layer）：
池化层用于对卷积层提取的特征进行下采样，减少特征图的尺寸和计算量，同时增强特征的平移不变性。常见的池化方法有最大池化（Max Pooling）和平均池化（Average Pooling）。

* 全连接层（Fully Connected Layer）：
在网络的后端，通常使用一到多个全连接层将前面提取的高层次特征映射到最终的输出空间，例如分类的各个类别。全连接层类似于传统的神经网络层。

CNN中新增了 Convolution 层和 Pooling 层。 CNN的层的连接顺序是“Convolution - ReLU - (Pooling)”（Pooling 层有时会被省略）。这可以理解为之前的“Affine - ReLU” 连接被替换成了“Convolution ReLU - (Pooling)”连接。

## 卷积层
* **卷积核**（Kernel）/ **滤波器**（Filter）：卷积层使用的小型权重矩阵，用于在输入数据上滑动并提取局部特征。每个卷积核可以看作一个特征探测器，用于识别特定的模式（如边缘、纹理等）。
* **步幅**（Stride）：卷积核在输入数据上滑动的步长。步幅决定了卷积操作的移动速度，影响输出特征图的尺寸。较大的步幅会导致输出尺寸减小。
* **填充**（Padding）：在输入数据的边界周围添加额外的像素（通常填充为零），以控制输出特征图的空间尺寸和保留边缘信息。常见的填充方式有“same”填充（输出尺寸与输入相同）和“valid”填充（不填充，使输出尺寸缩小）。
* **感受野**（Receptive Field）：指卷积神经网络中某个神经元能“看到”的输入区域大小。感受野随着网络层数的增加而增大，决定了神经元能够捕捉的上下文范围。
* **通道**（Channel）：输入或输出特征图的深度维度。比如，彩色图像有3个通道（RGB）。卷积核的深度通常与输入通道数相同，以便在所有通道上进行卷积运算。
* **特征图**（Feature Map）：卷积操作后得到的输出，即激活图。特征图反映了输入数据中被卷积核检测到的特征和模式。

在卷积操作中，输出特征图的空间尺寸（高度和宽度）由输入尺寸、滤波器尺寸、填充和步幅共同决定。给定：
* 输入尺寸：$H \times W$
* 滤波器尺寸：$F_H \times F_W$
* 填充（每边添加的像素数）：$P$
* 步幅：$S$

输出尺寸 $(O_H, O_W)$ 的计算公式为：

$$
O_H = \left\lfloor \frac{H + 2P - F_H}{S} \right\rfloor + 1
$$

$$
O_W = \left\lfloor \frac{W + 2P - F_W}{S} \right\rfloor + 1
$$


公式推导说明：
1. 填充：填充 P 表示在输入的高度和宽度两边各添加 $P$ 行和 $P$ 列，从而有效输入尺寸变为 $H + 2P$ 和 $W + 2P$。
2. 有效滑动次数：滤波器在填充后的输入上滑动。滤波器尺寸为 $F_H \times F_W$，每移动一步高度或宽度增加 S，直到滤波器刚好能放置在输入边界内。
    * 在高度方向，可移动次数 $= \dfrac{(H + 2P) - F_H}{S} + 1$；
    * 在宽度方向，可移动次数 $= \dfrac{(W + 2P) - F_W}{S} + 1$。
3. 取整：由于滑动次数通常要求为整数，所以需要对上述结果取下整（floor），即取最大的整数值保证滤波器完全位于输入范围内。


示例说明：

假设：
* 输入尺寸 $H = 32, W = 32$
* 滤波器尺寸 $F_H = 5, F_W = 5$
* 填充 $P = 2$
* 步幅 $S = 1$

则计算输出尺寸：

$$
O_H = \left\lfloor \frac{32 + 2 \times 2 - 5}{1} \right\rfloor + 1 = \left\lfloor \frac{32 + 4 - 5}{1} \right\rfloor + 1 = \left\lfloor 31 \right\rfloor + 1 = 31 + 1 = 32
$$

$$
O_W = \left\lfloor \frac{32 + 2 \times 2 - 5}{1} \right\rfloor + 1 = 32
$$

## im2col

im2col 是一种常用的图像处理函数，特别是在卷积神经网络中进行高效卷积计算时。它的作用是将输入图像的局部块（如卷积核大小的子区域）展开为列向量。通过这种展开，卷积运算可以转化为矩阵乘法，从而利用 BLAS 库的优化。

下面是一个相对通用的 im2col 伪代码，假设我们有以下输入：
* Input: 形状 (N, C, H, W)
* N 表示批大小（batch size）
* C 表示通道数（channels）
* H, W 表示图像的高和宽
* Kernel: 卷积核大小 (kH, kW)
* stride: 步幅
* padding: 填充数量（四周填充同样大小）


function IM2COL(input, kernel_h, kernel_w, stride, pad):
    # 1. 获取输入大小 (N, C, H, W)
    N, C, H, W = input.shape

    # 2. 计算输出的高度和宽度
    out_h = (H + 2*pad - kernel_h) // stride + 1
    out_w = (W + 2*pad - kernel_w) // stride + 1

    # 3. 对输入进行 zero-padding（如果 pad > 0）
    #    创建一个新的 padded_input，形状 (N, C, H + 2*pad, W + 2*pad)
    padded_input = zero_pad(input, pad)

    # 4. 创建用于存放 im2col 结果的空数组
    #    每一个卷积核感受野的大小是 (C * kernel_h * kernel_w),
    #    每个输出位置对应一列，共 out_h * out_w 列。
    col = zeros((N, C * kernel_h * kernel_w, out_h * out_w))

    # 5. 双层循环, 遍历 kernel 的每个位置 (dy, dx)
    idx = 0
    for dy in range(kernel_h):
        for dx in range(kernel_w):
            # 当前 (dy, dx) 位置能滑动到多少个有效输出位置:
            #   纵向范围: [dy, dy + stride*out_h : stride]
            #   横向范围: [dx, dx + stride*out_w : stride]

            # (N, C, out_h, out_w)
            patch = padded_input[:, 
                                 :,
                                 dy : dy + stride*out_h : stride, 
                                 dx : dx + stride*out_w : stride]

            # 将 patch 形状从 (N, C, out_h, out_w) 拉直为 (N, C*out_h*out_w)
            patch = reshape(patch, (N, C, -1))  # -1 会自动计算

            # 将 patch 放到 col 的第 idx 行 (对应 kernel_h*kernel_w 中的第 idx 个)
            col[:, idx*C : (idx+1)*C, :] = patch  # 注意这里要适配索引

            idx += 1

    return col

In [10]:
import numpy as np

def zero_pad(x, pad):
    """
    对输入 x (N, C, H, W) 在 H 和 W 维度做零填充。
    pad 为整型时，表示在高和宽的两边各 pad。
    如果 pad=0，则直接返回 x。
    """
    if pad == 0:
        return x
    return np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant')

def im2col(input_data, kernel_h, kernel_w, stride=1, pad=0):
    """
    input_data: 形状 (N, C, H, W)
    kernel_h, kernel_w: 卷积核高宽
    stride: 步幅
    pad: 上下左右的零填充数量

    返回值 col: 形状 (N, C*kernel_h*kernel_w, out_h*out_w)
    """
    N, C, H, W = input_data.shape

    # 计算输出的高度和宽度
    out_h = (H + 2*pad - kernel_h) // stride + 1
    out_w = (W + 2*pad - kernel_w) // stride + 1

    # 对输入进行 padding
    padded_input = zero_pad(input_data, pad)

    # 分配 im2col 输出内存
    # 行数 = C*kernel_h*kernel_w；列数 = out_h*out_w
    col = np.zeros((N, C * kernel_h * kernel_w, out_h * out_w), dtype=input_data.dtype)

    out_col_idx = 0
    # 遍历卷积核的每个 (dy, dx) 位置
    for dy in range(kernel_h):
        for dx in range(kernel_w):
            # 提取在 padded_input 上的采样结果
            # shape: (N, C, out_h, out_w)
            patch = padded_input[:,
                                 :,
                                 dy : dy + stride*out_h : stride,
                                 dx : dx + stride*out_w : stride]
            
            # (N, C, out_h, out_w) -> (N, C, out_h*out_w)
            patch = patch.reshape(N, C, -1)

            # 存到 col 对应的位置
            # 注意：对每个 dy, dx，需要在「kernel_h*kernel_w 维度」上占据一块
            col[:, out_col_idx*C : (out_col_idx+1)*C, :] = patch
            
            out_col_idx += 1

    return col


# ==================
#   测试示例
# ==================
# ================ 测试代码 ================
if __name__ == "__main__":
    # 1. 构造一个简单输入
    x = np.arange(16).reshape(1, 1, 4, 4).astype(np.float32)
    # x: shape (1,1,4,4)
    # x 的内容:
    # [[ 0,  1,  2,  3],
    #  [ 4,  5,  6,  7],
    #  [ 8,  9, 10, 11],
    #  [12,13, 14, 15]]

    # 2. 构造一个 2x2 的卷积核 (out_channels=1, in_channels=1, kH=2, kW=2)
    #   这里随便给一些值，做个演示
    w = np.array([[[[1, 2],
                    [3, 4]]]], dtype=np.float32)
    # w.shape = (1, 1, 2, 2)
    
    # 如果有偏置 b (维度 = out_channels=1), 例如
    b = np.array([1.0], dtype=np.float32)  # 假设有一个偏置

    # 3. 先做 im2col
    col_x = im2col(x, kernel_h=2, kernel_w=2, stride=1, pad=0)
    # col_x: shape (N=1, C*kH*kW=4, out_h*out_w=9)
    # 也即 (1, 4, 9)

    # 4. 将卷积核 w 拉直, 形状 = (out_channels, in_channels*kH*kW) = (1, 4)
    #   因为 out_channels=1, in_channels=1, kH=2, kW=2 => 1*4 = 4
    w_flat = w.reshape(1, -1)  # (1,4)

    # 5. 对 im2col 的结果进行乘法
    #    由于 col_x.shape = (N, 4, 9)，为了和 (1,4) 做矩阵乘法，需把 batch 维度 N=1 放在外面处理
    #    对于每个 batch 条目，都可以做:   out = w_flat(1,4) @ col_x(4,9) = (1,9)
    #    再加上偏置 b
    #    最后 reshape 为 (1, 1, out_h, out_w) = (1,1,3,3)
    
    N, _, out_HW = col_x.shape  # N=1, _=4, out_HW=9
    out_h = out_w = int(np.sqrt(out_HW))  # = 3

    conv_out = np.zeros((N, 1, out_h, out_w), dtype=np.float32)  # (1,1,3,3)
    for i in range(N):
        # 取出第 i 个样本的 im2col: shape (4,9)
        sample_col = col_x[i]  # (4,9)
        # 卷积: (1,4) @ (4,9) = (1,9)
        out_vec = w_flat @ sample_col  # (1,9)
        # 加上偏置 b
        out_vec += b[0]  # out_vec shape 依旧 (1,9)
        # reshape -> (1,3,3)
        out_2d = out_vec.reshape(1, out_h, out_w)
        conv_out[i] = out_2d
    
    print("im2col 的结果 col_x.shape:", col_x.shape)
    print(col_x)

    print("\n卷积核 w:\n", w)
    print("卷积偏置 b:", b)

    print("\n卷积乘法后 conv_out.shape:", conv_out.shape)
    print(conv_out[0, 0])  # 展示 (3,3)

im2col 的结果 col_x.shape: (1, 4, 9)
[[[ 0.  1.  2.  4.  5.  6.  8.  9. 10.]
  [ 1.  2.  3.  5.  6.  7.  9. 10. 11.]
  [ 4.  5.  6.  8.  9. 10. 12. 13. 14.]
  [ 5.  6.  7.  9. 10. 11. 13. 14. 15.]]]

卷积核 w:
 [[[[1. 2.]
   [3. 4.]]]]
卷积偏置 b: [1.]

卷积乘法后 conv_out.shape: (1, 1, 3, 3)
[[ 35.  45.  55.]
 [ 75.  85.  95.]
 [115. 125. 135.]]


In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# ============ 1. 数据准备 ============

def load_mnist_data(batch_size=64):
    """
    返回 MNIST 训练集和测试集的 DataLoader
    """
    transform = transforms.Compose([
        transforms.ToTensor(), 
        transforms.Normalize((0.1307,), (0.3081,))  # MNIST 均值/方差
    ])

    train_dataset = torchvision.datasets.MNIST(
        root='./data', train=True, download=True, transform=transform)
    test_dataset = torchvision.datasets.MNIST(
        root='./data', train=False, download=True, transform=transform)

    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
    test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_loader, test_loader


# ============ 2. 定义网络结构 ============

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        
        # (1) 卷积层 1: 输入通道=1, 输出通道=32, kernel=3, padding=1
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, 
                               kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        
        # (2) 卷积层 2: 输入通道=32, 输出通道=64, kernel=3, padding=1
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, 
                               kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()

        # (3) 池化层: kernel=2, stride=2 -> (28x28)->(14x14)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # 计算池化后的特征图大小: 原图 28x28，池化后 14x14 (只做一次 pooling)。
        # 卷积通道=64 -> 平铺后维度 = 64 * 14 * 14 = 12544
        self.fc1 = nn.Linear(64 * 14 * 14, 128)
        self.relu_fc1 = nn.ReLU()

        # 最后一层分类输出
        self.fc2 = nn.Linear(128, num_classes)  # 10
        
    def forward(self, x):
        # 输入 x: shape = (N, 1, 28, 28)
        out = self.conv1(x)   # (N,32,28,28)
        out = self.relu1(out)
        
        out = self.conv2(out) # (N,64,28,28)
        out = self.relu2(out)
        
        out = self.pool(out)  # (N,64,14,14)
        
        # 展开
        out = out.view(out.size(0), -1)  # (N,64*14*14) = (N,12544)
        
        out = self.fc1(out)   # (N,128)
        out = self.relu_fc1(out)
        
        out = self.fc2(out)   # (N,10)
        return out


# ============ 3. 训练与测试流程 ============

def train_and_test(epochs=3, batch_size=64, lr=0.001):
    # 加载数据
    train_loader, test_loader = load_mnist_data(batch_size)
    
    # 定义网络
    model = SimpleCNN(num_classes=10)
    
    # 定义损失函数 & 优化器
    criterion = nn.CrossEntropyLoss()  # 交叉熵损失
    optimizer = optim.Adam(model.parameters(), lr=lr)

    # ============= 训练 =============
    model.train()
    for epoch in range(1, epochs+1):
        running_loss = 0.0
        for i, (images, labels) in enumerate(train_loader):
            # 前向
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # 反向
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            if (i+1) % 100 == 0:
                print(f"Epoch [{epoch}/{epochs}], Step [{i+1}/{len(train_loader)}], "
                      f"Loss: {running_loss/100:.4f}")
                running_loss = 0.0

    # ============= 测试 =============
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            outputs = model(images)               # (batch_size,10)
            _, predicted = torch.max(outputs, 1)  # 取分类最大概率对应的下标
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print(f"Test Accuracy: {100 * correct/total:.2f}%")


# ============ 4. 主函数入口 ============

if __name__ == "__main__":
    train_and_test(epochs=3, batch_size=64, lr=0.001)

Epoch [1/3], Step [100/938], Loss: 0.4625
Epoch [1/3], Step [200/938], Loss: 0.1340
Epoch [1/3], Step [300/938], Loss: 0.0946
Epoch [1/3], Step [400/938], Loss: 0.0837
Epoch [1/3], Step [500/938], Loss: 0.0822
Epoch [1/3], Step [600/938], Loss: 0.0602
Epoch [1/3], Step [700/938], Loss: 0.0605
Epoch [1/3], Step [800/938], Loss: 0.0503
Epoch [1/3], Step [900/938], Loss: 0.0633
Epoch [2/3], Step [100/938], Loss: 0.0303
Epoch [2/3], Step [200/938], Loss: 0.0376
Epoch [2/3], Step [300/938], Loss: 0.0373
Epoch [2/3], Step [400/938], Loss: 0.0369
Epoch [2/3], Step [500/938], Loss: 0.0399
Epoch [2/3], Step [600/938], Loss: 0.0367
Epoch [2/3], Step [700/938], Loss: 0.0322
Epoch [2/3], Step [800/938], Loss: 0.0342
Epoch [2/3], Step [900/938], Loss: 0.0366
Epoch [3/3], Step [100/938], Loss: 0.0162
Epoch [3/3], Step [200/938], Loss: 0.0208
Epoch [3/3], Step [300/938], Loss: 0.0235
Epoch [3/3], Step [400/938], Loss: 0.0214
Epoch [3/3], Step [500/938], Loss: 0.0231
Epoch [3/3], Step [600/938], Loss:

1. 带池化层 (Pooling)、不划分验证集 —— cnn_with_pool_no_val.py

In [16]:
"""
cnn_with_pool_no_val.py

在 MNIST 上训练一个带池化层的 CNN，不划分验证集。
打印类似：
Epoch [1/3], Step [100/938], Loss: 0.4625
...
Test Accuracy: 98.83%
"""

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# ------------------------------
# 1. 加载 MNIST 数据 (无验证集)
# ------------------------------
def load_mnist_data(batch_size=64):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # 训练集: 60k，全做训练
    train_dataset = torchvision.datasets.MNIST(
        root='./data', train=True, download=True, transform=transform
    )
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=2
    )
    
    # 测试集: 10k
    test_dataset = torchvision.datasets.MNIST(
        root='./data', train=False, download=True, transform=transform
    )
    test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    return train_loader, test_loader

# ------------------------------
# 2. 带池化层的 CNN
# ------------------------------
class CNNWithPool(nn.Module):
    def __init__(self, num_classes=10):
        super(CNNWithPool, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32,
                               kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64,
                               kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # 28->14
        
        self.fc1 = nn.Linear(64 * 14 * 14, 128)
        self.relu_fc1 = nn.ReLU()
        self.fc2 = nn.Linear(128, num_classes)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        
        x = self.pool(x)  # (N,64,14,14)
        
        x = x.view(x.size(0), -1)  # (N,64*14*14)
        x = self.fc1(x)
        x = self.relu_fc1(x)
        x = self.fc2(x)
        return x

# ------------------------------
# 3. 训练和测试逻辑
# ------------------------------
def evaluate_accuracy(model, data_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in data_loader:
            outputs = model(images)
            _, predicted = torch.max(outputs, dim=1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    return 100.0 * correct / total

def train_test_model(epochs=3, batch_size=64, lr=1e-3):
    train_loader, test_loader = load_mnist_data(batch_size)
    model = CNNWithPool(num_classes=10)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    total_steps = len(train_loader)  # 一共多少个batch
    print(f"Total steps per epoch: {total_steps}")  # 一般 ~938 (60k/64)
    
    # 训练
    for epoch in range(1, epochs+1):
        model.train()
        running_loss = 0.0
        
        for i, (images, labels) in enumerate(train_loader):
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            # 每100 step打印一次
            if (i+1) % 100 == 0:
                avg_loss = running_loss / 100
                print(f"Epoch [{epoch}/{epochs}], Step [{i+1}/{total_steps}], Loss: {avg_loss:.4f}")
                running_loss = 0.0
    
    # 测试
    test_acc = evaluate_accuracy(model, test_loader)
    print(f"Test Accuracy: {test_acc:.2f}%")

# ------------------------------
# 4. 主函数
# ------------------------------
if __name__ == "__main__":
    train_test_model(epochs=3, batch_size=64, lr=1e-3)

Total steps per epoch: 938
Epoch [1/3], Step [100/938], Loss: 0.4233
Epoch [1/3], Step [200/938], Loss: 0.1228
Epoch [1/3], Step [300/938], Loss: 0.1022
Epoch [1/3], Step [400/938], Loss: 0.0763
Epoch [1/3], Step [500/938], Loss: 0.0766
Epoch [1/3], Step [600/938], Loss: 0.0583
Epoch [1/3], Step [700/938], Loss: 0.0643
Epoch [1/3], Step [800/938], Loss: 0.0583
Epoch [1/3], Step [900/938], Loss: 0.0546
Epoch [2/3], Step [100/938], Loss: 0.0297
Epoch [2/3], Step [200/938], Loss: 0.0362
Epoch [2/3], Step [300/938], Loss: 0.0319
Epoch [2/3], Step [400/938], Loss: 0.0391
Epoch [2/3], Step [500/938], Loss: 0.0373
Epoch [2/3], Step [600/938], Loss: 0.0395
Epoch [2/3], Step [700/938], Loss: 0.0408
Epoch [2/3], Step [800/938], Loss: 0.0343
Epoch [2/3], Step [900/938], Loss: 0.0429
Epoch [3/3], Step [100/938], Loss: 0.0159
Epoch [3/3], Step [200/938], Loss: 0.0219
Epoch [3/3], Step [300/938], Loss: 0.0184
Epoch [3/3], Step [400/938], Loss: 0.0185
Epoch [3/3], Step [500/938], Loss: 0.0243
Epoch [

2. 不带池化层、划分验证集 —— cnn_no_pool_with_val.py

In [20]:
"""
cnn_no_pool_with_val.py

在 MNIST 上训练一个不带池化层的 CNN，划分出验证集 (5k)。
打印类似：
Epoch [1/3], Step [100/938], Loss: 0.4625
...
[Val] Accuracy: 97.32%
Test Accuracy: 98.83%
"""

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# ------------------------------
# 1. 加载 MNIST + 划分验证集
# ------------------------------
def load_mnist_data(batch_size=64):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # 整体训练集 60k -> 拆分 55k 训练 + 5k 验证
    full_train_dataset = torchvision.datasets.MNIST(
        root='./data', train=True, download=True, transform=transform
    )
    train_size = 55000
    val_size   = 5000
    train_dataset, val_dataset = torch.utils.data.random_split(
        full_train_dataset, [train_size, val_size]
    )
    
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=2
    )
    val_loader = torch.utils.data.DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    # 测试集 10k
    test_dataset = torchvision.datasets.MNIST(
        root='./data', train=False, download=True, transform=transform
    )
    test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    return train_loader, val_loader, test_loader


# ------------------------------
# 2. 不带池化层的 CNN (用 stride=2 下采样)
# ------------------------------
class CNNNoPool(nn.Module):
    def __init__(self, num_classes=10):
        super(CNNNoPool, self).__init__()
        
        # conv1: stride=1, 不改尺寸
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32,
                               kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        
        # conv2: stride=2，相当于下采样 => 28->14
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64,
                               kernel_size=3, stride=2, padding=1)
        self.relu2 = nn.ReLU()
        
        # 现在尺寸 14x14，通道64 => 64*14*14=12544
        self.fc1 = nn.Linear(64*14*14, 128)
        self.relu_fc1 = nn.ReLU()
        self.fc2 = nn.Linear(128, num_classes)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        
        x = x.view(x.size(0), -1)  # (N,12544)
        x = self.fc1(x)
        x = self.relu_fc1(x)
        x = self.fc2(x)
        return x

# ------------------------------
# 3. 训练、验证、测试流程
# ------------------------------
def evaluate_accuracy(model, data_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in data_loader:
            outputs = model(images)
            _, predicted = torch.max(outputs, dim=1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    return 100.0 * correct / total

def train_test_model(epochs=3, batch_size=64, lr=1e-3):
    train_loader, val_loader, test_loader = load_mnist_data(batch_size)
    model = CNNNoPool(num_classes=10)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    total_steps = len(train_loader)
    print(f"Total steps per epoch: {total_steps}")
    
    # 训练
    for epoch in range(1, epochs+1):
        model.train()
        running_loss = 0.0
        
        for i, (images, labels) in enumerate(train_loader):
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            if (i+1) % 100 == 0:
                avg_loss = running_loss / 100
                print(f"Epoch [{epoch}/{epochs}], Step [{i+1}/{total_steps}], Loss: {avg_loss:.4f}")
                running_loss = 0.0
        
        # 每个 epoch 结束后，做一次验证集准确率
        val_acc = evaluate_accuracy(model, val_loader)
        print(f"[Val] Accuracy: {val_acc:.2f}%")
    
    # 测试
    test_acc = evaluate_accuracy(model, test_loader)
    print(f"Test Accuracy: {test_acc:.2f}%")

# ------------------------------
# 4. 主函数
# ------------------------------
if __name__ == "__main__":
    train_test_model(epochs=3, batch_size=64, lr=1e-3)

Total steps per epoch: 860
Epoch [1/3], Step [100/860], Loss: 0.4814
Epoch [1/3], Step [200/860], Loss: 0.1527
Epoch [1/3], Step [300/860], Loss: 0.1111
Epoch [1/3], Step [400/860], Loss: 0.0819
Epoch [1/3], Step [500/860], Loss: 0.0839
Epoch [1/3], Step [600/860], Loss: 0.0761
Epoch [1/3], Step [700/860], Loss: 0.0748
Epoch [1/3], Step [800/860], Loss: 0.0608
[Val] Accuracy: 97.80%
Epoch [2/3], Step [100/860], Loss: 0.0365
Epoch [2/3], Step [200/860], Loss: 0.0441
Epoch [2/3], Step [300/860], Loss: 0.0379
Epoch [2/3], Step [400/860], Loss: 0.0435
Epoch [2/3], Step [500/860], Loss: 0.0391
Epoch [2/3], Step [600/860], Loss: 0.0432
Epoch [2/3], Step [700/860], Loss: 0.0356
Epoch [2/3], Step [800/860], Loss: 0.0458
[Val] Accuracy: 98.12%
Epoch [3/3], Step [100/860], Loss: 0.0165
Epoch [3/3], Step [200/860], Loss: 0.0186
Epoch [3/3], Step [300/860], Loss: 0.0181
Epoch [3/3], Step [400/860], Loss: 0.0260
Epoch [3/3], Step [500/860], Loss: 0.0255
Epoch [3/3], Step [600/860], Loss: 0.0211
Epo

3. 带池化层、划分验证集 —— cnn_with_pool_with_val.py

In [21]:
"""
cnn_with_pool_with_val.py

在 MNIST 上训练一个带池化层的 CNN，并划分出验证集 (5k)。
打印类似：
Epoch [1/3], Step [100/938], Loss: 0.4625
...
[Val] Accuracy: 97.32%
Test Accuracy: 98.83%
"""

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# ------------------------------
# 1. 加载 MNIST + 划分验证集
# ------------------------------
def load_mnist_data(batch_size=64):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    full_train_dataset = torchvision.datasets.MNIST(
        root='./data', train=True, download=True, transform=transform
    )
    train_size = 55000
    val_size   = 5000
    train_dataset, val_dataset = torch.utils.data.random_split(
        full_train_dataset, [train_size, val_size]
    )
    
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=2
    )
    val_loader = torch.utils.data.DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    test_dataset = torchvision.datasets.MNIST(
        root='./data', train=False, download=True, transform=transform
    )
    test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    return train_loader, val_loader, test_loader

# ------------------------------
# 2. 带池化层的 CNN
# ------------------------------
class CNNWithPool(nn.Module):
    def __init__(self, num_classes=10):
        super(CNNWithPool, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32,
                               kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64,
                               kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # 28->14
        
        self.fc1 = nn.Linear(64 * 14 * 14, 128)
        self.relu_fc1 = nn.ReLU()
        self.fc2 = nn.Linear(128, num_classes)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        
        x = self.pool(x)
        
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.relu_fc1(x)
        x = self.fc2(x)
        return x

# ------------------------------
# 3. 训练、验证、测试流程
# ------------------------------
def evaluate_accuracy(model, data_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in data_loader:
            outputs = model(images)
            _, predicted = torch.max(outputs, dim=1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    return 100.0 * correct / total

def train_test_model(epochs=3, batch_size=64, lr=1e-3):
    train_loader, val_loader, test_loader = load_mnist_data(batch_size)
    
    model = CNNWithPool(num_classes=10)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    total_steps = len(train_loader)
    print(f"Total steps per epoch: {total_steps}")
    
    # 训练
    for epoch in range(1, epochs+1):
        model.train()
        running_loss = 0.0
        
        for i, (images, labels) in enumerate(train_loader):
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            # 每 100 steps 打印一次
            if (i+1) % 100 == 0:
                avg_loss = running_loss / 100
                print(f"Epoch [{epoch}/{epochs}], Step [{i+1}/{total_steps}], Loss: {avg_loss:.4f}")
                running_loss = 0.0
        
        # 验证集
        val_acc = evaluate_accuracy(model, val_loader)
        print(f"[Val] Accuracy: {val_acc:.2f}%")
    
    # 测试集
    test_acc = evaluate_accuracy(model, test_loader)
    print(f"Test Accuracy: {test_acc:.2f}%")

# ------------------------------
# 4. 主函数
# ------------------------------
if __name__ == "__main__":
    train_test_model(epochs=3, batch_size=64, lr=1e-3)

Total steps per epoch: 860
Epoch [1/3], Step [100/860], Loss: 0.4421
Epoch [1/3], Step [200/860], Loss: 0.1316
Epoch [1/3], Step [300/860], Loss: 0.1127
Epoch [1/3], Step [400/860], Loss: 0.0921
Epoch [1/3], Step [500/860], Loss: 0.0723
Epoch [1/3], Step [600/860], Loss: 0.0720
Epoch [1/3], Step [700/860], Loss: 0.0717
Epoch [1/3], Step [800/860], Loss: 0.0597
[Val] Accuracy: 98.60%
Epoch [2/3], Step [100/860], Loss: 0.0386
Epoch [2/3], Step [200/860], Loss: 0.0468
Epoch [2/3], Step [300/860], Loss: 0.0385
Epoch [2/3], Step [400/860], Loss: 0.0385
Epoch [2/3], Step [500/860], Loss: 0.0420
Epoch [2/3], Step [600/860], Loss: 0.0364
Epoch [2/3], Step [700/860], Loss: 0.0426
Epoch [2/3], Step [800/860], Loss: 0.0332
[Val] Accuracy: 98.58%
Epoch [3/3], Step [100/860], Loss: 0.0178
Epoch [3/3], Step [200/860], Loss: 0.0196
Epoch [3/3], Step [300/860], Loss: 0.0201
Epoch [3/3], Step [400/860], Loss: 0.0217
Epoch [3/3], Step [500/860], Loss: 0.0245
Epoch [3/3], Step [600/860], Loss: 0.0344
Epo

「带池化层 + 划分验证集」的网络在 MNIST 上往往能达到更高的准确率

1. 池化层帮助抑制过拟合并提取稳健特征
* 池化层 (Pooling) 的引入，会降低特征图分辨率、减少可学习参数，从而减轻过拟合；
* 它还带来一定程度的「平移不变性」，使网络对输入细微变动更具鲁棒性；
* 对于 MNIST 这样的小图像，池化操作（如 MaxPool2d(kernel=2, stride=2)）通常能快速稳定地提取关键信息，提升模型泛化能力。

2. 验证集指导超参数调优，进一步提升泛化
* 验证集 (Validation Set) 可在训练过程中及时评估模型表现：
* 监控是否出现过拟合；
* 动态调整学习率、迭代次数、正则化系数等超参数；
* 进行早停 (Early Stopping) 或其他策略时也能防止过拟合到训练集本身。
* 没有验证集时，往往会把所有训练样本都用于训练，虽然能获得更多的训练量，但可能无法及时监测模型对新数据的泛化表现，导致超参数选取不够精细、难以及时纠正过拟合或欠拟合问题。
* 有了验证集，可以更精细地找到合适的超参数设置，通常会使最终测试集准确率更高。

3. 两者结合：减少过拟合 + 合理调参

当池化层与验证集调参一起使用时：

    1. 网络结构上，池化层为卷积神经网络提供了简洁有效的特征提取方式；
    
    2. 训练流程上，验证集可帮助更好地控制训练和调参，把模型「打磨」到最佳状态；
    
    3. 结果：在 MNIST 这类相对简单的数据集上，更容易达到 99% 或更高的准确率。

在更大规模或更复杂的任务中，这两者（合理的网络结构 + 验证集调参）依旧是深度学习项目中常见且有效的组合策略。

cnn_with_pool_with_val_compare.py GPU VS CPU

In [None]:
"""
cnn_with_pool_with_val_compare.py

在 MNIST 上训练一个带池化层的 CNN，并划分出验证集 (5k)。
比较在 Apple Silicon 环境下，使用 CPU vs. MPS (GPU) 的训练耗时和准确率。

打印示例：
Epoch [1/3], Step [100/938], Loss: 0.4625
...
[Val] Accuracy: 97.32%
Test Accuracy: 98.83%

并输出训练耗时：
Training completed in XX.XX seconds on device=mps/cpu
"""

import time
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# ------------------------------
# 1. 加载 MNIST + 划分验证集
# ------------------------------
def load_mnist_data(batch_size=64):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # 整体 60k -> 划分 55k(训练) + 5k(验证)
    full_train_dataset = torchvision.datasets.MNIST(
        root='./data', train=True, download=True, transform=transform
    )
    train_size = 55000
    val_size   = 5000
    train_dataset, val_dataset = torch.utils.data.random_split(
        full_train_dataset, [train_size, val_size]
    )
    
    train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=2
    )
    val_loader = torch.utils.data.DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    test_dataset = torchvision.datasets.MNIST(
        root='./data', train=False, download=True, transform=transform
    )
    test_loader = torch.utils.data.DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=2
    )
    
    return train_loader, val_loader, test_loader

# ------------------------------
# 2. 带池化层的 CNN
# ------------------------------
class CNNWithPool(nn.Module):
    def __init__(self, num_classes=10):
        super(CNNWithPool, self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32,
                               kernel_size=3, stride=1, padding=1)
        self.relu1 = nn.ReLU()
        
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64,
                               kernel_size=3, stride=1, padding=1)
        self.relu2 = nn.ReLU()

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # 28->14
        
        self.fc1 = nn.Linear(64 * 14 * 14, 128)
        self.relu_fc1 = nn.ReLU()
        self.fc2 = nn.Linear(128, num_classes)
        
    def forward(self, x):
        x = self.conv1(x)   # (N,32,28,28)
        x = self.relu1(x)
        x = self.conv2(x)   # (N,64,28,28)
        x = self.relu2(x)
        
        x = self.pool(x)    # (N,64,14,14)
        
        x = x.view(x.size(0), -1)  # (N,64*14*14)
        x = self.fc1(x)            # (N,128)
        x = self.relu_fc1(x)
        x = self.fc2(x)            # (N,10)
        return x

# ------------------------------
# 3. 训练、验证、测试流程
# ------------------------------
def evaluate_accuracy(model, data_loader, device):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in data_loader:
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model(images)
            _, predicted = torch.max(outputs, dim=1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    return 100.0 * correct / total

def train_test_model(device="cpu", epochs=3, batch_size=64, lr=1e-3):
    """
    在指定 device 上训练并测试:
      - device: "cpu" or "mps" (Apple GPU)
      - epochs, batch_size, lr: 超参数
    """
    # 加载数据
    train_loader, val_loader, test_loader = load_mnist_data(batch_size)
    
    # 初始化模型到指定设备 模型和数据默认是在 CPU 上。现在通过 .to(device) 显式地切换。
    model = CNNWithPool(num_classes=10).to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    total_steps = len(train_loader)
    print(f"\nUsing device: {device}, total steps per epoch = {total_steps}")
    
    # 训练
    start_time = time.time()
    for epoch in range(1, epochs+1):
        model.train()
        running_loss = 0.0
        
        for i, (images, labels) in enumerate(train_loader):
            # 搬数据到 device
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            # 每 100 steps 打印一次
            if (i+1) % 100 == 0:
                avg_loss = running_loss / 100
                print(f"Epoch [{epoch}/{epochs}], Step [{i+1}/{total_steps}], Loss: {avg_loss:.4f}")
                running_loss = 0.0
        
        # 验证集准确率
        val_acc = evaluate_accuracy(model, val_loader, device)
        print(f"[Val] Accuracy: {val_acc:.2f}%")
    end_time = time.time()
    
    # 计算训练耗时
    elapsed_time = end_time - start_time
    print(f"Training completed in {elapsed_time:.2f} seconds on device={device}.")

    # 测试集准确率
    test_acc = evaluate_accuracy(model, test_loader, device)
    print(f"Test Accuracy on {device}: {test_acc:.2f}%")

# ------------------------------
# 4. 主函数：分别在 mps/cpu 上对比
# ------------------------------
if __name__ == "__main__":
    # 判断 MPS 是否可用 (针对 macOS + M1/M2 芯片)
    can_use_mps = torch.backends.mps.is_available()
    
    # 如果 MPS 可用，先跑 MPS 训练，再跑 CPU；否则只能跑 CPU
    if can_use_mps:
        train_test_model(device="mps", epochs=3, batch_size=64, lr=1e-3)
    
    # CPU 训练 (可对比)
    train_test_model(device="cpu", epochs=3, batch_size=64, lr=1e-3)


Using device: mps, total steps per epoch = 860
Epoch [1/3], Step [100/860], Loss: 0.4981
Epoch [1/3], Step [200/860], Loss: 0.1493
Epoch [1/3], Step [300/860], Loss: 0.1068
Epoch [1/3], Step [400/860], Loss: 0.0846
Epoch [1/3], Step [500/860], Loss: 0.0771
Epoch [1/3], Step [600/860], Loss: 0.0673
Epoch [1/3], Step [700/860], Loss: 0.0601
Epoch [1/3], Step [800/860], Loss: 0.0625
[Val] Accuracy: 98.36%
Epoch [2/3], Step [100/860], Loss: 0.0375
Epoch [2/3], Step [200/860], Loss: 0.0379
Epoch [2/3], Step [300/860], Loss: 0.0389
Epoch [2/3], Step [400/860], Loss: 0.0393
Epoch [2/3], Step [500/860], Loss: 0.0392
Epoch [2/3], Step [600/860], Loss: 0.0384
Epoch [2/3], Step [700/860], Loss: 0.0418
Epoch [2/3], Step [800/860], Loss: 0.0415
[Val] Accuracy: 98.60%
Epoch [3/3], Step [100/860], Loss: 0.0189
Epoch [3/3], Step [200/860], Loss: 0.0228
Epoch [3/3], Step [300/860], Loss: 0.0257
Epoch [3/3], Step [400/860], Loss: 0.0213
Epoch [3/3], Step [500/860], Loss: 0.0221
Epoch [3/3], Step [600/8