# 现代卷积神经网络
## 深度卷积神经网络
### 学习表征
- 让机器自动从原始数据中发现和提取对解决任务（如分类）有用的特征（即表征）的过程。
- 如何学习表征（层层递进）：
    - 先学习到一些简单的边缘、角点、颜色等基础特征（局部特征）；
    - 然后将底层的特征组合起来，学习到更复杂的纹理、图案、部件。
    - 最后将中间层的特征进一步组合，形成完整的、高度抽象的对象或概念； 
### $\mathrm{AlexNet}$（$8$层卷积）
![卷积神经网络](../image/AlexNet.jpg)
- 激活函数改用 $\mathrm{ReLU}$

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms, datasets


class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 96, kernel_size=11, stride=4),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        
        """使用自适应池化层来处理不同尺寸的输入"""
        self.adaptive_pool = nn.AdaptiveAvgPool2d((6, 6))
        
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            
            nn.Linear(4096, num_classes),
        )
        
    def forward(self, x):
        x = self.features(x)
        x = self.adaptive_pool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

"""测试网络"""
X = torch.rand(1, 1, 224, 224)
net = AlexNet(num_classes=10)

print("网络结构:")
print(net)

print("\n逐层输出形状：")
print("Input shape:\t", X.shape)

# 手动计算每一层的输出形状
with torch.no_grad():
    x = X.clone()
    for name, layer in net.named_children():
        if name == 'features':
            for i, sub_layer in enumerate(layer):
                x = sub_layer(x)
                print(f'{name}[{i}] ({sub_layer.__class__.__name__}) output shape:\t', x.shape)
        elif name == 'adaptive_pool':
            x = layer(x)
            print(f'{name} output shape:\t', x.shape)
        elif name == 'classifier':
            x = x.view(x.size(0), -1)  # 展平
            print(f'Flattened shape:\t', x.shape)
            for i, sub_layer in enumerate(layer):
                x = sub_layer(x)
                print(f'{name}[{i}] ({sub_layer.__class__.__name__}) output shape:\t', x.shape)

网络结构:
AlexNet(
  (features): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (adaptive_pool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (

In [4]:
batch_size = 128
resize = 224

transform = transforms.Compose([
    transforms.Resize(resize),
    transforms.ToTensor()
])

"""加载数据集"""
train_dataset = datasets.FashionMNIST(
    root="./data", train=True, download=True, transform=transform
)
test_dataset = datasets.FashionMNIST(
    root="./data", train=False, download=True, transform=transform
)

"""构建 DataLoader"""
train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [6]:
lr, num_epochs = 0.01, 10
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

net = AlexNet(num_classes=10).to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

for epoch in range(num_epochs):
    net.train()
    total_loss, correct, total = 0, 0, 0
    
    for X, y in train_iter:
        X, y = X.to(device), y.to(device)
        
        y_hat = net(X)
        loss = criterion(y_hat, y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(y_hat, 1)
        correct += (predicted == y).sum().item()
        total += y.size(0)
        
    train_loss = total_loss / total
    train_acc = correct / total
    
    net.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for X, y in test_iter:
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            _, predicted = torch.max(y_hat, 1)
            correct += (predicted == y).sum().item()
            total += y.size(0)
            
    test_acc = correct / total
    print(f"Epoch {epoch+1}: "
          f"train_loss={train_loss:.4f}, train_acc={train_acc:.4f}, test_acc={test_acc:.4f}")

Epoch 1: train_loss=0.0180, train_acc=0.1076, test_acc=0.1379
Epoch 2: train_loss=0.0179, train_acc=0.1088, test_acc=0.1001
Epoch 3: train_loss=0.0106, train_acc=0.4617, test_acc=0.5778
Epoch 4: train_loss=0.0075, train_acc=0.6217, test_acc=0.6464
Epoch 5: train_loss=0.0063, train_acc=0.6897, test_acc=0.7263
Epoch 6: train_loss=0.0056, train_acc=0.7305, test_acc=0.7569
Epoch 7: train_loss=0.0051, train_acc=0.7575, test_acc=0.7686
Epoch 8: train_loss=0.0046, train_acc=0.7772, test_acc=0.7828
Epoch 9: train_loss=0.0043, train_acc=0.7940, test_acc=0.8053
Epoch 10: train_loss=0.0040, train_acc=0.8098, test_acc=0.8082


## 使用块的网络（$\mathrm{VGG}$）
- 使用小尺寸卷积核代替大尺寸卷积核。
### $\mathrm{VGG}$块
![VGG](../image/VGG.jpg)

In [8]:
def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU(inplace=True))
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*layers)

In [9]:
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))

In [16]:
def vgg(conv_arch, in_channels=1, num_classes=10):
    layers = []
    out_channels = None
    for (num_convs, out_channels) in conv_arch:
        layers.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels
        
    conv_part = nn.Sequential(*layers)
    
    fc_part = nn.Sequential(
        nn.Flatten(),
        nn.Linear(out_channels * 7 * 7, 4096),
        nn.ReLU(inplace=True),
        nn.Dropout(0.5),
        nn.Linear(4096, 4096),
        nn.ReLU(inplace=True),
        nn.Dropout(0.5),
        nn.Linear(4096, num_classes),
    )
    
    net = nn.Sequential(conv_part, fc_part)
    return net

net = vgg(conv_arch, in_channels=1, num_classes=10)

In [17]:
X = torch.randn(1, 1, 224, 224)

for _, Blk in net.named_children():
    for name, blk in Blk.named_children():
        X = blk(X)
        print(f"{name} output shape:\t", X.shape)

0 output shape:	 torch.Size([1, 64, 112, 112])
1 output shape:	 torch.Size([1, 128, 56, 56])
2 output shape:	 torch.Size([1, 256, 28, 28])
3 output shape:	 torch.Size([1, 512, 14, 14])
4 output shape:	 torch.Size([1, 512, 7, 7])
0 output shape:	 torch.Size([1, 25088])
1 output shape:	 torch.Size([1, 4096])
2 output shape:	 torch.Size([1, 4096])
3 output shape:	 torch.Size([1, 4096])
4 output shape:	 torch.Size([1, 4096])
5 output shape:	 torch.Size([1, 4096])
6 output shape:	 torch.Size([1, 4096])
7 output shape:	 torch.Size([1, 10])


In [18]:
ratio = 4
small_conv_arch = [(pair[0], pair[1] // ratio) for pair in conv_arch]
net = vgg(small_conv_arch)

In [20]:
lr, num_epochs, batch_size = 0.05, 10, 128
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor()
])

train_dataset = datasets.FashionMNIST(root="./data", train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root="./data", train=False, download=True, transform=transform)

train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

for epoch in range(num_epochs):
    net.train()
    total_loss, correct, total = 0, 0, 0
    for X, y in train_iter:
        X, y = X.to(device), y.to(device)
        
        y_hat = net(X)
        loss = criterion(y_hat, y)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = y_hat.max(1)
        correct += (predicted == y).sum().item()
        total += y.size(0)
        
    train_loss = total_loss / total
    train_acc = correct / total
    
    net.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for X, y in test_iter:
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            _, predicted = y_hat.max(1)
            correct += (predicted == y).sum().item()
            total += y.size(0)
        test_acc = correct / total
        
        print(f"Epoch {epoch+1}: "
          f"train_loss={train_loss:.4f}, train_acc={train_acc:.4f}, test_acc={test_acc:.4f}")

Epoch 1: train_loss=0.0180, train_acc=0.0996, test_acc=0.1000
Epoch 2: train_loss=0.0180, train_acc=0.0969, test_acc=0.1000
Epoch 3: train_loss=0.0180, train_acc=0.1011, test_acc=0.1000
Epoch 4: train_loss=0.0177, train_acc=0.1845, test_acc=0.3522
Epoch 5: train_loss=0.0069, train_acc=0.6548, test_acc=0.7322
Epoch 6: train_loss=0.0038, train_acc=0.8178, test_acc=0.8519
Epoch 7: train_loss=0.0029, train_acc=0.8631, test_acc=0.8683
Epoch 8: train_loss=0.0025, train_acc=0.8846, test_acc=0.8552
Epoch 9: train_loss=0.0022, train_acc=0.8949, test_acc=0.8829
Epoch 10: train_loss=0.0020, train_acc=0.9045, test_acc=0.8870


## 网络中的网络（$\mathrm{NiN}$）
- 使用**小型的多层感知机**代替卷积核，实现非线性变换，增强表达能力。
### $\mathrm{NiN}$块
- 卷积$+1\times1$卷积组合的小模块

![NiN块](../image/NiN.jpg)

In [13]:
def nin_block(in_channels, out_channels, kernel_size, stride, padding, use_relu=True):
    layers = [
        nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size,
                  stride=stride, padding=padding),
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, kernel_size=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(out_channels, out_channels, kernel_size=1)
    ]
    if use_relu:
        layers.append(nn.ReLU(inplace=True))
    return nn.Sequential(*layers)

### $\mathrm{NiN}$模型

In [10]:
net = nn.Sequential(
    nin_block(1, 96, kernel_size=11, stride=4, padding=0),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nin_block(96, 256, kernel_size=5, stride=1, padding=2),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nin_block(256, 384, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(kernel_size=3, stride=2),
    # 先去掉 Dropout，等模型能收敛后再加
    nin_block(384, 10, kernel_size=3, stride=1, padding=1, use_relu=False),  # 最后一层不要 ReLU
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
)

In [14]:
lr, num_epochs, batch_size = 0.01, 10, 128
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = net.to(device)

In [15]:
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # 标准化，加速收敛
])

train_dataset = datasets.FashionMNIST(root="./data", train=True, download=True, transform=transform)
test_dataset = datasets.FashionMNIST(root="./data", train=False, download=True, transform=transform)

train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)


criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr, momentum=0.9)

def train(net, train_iter, test_iter, num_epochs, device, optimizer, criterion):
    for epoch in range(num_epochs):
        net.train()
        train_loss, train_correct, total = 0, 0, 0
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()
            y_hat = net(X)
            loss = criterion(y_hat, y)
            loss.backward()
            optimizer.step()

            train_loss += loss.item() * y.size(0)
            train_correct += (y_hat.argmax(1) == y).sum().item()
            total += y.size(0)

        net.eval()
        test_correct, test_total = 0, 0
        with torch.no_grad():
            for X, y in test_iter:
                X, y = X.to(device), y.to(device)
                y_hat = net(X)
                test_correct += (y_hat.argmax(1) == y).sum().item()
                test_total += y.size(0)

        print(f"Epoch {epoch+1}: "
              f"train loss {train_loss/total:.4f}, "
              f"train acc {train_correct/total:.3f}, "
              f"test acc {test_correct/test_total:.3f}")

train(net, train_iter, test_iter, num_epochs, device, optimizer, criterion)

Epoch 1: train loss 2.3030, train acc 0.099, test acc 0.172
Epoch 2: train loss 2.3027, train acc 0.100, test acc 0.100
Epoch 3: train loss 2.2459, train acc 0.138, test acc 0.267
Epoch 4: train loss 1.2608, train acc 0.502, test acc 0.710
Epoch 5: train loss 0.7279, train acc 0.733, test acc 0.736
Epoch 6: train loss 0.6055, train acc 0.773, test acc 0.780
Epoch 7: train loss 0.5368, train acc 0.801, test acc 0.817
Epoch 8: train loss 0.4836, train acc 0.822, test acc 0.811
Epoch 9: train loss 0.4461, train acc 0.836, test acc 0.843
Epoch 10: train loss 0.4152, train acc 0.848, test acc 0.857


## 含并行连结的网络（$\mathrm{GoogLeNet}$）
- 使用并行的卷积层拼接 + 降维。
### $\mathrm{Inception}$ 模块
- 多个并行的卷积分支

![Inception块](../image/Inception块.jpg)

In [2]:
class Inception(nn.Module):
    def __init__(self, in_channels, c1, c2, c3, c4):
        super(Inception, self).__init__()
        # 线路1
        self.p1 = nn.Sequential(
            nn.Conv2d(in_channels, c1, kernel_size=1),
            nn.BatchNorm2d(c1),
            nn.ReLU(inplace=True)
        )
        # 线路2
        self.p2 = nn.Sequential(
            nn.Conv2d(in_channels, c2[0], kernel_size=1),
            nn.BatchNorm2d(c2[0]),
            nn.ReLU(inplace=True),
            nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1),
            nn.BatchNorm2d(c2[1]),
            nn.ReLU(inplace=True)
        )
        # 线路3
        self.p3 = nn.Sequential(
            nn.Conv2d(in_channels, c3[0], kernel_size=1),
            nn.BatchNorm2d(c3[0]),
            nn.ReLU(inplace=True),
            nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2),
            nn.BatchNorm2d(c3[1]),
            nn.ReLU(inplace=True)
        )
        # 线路4
        self.p4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, c4, kernel_size=1),
            nn.BatchNorm2d(c4),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return torch.cat([self.p1(x), self.p2(x), self.p3(x), self.p4(x)], dim=1)

## $\mathrm{GoogLeNet}$模型

![GoogLeNet](../image/GoogLeNet模型.jpg)

In [3]:
class MiniGoogLeNet(nn.Module):
    def __init__(self, num_classes=10):
        super(MiniGoogLeNet, self).__init__()
        self.b1 = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3,2,1)
        )
        self.b2 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 192, kernel_size=3, padding=1),
            nn.BatchNorm2d(192),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3,2,1)
        )
        self.b3 = nn.Sequential(
            Inception(192, 64, (96, 128), (16, 32), 32),
            Inception(256, 128, (128, 192), (32, 96), 64),
            nn.MaxPool2d(3,2,1)
        )
        self.b4 = nn.Sequential(
            Inception(480, 192, (96, 208), (16, 48), 64),
            Inception(512, 160, (112, 224), (24, 64), 64),
            Inception(512, 128, (128, 256), (24, 64), 64),   
            Inception(512, 112, (144, 288), (32, 64), 64),
            Inception(528, 256, (160, 320), (32, 128), 128),
            nn.MaxPool2d(3,2,1)
        )
        
        self.b5 = nn.Sequential(
            Inception(832, 256, (160, 320), (32, 128), 128),
            Inception(832, 384, (192, 384), (48, 128), 128),
            nn.AdaptiveAvgPool2d((1,1))
        )
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.b1(x)
        x = self.b2(x)
        x = self.b3(x)
        x = self.b4(x)
        x = self.b5(x)
        x = torch.flatten(x,1)
        x = self.fc(x)
        return x

In [4]:
batch_size = 128
transform = transforms.Compose([
    transforms.Resize(96),
    transforms.ToTensor()
])
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset  = datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)
train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter  = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [5]:
# ---------- 训练 ----------
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
net = MiniGoogLeNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)

num_epochs = 10

for epoch in range(num_epochs):
    net.train()
    train_loss, train_correct, total = 0.0,0,0
    for X, y in train_iter:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad()
        y_hat = net(X)
        loss = criterion(y_hat, y)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * y.size(0)
        train_correct += (y_hat.argmax(1) == y).sum().item()
        total += y.size(0)
    # 测试
    net.eval()
    test_correct, test_total = 0,0
    with torch.no_grad():
        for X, y in test_iter:
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            test_correct += (y_hat.argmax(1) == y).sum().item()
            test_total += y.size(0)
    net.train()
    print(f"Epoch {epoch+1}: train loss {train_loss/total:.4f}, "
          f"train acc {train_correct/total:.3f}, "
          f"test acc {test_correct/test_total:.3f}")

Epoch 1: train loss 0.4132, train acc 0.848, test acc 0.846
Epoch 2: train loss 0.2689, train acc 0.901, test acc 0.901
Epoch 3: train loss 0.2293, train acc 0.917, test acc 0.907
Epoch 4: train loss 0.2060, train acc 0.924, test acc 0.896
Epoch 5: train loss 0.1859, train acc 0.932, test acc 0.905
Epoch 6: train loss 0.1707, train acc 0.937, test acc 0.909
Epoch 7: train loss 0.1550, train acc 0.943, test acc 0.927
Epoch 8: train loss 0.1423, train acc 0.948, test acc 0.928
Epoch 9: train loss 0.1307, train acc 0.952, test acc 0.910
Epoch 10: train loss 0.1198, train acc 0.956, test acc 0.909


## 批量规范化
### 训练深层网络
- 批量规范化（BN）是在每一层输入上进行标准化（减均值、除方差），让特征分布保持稳定，从而加快收敛、缓解梯度消失/爆炸。它在此基础上再引入可学习的缩放和平移参数，保证网络仍能灵活表达特征。在$\mathrm{CNN}$中，$\mathrm{BN}$是对每个通道在整个 batch 和空间维度上做归一化。
### 批量规范化层

In [3]:
def batch_norm(X, gamma, beta, moving_mean, moving_var, eps, momentum):
    """判断是训练模式还是推理模式"""
    if not X.requires_grad:  # 推理模式
        X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)
    else:  # 训练模式
        assert X.dim() in (2, 4)
        if X.dim() == 2:
            """全连接层 (batch, features)"""
            mean = X.mean(dim=0, keepdim=True)
            var = ((X - mean) ** 2).mean(dim=0, keepdim=True)
        else:
            """卷积层 (batch, channels, height, width)"""
            mean = X.mean(dim=(0, 2, 3), keepdim=True)
            var = ((X - mean) ** 2).mean(dim=(0, 2, 3), keepdim=True)

        X_hat = (X - mean) / torch.sqrt(var + eps)

        """更新滑动平均（in-place 不要反向传播）"""
        with torch.no_grad():
            moving_mean[:] = momentum * moving_mean + (1.0 - momentum) * mean
            moving_var[:] = momentum * moving_var + (1.0 - momentum) * var

    Y = gamma * X_hat + beta
    return Y, moving_mean, moving_var

In [11]:
class BatchNorm(nn.Module):
    def __init__(self, num_features, num_dims, eps=1e-12, momentum=0.9):
        super(BatchNorm, self).__init__()
        if num_dims == 2:  # 全连接层
            shape = (1, num_features)
        else:  # 卷积层
            shape = (1, num_features, 1, 1)

        """参与训练的参数 gamma 和 beta"""
        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))

        """非模型参数（不参与梯度），滑动平均"""
        self.register_buffer("moving_mean", torch.zeros(shape))
        self.register_buffer("moving_var", torch.ones(shape))

        self.eps = eps
        self.momentum = momentum

    def forward(self, X):
        if self.training:  # 训练模式
            if X.dim() == 2:  # 全连接层
                mean = X.mean(dim=0, keepdim=True)
                var = ((X - mean) ** 2).mean(dim=0, keepdim=True)
            else:  # 卷积层 (N, C, H, W)
                mean = X.mean(dim=(0, 2, 3), keepdim=True)
                var = ((X - mean) ** 2).mean(dim=(0, 2, 3), keepdim=True)

            X_hat = (X - mean) / torch.sqrt(var + self.eps)

            """更新滑动平均"""
            with torch.no_grad():
                self.moving_mean[:] = self.momentum * self.moving_mean + (1.0 - self.momentum) * mean
                self.moving_var[:] = self.momentum * self.moving_var + (1.0 - self.momentum) * var

        else:  # 推理模式
            X_hat = (X - self.moving_mean) / torch.sqrt(self.moving_var + self.eps)

        Y = self.gamma * X_hat + self.beta
        return Y

In [12]:
net = nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=5),
    BatchNorm(6, num_dims=4),
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),

    nn.Conv2d(6, 16, kernel_size=5),
    BatchNorm(16, num_dims=4),
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),

    nn.Flatten(),
    BatchNorm(256, num_dims=2),
    nn.Sigmoid(),

    nn.Linear(256, 120),
    BatchNorm(120, num_dims=2),
    nn.Sigmoid(),

    nn.Linear(120, 84),
    BatchNorm(84, num_dims=2),
    nn.Sigmoid(),

    nn.Linear(84, 10)
)


In [13]:
lr, num_epochs, batch_size = 1.0, 10, 256
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

transform = transforms.ToTensor()
train_dataset = datasets.FashionMNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = datasets.FashionMNIST(root="./data", train=False, transform=transform, download=True)

train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

net.to(device)

loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

def train(net, train_iter, test_iter, loss, num_epochs, optimizer, device):
    for epoch in range(num_epochs):
        net.train()
        train_loss, train_acc, total = 0.0, 0.0, 0
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()

            train_loss += l.item() * y.size(0)
            train_acc += (y_hat.argmax(dim=1) == y).sum().item()
            total += y.size(0)

        test_acc = evaluate_accuracy(net, test_iter, device)
        print(f"epoch {epoch+1}, loss {train_loss/total:.4f}, "
              f"train acc {train_acc/total:.3f}, test acc {test_acc:.3f}")

def evaluate_accuracy(net, data_iter, device):
    net.eval()
    acc, total = 0, 0
    with torch.no_grad():
        for X, y in data_iter:
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            acc += (y_hat.argmax(dim=1) == y).sum().item()
            total += y.size(0)
    return acc / total

train(net, train_iter, test_iter, loss, num_epochs, optimizer, device)

epoch 1, loss 0.7894, train acc 0.713, test acc 0.682
epoch 2, loss 0.4780, train acc 0.825, test acc 0.779
epoch 3, loss 0.4012, train acc 0.853, test acc 0.821
epoch 4, loss 0.3566, train acc 0.870, test acc 0.828
epoch 5, loss 0.3313, train acc 0.879, test acc 0.799
epoch 6, loss 0.3088, train acc 0.887, test acc 0.786
epoch 7, loss 0.2973, train acc 0.891, test acc 0.827
epoch 8, loss 0.2802, train acc 0.897, test acc 0.831
epoch 9, loss 0.2745, train acc 0.899, test acc 0.859
epoch 10, loss 0.2644, train acc 0.903, test acc 0.833


In [15]:
gamma = net[1].gamma.view(-1)
beta = net[1].beta.view(-1)

print(gamma, beta)

tensor([2.4113, 3.1100, 2.2536, 3.0580, 3.1238, 3.3374], device='cuda:0',
       grad_fn=<ViewBackward0>) tensor([ 2.6976, -2.1039, -2.5220,  2.5038,  1.1402,  1.2099], device='cuda:0',
       grad_fn=<ViewBackward0>)


In [17]:
net = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),  # 假设输入是单通道图像
    nn.BatchNorm2d(6),
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),

    nn.Conv2d(6, 16, kernel_size=5),
    nn.BatchNorm2d(16),
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),

    nn.Flatten(),

    nn.Linear(16*4*4, 120),  # 输入特征维度要根据前面卷积+池化计算
    nn.BatchNorm1d(120),
    nn.Sigmoid(),

    nn.Linear(120, 84),
    nn.BatchNorm1d(84),
    nn.Sigmoid(),

    nn.Linear(84, 10)
)

In [18]:
lr, num_epochs, batch_size = 1.0, 10, 256
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

transform = transforms.ToTensor()
train_dataset = datasets.FashionMNIST(root="./data", train=True, transform=transform, download=True)
test_dataset = datasets.FashionMNIST(root="./data", train=False, transform=transform, download=True)

train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

net = net.to(device)

loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

def evaluate_accuracy(net, data_iter, device):
    net.eval()
    acc, total = 0, 0
    with torch.no_grad():
        for X, y in data_iter:
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            acc += (y_hat.argmax(dim=1) == y).sum().item()
            total += y.size(0)
    return acc / total

def train(net, train_iter, test_iter, loss, num_epochs, optimizer, device):
    for epoch in range(num_epochs):
        net.train()
        train_loss, train_acc, total = 0.0, 0.0, 0
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()
            y_hat = net(X)
            l = loss(y_hat, y)
            l.backward()
            optimizer.step()

            train_loss += l.item() * y.size(0)
            train_acc += (y_hat.argmax(dim=1) == y).sum().item()
            total += y.size(0)

        test_acc = evaluate_accuracy(net, test_iter, device)
        print(f"epoch {epoch+1}, loss {train_loss/total:.4f}, "
              f"train acc {train_acc/total:.3f}, test acc {test_acc:.3f}")

train(net, train_iter, test_iter, loss, num_epochs, optimizer, device)

epoch 1, loss 0.7853, train acc 0.715, test acc 0.601
epoch 2, loss 0.4869, train acc 0.822, test acc 0.729
epoch 3, loss 0.4149, train acc 0.848, test acc 0.806
epoch 4, loss 0.3747, train acc 0.862, test acc 0.695
epoch 5, loss 0.3535, train acc 0.869, test acc 0.847
epoch 6, loss 0.3292, train acc 0.878, test acc 0.858
epoch 7, loss 0.3129, train acc 0.884, test acc 0.797
epoch 8, loss 0.3022, train acc 0.890, test acc 0.826
epoch 9, loss 0.2889, train acc 0.894, test acc 0.844
epoch 10, loss 0.2767, train acc 0.899, test acc 0.865


## 残差网络
- 就是解决网络过深，训练损失反而上升的退化问题（使用残差块）。
- **函数类**
    - 每个残差块中要学习的映射（即残差）；
### 残差块
- 主路径（卷积层，批量归一化层、激活函数等组成）
- 跳跃连接（将输入直接加到输出上面）

![残差块](../image/残差块.jpg)

In [158]:
import torch.nn.functional as F

"""定义残差块（Residual Block）"""
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        """残差连接调整"""
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        residual = self.shortcut(x)
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += residual  # 添加残差连接
        out = F.relu(out)
        return out

### $\mathrm{ResNet}$模型

![残差模型](../image/ResNet.jpg)

In [171]:
class SimpleResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # 构建残差块
        self.layer1 = self._make_layer(64, 64, 2)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        self.layer4 = self._make_layer(256, 512, 2, stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

    @staticmethod
    def _make_layer(in_channels, out_channels, num_blocks, stride=1):
        layers = [ResidualBlock(in_channels, out_channels, stride)]
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

In [172]:
"""通过钩子打印每层的输出"""
def print_shape_hook(module, output):
    print(f"Layer: {module.__class__.__name__}, Output shape: {output.shape}")

"""注册钩子函数"""
def register_hooks(model):
    hooks = []
    for layer in model.children():
        hook = layer.register_forward_hook(print_shape_hook)
        hooks.append(hook)
    return hooks

model = SimpleResNet(num_classes=10)
hooks = register_hooks(model)

input_tensor = torch.randn(1, 3, 224, 224)

output = model(input_tensor)

"""清理钩子，避免内存泄漏"""
for hook in hooks:
    hook.remove()

Layer: Conv2d, Output shape: torch.Size([1, 64, 112, 112])
Layer: BatchNorm2d, Output shape: torch.Size([1, 64, 112, 112])
Layer: ReLU, Output shape: torch.Size([1, 64, 112, 112])
Layer: MaxPool2d, Output shape: torch.Size([1, 64, 56, 56])
Layer: Sequential, Output shape: torch.Size([1, 64, 56, 56])
Layer: Sequential, Output shape: torch.Size([1, 128, 28, 28])
Layer: Sequential, Output shape: torch.Size([1, 256, 14, 14])
Layer: Sequential, Output shape: torch.Size([1, 512, 7, 7])
Layer: AdaptiveAvgPool2d, Output shape: torch.Size([1, 512, 1, 1])
Layer: Linear, Output shape: torch.Size([1, 10])


## 稠密连接网络
- 把前面所有层的输出拼接为稠密网络中某层的输入。
### 从残差网络到稠密网络
- 残差是层输出相加，稠密是层输出连接。
### 稠密块体

In [25]:
import torch.nn.functional as F

class _DenseLayer(nn.Module):
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.norm1 = nn.BatchNorm2d(num_input_features)
        self.relu1 = nn.ReLU(inplace=True)
        self.conv1 = nn.Conv2d(num_input_features, bn_size * growth_rate,
                               kernel_size=1, stride=1, bias=False)
        
        self.norm2 = nn.BatchNorm2d(bn_size * growth_rate)
        self.relu2 = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(bn_size * growth_rate, growth_rate,
                               kernel_size=3, stride=1, padding=1, bias=False)
        
        self.drop_rate = drop_rate

    def forward(self, x):
        # 复合函数：BN -> ReLU -> Conv
        out = self.conv1(self.relu1(self.norm1(x)))
        out = self.conv2(self.relu2(self.norm2(out)))
        
        if self.drop_rate > 0:
            out = F.dropout(out, p=self.drop_rate, training=self.training)
        
        # 将输入和输出在通道维度上拼接
        out = torch.cat([x, out], 1)
        return out

In [26]:
class _DenseBlock(nn.Module):
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        layers = []
        for i in range(num_layers):
            layer = _DenseLayer(
                num_input_features + i * growth_rate,
                growth_rate,
                bn_size,
                drop_rate
            )
            layers.append(layer)
        self.layers = nn.ModuleList(layers)

    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

### 过渡层
- 解决稠密块带来的通道数增加导致模型复杂化问题，使用$1\times 1$卷积层来减少通道数，并使用步幅为2的平均汇聚层减半高和宽，从而进一步降低模型复杂度。

In [27]:
class _Transition(nn.Module):
    def __init__(self, num_input_features, num_output_features):
        super(_Transition, self).__init__()
        self.norm = nn.BatchNorm2d(num_input_features)
        self.relu = nn.ReLU(inplace=True)
        self.conv = nn.Conv2d(num_input_features, num_output_features,
                             kernel_size=1, stride=1, bias=False)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)

    def forward(self, x):
        x = self.conv(self.relu(self.norm(x)))
        x = self.pool(x)
        return x

In [29]:
class DenseNet(nn.Module):
    def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16),
                 num_init_features=64, bn_size=4, drop_rate=0, num_classes=1000):
        super(DenseNet, self).__init__()
        
        # 初始卷积层
        self.features = nn.Sequential(
            nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(num_init_features),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )
        
        # 构建DenseBlock
        num_features = num_init_features
        self.dense_blocks = nn.ModuleList()
        self.transitions = nn.ModuleList()
        
        for i, num_layers in enumerate(block_config):
            block = _DenseBlock(
                num_layers=num_layers,
                num_input_features=num_features,
                bn_size=bn_size,
                growth_rate=growth_rate,
                drop_rate=drop_rate
            )
            self.dense_blocks.append(block)
            num_features = num_features + num_layers * growth_rate
            
            if i != len(block_config) - 1:
                trans = _Transition(
                    num_input_features=num_features,
                    num_output_features=num_features // 2
                )
                self.transitions.append(trans)
                num_features = num_features // 2
        
        # 最后的BN和ReLU
        self.norm = nn.BatchNorm2d(num_features)
        self.relu = nn.ReLU(inplace=True)
        
        # 分类器
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.classifier = nn.Linear(num_features, num_classes)
        
        # 初始化权重
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.features(x)
        
        for i in range(len(self.dense_blocks)):
            x = self.dense_blocks[i](x)
            if i < len(self.transitions):
                x = self.transitions[i](x)
        
        x = self.relu(self.norm(x))
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

In [30]:
# 创建DenseNet-121模型
def densenet121(num_classes=1000):
    return DenseNet(
        growth_rate=32,
        block_config=(6, 12, 24, 16),
        num_init_features=64,
        num_classes=num_classes
    )

# 创建DenseNet-169模型
def densenet169(num_classes=1000):
    return DenseNet(
        growth_rate=32,
        block_config=(6, 12, 32, 32),
        num_init_features=64,
        num_classes=num_classes
    )

In [33]:
if __name__ == "__main__":
    # 创建模型
    model = densenet121(num_classes=10)
    
    # 测试输入
    x = torch.randn(2, 3, 224, 224)
    output = model(x)
    print(f"输出形状: {output.shape}")

输出形状: torch.Size([2, 10])


In [39]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import time
import os
from tqdm import tqdm

"""数据预处理"""
def get_data_transforms():
    train_transform = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
    ])
    
    test_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
    ])
    
    return train_transform, test_transform

"""加载数据集"""
def load_datasets(train_transform, test_transform):
    train_dataset = torchvision.datasets.CIFAR10(
        root='./data', 
        train=True, 
        download=True, 
        transform=train_transform
    )
    
    test_dataset = torchvision.datasets.CIFAR10(
        root='./data', 
        train=False, 
        download=True, 
        transform=test_transform
    )
    
    return train_dataset, test_dataset

"""训练函数"""
def train_model(model, train_loader, criterion, optimizer, device, epoch):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    progress_bar = tqdm(train_loader, desc=f'Epoch {epoch+1}')
    
    for batch_idx, (inputs, targets) in enumerate(progress_bar):
        inputs, targets = inputs.to(device), targets.to(device)
        
        # 前向传播
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        
        # 反向传播
        loss.backward()
        optimizer.step()
        
        # 统计信息
        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()
        
        # 更新进度条
        progress_bar.set_postfix({
            'Loss': f'{running_loss/(batch_idx+1):.3f}',
            'Acc': f'{100.*correct/total:.2f}%'
        })
    
    train_loss = running_loss / len(train_loader)
    train_acc = 100. * correct / total
    
    return train_loss, train_acc

"""验证函数"""
def validate_model(model, test_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            
            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
    
    test_loss = running_loss / len(test_loader)
    test_acc = 100. * correct / total
    
    return test_loss, test_acc

"""保存检查点"""
def save_checkpoint(model, optimizer, scheduler, epoch, acc, path):
    state = {
        'epoch': epoch,
        'state_dict': model.state_dict(),
        'optimizer': optimizer.state_dict(),
        'scheduler': scheduler.state_dict(),
        'accuracy': acc
    }
    torch.save(state, path)
    print(f'检查点已保存: {path}')

"""加载检查点"""
def load_checkpoint(model, optimizer, scheduler, checkpoint_path):
    if os.path.isfile(checkpoint_path):
        print(f'加载检查点: {checkpoint_path}')
        checkpoint = torch.load(checkpoint_path)
        model.load_state_dict(checkpoint['state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer'])
        scheduler.load_state_dict(checkpoint['scheduler'])
        start_epoch = checkpoint['epoch'] + 1
        best_acc = checkpoint['accuracy']
        print(f'从 epoch {start_epoch} 继续训练')
        return start_epoch, best_acc
    return 0, 0

def main():
    """超参数设置"""
    batch_size = 128
    learning_rate = 0.01
    momentum = 0.9
    weight_decay = 1e-4
    num_epochs = 10
    num_classes = 10
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f'使用设备: {device}')
    
    model = densenet121(num_classes=num_classes)
    model = model.to(device)
    
    train_transform, test_transform = get_data_transforms()
    train_dataset, test_dataset = load_datasets(train_transform, test_transform)
    
    train_loader = DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=4
    )
    test_loader = DataLoader(
        test_dataset, batch_size=batch_size, shuffle=False, num_workers=4
    )
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(
        model.parameters(), 
        lr=learning_rate, 
        momentum=momentum, 
        weight_decay=weight_decay
    )
    
    """学习率调度器"""
    scheduler = optim.lr_scheduler.MultiStepLR(
        optimizer, milestones=[100, 150], gamma=0.1
    )  # 在第5和8个epoch时调整学习率（这里为了方便就没使用到实际上，因为只训练十轮实际上），每次调整时学习率乘以0.1
    
    """检查点设置"""
    checkpoint_dir = 'checkpoints'
    os.makedirs(checkpoint_dir, exist_ok=True)
    checkpoint_path = os.path.join(checkpoint_dir, 'densenet121_best.pth')  # 这个可以帮助模型在训练中断时保存状态，以便于下次可以不用从头开始
    
    """加载检查点（如果存在）"""
    start_epoch, best_acc = load_checkpoint(model, optimizer, scheduler, checkpoint_path)
    
    """训练日志"""
    log_file = open('training_log.txt', 'a')
    log_file.write(f'开始训练 DenseNet-121\n')
    log_file.write(f'超参数: batch_size={batch_size}, lr={learning_rate}, epochs={num_epochs}\n')
    log_file.flush()
    
    for epoch in range(start_epoch, num_epochs):
        start_time = time.time()
        
        train_loss, train_acc = train_model(
            model, train_loader, criterion, optimizer, device, epoch
        )
        
        test_loss, test_acc = validate_model(model, test_loader, criterion, device)
        
        scheduler.step()
        
        epoch_time = time.time() - start_time
        
        print(f'Epoch [{epoch+1}/{num_epochs}] | '
              f'Time: {epoch_time:.2f}s | '
              f'Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.2f}% | '
              f'Test Loss: {test_loss:.4f} | Test Acc: {test_acc:.2f}% | '
              f'LR: {scheduler.get_last_lr()[0]:.6f}')
        
        """保存日志"""
        log_file.write(f'Epoch {epoch+1}: '
                       f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, '
                       f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%, '
                       f'Time: {epoch_time:.2f}s\n')
        log_file.flush()
        
        """保存最佳模型"""
        if test_acc > best_acc:
            best_acc = test_acc
            save_checkpoint(model, optimizer, scheduler, epoch, best_acc, checkpoint_path)
    
    print('训练完成，进行最终测试...')
    final_test_loss, final_test_acc = validate_model(model, test_loader, criterion, device)
    print(f'最终测试结果: Loss: {final_test_loss:.4f}, Acc: {final_test_acc:.2f}%')
    
    log_file.write(f'最终测试结果: Loss: {final_test_loss:.4f}, Acc: {final_test_acc:.2f}%\n')
    log_file.close()
    
    # 保存最终模型
    final_model_path = os.path.join(checkpoint_dir, 'densenet121_final.pth')
    torch.save(model.state_dict(), final_model_path)
    print(f'最终模型已保存: {final_model_path}')

if __name__ == '__main__':
    main()

使用设备: cuda


Epoch 1: 100%|██████████| 391/391 [01:06<00:00,  5.90it/s, Loss=1.682, Acc=38.37%]


Epoch [1/10] | Time: 90.52s | Train Loss: 1.6823 | Train Acc: 38.37% | Test Loss: 1.4262 | Test Acc: 47.62% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 2: 100%|██████████| 391/391 [01:06<00:00,  5.84it/s, Loss=1.351, Acc=50.73%]


Epoch [2/10] | Time: 91.41s | Train Loss: 1.3513 | Train Acc: 50.73% | Test Loss: 1.2571 | Test Acc: 54.86% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 3:  49%|████▉     | 193/391 [00:44<00:21,  9.10it/s, Loss=1.209, Acc=56.40%]Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x0000016E87A33740>
Traceback (most recent call last):
  File "D:\Anaconda3\envs\DL\Lib\site-packages\torch\utils\data\dataloader.py", line 1664, in __del__
    self._shutdown_workers()
  File "D:\Anaconda3\envs\DL\Lib\site-packages\torch\utils\data\dataloader.py", line 1622, in _shutdown_workers
    if self._persistent_workers or self._workers_status[worker_id]:
                                   ^^^^^^^^^^^^^^^^^^^^
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_workers_status'
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x0000016E87A33740>
Traceback (most recent call last):
  File "D:\Anaconda3\envs\DL\Lib\site-packages\torch\utils\data\dataloader.py", line 1664, in __del__
    self._shutdown_workers()
  File "D:\Anaconda3\envs\DL\Lib\site-packages\torch\utils\data\dataloa

Epoch [3/10] | Time: 101.17s | Train Loss: 1.1893 | Train Acc: 57.17% | Test Loss: 1.1596 | Test Acc: 59.09% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 4: 100%|██████████| 391/391 [01:13<00:00,  5.33it/s, Loss=1.072, Acc=61.57%]


Epoch [4/10] | Time: 97.96s | Train Loss: 1.0717 | Train Acc: 61.57% | Test Loss: 1.0095 | Test Acc: 63.59% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 5: 100%|██████████| 391/391 [01:07<00:00,  5.80it/s, Loss=0.978, Acc=65.19%]


Epoch [5/10] | Time: 91.65s | Train Loss: 0.9778 | Train Acc: 65.19% | Test Loss: 1.0126 | Test Acc: 65.56% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 6: 100%|██████████| 391/391 [01:07<00:00,  5.83it/s, Loss=0.906, Acc=67.82%]


Epoch [6/10] | Time: 93.24s | Train Loss: 0.9055 | Train Acc: 67.82% | Test Loss: 0.9372 | Test Acc: 67.67% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 7: 100%|██████████| 391/391 [01:17<00:00,  5.08it/s, Loss=0.849, Acc=69.65%]


Epoch [7/10] | Time: 106.03s | Train Loss: 0.8488 | Train Acc: 69.65% | Test Loss: 0.8635 | Test Acc: 69.90% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 8: 100%|██████████| 391/391 [01:18<00:00,  5.00it/s, Loss=0.799, Acc=71.45%]


Epoch [8/10] | Time: 106.74s | Train Loss: 0.7993 | Train Acc: 71.45% | Test Loss: 0.8494 | Test Acc: 70.42% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 9: 100%|██████████| 391/391 [01:14<00:00,  5.23it/s, Loss=0.752, Acc=73.50%]


Epoch [9/10] | Time: 101.26s | Train Loss: 0.7517 | Train Acc: 73.50% | Test Loss: 0.8236 | Test Acc: 71.36% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth


Epoch 10: 100%|██████████| 391/391 [01:11<00:00,  5.45it/s, Loss=0.717, Acc=74.81%]


Epoch [10/10] | Time: 97.52s | Train Loss: 0.7169 | Train Acc: 74.81% | Test Loss: 0.7905 | Test Acc: 72.46% | LR: 0.010000
检查点已保存: checkpoints\densenet121_best.pth
训练完成，进行最终测试...
最终测试结果: Loss: 0.7905, Acc: 72.46%
最终模型已保存: checkpoints\densenet121_final.pth
