*Accompanying code examples of the book "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python" by [Sebastian Raschka](https://sebastianraschka.com). All code examples are released under the [MIT license](https://github.com/rasbt/deep-learning-book/blob/master/LICENSE). If you find this content useful, please consider supporting the work by buying a [copy of the book](https://leanpub.com/ann-and-deeplearning).*
  
Other code examples and content are available on [GitHub](https://github.com/rasbt/deep-learning-book). The PDF and ebook versions of the book are available through [Leanpub](https://leanpub.com/ann-and-deeplearning).

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Author: Sebastian Raschka

Python implementation: CPython
Python version       : 3.11.11
IPython version      : 9.0.2

torch: 2.6.0+cu126



- Runs on CPU or GPU (if available)

# Model Zoo -- Convolutional ResNet and Residual Blocks

Please note that this example does not implement a really deep ResNet as described in literature but rather illustrates how the residual blocks described in He et al. [1] can be implemented in PyTorch.  
请注意，这个例子并没有实现文献中描述的真正深度的ResNet，而是展示了如何在PyTorch中实现He等人[1]描述的残差块。

- [1] He, Kaiming, et al. "Deep residual learning for image recognition." *Proceedings of the IEEE conference on computer vision and pattern recognition*. 2016.  
- [1] He, Kaiming等人，“深度残差学习用于图像识别。” *IEEE计算机视觉与模式识别会议论文集*，2016年。

## Imports

In [2]:
import time
import numpy as np
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision import transforms


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True
    torch.cuda.set_per_process_memory_fraction(0.5, device=0)

## Settings and Dataset

In [3]:
##########################
### 配置参数
##########################

# 设备选择
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  # 如果有可用的 GPU，则使用 GPU，否则使用 CPU

# 超参数设置
random_seed = 123  # 随机种子
learning_rate = 0.01  # 学习率
num_epochs = 10  # 训练轮数
batch_size = 128  # 批次大小

# 模型架构相关
num_classes = 10  # 输出的类别数（MNIST数据集有10个数字类别）


##########################
### MNIST 数据集
##########################

# 注意 transforms.ToTensor() 会将输入图像缩放到 0-1 范围
train_dataset = datasets.MNIST(root='data', 
                               train=True,  # 使用训练集
                               transform=transforms.ToTensor(),  # 将图像转换为 Tensor 类型
                               download=True)  # 如果数据集不存在，则下载

test_dataset = datasets.MNIST(root='data', 
                              train=False,  # 使用测试集
                              transform=transforms.ToTensor())  # 将图像转换为 Tensor 类型


# 使用 DataLoader 加载训练集
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size,  # 每批次的大小
                          shuffle=True)  # 是否打乱数据顺序

# 使用 DataLoader 加载测试集
test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size,  # 每批次的大小
                         shuffle=False)  # 测试集一般不打乱数据顺序


# 检查数据集的形状
for images, labels in train_loader:  
    print('图像批次的维度:', images.shape)  # 打印图像的维度
    print('图像标签的维度:', labels.shape)  # 打印标签的维度
    break  # 只打印一次

图像批次的维度: torch.Size([128, 1, 28, 28])
图像标签的维度: torch.Size([128])


## ResNet with identity blocks

The following code implements the residual blocks with skip connections such that the input passed via the shortcut matches the dimensions of the main path's output, which allows the network to learn identity functions.  
以下代码实现了带有跳跃连接的残差块，使得通过快捷方式传递的输入与主路径输出的维度相匹配，从而使网络能够学习恒等函数。

Such a residual block is illustrated below:  
下图展示了这样的残差块：
![](../images/resnets/resnet-ex-1-1.png)

In [4]:
##########################
### 模型
##########################


class ConvNet(torch.nn.Module):

    def __init__(self, num_classes):
        super(ConvNet, self).__init__()
        
        #########################
        ### 第一个残差块
        #########################
        # 输入: 28x28x1 -> 输出: 28x28x4
        self.conv_1 = torch.nn.Conv2d(in_channels=1,
                                      out_channels=4,
                                      kernel_size=(1, 1),
                                      stride=(1, 1),
                                      padding=0)
        self.conv_1_bn = torch.nn.BatchNorm2d(4)  # 批归一化
        
        # 输入: 28x28x4 -> 输出: 28x28x1
        self.conv_2 = torch.nn.Conv2d(in_channels=4,
                                      out_channels=1,
                                      kernel_size=(3, 3),
                                      stride=(1, 1),
                                      padding=1)   
        self.conv_2_bn = torch.nn.BatchNorm2d(1)  # 批归一化
        
        
        #########################
        ### 第二个残差块
        #########################
        # 输入: 28x28x1 -> 输出: 28x28x4
        self.conv_3 = torch.nn.Conv2d(in_channels=1,
                                      out_channels=4,
                                      kernel_size=(1, 1),
                                      stride=(1, 1),
                                      padding=0)
        self.conv_3_bn = torch.nn.BatchNorm2d(4)  # 批归一化
        
        # 输入: 28x28x4 -> 输出: 28x28x1
        self.conv_4 = torch.nn.Conv2d(in_channels=4,
                                      out_channels=1,
                                      kernel_size=(3, 3),
                                      stride=(1, 1),
                                      padding=1)   
        self.conv_4_bn = torch.nn.BatchNorm2d(1)  # 批归一化

        #########################
        ### 全连接层
        #########################        
        self.linear_1 = torch.nn.Linear(28*28*1, num_classes)  # 输入为展平后的28x28x1，输出为类别数


    def forward(self, x):
        
        #########################
        ### 第一个残差块
        #########################
        shortcut = x  # 保存输入数据作为跳跃连接
        
        out = self.conv_1(x)  # 卷积操作
        out = self.conv_1_bn(out)  # 批归一化
        out = F.relu(out)  # ReLU 激活函数

        out = self.conv_2(out)  # 卷积操作
        out = self.conv_2_bn(out)  # 批归一化
        
        out += shortcut  # 跳跃连接
        out = F.relu(out)  # ReLU 激活函数
        
        #########################
        ### 第二个残差块
        #########################
        
        shortcut = out  # 保存新的跳跃连接
        
        out = self.conv_3(out)  # 卷积操作
        out = self.conv_3_bn(out)  # 批归一化
        out = F.relu(out)  # ReLU 激活函数

        out = self.conv_4(out)  # 卷积操作
        out = self.conv_4_bn(out)  # 批归一化
        
        out += shortcut  # 跳跃连接
        out = F.relu(out)  # ReLU 激活函数
        
        #########################
        ### 全连接层
        #########################   
        logits = self.linear_1(out.view(-1, 28*28*1))  # 展平输出并通过全连接层
        probas = F.softmax(logits, dim=1)  # softmax 得到概率分布
        return logits, probas


# 设置随机种子，确保实验可重复
torch.manual_seed(random_seed)

# 初始化模型
model = ConvNet(num_classes=num_classes)
model = model.to(device)  # 将模型移到指定设备

# 设置优化器
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  # 使用 Adam 优化器


### Training

In [5]:
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0  # 初始化正确预测的数量和总样本数
    for i, (features, targets) in enumerate(data_loader):  # 遍历数据加载器中的每个批次
        features = features.to(device)  # 将特征移到指定设备
        targets = targets.to(device)  # 将标签移到指定设备
        logits, probas = model(features)  # 前向传播，得到logits和预测概率
        _, predicted_labels = torch.max(probas, 1)  # 获取概率最大的类别
        num_examples += targets.size(0)  # 增加批次中的样本数
        correct_pred += (predicted_labels == targets).sum()  # 计算正确预测的数量
    return correct_pred.float()/num_examples * 100  # 返回准确率（百分比）


start_time = time.time()  # 记录训练开始时间
for epoch in range(num_epochs):  # 遍历所有训练轮次
    model = model.train()  # 设置模型为训练模式
    for batch_idx, (features, targets) in enumerate(train_loader):  # 遍历训练数据
        features = features.to(device)  # 将特征移到指定设备
        targets = targets.to(device)  # 将标签移到指定设备
        
        ### 正向传播和反向传播
        logits, probas = model(features)  # 前向传播，得到logits和预测概率
        cost = F.cross_entropy(logits, targets)  # 计算交叉熵损失
        optimizer.zero_grad()  # 清空之前的梯度
        
        cost.backward()  # 反向传播，计算梯度
        
        ### 更新模型参数
        optimizer.step()  # 更新参数
        
        ### 记录日志
        if not batch_idx % 50:  # 每50个批次打印一次日志
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    model = model.eval()  # 设置模型为评估模式，以防在推理过程中更新批归一化参数
    with torch.set_grad_enabled(False):  # 推理时不需要计算梯度，节省内存
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))  # 打印训练集上的准确率

    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))  # 打印已用时间
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))  # 打印总训练时间

Epoch: 001/010 | Batch 000/469 | Cost: 2.6800
Epoch: 001/010 | Batch 050/469 | Cost: 0.2592
Epoch: 001/010 | Batch 100/469 | Cost: 0.3554
Epoch: 001/010 | Batch 150/469 | Cost: 0.2801
Epoch: 001/010 | Batch 200/469 | Cost: 0.4034
Epoch: 001/010 | Batch 250/469 | Cost: 0.3138
Epoch: 001/010 | Batch 300/469 | Cost: 0.4060
Epoch: 001/010 | Batch 350/469 | Cost: 0.2429
Epoch: 001/010 | Batch 400/469 | Cost: 0.2464
Epoch: 001/010 | Batch 450/469 | Cost: 0.3419
Epoch: 001/010 training accuracy: 91.41%
Time elapsed: 0.06 min
Epoch: 002/010 | Batch 000/469 | Cost: 0.3225
Epoch: 002/010 | Batch 050/469 | Cost: 0.2185
Epoch: 002/010 | Batch 100/469 | Cost: 0.3148
Epoch: 002/010 | Batch 150/469 | Cost: 0.2088
Epoch: 002/010 | Batch 200/469 | Cost: 0.3212
Epoch: 002/010 | Batch 250/469 | Cost: 0.2088
Epoch: 002/010 | Batch 300/469 | Cost: 0.2894
Epoch: 002/010 | Batch 350/469 | Cost: 0.3921
Epoch: 002/010 | Batch 400/469 | Cost: 0.2781
Epoch: 002/010 | Batch 450/469 | Cost: 0.2877
Epoch: 002/010 t

### Evaluation

In [6]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 92.03%


## ResNet with convolutional blocks for resizing

The following code implements the residual blocks with skip connections such that the input passed via the shortcut matches is resized to dimensions of the main path's output.  
以下代码实现了带有跳跃连接的残差块，使得通过快捷方式传递的输入被调整为与主路径输出的维度匹配。

Such a residual block is illustrated below:  
下图展示了这样的残差块：

![](../images/resnets/resnet-ex-1-2.png)

In [7]:
##########################
### 模型
##########################



class ConvNet(torch.nn.Module):

    def __init__(self, num_classes):
        super(ConvNet, self).__init__()
        
        #########################
        ### 第1个残差块
        #########################
        # 28x28x1 => 14x14x4 
        self.conv_1 = torch.nn.Conv2d(in_channels=1,
                                      out_channels=4,
                                      kernel_size=(3, 3),
                                      stride=(2, 2),
                                      padding=1)
        self.conv_1_bn = torch.nn.BatchNorm2d(4)  # 批归一化
        
        # 14x14x4 => 14x14x8
        self.conv_2 = torch.nn.Conv2d(in_channels=4,
                                      out_channels=8,
                                      kernel_size=(1, 1),
                                      stride=(1, 1),
                                      padding=0)   
        self.conv_2_bn = torch.nn.BatchNorm2d(8)  # 批归一化
        
        # 28x28x1 => 14x14x8
        self.conv_shortcut_1 = torch.nn.Conv2d(in_channels=1,
                                               out_channels=8,
                                               kernel_size=(1, 1),
                                               stride=(2, 2),
                                               padding=0)   
        self.conv_shortcut_1_bn = torch.nn.BatchNorm2d(8)  # 批归一化
        
        #########################
        ### 第2个残差块
        #########################
        # 14x14x8 => 7x7x16 
        self.conv_3 = torch.nn.Conv2d(in_channels=8,
                                      out_channels=16,
                                      kernel_size=(3, 3),
                                      stride=(2, 2),
                                      padding=1)
        self.conv_3_bn = torch.nn.BatchNorm2d(16)  # 批归一化
        
        # 7x7x16 => 7x7x32
        self.conv_4 = torch.nn.Conv2d(in_channels=16,
                                      out_channels=32,
                                      kernel_size=(1, 1),
                                      stride=(1, 1),
                                      padding=0)   
        self.conv_4_bn = torch.nn.BatchNorm2d(32)  # 批归一化
        
        # 14x14x8 => 7x7x32 
        self.conv_shortcut_2 = torch.nn.Conv2d(in_channels=8,
                                               out_channels=32,
                                               kernel_size=(1, 1),
                                               stride=(2, 2),
                                               padding=0)   
        self.conv_shortcut_2_bn = torch.nn.BatchNorm2d(32)  # 批归一化

        #########################
        ### 全连接层
        #########################        
        self.linear_1 = torch.nn.Linear(7*7*32, num_classes)  # 输出类别数
        
        
    def forward(self, x):
        
        #########################
        ### 第1个残差块
        #########################
        shortcut = x  # 保存输入x用于后续跳跃连接
        
        out = self.conv_1(x) # 28x28x1 => 14x14x4 
        out = self.conv_1_bn(out)  # 批归一化
        out = F.relu(out)  # ReLU激活函数

        out = self.conv_2(out) # 14x14x4 => 14x14x8
        out = self.conv_2_bn(out)  # 批归一化
        
        # 使用线性变换对shortcut进行维度匹配（不使用ReLU）
        shortcut = self.conv_shortcut_1(shortcut)
        shortcut = self.conv_shortcut_1_bn(shortcut)  # 批归一化
        
        out += shortcut  # 残差连接
        out = F.relu(out)  # ReLU激活函数
        
        #########################
        ### 第2个残差块
        #########################
        
        shortcut = out  # 保存当前输出作为跳跃连接
        
        out = self.conv_3(out) # 14x14x8 => 7x7x16 
        out = self.conv_3_bn(out)  # 批归一化
        out = F.relu(out)  # ReLU激活函数

        out = self.conv_4(out) # 7x7x16 => 7x7x32
        out = self.conv_4_bn(out)  # 批归一化
        
        # 使用线性变换对shortcut进行维度匹配（不使用ReLU）
        shortcut = self.conv_shortcut_2(shortcut)
        shortcut = self.conv_shortcut_2_bn(shortcut)  # 批归一化
        
        out += shortcut  # 残差连接
        out = F.relu(out)  # ReLU激活函数
        
        #########################
        ### 全连接层
        #########################   
        logits = self.linear_1(out.view(-1, 7*7*32))  # 扁平化并通过全连接层
        probas = F.softmax(logits, dim=1)  # Softmax激活函数，用于计算每类的概率
        return logits, probas  # 返回logits和概率


torch.manual_seed(random_seed)  # 设置随机种子
model = ConvNet(num_classes=num_classes)  # 初始化模型
model = model.to(device)  # 将模型移至指定设备

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  # 使用Adam优化器


### Training

In [8]:
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0  # 初始化正确预测的数量和样本数量
    for i, (features, targets) in enumerate(data_loader):  # 遍历数据加载器
        features = features.to(device)  # 将输入数据移动到指定设备
        targets = targets.to(device)  # 将标签数据移动到指定设备
        logits, probas = model(features)  # 获取模型的预测结果和概率分布
        _, predicted_labels = torch.max(probas, 1)  # 获取最大概率对应的预测标签
        num_examples += targets.size(0)  # 累加样本数量
        correct_pred += (predicted_labels == targets).sum()  # 累加正确预测的数量
    return correct_pred.float()/num_examples * 100  # 返回准确率


for epoch in range(num_epochs):  # 迭代每个epoch
    model = model.train()  # 设置模型为训练模式
    for batch_idx, (features, targets) in enumerate(train_loader):  # 遍历训练数据加载器中的每个批次
        
        features = features.to(device)  # 将输入数据移动到指定设备
        targets = targets.to(device)  # 将标签数据移动到指定设备
            
        ### 正向传播与反向传播
        logits, probas = model(features)  # 获取模型的预测结果和概率分布
        cost = F.cross_entropy(logits, targets)  # 计算交叉熵损失
        optimizer.zero_grad()  # 清除梯度
        
        cost.backward()  # 计算梯度
        
        ### 更新模型参数
        optimizer.step()  # 使用优化器更新模型的参数
        
        ### 日志记录
        if not batch_idx % 50:  # 每50个批次输出一次日志
            print('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                  % (epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))

    model = model.eval()  # 设置模型为评估模式，以防止在推理时更新批归一化参数
    with torch.set_grad_enabled(False):  # 在推理时不计算梯度以节省内存
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))  # 输出当前epoch的训练准确率


Epoch: 001/010 | Batch 000/469 | Cost: 2.3534
Epoch: 001/010 | Batch 050/469 | Cost: 0.2719
Epoch: 001/010 | Batch 100/469 | Cost: 0.2472
Epoch: 001/010 | Batch 150/469 | Cost: 0.1019
Epoch: 001/010 | Batch 200/469 | Cost: 0.0748
Epoch: 001/010 | Batch 250/469 | Cost: 0.1162
Epoch: 001/010 | Batch 300/469 | Cost: 0.2745
Epoch: 001/010 | Batch 350/469 | Cost: 0.1792
Epoch: 001/010 | Batch 400/469 | Cost: 0.0572
Epoch: 001/010 | Batch 450/469 | Cost: 0.1089
Epoch: 001/010 training accuracy: 97.42%
Epoch: 002/010 | Batch 000/469 | Cost: 0.0958
Epoch: 002/010 | Batch 050/469 | Cost: 0.0521
Epoch: 002/010 | Batch 100/469 | Cost: 0.1024
Epoch: 002/010 | Batch 150/469 | Cost: 0.1421
Epoch: 002/010 | Batch 200/469 | Cost: 0.0985
Epoch: 002/010 | Batch 250/469 | Cost: 0.0494
Epoch: 002/010 | Batch 300/469 | Cost: 0.0252
Epoch: 002/010 | Batch 350/469 | Cost: 0.0329
Epoch: 002/010 | Batch 400/469 | Cost: 0.0230
Epoch: 002/010 | Batch 450/469 | Cost: 0.1180
Epoch: 002/010 training accuracy: 98.21

### Evaluation

In [9]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 98.24%


## ResNet with convolutional blocks for resizing (using a helper class)

This is the same network as above but uses a `ResidualBlock` helper class.  
这与上面的网络相同，但使用了`ResidualBlock`辅助类。

In [10]:
class ResidualBlock(torch.nn.Module):

    def __init__(self, channels):
        
        super(ResidualBlock, self).__init__()
        # 定义第1个卷积层，输入通道为channels[0]，输出通道为channels[1]
        self.conv_1 = torch.nn.Conv2d(in_channels=channels[0],
                                      out_channels=channels[1],
                                      kernel_size=(3, 3),
                                      stride=(2, 2),
                                      padding=1)
        self.conv_1_bn = torch.nn.BatchNorm2d(channels[1])  # 批归一化
        
        # 定义第2个卷积层，输入通道为channels[1]，输出通道为channels[2]
        self.conv_2 = torch.nn.Conv2d(in_channels=channels[1],
                                      out_channels=channels[2],
                                      kernel_size=(1, 1),
                                      stride=(1, 1),
                                      padding=0)   
        self.conv_2_bn = torch.nn.BatchNorm2d(channels[2])  # 批归一化

        # 定义快捷连接的卷积层，输入通道为channels[0]，输出通道为channels[2]
        self.conv_shortcut_1 = torch.nn.Conv2d(in_channels=channels[0],
                                               out_channels=channels[2],
                                               kernel_size=(1, 1),
                                               stride=(2, 2),
                                               padding=0)   
        self.conv_shortcut_1_bn = torch.nn.BatchNorm2d(channels[2])  # 批归一化

    def forward(self, x):
        shortcut = x  # 保存输入x，用于残差连接
        
        out = self.conv_1(x)  # 通过第1个卷积层
        out = self.conv_1_bn(out)  # 批归一化
        out = F.relu(out)  # ReLU激活函数

        out = self.conv_2(out)  # 通过第2个卷积层
        out = self.conv_2_bn(out)  # 批归一化
        
        # 使用线性变换对shortcut进行维度匹配（不使用ReLU激活）
        shortcut = self.conv_shortcut_1(shortcut)
        shortcut = self.conv_shortcut_1_bn(shortcut)  # 批归一化
        
        out += shortcut  # 残差连接
        out = F.relu(out)  # ReLU激活函数

        return out  # 返回输出结果


In [11]:
##########################
### 模型定义
##########################


class ConvNet(torch.nn.Module):  # 定义卷积神经网络类，继承自torch.nn.Module

    def __init__(self, num_classes):  # 构造函数，num_classes是输出类别的数量
        super(ConvNet, self).__init__()  # 调用父类的构造函数
        
        self.residual_block_1 = ResidualBlock(channels=[1, 4, 8])  # 第一个残差块，输入通道为1，输出通道为4和8
        self.residual_block_2 = ResidualBlock(channels=[8, 16, 32])  # 第二个残差块，输入通道为8，输出通道为16和32
    
        self.linear_1 = torch.nn.Linear(7*7*32, num_classes)  # 全连接层，输入大小为7*7*32，输出为num_classes个类别

        
    def forward(self, x):  # 定义前向传播方法

        out = self.residual_block_1.forward(x)  # 通过第一个残差块
        out = self.residual_block_2.forward(out)  # 通过第二个残差块
         
        logits = self.linear_1(out.view(-1, 7*7*32))  # 扁平化输出，并通过全连接层
        probas = F.softmax(logits, dim=1)  # 对logits进行softmax归一化，得到类别概率
        return logits, probas  # 返回logits和概率

    
torch.manual_seed(random_seed)  # 设置随机种子，保证结果可复现
model = ConvNet(num_classes=num_classes)  # 实例化模型，num_classes是类别数量

model.to(device)  # 将模型转移到指定设备
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  # 使用Adam优化器，学习率为learning_rate


### Training

In [12]:
import torch
import torch.nn.functional as F

# 计算准确率的函数
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0
    # 遍历数据加载器中的每个批次
    for i, (features, targets) in enumerate(data_loader):
        # 将数据传输到设备（GPU或CPU）
        features = features.to(device)
        targets = targets.to(device)
        
        # 前向传播
        logits, probas = model(features)
        
        # 获取预测的标签
        _, predicted_labels = torch.max(probas, 1)
        
        # 统计正确预测的数量和样本的总数
        num_examples += targets.size(0)
        correct_pred += (predicted_labels == targets).sum()
    
    # 返回准确率
    return correct_pred.float() / num_examples * 100

# 训练过程
for epoch in range(num_epochs):
    model = model.train()  # 训练模式，启用dropout等层
    
    # 遍历训练数据集
    for batch_idx, (features, targets) in enumerate(train_loader):
        features = features.to(device)
        targets = targets.to(device)
        
        # 前向传播和反向传播
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)  # 计算交叉熵损失
        optimizer.zero_grad()  # 清空梯度
        cost.backward()  # 反向传播
        
        # 更新模型参数
        optimizer.step()
        
        # 每50个批次打印一次日志
        if batch_idx % 50 == 0:
            print('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                  % (epoch+1, num_epochs, batch_idx, len(train_dataset) // batch_size, cost.item()))

    # 在验证模式下评估模型（防止在推理时更新batchnorm参数）
    model = model.eval()  
    with torch.set_grad_enabled(False):  # 推理时不需要计算梯度，节省内存
        # 打印训练集上的准确率
        print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader)))


Epoch: 001/010 | Batch 000/468 | Cost: 2.3534
Epoch: 001/010 | Batch 050/468 | Cost: 0.2719
Epoch: 001/010 | Batch 100/468 | Cost: 0.2472
Epoch: 001/010 | Batch 150/468 | Cost: 0.1019
Epoch: 001/010 | Batch 200/468 | Cost: 0.0748
Epoch: 001/010 | Batch 250/468 | Cost: 0.1162
Epoch: 001/010 | Batch 300/468 | Cost: 0.2745
Epoch: 001/010 | Batch 350/468 | Cost: 0.1792
Epoch: 001/010 | Batch 400/468 | Cost: 0.0572
Epoch: 001/010 | Batch 450/468 | Cost: 0.1089
Epoch: 001/010 training accuracy: 97.42%
Epoch: 002/010 | Batch 000/468 | Cost: 0.0958
Epoch: 002/010 | Batch 050/468 | Cost: 0.0521
Epoch: 002/010 | Batch 100/468 | Cost: 0.1024
Epoch: 002/010 | Batch 150/468 | Cost: 0.1421
Epoch: 002/010 | Batch 200/468 | Cost: 0.0985
Epoch: 002/010 | Batch 250/468 | Cost: 0.0494
Epoch: 002/010 | Batch 300/468 | Cost: 0.0252
Epoch: 002/010 | Batch 350/468 | Cost: 0.0329
Epoch: 002/010 | Batch 400/468 | Cost: 0.0230
Epoch: 002/010 | Batch 450/468 | Cost: 0.1180
Epoch: 002/010 training accuracy: 98.21

### Evaluation

In [13]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 98.24%


In [14]:
%watermark -iv

numpy      : 1.26.4
torch      : 2.6.0+cu126
torchvision: 0.21.0+cu126

