*Accompanying code examples of the book "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python" by [Sebastian Raschka](https://sebastianraschka.com). All code examples are released under the [MIT license](https://github.com/rasbt/deep-learning-book/blob/master/LICENSE). If you find this content useful, please consider supporting the work by buying a [copy of the book](https://leanpub.com/ann-and-deeplearning).*
  
Other code examples and content are available on [GitHub](https://github.com/rasbt/deep-learning-book). The PDF and ebook versions of the book are available through [Leanpub](https://leanpub.com/ann-and-deeplearning).

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Author: Sebastian Raschka

Python implementation: CPython
Python version       : 3.11.11
IPython version      : 9.0.2

torch: 2.6.0+cu126



- Runs on CPU or GPU (if available)

# Model Zoo -- All-Convolutional Neural Network

Simple convolutional neural network that uses stride=2 every 2nd convolutional layer, instead of max pooling, to reduce the feature maps. Loosely based on

- Springenberg, Jost Tobias, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. "Striving for simplicity: The all convolutional net." arXiv preprint arXiv:1412.6806 (2014).

## Imports

In [2]:
import time
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

## Settings and Dataset

In [5]:
##########################
### 参数设置
##########################

# 选择设备：如果有可用的GPU则使用 cuda:0，否则使用 CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 超参数设置
random_seed = 1          # 随机种子，保证结果可复现
learning_rate = 0.001    # 学习率
num_epochs = 15          # 训练轮数
batch_size = 256         # 每个 batch 的样本数

# 模型输出类别数（MNIST 是 10 类：数字 0~9）
num_classes = 10


##########################
### MNIST 数据集加载
##########################

# 注意：transforms.ToTensor() 会将图像像素值缩放到 0-1 范围

# 加载训练集
train_dataset = datasets.MNIST(root='data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)

# 加载测试集
test_dataset = datasets.MNIST(root='data', 
                              train=False, 
                              transform=transforms.ToTensor())

# 创建训练集的 DataLoader，打乱顺序用于训练
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size, 
                          shuffle=True)

# 创建测试集的 DataLoader，顺序不打乱
test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size, 
                         shuffle=False)

# 查看训练集中一个 batch 的数据维度
for images, labels in train_loader:  
    print('图像批次维度:', images.shape)    # e.g. [256, 1, 28, 28]
    print('标签批次维度:', labels.shape)    # e.g. [256]
    break


图像批次维度: torch.Size([256, 1, 28, 28])
标签批次维度: torch.Size([256])


## Model

In [6]:
##########################
### 模型定义
##########################

class ConvNet(torch.nn.Module):

    def __init__(self, num_classes):
        super(ConvNet, self).__init__()
        
        self.num_classes = num_classes

        # 计算 "same padding" 的公式：
        # (w - k + 2*p) / s + 1 = o
        # 推导：p = (s(o-1) - w + k) / 2

        # 输入图像：28x28x1 => 输出：28x28x4
        self.conv_1 = torch.nn.Conv2d(in_channels=1,
                                      out_channels=4,
                                      kernel_size=(3, 3),
                                      stride=(1, 1),
                                      padding=1)  # 保持尺寸不变，padding=1

        # 下采样一半：28x28x4 => 14x14x4
        self.conv_2 = torch.nn.Conv2d(in_channels=4,
                                      out_channels=4,
                                      kernel_size=(3, 3),
                                      stride=(2, 2),
                                      padding=1)

        # 增加通道数：14x14x4 => 14x14x8
        self.conv_3 = torch.nn.Conv2d(in_channels=4,
                                      out_channels=8,
                                      kernel_size=(3, 3),
                                      stride=(1, 1),
                                      padding=1)

        # 再次下采样：14x14x8 => 7x7x8
        self.conv_4 = torch.nn.Conv2d(in_channels=8,
                                      out_channels=8,
                                      kernel_size=(3, 3),
                                      stride=(2, 2),
                                      padding=1)

        # 增加通道数：7x7x8 => 7x7x16
        self.conv_5 = torch.nn.Conv2d(in_channels=8,
                                      out_channels=16,
                                      kernel_size=(3, 3),
                                      stride=(1, 1),
                                      padding=1)

        # 下采样：7x7x16 => 4x4x16
        self.conv_6 = torch.nn.Conv2d(in_channels=16,
                                      out_channels=16,
                                      kernel_size=(3, 3),
                                      stride=(2, 2),
                                      padding=1)

        # 最后一层卷积用于分类：4x4x16 => 4x4xnum_classes（每个类1个特征图）
        self.conv_7 = torch.nn.Conv2d(in_channels=16,
                                      out_channels=self.num_classes,
                                      kernel_size=(3, 3),
                                      stride=(1, 1),
                                      padding=1)


    def forward(self, x):
        out = self.conv_1(x)
        out = F.  (out)
        
        out = self.conv_2(out)
        out = F.relu(out)

        out = self.conv_3(out)
        out = F.relu(out)

        out = self.conv_4(out)
        out = F.relu(out)
        
        out = self.conv_5(out)
        out = F.relu(out)
        
        out = self.conv_6(out)
        out = F.relu(out)
        
        out = self.conv_7(out)
        out = F.relu(out)
        
        # 对每个通道做全局平均池化：4x4 => 1x1
        logits = F.adaptive_avg_pool2d(out, 1)

        # 去掉宽度维度（最后一维）
        logits.squeeze_(-1)

        # 去掉高度维度（倒数第二维）
        logits.squeeze_(-1)

        # 对每个样本的 num_classes 输出做 softmax，得到分类概率
        probas = torch.softmax(logits, dim=1)

        return logits, probas


# 设置随机种子，确保结果可复现
torch.manual_seed(random_seed)

# 实例化模型
model = ConvNet(num_classes=num_classes)

# 将模型移动到指定设备（GPU 或 CPU）
model = model.to(device)

# 定义优化器，这里使用 Adam 并设置学习率
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


## Training

In [7]:
# 计算模型在某个数据加载器（训练集或测试集）上的准确率
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0  # 初始化：正确预测数，总样本数
    for features, targets in data_loader:
        features = features.to(device)
        targets = targets.to(device)

        logits, probas = model(features)  # 前向传播，获取分类输出
        _, predicted_labels = torch.max(probas, 1)  # 获取概率最大的类别索引

        num_examples += targets.size(0)  # 统计总样本数
        correct_pred += (predicted_labels == targets).sum()  # 统计预测正确的数量

    # 返回准确率（百分数形式）
    return correct_pred.float() / num_examples * 100


# =============================
# 模型训练主循环
# =============================

start_time = time.time()  # 记录开始时间

for epoch in range(num_epochs):
    model = model.train()  # 设置为训练模式（启用 Dropout、BN 等）

    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.to(device)
        targets = targets.to(device)

        ### 前向传播 + 反向传播
        logits, probas = model(features)             # 获取输出
        cost = F.cross_entropy(logits, targets)      # 计算交叉熵损失
        optimizer.zero_grad()                        # 梯度清零
        cost.backward()                              # 反向传播计算梯度

        ### 更新模型参数
        optimizer.step()                             # 用优化器更新参数

        ### 打印训练日志（每50个批次打印一次）
        if not batch_idx % 50:
            print('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                  % (epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))
    
    model = model.eval()  # 设置为评估模式（关闭 Dropout、BN 等）

    # 每轮训练结束后，计算训练集上的准确率
    print('Epoch: %03d/%03d 训练集准确率: %.2f%%' % (
          epoch+1, num_epochs, 
          compute_accuracy(model, train_loader)))

    # 打印当前轮次耗时
    print('当前轮耗时: %.2f 分钟' % ((time.time() - start_time)/60))


# =============================
# 打印总训练时间
# =============================
print('总训练时间: %.2f 分钟' % ((time.time() - start_time)/60))


Epoch: 001/015 | Batch 000/235 | Cost: 2.3033
Epoch: 001/015 | Batch 050/235 | Cost: 2.2850
Epoch: 001/015 | Batch 100/235 | Cost: 1.9876
Epoch: 001/015 | Batch 150/235 | Cost: 1.2791
Epoch: 001/015 | Batch 200/235 | Cost: 0.8693
Epoch: 001/015 训练集准确率: 76.40%
当前轮耗时: 0.05 分钟
Epoch: 002/015 | Batch 000/235 | Cost: 0.7913
Epoch: 002/015 | Batch 050/235 | Cost: 0.6480
Epoch: 002/015 | Batch 100/235 | Cost: 0.5416
Epoch: 002/015 | Batch 150/235 | Cost: 0.5962
Epoch: 002/015 | Batch 200/235 | Cost: 0.5097
Epoch: 002/015 训练集准确率: 85.66%
当前轮耗时: 0.10 分钟
Epoch: 003/015 | Batch 000/235 | Cost: 0.4559
Epoch: 003/015 | Batch 050/235 | Cost: 0.3937
Epoch: 003/015 | Batch 100/235 | Cost: 0.4870
Epoch: 003/015 | Batch 150/235 | Cost: 0.3309
Epoch: 003/015 | Batch 200/235 | Cost: 0.3666
Epoch: 003/015 训练集准确率: 89.89%
当前轮耗时: 0.14 分钟
Epoch: 004/015 | Batch 000/235 | Cost: 0.4202
Epoch: 004/015 | Batch 050/235 | Cost: 0.2765
Epoch: 004/015 | Batch 100/235 | Cost: 0.3189
Epoch: 004/015 | Batch 150/235 | Cost

## Evaluation

In [8]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 95.95%


In [9]:
%watermark -iv

torch      : 2.6.0+cu126
numpy      : 1.26.4
torchvision: 0.21.0+cu126

