### CNN Learning

    Author: 彭日骏
    Time: 2025/10/4

尝试完全自主code一个CNN模型学习MNIST数据集并对模型效果进行测试

---

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

is_cuda = torch.cuda.is_available()
is_cuda

True

In [2]:
# More than 1 GPU
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("cuda:0")

device = torch.device("cuda")

In [3]:
# 定义所需要的超参数
num_epochs = 10
batch_size = 100
learning_rate = 0.001

---

#### 笔记：

**关于为何需要提前对数据进行归一化:**


`transforms.Normalize((mean,), (std,))`

可对数据进行归一化, output = (input - mean) / std

好处：
1. 加速模型收敛，使梯度下降更快速更稳定的找到最优解
2. 提升模型精度，减轻梯度消失或梯度爆炸的问题

注: MNIST数据集mean≈0.1307, std≈0.3081

可通过如下cell计算

In [4]:
transforms_for_calc = transforms.ToTensor()
train_dataset_for_calc = torchvision.datasets.MNIST(root='./data', 
                                                    train=True, 
                                                    transform=transforms_for_calc,
                                                    download=True)
loader_for_calc = DataLoader(train_dataset_for_calc, 
                             batch_size=len(train_dataset_for_calc),
                             shuffle=False)
images, labels = next(iter(loader_for_calc))
mean = images.mean()
std = images.std()

In [5]:
print(f'\nmean = {mean:.6f}, std = {std:.6f}\n')


mean = 0.130660, std = 0.308108



---

In [6]:
transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True,
                                           transform=transforms,
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='./data',
                                          train=False,
                                          transform=transforms)

train_loader = DataLoader(dataset=train_dataset,
                          batch_size=batch_size,
                          shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
                         batch_size=batch_size,
                         shuffle=False)

In [7]:
# define CNN model
class My_MNIST_CNN(nn.Module):
    def __init__(self):
        '''
        Make Structure and Define All Functions
        '''
        # Length(L+1) = Floor[(Pooling(L) - Kernel + 2*Padding) / Stride] + 1
        # Pooling(L+1) = Length(L+1) / Pool_size
        # MNIST Image 28*28

        super(My_MNIST_CNN, self).__init__()
        # Fist Kernel: Input Channel(Origin Image = 1), Output Channel, Kernel_size, Stride（步长）, Padding（零填充） 
        # Length(1) = Floor[(28 - 5 + 2*2) / 1] + 1 = 28
        # Pooling(1) = 28 / 2 = 14
        self.conv1 = nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2) # conv1d与conv2d的区别是---???
        # 批归一化层-???作用是???
        self.bn1 = nn.BatchNorm2d(16)
        # 激活
        self.relu1 = nn.ReLU()
        # Pooling
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        # Second Kernel
        # Length(2) = Floor[(14 - 5 + 2*2) / 1] + 1 = 14
        # Pooling(2) = 14 / 2 = 7
        self.conv2 = nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2)
        self.bn2 = nn.BatchNorm2d(32)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        # Fully Connected Layer
        # Feature Map 特征图维度为 7*7*32(Channel)
        # 输出类别10，对应0-9
        self.fc = nn.Linear(7*7*32, 10)
        
    def forward(self, x):
        '''
        Forward Algorithm
        '''
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu1(out)
        out = self.pool1(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu2(out)
        out = self.pool2(out)

        # Flatten input Fully Connected Layer
        out = out.reshape(out.size(0), -1)

        out = self.fc(out)
        return out

# Load Model to GPU
model = My_MNIST_CNN().to(device)

In [8]:
# Train Model

# 定义交叉熵损失
criterion = nn.CrossEntropyLoss()

# 好东西：可自动化追踪梯度
# 使用Adam优化进行梯度下降---???尚未学习什么是Adam优化
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # GPU上训练
        images = images.to(device)
        labels = labels.to(device)

        # Forward Algorithm
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward Algorithm
        optimizer.zero_grad() # 清空之前梯度
        loss.backward()       # 自动计算当前梯度
        optimizer.step()      # 根据梯度更新模型参数

        if (i + 1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}')

print("\nTrain Finish!\n")

Epoch [1/10], Step [100/600], Loss: 0.0975
Epoch [1/10], Step [200/600], Loss: 0.1193
Epoch [1/10], Step [300/600], Loss: 0.0397
Epoch [1/10], Step [400/600], Loss: 0.0704
Epoch [1/10], Step [500/600], Loss: 0.0282
Epoch [1/10], Step [600/600], Loss: 0.0270
Epoch [2/10], Step [100/600], Loss: 0.0343
Epoch [2/10], Step [200/600], Loss: 0.0555
Epoch [2/10], Step [300/600], Loss: 0.0411
Epoch [2/10], Step [400/600], Loss: 0.0857
Epoch [2/10], Step [500/600], Loss: 0.0476
Epoch [2/10], Step [600/600], Loss: 0.0265
Epoch [3/10], Step [100/600], Loss: 0.1388
Epoch [3/10], Step [200/600], Loss: 0.0449
Epoch [3/10], Step [300/600], Loss: 0.1353
Epoch [3/10], Step [400/600], Loss: 0.0175
Epoch [3/10], Step [500/600], Loss: 0.0124
Epoch [3/10], Step [600/600], Loss: 0.0107
Epoch [4/10], Step [100/600], Loss: 0.0368
Epoch [4/10], Step [200/600], Loss: 0.0938
Epoch [4/10], Step [300/600], Loss: 0.0304
Epoch [4/10], Step [400/600], Loss: 0.0147
Epoch [4/10], Step [500/600], Loss: 0.0278
Epoch [4/10

---

#### 笔记：

**关于outputs:**

outputs的形状会是【batch_size, 类别数】，每一行代表该batch中的一张图片，每一列代表对应类别的一个logit(原始分数)，logit最大的类别即是模型认为其最可能在的分类

`torch.max(outputs, dim=1)`

表示在outputs的第二维度，也就是100*10的列的这个维度里的10个数中返回最大值以及最大值索引

这个最大值并不能表示“概率”，只能表示一个分数

**关于交叉熵损失:**

`criterion = nn.CrossEntropyLoss()`

为了数值计算的稳定性和效率，内部整合了
- 1. `LogSoftmax`: 对模型原始输出outputs(logits)执行`logSoftmax`操作。`Softmax`能将原始分数转换为概率分布，而`logSoftmax`则是对这些概率取对数。
- 2. `NLLLoss(Negative Log Likelihood Loss)`: 计算`LogSoftmax`的输出与真实标签之间的负对数似然损失

`criterion = nn.CrossEntropyLoss()`本身包含了`Softmax`及本身取对数的步骤

若需要使用`Softmax`查看最可能概率分布
可加入如下代码

```python
probabilities = torch.softmax(outputs, dim=1) 
prob_values, prob_indices = torch.max(probabilities, 1)
```


---

In [9]:
# Evaluate Model

# Set model evaluate
model.eval()

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        # GPU上评估
        images = images.to(device)
        labels = labels.to(device)

        # Forward Algorithm
        outputs = model(images)

        # 可通过如下代码观察最可能概率分布
        # probabilities = torch.softmax(outputs, dim=1) # probabilities 的形状是 [100, 10]，每一行的和为1
        # prob_values, prob_indices = torch.max(probabilities, 1)

        _, predicted = torch.max(outputs, 1) # 1表示维度，outputs的1维度大小为10
        total += labels.size(0) # 0表示维度，labels的0维度大小为100，1维度大小为10，是100*10矩阵
        correct += (predicted == labels).sum().item()

    print(f'\n准确率: {100 * correct / total:.2f}%\n')


准确率: 99.10%

