Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
- Author: Sebastian Raschka
- GitHub Repository: https://github.com/rasbt/deeplearning-models

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Author: Sebastian Raschka

Python implementation: CPython
Python version       : 3.11.11
IPython version      : 8.30.0

torch: 2.6.0+cu126



- Runs on CPU or GPU (if available)

# Model Zoo -Standardizing Images

This notebook provides an example for working with standardized images, that is, images where the image pixels in each image has mean zero and unit variance across the channel.
本笔记本提供了一个标准化图像的示例，即图像中每个像素在各通道中具有零均值和单位方差的图像。

The general equation for z-score standardization is computed as
z-score标准化的通用公式如下计算：

$$x' = \frac{x_i - \mu}{\sigma}$$

where $\mu$ is the mean and $\sigma$ is the standard deviation of the training set, respectively. Then $x_i'$ is the scaled feature feature value, and $x_i$ is the original feature value.
其中，$\mu$ 是训练集的均值，$\sigma$ 是标准差。然后 $x_i'$ 是缩放后的特征值，$x_i$ 是原始特征值。

I.e, for grayscale images, we would obtain 1 mean and 1 standard deviation. For RGB images (3 color channels), we would obtain 3 mean values and 3 standard deviations.
也就是说，对于灰度图像，我们将获得一个均值和一个标准差。对于RGB图像（3个颜色通道），我们将获得3个均值和3个标准差。

## Imports

In [2]:
import time
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

## Settings and Dataset

In [3]:
##########################
### 设置
##########################

# 设备设置
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  # 如果GPU可用，使用GPU，否则使用CPU

# 超参数设置
random_seed = 1  # 随机种子
learning_rate = 0.05  # 学习率
num_epochs = 10  # 训练的轮次
batch_size = 128  # 每个批次的样本数

# 网络架构设置
num_classes = 10  # 类别数量（CIFAR-10有10个类别）

### Compute the Mean and Standard Deviation for Normalization

First, we need to determine the mean and standard deviation for each color channel in the training set.  
首先，我们需要确定训练集中每个颜色通道的均值和标准差。

Since we assume the entire dataset does not fit into the computer memory all at once, we do this in an incremental fashion, as shown below.  
由于我们假设整个数据集无法一次性加载到计算机内存中，因此我们采用增量方式进行处理，如下所示。

In [4]:
##############################
### 初步数据加载器
##############################


# 加载MNIST训练数据集
train_dataset = datasets.MNIST(root='data', 
                               train=True,  # 训练集
                               transform=transforms.ToTensor(),  # 转换为Tensor格式
                               download=True)  # 如果数据集不存在，则下载

# 创建数据加载器
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size,  # 批次大小
                          shuffle=False)  # 不打乱数据

# 初始化存储均值和标准差的列表
train_mean = []
train_std = []

# 计算训练集的均值和标准差
for i, image in enumerate(train_loader, 0):  # 遍历数据加载器
    numpy_image = image[0].numpy()  # 将图像数据转换为numpy数组
    
    # 计算当前批次的均值和标准差，axis=(0, 2, 3)表示按（通道，高度，宽度）维度计算
    batch_mean = np.mean(numpy_image, axis=(0, 2, 3))
    batch_std = np.std(numpy_image, axis=(0, 2, 3))
    
    # 将批次的均值和标准差添加到列表中
    train_mean.append(batch_mean)
    train_std.append(batch_std)

# 计算整个训练集的均值和标准差
train_mean = torch.tensor(np.mean(train_mean, axis=0))  # 计算均值
train_std = torch.tensor(np.mean(train_std, axis=0))  # 计算标准差

# 打印训练集的均值和标准差
print('Mean:', train_mean)
print('Std Dev:', train_std)


100.0%
100.0%
100.0%
100.0%


Mean: tensor([0.1307])
Std Dev: tensor([0.3077])


**Note that**  
**请注意，**

- For RGB images (3 color channels), we would get 3 means and 3 standard deviations.  
- 对于RGB图像（3个颜色通道），我们会得到3个均值和3个标准差。

- The transforms.ToTensor() method converts images to [0, 1] range, which is why the mean and standard deviation values are below 1.  
- transforms.ToTensor()方法将图像转换为[0, 1]范围，这就是为什么均值和标准差值低于1的原因。

### Standardized Dataset Loader

Now we can use a custom transform function to standardize the dataset according to the mean and standard deviation we computed above.  
现在我们可以使用自定义转换函数，根据我们上面计算的均值和标准差对数据集进行标准化。

In [5]:
custom_transform = transforms.Compose([transforms.ToTensor(),
                                       transforms.Normalize(mean=train_mean, std=train_std)])

In [6]:
##########################
### MNIST 数据集
##########################

# 注意：transforms.ToTensor() 将输入图像缩放到0到1的范围内
train_dataset = datasets.MNIST(root='data', 
                               train=True,  # 使用训练集
                               transform=custom_transform,  # 自定义的数据转换操作
                               download=True)  # 如果数据集不存在，则下载

test_dataset = datasets.MNIST(root='data', 
                              train=False,  # 使用测试集
                              transform=custom_transform)  # 自定义的数据转换操作


# 创建训练数据加载器
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size,  # 批次大小
                          shuffle=True)  # 打乱数据

# 创建测试数据加载器
test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size,  # 批次大小
                         shuffle=False)  # 不打乱数据


Check that the dataset can be loaded:  
检查数据集是否可以加载：

In [7]:
# Checking the dataset
for images, labels in train_loader:  
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    break

Image batch dimensions: torch.Size([128, 1, 28, 28])
Image label dimensions: torch.Size([128])


For the given batch, check that the channel means and standard deviations are roughly 0 and 1, respectively:  
对于给定的批次，检查各通道的均值和标准差是否分别接近0和1：

In [8]:
print('Channel mean:', torch.mean(images[:, 0, :, :]))
print('Channel std:', torch.std(images[:, 0, :, :]))

Channel mean: tensor(-0.0112)
Channel std: tensor(0.9893)


## Model

In [9]:
##########################
### 模型
##########################


class ConvNet(torch.nn.Module):

    def __init__(self, num_classes):
        super(ConvNet, self).__init__()
        
        # 计算相同填充（same padding）：
        # (w - k + 2*p)/s + 1 = o
        # => p = (s(o-1) - w + k)/2
        
        # 28x28x1 => 28x28x4
        self.conv_1 = torch.nn.Conv2d(in_channels=1,  # 输入通道数为1（灰度图像）
                                      out_channels=4,  # 输出通道数为4
                                      kernel_size=(3, 3),  # 卷积核大小为3x3
                                      stride=(1, 1),  # 步幅为1
                                      padding=1)  # 填充为1，保持尺寸不变（计算：p = (1(28-1) - 28 + 3) / 2 = 1）
        # 28x28x4 => 14x14x4
        self.pool_1 = torch.nn.MaxPool2d(kernel_size=(2, 2),  # 最大池化层，池化大小为2x2
                                         stride=(2, 2),  # 步幅为2
                                         padding=0)  # 填充为0（计算：p = (2(14-1) - 28 + 2) = 0）
                                       
        # 14x14x4 => 14x14x8
        self.conv_2 = torch.nn.Conv2d(in_channels=4,  # 输入通道数为4
                                      out_channels=8,  # 输出通道数为8
                                      kernel_size=(3, 3),  # 卷积核大小为3x3
                                      stride=(1, 1),  # 步幅为1
                                      padding=1)  # 填充为1，保持尺寸不变（计算：p = (1(14-1) - 14 + 3) / 2 = 1）
                 
        # 14x14x8 => 7x7x8
        self.pool_2 = torch.nn.MaxPool2d(kernel_size=(2, 2),  # 最大池化层，池化大小为2x2
                                         stride=(2, 2),  # 步幅为2
                                         padding=0)  # 填充为0（计算：p = (2(7-1) - 14 + 2) = 0）
        
        # 全连接层，将7*7*8个特征压缩为num_classes个类别
        self.linear_1 = torch.nn.Linear(7*7*8, num_classes)

        
    def forward(self, x):
        # 前向传播过程
        out = self.conv_1(x)  # 经过第一层卷积
        out = F.relu(out)  # ReLU激活函数
        out = self.pool_1(out)  # 经过池化层

        out = self.conv_2(out)  # 经过第二层卷积
        out = F.relu(out)  # ReLU激活函数
        out = self.pool_2(out)  # 经过池化层
        
        logits = self.linear_1(out.view(-1, 7*7*8))  # 将特征展平并通过全连接层
        probas = F.softmax(logits, dim=1)  # 计算Softmax概率分布
        return logits, probas  # 返回logits和概率分布


# 设置随机种子
torch.manual_seed(random_seed)

# 初始化模型
model = ConvNet(num_classes=num_classes)

# 将模型转移到指定的设备（CPU或GPU）
model = model.to(device)

# 设置优化器，这里使用随机梯度下降（SGD）
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)


## Training

In [10]:
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0
    for features, targets in data_loader:
        features = features.to(device)
        targets = targets.to(device)
        logits, probas = model(features)
        _, predicted_labels = torch.max(probas, 1)
        num_examples += targets.size(0)
        correct_pred += (predicted_labels == targets).sum()
    return correct_pred.float()/num_examples * 100
    

start_time = time.time()
for epoch in range(num_epochs):
    model = model.train()
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.to(device)
        targets = targets.to(device)

        ### FORWARD AND BACK PROP
        logits, probas = model(features)
        cost = F.cross_entropy(logits, targets)
        optimizer.zero_grad()
        
        cost.backward()
        
        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        ### LOGGING
        if not batch_idx % 50:
            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' 
                   %(epoch+1, num_epochs, batch_idx, 
                     len(train_loader), cost))
    
    model = model.eval()
    print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
          epoch+1, num_epochs, 
          compute_accuracy(model, train_loader)))
    
    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
    
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

Epoch: 001/010 | Batch 000/469 | Cost: 2.3226
Epoch: 001/010 | Batch 050/469 | Cost: 0.6868
Epoch: 001/010 | Batch 100/469 | Cost: 0.3672
Epoch: 001/010 | Batch 150/469 | Cost: 0.1709
Epoch: 001/010 | Batch 200/469 | Cost: 0.1694
Epoch: 001/010 | Batch 250/469 | Cost: 0.1217
Epoch: 001/010 | Batch 300/469 | Cost: 0.1377
Epoch: 001/010 | Batch 350/469 | Cost: 0.1646
Epoch: 001/010 | Batch 400/469 | Cost: 0.1692
Epoch: 001/010 | Batch 450/469 | Cost: 0.1302
Epoch: 001/010 training accuracy: 93.54%
Time elapsed: 0.09 min
Epoch: 002/010 | Batch 000/469 | Cost: 0.3378
Epoch: 002/010 | Batch 050/469 | Cost: 0.0870
Epoch: 002/010 | Batch 100/469 | Cost: 0.1784
Epoch: 002/010 | Batch 150/469 | Cost: 0.1351
Epoch: 002/010 | Batch 200/469 | Cost: 0.1303
Epoch: 002/010 | Batch 250/469 | Cost: 0.1283
Epoch: 002/010 | Batch 300/469 | Cost: 0.0808
Epoch: 002/010 | Batch 350/469 | Cost: 0.1514
Epoch: 002/010 | Batch 400/469 | Cost: 0.1341
Epoch: 002/010 | Batch 450/469 | Cost: 0.0904
Epoch: 002/010 t

## Evaluation

In [11]:
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 98.27%


In [12]:
%watermark -iv

torch      : 2.6.0+cu126
torchvision: 0.21.0+cu126
numpy      : 2.1.2

