<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Model-Zoo----Convolutional-Neural-Network-(VGG16)" data-toc-modified-id="Model-Zoo----Convolutional-Neural-Network-(VGG16)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Model Zoo -- Convolutional Neural Network (VGG16)</a></span><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Settings-and-Dataset" data-toc-modified-id="Settings-and-Dataset-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Settings and Dataset</a></span></li><li><span><a href="#Model" data-toc-modified-id="Model-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Model</a></span></li><li><span><a href="#Training" data-toc-modified-id="Training-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Training</a></span></li><li><span><a href="#Evaluation" data-toc-modified-id="Evaluation-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Evaluation</a></span></li></ul></li></ul></div>

Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
- Author: Sebastian Raschka
- GitHub Repository: https://github.com/rasbt/deeplearning-models

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Author: Sebastian Raschka

Python implementation: CPython
Python version       : 3.11.11
IPython version      : 9.0.2

torch: 2.6.0+cu126



- Runs on CPU (not recommended here) or GPU (if available)

# Model Zoo -- Convolutional Neural Network (VGG16)

## Imports

In [2]:
import time
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True
    torch.cuda.set_per_process_memory_fraction(0.5, device=0)

## Settings and Dataset

In [3]:
##########################
### 配置设置
##########################

# 设备选择：如果有可用的 GPU，则使用 GPU，否则使用 CPU
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('使用的设备:', DEVICE)

# 超参数设置
random_seed = 1            # 随机种子
learning_rate = 0.001      # 学习率
num_epochs = 10            # 训练的轮数
batch_size = 512           # 每批次的样本数量

# 网络架构设置
num_features = 784         # 特征数量（例如：MNIST 图像的像素数，28x28=784）
num_classes = 10           # 类别数量（对于 CIFAR10 数据集，类别数为 10）

##########################
### CIFAR10 数据集
##########################

# 注意：transforms.ToTensor() 会将输入图像的像素值缩放到 0-1 范围内
train_dataset = datasets.CIFAR10(root='data',  # 数据存储路径
                                 train=True,   # 加载训练集
                                 transform=transforms.ToTensor(),  # 进行 ToTensor 变换
                                 download=True)  # 下载数据集（如果没有下载的话）

test_dataset = datasets.CIFAR10(root='data',   # 数据存储路径
                                train=False,  # 加载测试集
                                transform=transforms.ToTensor())  # 进行 ToTensor 变换

# 创建训练数据加载器（DataLoader），批量读取训练集数据
train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=batch_size,  # 每批次的数据量
                          shuffle=True)           # 是否打乱数据顺序

# 创建测试数据加载器（DataLoader），批量读取测试集数据
test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=batch_size,  # 每批次的数据量
                         shuffle=False)          # 不打乱数据顺序

# 检查数据集的维度（查看一个批次的数据形状）
for images, labels in train_loader:  
    print('图像批次的维度:', images.shape)  # 输出图像的维度，通常是 (batch_size, channels, height, width)
    print('标签的维度:', labels.shape)       # 输出标签的维度，通常是 (batch_size,)
    break  # 只检查第一个批次的数据


使用的设备: cuda:0
图像批次的维度: torch.Size([512, 3, 32, 32])
标签的维度: torch.Size([512])


## Model

In [4]:
##########################
### MODEL
##########################


class VGG16(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super(VGG16, self).__init__()
        
        # calculate same padding:
        # (w - k + 2*p)/s + 1 = o
        # => p = (s(o-1) - w + k)/2
        
        self.block_1 = nn.Sequential(
                nn.Conv2d(in_channels=3,
                          out_channels=64,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          # (1(32-1)- 32 + 3)/2 = 1
                          padding=1), 
                nn.ReLU(),
                nn.Conv2d(in_channels=64,
                          out_channels=64,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=(2, 2),
                             stride=(2, 2))
        )
        
        self.block_2 = nn.Sequential(
                nn.Conv2d(in_channels=64,
                          out_channels=128,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),
                nn.Conv2d(in_channels=128,
                          out_channels=128,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=(2, 2),
                             stride=(2, 2))
        )
        
        self.block_3 = nn.Sequential(        
                nn.Conv2d(in_channels=128,
                          out_channels=256,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),
                nn.Conv2d(in_channels=256,
                          out_channels=256,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),        
                nn.Conv2d(in_channels=256,
                          out_channels=256,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=(2, 2),
                             stride=(2, 2))
        )
        
          
        self.block_4 = nn.Sequential(   
                nn.Conv2d(in_channels=256,
                          out_channels=512,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),        
                nn.Conv2d(in_channels=512,
                          out_channels=512,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),        
                nn.Conv2d(in_channels=512,
                          out_channels=512,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),            
                nn.MaxPool2d(kernel_size=(2, 2),
                             stride=(2, 2))
        )
        
        self.block_5 = nn.Sequential(
                nn.Conv2d(in_channels=512,
                          out_channels=512,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),            
                nn.Conv2d(in_channels=512,
                          out_channels=512,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),            
                nn.Conv2d(in_channels=512,
                          out_channels=512,
                          kernel_size=(3, 3),
                          stride=(1, 1),
                          padding=1),
                nn.ReLU(),    
                nn.MaxPool2d(kernel_size=(2, 2),
                             stride=(2, 2))             
        )
            
        self.classifier = nn.Sequential(
            nn.Linear(512, 4096),
            nn.ReLU(True),
            #nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            #nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes),
        )
            
        for m in self.modules():
            if isinstance(m, torch.nn.Conv2d) or isinstance(m, torch.nn.Linear):
                nn.init.kaiming_uniform_(m.weight, mode='fan_in', nonlinearity='relu')
                if m.bias is not None:
                    m.bias.detach().zero_()
                    
        #self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        
        
    def forward(self, x):

        x = self.block_1(x)
        x = self.block_2(x)
        x = self.block_3(x)
        x = self.block_4(x)
        x = self.block_5(x)
        #x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        logits = self.classifier(x)
        probas = F.softmax(logits, dim=1)

        return logits, probas

    
torch.manual_seed(random_seed)
model = VGG16(num_features=num_features,
              num_classes=num_classes)

model = model.to(DEVICE)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)  

## Training

In [5]:
# 计算模型在给定数据加载器上的准确率
def compute_accuracy(model, data_loader):
    model.eval()  # 设置模型为评估模式，关闭dropout等
    correct_pred, num_examples = 0, 0
    # 遍历数据加载器中的每一个批次
    for i, (features, targets) in enumerate(data_loader):
            
        features = features.to(DEVICE)  # 将特征数据移到设备（GPU 或 CPU）
        targets = targets.to(DEVICE)    # 将目标标签移到设备（GPU 或 CPU）

        logits, probas = model(features)  # 获取模型的输出 logits 和概率值
        _, predicted_labels = torch.max(probas, 1)  # 获取预测的标签，`1` 表示按行选择最大值
        
        num_examples += targets.size(0)  # 更新总样本数
        correct_pred += (predicted_labels == targets).sum()  # 计算预测正确的样本数
    
    # 返回准确率
    return correct_pred.float() / num_examples * 100  # 返回百分比形式的准确率


# 计算模型在给定数据加载器上的平均损失
def compute_epoch_loss(model, data_loader):
    model.eval()  # 设置模型为评估模式
    curr_loss, num_examples = 0., 0
    with torch.no_grad():  # 在此上下文中，禁用梯度计算，节省内存
        # 遍历数据加载器中的每一个批次
        for features, targets in data_loader:
            features = features.to(DEVICE)  # 将特征数据移到设备
            targets = targets.to(DEVICE)    # 将目标标签移到设备
            logits, probas = model(features)  # 获取模型的输出
            loss = F.cross_entropy(logits, targets, reduction='sum')  # 计算交叉熵损失，`sum`表示对所有样本的损失求和
            num_examples += targets.size(0)  # 更新样本数
            curr_loss += loss  # 累加损失

        curr_loss = curr_loss / num_examples  # 计算平均损失
        return curr_loss  # 返回平均损失
    

# 训练过程开始
start_time = time.time()  # 记录训练开始时间
for epoch in range(num_epochs):
    
    model.train()  # 设置模型为训练模式
    # 遍历训练集的每一个批次
    for batch_idx, (features, targets) in enumerate(train_loader):
        
        features = features.to(DEVICE)  # 将特征数据移到设备
        targets = targets.to(DEVICE)    # 将目标标签移到设备
            
        ### 正向传播和反向传播 ###
        logits, probas = model(features)  # 获取模型的输出
        cost = F.cross_entropy(logits, targets)  # 计算交叉熵损失
        optimizer.zero_grad()  # 清除之前计算的梯度
        
        cost.backward()  # 反向传播计算梯度
        
        ### 更新模型参数 ###
        optimizer.step()  # 更新模型参数
        
        ### 记录日志 ###
        if not batch_idx % 50:  # 每50个批次打印一次日志
            print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f' 
                   % (epoch+1, num_epochs, batch_idx, 
                      len(train_loader), cost))  # 输出当前轮次、批次和损失值

    model.eval()  # 设置模型为评估模式
    with torch.set_grad_enabled(False):  # 禁用梯度计算，节省内存
        # 输出训练集的准确率和损失
        print('Epoch: %03d/%03d | Train: %.3f%% |  Loss: %.3f' % (
              epoch+1, num_epochs, 
              compute_accuracy(model, train_loader),  # 计算并输出训练集的准确率
              compute_epoch_loss(model, train_loader)))  # 计算并输出训练集的平均损失


    # 输出每个epoch的时间
    print('Time elapsed: %.2f min' % ((time.time() - start_time) / 60))
    
# 输出总的训练时间
print('Total Training Time: %.2f min' % ((time.time() - start_time) / 60))

Epoch: 001/010 | Batch 0000/0098 | Cost: 2.3856
Epoch: 001/010 | Batch 0050/0098 | Cost: 2.2934
Epoch: 001/010 | Train: 16.854% |  Loss: 2.162
Time elapsed: 0.34 min
Epoch: 002/010 | Batch 0000/0098 | Cost: 2.1523
Epoch: 002/010 | Batch 0050/0098 | Cost: 1.8502
Epoch: 002/010 | Train: 35.606% |  Loss: 1.631
Time elapsed: 0.67 min
Epoch: 003/010 | Batch 0000/0098 | Cost: 1.5630
Epoch: 003/010 | Batch 0050/0098 | Cost: 1.5667
Epoch: 003/010 | Train: 40.448% |  Loss: 1.565
Time elapsed: 1.00 min
Epoch: 004/010 | Batch 0000/0098 | Cost: 1.6272
Epoch: 004/010 | Batch 0050/0098 | Cost: 1.4153
Epoch: 004/010 | Train: 54.202% |  Loss: 1.234
Time elapsed: 1.34 min
Epoch: 005/010 | Batch 0000/0098 | Cost: 1.2143
Epoch: 005/010 | Batch 0050/0098 | Cost: 1.2371
Epoch: 005/010 | Train: 62.090% |  Loss: 1.037
Time elapsed: 1.67 min
Epoch: 006/010 | Batch 0000/0098 | Cost: 1.0721
Epoch: 006/010 | Batch 0050/0098 | Cost: 1.0059
Epoch: 006/010 | Train: 70.060% |  Loss: 0.844
Time elapsed: 2.00 min
Epoc

## Evaluation

In [6]:
with torch.set_grad_enabled(False): # save memory during inference
    print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 72.92%


In [7]:
%watermark -iv

numpy      : 1.26.4
torchvision: 0.21.0+cu126
torch      : 2.6.0+cu126

