Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
- Author: Sebastian Raschka
- GitHub Repository: https://github.com/rasbt/deeplearning-models

In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Author: Sebastian Raschka

Python implementation: CPython
Python version       : 3.11.11
IPython version      : 9.0.2

torch: 2.6.0+cu126



# ResNet-101 (on CIFAR-10)

### Network Architecture

The network in this notebook is an implementation of the ResNet-101 [1] architecture on the CelebA face dataset [2] to train a gender classifier.  


References
    
- [1] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). ([CVPR Link](https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html))

- [2] Zhang, K., Tan, L., Li, Z., & Qiao, Y. (2016). Gender and smile classification using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 34-38).

The ResNet-101 architecture is similar to the ResNet-50 architecture, which is in turn similar to the ResNet-34 architecture shown below (from [1]) except that the ResNet 101 is using a Bootleneck block (compared to ResNet-34) and more layers than ResNet-50 (figure shows a screenshot from [1]):


![](../images/resnets/resnet101/resnet101-arch-1.png)


The following figure illustrates residual blocks with skip connections such that the input passed via the shortcut matches the dimensions of the main path's output, which allows the network to learn identity functions.

![](../images/resnets/resnet-ex-1-1.png)


The ResNet-34 architecture actually uses residual blocks with modified skip connections such that the input passed via the shortcut matches is resized to dimensions of the main path's output. Such a residual block is illustrated below:

![](../images/resnets/resnet-ex-1-2.png)

The ResNet-50/101/151 then uses a bottleneck as shown below:

![](../images/resnets/resnet-ex-1-3.png)

For a more detailed explanation see the other notebook, [resnet-ex-1.ipynb](resnet-ex-1.ipynb).

## Imports

In [2]:
import os
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data.dataset import Subset

from torchvision import datasets
from torchvision import transforms

import time

import matplotlib.pyplot as plt
from PIL import Image


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True
    torch.cuda.set_per_process_memory_fraction(0.5, device=0)

## Settings

In [3]:
##########################
### 设置参数
##########################

# 超参数（Hyperparameters）
RANDOM_SEED = 1
LEARNING_RATE = 0.01
NUM_EPOCHS = 50

# 网络结构相关参数（Architecture）
NUM_CLASSES = 10
BATCH_SIZE = 128
DEVICE = torch.device('cuda:0')
GRAYSCALE = False

## Dataset

In [4]:
##########################
### CIFAR-10 数据集
##########################

# 注意：transforms.ToTensor() 会将输入图像缩放到 0-1 范围

# 划分训练集和验证集的索引
train_indices = torch.arange(0, 49000)       # 前49,000张作为训练集
valid_indices = torch.arange(49000, 50000)   # 剩余1,000张作为验证集

# 下载并加载 CIFAR-10 数据集（训练集）
train_and_valid = datasets.CIFAR10(root='data', 
                                   train=True, 
                                   transform=transforms.ToTensor(),  # 转换为Tensor并归一化
                                   download=True)

# 根据索引划分训练集和验证集
train_dataset = Subset(train_and_valid, train_indices)
valid_dataset = Subset(train_and_valid, valid_indices)

# 加载测试集
test_dataset = datasets.CIFAR10(root='data', 
                                train=False, 
                                transform=transforms.ToTensor())

#####################################################
### 数据加载器（Data Loaders）
#####################################################

train_loader = DataLoader(dataset=train_dataset, 
                          batch_size=BATCH_SIZE,   # 每批次样本数量
                          num_workers=8,           # 使用8个子线程加速加载数据
                          shuffle=True)            # 打乱训练数据

valid_loader = DataLoader(dataset=valid_dataset, 
                          batch_size=BATCH_SIZE,
                          num_workers=8,
                          shuffle=False)           # 验证集不需要打乱顺序

test_loader = DataLoader(dataset=test_dataset, 
                         batch_size=BATCH_SIZE,
                         num_workers=8,
                         shuffle=False)            # 测试集同样不需要打乱

#####################################################

# 检查训练集数据维度
for images, labels in train_loader:  
    print('训练集图像批次维度:', images.shape)
    print('训练集标签批次维度:', labels.shape)
    break

# 检查测试集数据维度
for images, labels in test_loader:  
    print('测试集图像批次维度:', images.shape)
    print('测试集标签批次维度:', labels.shape)
    break

# 检查验证集数据维度
for images, labels in valid_loader:  
    print('验证集图像批次维度:', images.shape)
    print('验证集标签批次维度:', labels.shape)
    break


训练集图像批次维度: torch.Size([128, 3, 32, 32])
训练集标签批次维度: torch.Size([128])
测试集图像批次维度: torch.Size([128, 3, 32, 32])
测试集标签批次维度: torch.Size([128])
验证集图像批次维度: torch.Size([128, 3, 32, 32])
验证集标签批次维度: torch.Size([128])


## Model

The following code cell that implements the ResNet-34 architecture is a derivative of the code provided at https://pytorch.org/docs/0.4.0/_modules/torchvision/models/resnet.html.

In [5]:
##########################
### 模型定义
##########################


def conv3x3(in_planes, out_planes, stride=1):
    """带填充的3x3卷积层"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


class Bottleneck(nn.Module):
    expansion = 4  # Bottleneck模块的输出通道扩展倍数

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        # 第1层：1x1卷积用于降维
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        
        # 第2层：3x3卷积用于提取特征
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        
        # 第3层：1x1卷积用于升维
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        
        self.relu = nn.ReLU(inplace=True)  # 激活函数
        self.downsample = downsample       # 是否需要下采样
        self.stride = stride               # 步幅

    def forward(self, x):
        residual = x  # 残差连接分支

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        # 如果维度不同或需要下采样，则对残差进行变换
        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual  # 残差连接
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes, grayscale):
        self.inplanes = 64  # 初始通道数
        in_dim = 1 if grayscale else 3  # 灰度图像输入通道为1，RGB图像为3

        super(ResNet, self).__init__()
        
        # 输入图像的初始卷积层（7x7卷积 + BN + ReLU）
        self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # 构建4个残差层，每层包含多个Bottleneck块
        self.layer1 = self._make_layer(block, 64, layers[0])     # 输出通道256
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)  # 输出通道512
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)  # 输出通道1024
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)  # 输出通道2048

        # 平均池化层（默认尺寸为7x7，加了padding=2是为了适配小图）
        self.avgpool = nn.AvgPool2d(7, stride=1, padding=2)
        
        # 全连接层输出分类结果
        # self.fc = nn.Linear(2048 * block.expansion, num_classes)
        self.fc = nn.Linear(2048, num_classes)

        # 初始化所有卷积层和BatchNorm层的参数
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, (2. / n)**.5)
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

    def _make_layer(self, block, planes, blocks, stride=1):
        """
        构建每个残差层，由多个Bottleneck组成
        """
        downsample = None
        # 如果输入输出维度不一致，或步幅不为1，则需下采样
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        # 添加第一个Bottleneck（带下采样）
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        # 添加后续不变尺寸的Bottleneck模块
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        # 输入图像先通过初始卷积层和池化层
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        # 通过4个残差层
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        # x = self.avgpool(x)  # 可启用平均池化
        x = x.view(x.size(0), -1)  # 展平成一维向量
        logits = self.fc(x)        # 全连接层输出
        probas = F.softmax(logits, dim=1)  # 计算类别概率
        return logits, probas


def resnet101(num_classes, grayscale):
    """构建 ResNet-101 模型"""
    model = ResNet(block=Bottleneck, 
                   layers=[3, 4, 23, 3],   # 每个阶段的Bottleneck模块数量
                   num_classes=NUM_CLASSES,
                   grayscale=grayscale)
    return model

In [6]:
torch.manual_seed(RANDOM_SEED)

##########################
### 损失函数和优化器
##########################

model = resnet101(NUM_CLASSES, GRAYSCALE)
model.to(DEVICE)
 
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)  

## Training

In [7]:
def compute_accuracy(model, data_loader, device):
    correct_pred, num_examples = 0, 0
    # 遍历数据加载器中的每一个批次
    for i, (features, targets) in enumerate(data_loader):
        # 将数据转移到GPU/CPU上
        features = features.to(device)
        targets = targets.to(device)

        # 模型前向传播，获取输出结果
        logits, probas = model(features)
        # 取概率最大的预测类别
        _, predicted_labels = torch.max(probas, 1)
        # 统计总样本数
        num_examples += targets.size(0)
        # 累加预测正确的数量
        correct_pred += (predicted_labels == targets).sum()
    # 返回百分比形式的准确率
    return correct_pred.float() / num_examples * 100


start_time = time.time()

# 设置随机种子，使每次运行的结果一致（这里影响批次打乱）
torch.manual_seed(RANDOM_SEED)

for epoch in range(NUM_EPOCHS):
    
    model.train()  # 设置为训练模式
    
    for batch_idx, (features, targets) in enumerate(train_loader):
    
        ### 准备小批量数据
        features = features.to(DEVICE)
        targets = targets.to(DEVICE)
            
        ### 前向传播与反向传播
        logits, probas = model(features)                 # 前向传播
        cost = F.cross_entropy(logits, targets)         # 计算交叉熵损失
        optimizer.zero_grad()                           # 梯度清零
        cost.backward()                                 # 反向传播计算梯度
        
        ### 更新模型参数
        optimizer.step()                                # 梯度下降更新权重
        
        ### 日志打印
        if not batch_idx % 120:  # 每隔120个批次打印一次损失信息
            print(f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} | '
                  f'Batch {batch_idx:03d}/{len(train_loader):03d} |' 
                  f' Cost: {cost:.4f}')

    # 评估模式下，不需要构建反向传播的计算图，节省资源
    with torch.set_grad_enabled(False):
        train_acc = compute_accuracy(model, train_loader, device=DEVICE)
        valid_acc = compute_accuracy(model, valid_loader, device=DEVICE)
        print(f'Epoch: {epoch+1:03d}/{NUM_EPOCHS:03d} Train Acc.: {train_acc:.2f}%'
              f' | Validation Acc.: {valid_acc:.2f}%')
        
    # 每轮结束后打印已用时间
    elapsed = (time.time() - start_time) / 60
    print(f'Time elapsed: {elapsed:.2f} min')
  
# 训练全部完成后，打印总耗时
elapsed = (time.time() - start_time) / 60
print(f'Total Training Time: {elapsed:.2f} min')


Epoch: 001/050 | Batch 000/383 | Cost: 2.7590
Epoch: 001/050 | Batch 120/383 | Cost: 2.2662
Epoch: 001/050 | Batch 240/383 | Cost: 2.0029
Epoch: 001/050 | Batch 360/383 | Cost: 1.9167
Epoch: 001/050 Train Acc.: 26.79% | Validation Acc.: 28.40%
Time elapsed: 0.56 min
Epoch: 002/050 | Batch 000/383 | Cost: 1.8444
Epoch: 002/050 | Batch 120/383 | Cost: 1.7140
Epoch: 002/050 | Batch 240/383 | Cost: 1.8181
Epoch: 002/050 | Batch 360/383 | Cost: 1.6982
Epoch: 002/050 Train Acc.: 38.64% | Validation Acc.: 38.70%
Time elapsed: 1.11 min
Epoch: 003/050 | Batch 000/383 | Cost: 1.5503
Epoch: 003/050 | Batch 120/383 | Cost: 1.6675
Epoch: 003/050 | Batch 240/383 | Cost: 1.5916
Epoch: 003/050 | Batch 360/383 | Cost: 1.5134
Epoch: 003/050 Train Acc.: 46.71% | Validation Acc.: 48.30%
Time elapsed: 1.65 min
Epoch: 004/050 | Batch 000/383 | Cost: 1.2814
Epoch: 004/050 | Batch 120/383 | Cost: 1.4845
Epoch: 004/050 | Batch 240/383 | Cost: 1.3911
Epoch: 004/050 | Batch 360/383 | Cost: 1.2463
Epoch: 004/050 

## Evaluation

In [8]:
with torch.set_grad_enabled(False): # 在推理过程中禁用梯度计算，节省内存
    print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader, device=DEVICE)))

Test accuracy: 74.83%


In [9]:
%watermark -iv

numpy      : 1.26.4
pandas     : 2.2.3
PIL        : 11.1.0
matplotlib : 3.10.1
torchvision: 0.21.0+cu126
torch      : 2.6.0+cu126

