# DenseNet

---

### 背景和尝试解决的问题

- **ResNet后的新挑战**：  
  ResNet通过残差连接缓解了梯度消失问题，但每层仅与相邻层相加，特征复用效率有限。深层网络中，特征的逐层传递可能导致信息逐渐稀释。
- **参数冗余问题**：  
  传统CNN（如VGG、ResNet）通过堆叠层数提升性能，但参数利用率低，大量重复的特征提取导致计算成本高昂。
- **梯度消失残留风险**：  
  尽管ResNet改善了梯度传播，但在极深层网络中（如1000层以上），梯度衰减问题仍未完全消除。

**DenseNet目标**：  
通过密集跨层连接最大化特征复用，减少参数冗余，同时进一步优化梯度流动（由Gao Huang等人在2017年提出）。

---

![alt text](resources/densenet_arch.png "Title")

## 创新点

### 密集连接（Dense Connectivity）
- **密集块（Dense Block）设计**：  
  每一层的输入来自前面所有层的输出（例如第$L$层的输入为$[x_0, x_1, ..., x_{L-1}]$，其中$x_i$为第$i$层的特征图），通过**通道维度拼接**（Concatenation）而非ResNet的加法。
- **特征复用与多样性**：  
  每一层均可访问所有前置层的特征，避免重复提取，鼓励网络学习新特征（即“集体知识”）。

### 关键组件设计
- **增长率（Growth Rate $k$）**：  
  控制每层输出特征图的通道数（例如$k=32$），密集块内每层的输出通道固定为$k$，但拼接后的输入通道数线性增长。
- **过渡层（Transition Layer）**：  
  位于密集块之间，包含：
  - 1×1卷积（压缩通道数）
  - 2×2平均池化（降采样）
- **瓶颈层（Bottleneck Layer）**：  
  在密集块内，每层先通过1×1卷积降维（减少计算量），再进行3×3卷积。

### 优势
- **参数高效**：  
  DenseNet-201（20M参数）在ImageNet上性能优于ResNet-152（60M参数）。
- **梯度流动增强**：  
  反向传播时梯度可直达所有前置层，彻底消除梯度消失问题。
- **隐式深度监督**：  
  浅层特征直接参与深层计算，相当于自动引入中间监督信号。

![alt text](resources/densenet_loss_surface.png "Title")

[Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/pdf/1712.09913)也指出,DenseNet的损失函数更加平滑,更容易收敛。

![alt text](resources/densenet_detail.png "Title")


In [1]:
# 自动重新加载外部module，使得修改代码之后无需重新import
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

from hdd.device.utils import get_device

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# 设置训练数据的路径
DATA_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置TensorBoard的路径
TENSORBOARD_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置预训练模型参数路径
TORCH_HUB_PATH = "~/workspace/hands-dirty-on-dl/pretrained_models"
torch.hub.set_dir(TORCH_HUB_PATH)
# 挑选最合适的训练设备
DEVICE = get_device(["cuda", "cpu"])
print("Use device: ", DEVICE)

Use device:  cuda


## Experiment on cifar10

In [2]:
# 我们提前计算好了训练数据集上的均值和方差
TRAIN_MEAN = [0.50707516, 0.48654887, 0.44091784]
TRAIN_STD = [0.26733429, 0.25643846, 0.27615047]

train_dataset_transforms = transforms.Compose(
    [
        transforms.Pad(4),
        transforms.RandomRotation(3),
        transforms.RandomCrop(32),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
# 加载数据集
train_dataset = datasets.CIFAR10(
    root=DATA_ROOT,
    train=True,
    transform=train_dataset_transforms,
    download=True,
)
val_dataset = datasets.CIFAR10(
    root=DATA_ROOT,
    train=False,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize(TRAIN_MEAN, TRAIN_STD)]
    ),
    download=True,
)
print("Basic Info of train dataset: \n", train_dataset)
print("Basic Info of test dataset: \n", val_dataset)
BATCH_SIZE = 64
train_dataloader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=4,
)
val_dataloader = torch.utils.data.DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=4,
)

Files already downloaded and verified
Files already downloaded and verified
Basic Info of train dataset: 
 Dataset CIFAR10
    Number of datapoints: 50000
    Root location: /home/tf/workspace/hands-dirty-on-dl/dataset
    Split: Train
    StandardTransform
Transform: Compose(
               Pad(padding=4, fill=0, padding_mode=constant)
               RandomRotation(degrees=[-3.0, 3.0], interpolation=nearest, expand=False, fill=0)
               RandomCrop(size=(32, 32), padding=None)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
               Normalize(mean=[0.50707516, 0.48654887, 0.44091784], std=[0.26733429, 0.25643846, 0.27615047])
           )
Basic Info of test dataset: 
 Dataset CIFAR10
    Number of datapoints: 10000
    Root location: /home/tf/workspace/hands-dirty-on-dl/dataset
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=[0.50707516, 0.48654887, 0.44091784], std=[0.26733429, 0.25643

In [3]:
from hdd.models.cnn.densenet import (
    DenseNetSmall40,
)
from hdd.train.classification_utils import (
    naive_train_classification_model,
    eval_image_classifier,
)
from hdd.models.nn_utils import count_trainable_parameter


def train_net(
    net,
    train_dataloader,
    val_dataloader,
    lr,
    weight_decay,
    step_size=30,
    gamma=0.1,
    max_epochs=130,
) -> dict[str, list[float]]:
    criteria = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(
        net.parameters(), lr=lr, momentum=0.9, weight_decay=weight_decay
    )
    scheduler = torch.optim.lr_scheduler.StepLR(
        optimizer, step_size=step_size, gamma=gamma, last_epoch=-1
    )
    training_stats = naive_train_classification_model(
        net,
        criteria,
        max_epochs,
        train_dataloader,
        val_dataloader,
        DEVICE,
        optimizer,
        scheduler,
        verbose=True,
    )
    return training_stats


net = DenseNetSmall40(num_classes=10, dropout=0.2, growth_rate=12).to(DEVICE)
dense40_stats = train_net(
    net,
    train_dataloader,
    val_dataloader,
    lr=0.1,
    weight_decay=1e-4,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")

Epoch: 1/130 Train Loss: 1.6259 Accuracy: 0.4048 Time: 17.50240  | Val Loss: 2.0252 Accuracy: 0.3826
Epoch: 2/130 Train Loss: 1.1345 Accuracy: 0.5932 Time: 17.42568  | Val Loss: 1.0998 Accuracy: 0.6200
Epoch: 3/130 Train Loss: 0.9300 Accuracy: 0.6678 Time: 18.33759  | Val Loss: 1.2027 Accuracy: 0.6291
Epoch: 4/130 Train Loss: 0.7972 Accuracy: 0.7187 Time: 18.79990  | Val Loss: 0.7491 Accuracy: 0.7463
Epoch: 5/130 Train Loss: 0.7043 Accuracy: 0.7551 Time: 17.04572  | Val Loss: 0.8644 Accuracy: 0.7237
Epoch: 6/130 Train Loss: 0.6399 Accuracy: 0.7790 Time: 19.28985  | Val Loss: 0.5805 Accuracy: 0.8041
Epoch: 7/130 Train Loss: 0.5905 Accuracy: 0.7961 Time: 17.71463  | Val Loss: 0.7037 Accuracy: 0.7720
Epoch: 8/130 Train Loss: 0.5575 Accuracy: 0.8059 Time: 18.67479  | Val Loss: 0.5547 Accuracy: 0.8204
Epoch: 9/130 Train Loss: 0.5324 Accuracy: 0.8165 Time: 17.52033  | Val Loss: 0.5621 Accuracy: 0.8162
Epoch: 10/130 Train Loss: 0.5121 Accuracy: 0.8241 Time: 19.09249  | Val Loss: 0.6853 Accura

## Experiment on imagenette

In [4]:
from hdd.dataset.imagenette_in_memory import ImagenetteInMemory
from hdd.data_util.transforms import RandomResize
from torch.utils.data import DataLoader

TRAIN_MEAN = [0.4625, 0.4580, 0.4295]
TRAIN_STD = [0.2452, 0.2390, 0.2469]
train_dataset_transforms = transforms.Compose(
    [
        RandomResize([256, 296, 384]),  # 随机在三个size中选择一个进行resize
        transforms.RandomRotation(10),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
val_dataset_transforms = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
train_dataset = ImagenetteInMemory(
    root=DATA_ROOT,
    split="train",
    size="full",
    download=True,
    transform=train_dataset_transforms,
)
val_dataset = ImagenetteInMemory(
    root=DATA_ROOT,
    split="val",
    size="full",
    download=True,
    transform=val_dataset_transforms,
)


def build_dataloader(batch_size, train_dataset, val_dataset):
    train_dataloader = DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=8
    )
    val_dataloader = DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=8
    )
    return train_dataloader, val_dataloader


train_dataloader, val_dataloader = build_dataloader(32, train_dataset, val_dataset)

In [None]:
from hdd.models.cnn.densenet import (
    DenseNetBC121,
)

net = DenseNetBC121(num_classes=10, dropout=0.0, growth_rate=12).to(DEVICE)
dense40_stats = train_net(
    net,
    train_dataloader,
    val_dataloader,
    lr=0.1,
    weight_decay=1e-4,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")

Epoch: 1/130 Train Loss: 2.0372 Accuracy: 0.3088 Time: 9.17116  | Val Loss: 1.6869 Accuracy: 0.4594
Epoch: 2/130 Train Loss: 1.5700 Accuracy: 0.4777 Time: 8.80793  | Val Loss: 1.5644 Accuracy: 0.4973
Epoch: 3/130 Train Loss: 1.3361 Accuracy: 0.5636 Time: 8.83791  | Val Loss: 1.2912 Accuracy: 0.5987
Epoch: 4/130 Train Loss: 1.2103 Accuracy: 0.6054 Time: 8.78721  | Val Loss: 0.9844 Accuracy: 0.6884
Epoch: 5/130 Train Loss: 1.0998 Accuracy: 0.6436 Time: 9.02652  | Val Loss: 0.9751 Accuracy: 0.6736
Epoch: 6/130 Train Loss: 1.0129 Accuracy: 0.6675 Time: 8.73941  | Val Loss: 0.9953 Accuracy: 0.6879
Epoch: 7/130 Train Loss: 0.9695 Accuracy: 0.6869 Time: 8.75780  | Val Loss: 0.7781 Accuracy: 0.7483
Epoch: 8/130 Train Loss: 0.9142 Accuracy: 0.7035 Time: 8.95806  | Val Loss: 0.9456 Accuracy: 0.6935
Epoch: 9/130 Train Loss: 0.8782 Accuracy: 0.7177 Time: 8.93327  | Val Loss: 0.8223 Accuracy: 0.7376
Epoch: 10/130 Train Loss: 0.8503 Accuracy: 0.7263 Time: 8.92591  | Val Loss: 0.8020 Accuracy: 0.7442