# CondenseNet 关键观察与创新点总结

**CondenseNet** 是一种高效的卷积神经网络架构，结合了 **密集连接（DenseNet）** 和 **可学习分组卷积（Learnable Group Convolution, LGC）** 的思想，旨在实现高性能和低计算成本的图像分类任务。以下是其主要的关键观察与创新点。

---

## 🔍 关键观察（Key Observations）

1. **特征复用是高效模型设计的核心**
   - DenseNet 中通过密集连接实现了特征复用，减少了冗余信息的学习，提升了模型效率。
   - CondenseNet 延续了这一理念，并进一步优化了特征通道的选择机制。

2. **冗余通道对性能提升有限**
   - 实验发现，许多通道在推理过程中贡献较小，去除这些“非必要”通道对精度影响极小，却能显著降低计算量。

3. **训练阶段压缩通道可实现推理加速**
   - 在训练过程中逐步淘汰不重要的通道（即“condense”），使网络在推理时仅保留必要通道，从而减少FLOPs和参数量。

4. **结构稀疏性优于随机稀疏性**
   - 相比于随机剪枝或正则化方法，CondenseNet 采用结构化的通道剪枝方式，更易于部署和加速。

---

## 🧩 创新点（Innovations）

### 1. ✅ 可学习分组卷积（Learnable Group Convolution, LGC）

- 在训练过程中，将标准卷积分解为多个组卷积。
- 每个输入通道只连接到一组输出通道，且这种连接关系在训练中通过反向传播自动学习。
- 推理时，只保留每组中响应最强的输出通道，实现通道压缩。

### 2. 🚀 分阶段训练策略（Progressive Training）

- 将训练分为多个阶段，在每个阶段逐渐增加压缩程度。
- 类似课程学习的方式，让网络逐步适应稀疏结构，避免一次性压缩带来的性能下降。

  <img src="resources/condensenet_condensing_procedure.png" alt="drawing" width="60%"/>

### 3. 📉 结构化稀疏 + 高效推理

- 所有压缩操作都是结构化的，可以在通用硬件（如GPU）上直接加速，无需特殊硬件支持。
- 模型大小和计算量显著降低，同时保持了较高的准确率。

### 4. 🔁 灵活适配不同压缩率

- 可以根据实际需求调整压缩阶段数和每组的通道数，灵活平衡精度与速度。

---

<img src="resources/condensenet_blocks.png" alt="drawing" width="50%"/>

## ✅ 总结

CondenseNet 提出了一种新颖的、结构化稀疏化方法，结合了 DenseNet 的高效特征复用和可学习分组卷积的思想。其核心在于通过训练过程中的“压缩”机制，引导网络在推理阶段仅保留必要的通道连接，从而实现高精度与高效率的统一。

如果你正在寻找一个 **轻量级但准确率高的网络架构**，CondenseNet 是一个非常值得尝试的选择！

> 💡 **提示**：如果你需要我继续补充代码示例、训练技巧或 PyTorch 实现要点，也可以告诉我！

In [1]:
# 自动重新加载外部module，使得修改代码之后无需重新import
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

from hdd.device.utils import get_device

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# 设置训练数据的路径
DATA_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置TensorBoard的路径
TENSORBOARD_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置预训练模型参数路径
TORCH_HUB_PATH = "~/workspace/hands-dirty-on-dl/pretrained_models"
torch.hub.set_dir(TORCH_HUB_PATH)
# 挑选最合适的训练设备
DEVICE = get_device(["cuda", "cpu"])
print("Device: ", DEVICE)

Device:  cuda


In [2]:
from hdd.data_util.auto_augmentation import CIFAR10Policy

# 训练超参数和数据增强来自 https://github.com/omihub777/ViT-CIFAR
CIFAR_10_MEAN = [0.4914, 0.4822, 0.4465]
CIFAR_10_STD = [0.2470, 0.2435, 0.2616]
BATCH_SIZE = 128

val_transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(CIFAR_10_MEAN, CIFAR_10_STD),
    ]
)

val_dataloader = torch.utils.data.DataLoader(
    datasets.CIFAR10(
        root=DATA_ROOT, train=False, download=True, transform=val_transform
    ),
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=8,
    pin_memory=True,
)

train_transform = transforms.Compose(
    [
        transforms.RandomCrop(size=32, padding=4),
        transforms.RandomHorizontalFlip(),
        CIFAR10Policy(),
        transforms.ToTensor(),
        transforms.Normalize(CIFAR_10_MEAN, CIFAR_10_STD),
    ]
)

train_dataloader = torch.utils.data.DataLoader(
    datasets.CIFAR10(
        root=DATA_ROOT, train=True, download=True, transform=train_transform
    ),
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=8,
    pin_memory=True,
)

Files already downloaded and verified
Files already downloaded and verified


In [3]:
import time
from dataclasses import dataclass
from typing import Callable, List, Optional, Tuple

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data.dataset import Dataset
from hdd.train.classification_utils import _eval_classifier_naive
from hdd.train.early_stopping import EarlyStoppingInterface


def train_classifier_one_epoch(
    net: nn.Module,
    criteria: nn.CrossEntropyLoss,
    optimizer: optim.Optimizer,
    train_loader: torch.utils.data.DataLoader,
    device: torch.device,
    epoch: int,
    max_epochs: int,
    lasso_lambda: float,
) -> Tuple[float, float]:
    """Naive training procedure to train classifier for one epoch.

    Args:
        net: network instance.
        criteria: Loss function. Typically nn.CrossEntropyLoss
        optimizer: optimizer.
        train_loader: train data
        device: device to run the training.

    Returns:
        avg train loss and train accuracy.
    """

    train_loss = 0.0
    correct_items = 0
    total_items = 0
    net.train()
    progress = epoch / max_epochs
    learned_module_list = []
    for m in net.modules():
        if m.__str__().startswith("LearnedGroupConv"):
            learned_module_list.append(m)

    for i, [Xs, ys] in enumerate(train_loader):
        Xs, ys = Xs.to(device), ys.to(device)
        optimizer.zero_grad()
        progress = float(epoch * len(train_loader) + i) / (
            max_epochs * len(train_loader)
        )
        logits = net(Xs, progress)
        loss = criteria(logits, ys)
        lasso_loss = 0
        for m in learned_module_list:
            lasso_loss = lasso_loss + m.lasso_loss
            loss = loss + lasso_lambda * lasso_loss

        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        correct_items += torch.sum(torch.argmax(logits, dim=1) == ys).item()
        total_items += Xs.shape[0]

    avg_train_loss = train_loss / len(train_loader)
    accuracy = correct_items / total_items
    return avg_train_loss, accuracy


def train_condensenet_model(
    net: nn.Module,
    criteria,
    max_epochs: int,
    lasso_lambda: float,
    train_loader: torch.utils.data.DataLoader,
    val_loader: torch.utils.data.DataLoader,
    device: torch.device,
    optimizer: optim.Optimizer,
    scheduler: Optional[torch.optim.lr_scheduler.LRScheduler] = None,
    early_stopper: Optional[EarlyStoppingInterface] = None,
    verbose: bool = True,
    eval_classifier: Callable[
        [
            nn.Module,
            nn.CrossEntropyLoss,
            torch.utils.data.DataLoader,
            torch.device,
        ],
        Tuple[float, float],
    ] = _eval_classifier_naive,
) -> dict[str, list[float]]:
    """Naive classifier training procedure.

    Args:
        net: classification model.
        criteria: loss function.
        max_epochs: maximum number of epochs.
        train_loader: train dataloader.
        val_loader: validation dataloader.
        device: network device.
        optimizer: optimizer
        scheduler: learning rate scheduler.
        early_stopper: early stopper.
        verbose: Print anything or not. Defaults to True.
        eval_classifier: Function to eval the classifier for one epoch.
    Returns:
        training statistics.
    """
    result = {
        "train_loss": [],
        "val_loss": [],
        "train_accuracy": [],
        "val_accuracy": [],
    }
    for epoch in range(1, max_epochs + 1):
        t0 = time.time()
        avg_train_loss, train_accuracy = train_classifier_one_epoch(
            net,
            criteria,
            optimizer,
            train_loader,
            device,
            epoch,
            max_epochs,
            lasso_lambda,
        )
        t1 = time.time()
        if scheduler is not None:
            scheduler.step()
        avg_val_loss, val_accuracy = eval_classifier(
            net,
            criteria,
            val_loader,
            device,
        )
        if verbose:
            print(
                f"Epoch: {epoch}/{max_epochs} "
                f"Train Loss: {avg_train_loss:0.4f} "
                f"Accuracy: {train_accuracy:0.4f} "
                f"Time: {t1 - t0:0.5f} "
                f" | Val Loss: {avg_val_loss:0.4f} "
                f"Accuracy: {val_accuracy:0.4f}"
            )
        result["train_loss"].append(avg_train_loss)
        result["val_loss"].append(avg_val_loss)
        result["train_accuracy"].append(train_accuracy)
        result["val_accuracy"].append(val_accuracy)
        if early_stopper is not None:
            if early_stopper(val_loss=avg_val_loss, model=net):
                print(f"Early stop at epoch {epoch}!")
                early_stopper.load_best_model(net)
                return result

    return result

In [4]:
from hdd.train.warmup_scheduler import GradualWarmupScheduler
from hdd.models.cnn.condensenet import CondenseNet
from hdd.models.nn_utils import count_trainable_parameter


net = CondenseNet(
    num_classes=10,
    group_1x1=4,
    group_3x3=4,
    bottleneck=4,
    condense_factor=4,
    dropout_rate=0,
    stages=[14, 14, 14],
    growth=[8, 16, 32],
    data="cifar10",
).to(DEVICE)
print(f"#Parameter: {count_trainable_parameter(net)}")
criteria = nn.CrossEntropyLoss(label_smoothing=0.1)
optimizer = torch.optim.Adam(
    net.parameters(), lr=1e-3, betas=(0.9, 0.999), weight_decay=1e-5
)
max_epochs = 100
base_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, max_epochs, eta_min=1e-5
)
scheduler = GradualWarmupScheduler(
    optimizer,
    multiplier=1.0,
    total_epoch=10,
    after_scheduler=base_scheduler,
)
lasso_lambda = 1e-5
random_setting = train_condensenet_model(
    net,
    criteria,
    max_epochs,
    lasso_lambda,
    train_dataloader,
    val_dataloader,
    DEVICE,
    optimizer,
    scheduler,
    verbose=True,
)

#Parameter: 1451594
Epoch: 1/100 Train Loss: 6.5175 Accuracy: 0.0970 Time: 20.42050  | Val Loss: 2.3529 Accuracy: 0.0939
Epoch: 2/100 Train Loss: 6.1064 Accuracy: 0.2585 Time: 19.66456  | Val Loss: 1.8150 Accuracy: 0.3684
Epoch: 3/100 Train Loss: 5.5055 Accuracy: 0.3847 Time: 20.18755  | Val Loss: 1.5877 Accuracy: 0.5052
Epoch: 4/100 Train Loss: 4.9899 Accuracy: 0.4660 Time: 19.55599  | Val Loss: 1.4525 Accuracy: 0.5785
Epoch: 5/100 Train Loss: 4.5329 Accuracy: 0.5312 Time: 19.74438  | Val Loss: 1.3364 Accuracy: 0.6381
Epoch: 6/100 Train Loss: 4.1239 Accuracy: 0.5744 Time: 19.97007  | Val Loss: 1.2596 Accuracy: 0.6717
Epoch: 7/100 Train Loss: 3.7005 Accuracy: 0.6214 Time: 19.78455  | Val Loss: 1.2155 Accuracy: 0.6907
Epoch: 8/100 Train Loss: 3.3005 Accuracy: 0.6590 Time: 19.96895  | Val Loss: 1.1969 Accuracy: 0.7079
Epoch: 9/100 Train Loss: 2.9242 Accuracy: 0.6899 Time: 19.68217  | Val Loss: 1.0810 Accuracy: 0.7526
Epoch: 10/100 Train Loss: 2.5992 Accuracy: 0.7066 Time: 19.76372  | Val

In [5]:
from torchsummary import summary

summary(net, (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 32, 32]             432
       BatchNorm2d-2           [-1, 16, 32, 32]              32
              ReLU-3           [-1, 16, 32, 32]               0
  LearnedGroupConv-4           [-1, 32, 32, 32]               0
       BatchNorm2d-5           [-1, 32, 32, 32]              64
              ReLU-6           [-1, 32, 32, 32]               0
            Conv2d-7            [-1, 8, 32, 32]             576
       _DenseLayer-8           [-1, 24, 32, 32]               0
       BatchNorm2d-9           [-1, 24, 32, 32]              48
             ReLU-10           [-1, 24, 32, 32]               0
 LearnedGroupConv-11           [-1, 32, 32, 32]               0
      BatchNorm2d-12           [-1, 32, 32, 32]              64
             ReLU-13           [-1, 32, 32, 32]               0
           Conv2d-14            [-1, 8,

In [6]:
from hdd.models.cnn.condensenet import convert_model

convert_model(net)
net = net.to(DEVICE)
summary(net, (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 32, 32]             432
       BatchNorm2d-2           [-1, 16, 32, 32]              32
              ReLU-3           [-1, 16, 32, 32]               0
            Conv2d-4           [-1, 32, 32, 32]             128
    CondensingConv-5           [-1, 32, 32, 32]               0
       BatchNorm2d-6           [-1, 32, 32, 32]              64
              ReLU-7           [-1, 32, 32, 32]               0
            Conv2d-8            [-1, 8, 32, 32]             576
       _DenseLayer-9           [-1, 24, 32, 32]               0
      BatchNorm2d-10           [-1, 24, 32, 32]              48
             ReLU-11           [-1, 24, 32, 32]               0
           Conv2d-12           [-1, 32, 32, 32]             192
   CondensingConv-13           [-1, 32, 32, 32]               0
      BatchNorm2d-14           [-1, 32,

In [7]:
from hdd.train.classification_utils import naive_train_classification_model

print(f"#Parameter: {count_trainable_parameter(net)}")
criteria = nn.CrossEntropyLoss(label_smoothing=0.1)
optimizer = torch.optim.Adam(
    net.parameters(),
    lr=1e-4,
    betas=(0.9, 0.99),
)
max_epochs = 15
base_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer, max_epochs, eta_min=1e-5
)
scheduler = GradualWarmupScheduler(
    optimizer,
    multiplier=1.0,
    total_epoch=5,
    after_scheduler=base_scheduler,
)
patch_4 = naive_train_classification_model(
    net,
    criteria,
    max_epochs,
    train_dataloader,
    val_dataloader,
    DEVICE,
    optimizer,
    scheduler,
    verbose=True,
)

#Parameter: 516202
Epoch: 1/15 Train Loss: 0.6438 Accuracy: 0.9385 Time: 29.19615  | Val Loss: 0.6686 Accuracy: 0.9314
Epoch: 2/15 Train Loss: 0.6432 Accuracy: 0.9380 Time: 29.12915  | Val Loss: 0.6667 Accuracy: 0.9315
Epoch: 3/15 Train Loss: 0.6434 Accuracy: 0.9386 Time: 29.01056  | Val Loss: 0.6673 Accuracy: 0.9332
Epoch: 4/15 Train Loss: 0.6435 Accuracy: 0.9378 Time: 29.19215  | Val Loss: 0.6679 Accuracy: 0.9325
Epoch: 5/15 Train Loss: 0.6448 Accuracy: 0.9387 Time: 28.78944  | Val Loss: 0.6654 Accuracy: 0.9346
Epoch: 6/15 Train Loss: 0.6432 Accuracy: 0.9382 Time: 29.05139  | Val Loss: 0.6687 Accuracy: 0.9322
Epoch: 7/15 Train Loss: 0.6479 Accuracy: 0.9369 Time: 29.09117  | Val Loss: 0.6674 Accuracy: 0.9304
Epoch: 8/15 Train Loss: 0.6482 Accuracy: 0.9362 Time: 29.19816  | Val Loss: 0.6694 Accuracy: 0.9312
Epoch: 9/15 Train Loss: 0.6432 Accuracy: 0.9387 Time: 29.14522  | Val Loss: 0.6682 Accuracy: 0.9334
Epoch: 10/15 Train Loss: 0.6448 Accuracy: 0.9366 Time: 29.21176  | Val Loss: 0.67