## ResNeXt 模型关键创新点总结

ResNeXt 是由何凯明团队在 2017 年 CVPR 提出的图像分类网络，作为 ResNet 的改进版本，其核心创新点在于 **“聚合残差变换”（Aggregated Residual Transformations）**，通过以下关键技术提升了模型性能与效率：

---

#### 1. **Cardinality（基数）概念**
- **定义**：Cardinality 是指并行分支的数量（类似 ResNet 的残差分支），但不同于深度（depth）和宽度（width），它是一个新的维度超参数。
- **作用**：通过增加基数而非单纯加深或加宽网络，能够更高效地提升模型性能，同时保持参数量可控。

#### 2. **分组卷积（Grouped Convolution）**
- **实现方式**：将输入通道划分为多个组（groups），每组独立进行卷积操作，最后合并输出。
- **优势**：
  - 减少计算量和参数量（相比标准卷积）。
  - 增强特征多样性，提升模型鲁棒性。

#### 3. **聚合残差块（Aggregated Residual Block）**
- **结构特点**：使用 **并行堆叠的相同拓扑结构块**（如多个分组卷积分支），替代 ResNet 中的三层卷积块（Bottleneck Block）。
- **优势**：
  - 简化网络设计，减少超参数数量。
  - 通过多路径特征融合增强表达能力。

![alt text](resources/resnext_comparison.png "Title")


### 总结
ResNeXt 的核心思想是通过 **分组卷积 + 多路径聚合**，在降低计算复杂度的同时提升模型性能。其设计哲学强调“基数”这一新维度的重要性，为后续轻量化模型（如 MobileNet、ShuffleNet）提供了重要启发。

![alt text](resources/resnext_block.png "Title")

In [4]:
# 自动重新加载外部module，使得修改代码之后无需重新import
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

from hdd.device.utils import get_device

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# 设置训练数据的路径
DATA_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置TensorBoard的路径
TENSORBOARD_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置预训练模型参数路径
TORCH_HUB_PATH = "~/workspace/hands-dirty-on-dl/pretrained_models"
torch.hub.set_dir(TORCH_HUB_PATH)
# 挑选最合适的训练设备
DEVICE = get_device(["cuda", "cpu"])


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
from hdd.dataset.imagenette_in_memory import ImagenetteInMemory
from hdd.data_util.auto_augmentation import ImageNetPolicy

from hdd.data_util.transforms import RandomResize
from torch.utils.data import DataLoader

TRAIN_MEAN = [0.4625, 0.4580, 0.4295]
TRAIN_STD = [0.2452, 0.2390, 0.2469]
train_dataset_transforms = transforms.Compose(
    [
        RandomResize([256, 296, 384]),  # 随机在三个size中选择一个进行resize
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        ImageNetPolicy(),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
val_dataset_transforms = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
train_dataset = ImagenetteInMemory(
    root=DATA_ROOT,
    split="train",
    size="full",
    download=True,
    transform=train_dataset_transforms,
)
val_dataset = ImagenetteInMemory(
    root=DATA_ROOT,
    split="val",
    size="full",
    download=True,
    transform=val_dataset_transforms,
)


def build_dataloader(batch_size, train_dataset, val_dataset):
    train_dataloader = DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=8
    )
    val_dataloader = DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=8
    )
    return train_dataloader, val_dataloader

In [3]:
from hdd.models.cnn.resnext import ResNextNet50_32_4, ResNextNet
from hdd.train.classification_utils import (
    naive_train_classification_model,
    eval_image_classifier,
)
from hdd.models.nn_utils import count_trainable_parameter


def train_net(
    train_dataloader,
    val_dataloader,
    lr=1e-3,
    weight_decay=1e-3,
    max_epochs=200,
) -> tuple[ResNextNet, dict[str, list[float]]]:
    net = ResNextNet50_32_4(num_classes=10, dropout=0.5).to(DEVICE)
    print(f"#Parameter: {count_trainable_parameter(net)}")
    criteria = nn.CrossEntropyLoss(label_smoothing=0.1)
    optimizer = torch.optim.AdamW(net.parameters(), lr=lr, weight_decay=weight_decay)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, max_epochs, eta_min=lr / 100
    )
    training_stats = naive_train_classification_model(
        net,
        criteria,
        max_epochs,
        train_dataloader,
        val_dataloader,
        DEVICE,
        optimizer,
        scheduler,
        verbose=True,
    )
    return net, training_stats


train_dataloader, val_dataloader = build_dataloader(64, train_dataset, val_dataset)

net, width_multiplier_1 = train_net(
    train_dataloader,
    val_dataloader,
    lr=0.001,
    weight_decay=0,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")

#Parameter: 23000394
Epoch: 1/200 Train Loss: 2.3253 Accuracy: 0.2187 Time: 21.82687  | Val Loss: 2.3011 Accuracy: 0.3302
Epoch: 2/200 Train Loss: 2.0201 Accuracy: 0.3379 Time: 21.39554  | Val Loss: 2.3912 Accuracy: 0.3990
Epoch: 3/200 Train Loss: 1.8613 Accuracy: 0.4157 Time: 21.45586  | Val Loss: 2.1527 Accuracy: 0.4986
Epoch: 4/200 Train Loss: 1.7678 Accuracy: 0.4559 Time: 21.57703  | Val Loss: 1.5627 Accuracy: 0.5689
Epoch: 5/200 Train Loss: 1.6942 Accuracy: 0.4941 Time: 21.31908  | Val Loss: 2.0545 Accuracy: 0.5531
Epoch: 6/200 Train Loss: 1.6133 Accuracy: 0.5270 Time: 21.71740  | Val Loss: 1.5808 Accuracy: 0.5870
Epoch: 7/200 Train Loss: 1.5650 Accuracy: 0.5518 Time: 21.32236  | Val Loss: 1.3356 Accuracy: 0.6622
Epoch: 8/200 Train Loss: 1.5192 Accuracy: 0.5744 Time: 21.96170  | Val Loss: 1.3602 Accuracy: 0.6441
Epoch: 9/200 Train Loss: 1.4673 Accuracy: 0.5930 Time: 21.74933  | Val Loss: 1.2467 Accuracy: 0.7019
Epoch: 10/200 Train Loss: 1.4190 Accuracy: 0.6169 Time: 21.05814  | Va