# MobileNet V1 模型总结

## 关键创新点

### 1. 深度可分离卷积（Depthwise Separable Convolution）
- 将标准卷积分解为两个独立操作：
  - **Depthwise Convolution（逐通道卷积）**：对每个输入通道单独进行卷积操作。
  - **Pointwise Convolution（逐点卷积）**：使用 `1x1` 卷积融合不同通道的信息。
- 大幅减少了计算量和参数数量，适用于移动端和嵌入式设备。这一点和新出的MLP-Mixer的结构及其接近.

### 2. 轻量化设计
- 在保证准确率的前提下，通过减少冗余计算实现高效推理。
- 相比传统卷积网络（如VGG、Inception等），模型大小显著减小。

### 3. 宽度乘子（Width Multiplier）
- 引入超参数 `α ∈ (0,1]` 控制输入输出通道数，进一步压缩模型。
- 可以在精度与速度之间做权衡，提升部署灵活性。

### 4. 分辨率乘子（Resolution Multiplier）
- 控制输入图像的分辨率，作为另一个控制模型复杂度的参数。
- 允许根据设备性能调整输入尺寸，从而影响整体计算量。

### 5. 模块化结构
- 整体网络由多个堆叠的深度可分离卷积模块构成，便于复用和扩展。

---

## 缺点与局限性

### 1. 精度略低于标准模型
- 在相同数据集下，相比 ResNet、Inception 等大型模型，在 Top-1 准确率上略有下降。

### 2. 感受野受限
- 使用较多的小卷积核（如 3x3）和深度卷积，可能限制了特征提取的感受野范围。

### 3. 依赖手动设计
- 网络结构是人工设计的，没有像后续版本（如 MobileNetV2、NASNet）那样利用神经网络架构搜索（NAS）来优化性能。

### 4. 信息流动效率较低
- 深度可分离卷积可能导致特征表达能力受限，特别是在高层语义任务中表现不如密集连接的网络。

---

## 总结

MobileNet V1 是一个开创性的轻量级卷积神经网络，其核心思想 —— **深度可分离卷积**，为后续轻量化模型的发展奠定了基础。尽管它在精度和表达能力上有所妥协，但在移动设备和边缘计算场景中具有重要的应用价值。

In [1]:
# 自动重新加载外部module，使得修改代码之后无需重新import
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

from hdd.device.utils import get_device

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# 设置训练数据的路径
DATA_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置TensorBoard的路径
TENSORBOARD_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
# 设置预训练模型参数路径
TORCH_HUB_PATH = "~/workspace/hands-dirty-on-dl/pretrained_models"
torch.hub.set_dir(TORCH_HUB_PATH)
# 挑选最合适的训练设备
DEVICE = get_device(["cuda", "cpu"])
print("Use device: ", DEVICE)

Use device:  cuda


In [2]:
from hdd.dataset.imagenette_in_memory import ImagenetteInMemory
from hdd.data_util.transforms import RandomResize
from torch.utils.data import DataLoader

TRAIN_MEAN = [0.4625, 0.4580, 0.4295]
TRAIN_STD = [0.2452, 0.2390, 0.2469]
train_dataset_transforms = transforms.Compose(
    [
        RandomResize([256, 296, 384]),  # 随机在三个size中选择一个进行resize
        transforms.RandomRotation(10),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
val_dataset_transforms = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
train_dataset = ImagenetteInMemory(
    root=DATA_ROOT,
    split="train",
    size="full",
    download=True,
    transform=train_dataset_transforms,
)
val_dataset = ImagenetteInMemory(
    root=DATA_ROOT,
    split="val",
    size="full",
    download=True,
    transform=val_dataset_transforms,
)


def build_dataloader(batch_size, train_dataset, val_dataset):
    train_dataloader = DataLoader(
        train_dataset, batch_size=batch_size, shuffle=True, num_workers=8
    )
    val_dataloader = DataLoader(
        val_dataset, batch_size=batch_size, shuffle=False, num_workers=8
    )
    return train_dataloader, val_dataloader

In [None]:
from hdd.models.cnn.mobilenet_v1 import MobileNetV1
from hdd.train.classification_utils import (
    naive_train_classification_model,
    eval_image_classifier,
)
from hdd.models.nn_utils import count_trainable_parameter


def train_net(
    train_dataloader,
    val_dataloader,
    width_multiplier,
    lr=1e-3,
    weight_decay=1e-5,
    max_epochs=150,
) -> tuple[MobileNetV1, dict[str, list[float]]]:
    net = MobileNetV1(num_classes=10, width_multiplier=width_multiplier).to(DEVICE)
    print(f"#Parameter: {count_trainable_parameter(net)}")
    criteria = nn.CrossEntropyLoss(label_smoothing=0.1)
    optimizer = torch.optim.SGD(
        net.parameters(), lr=lr, momentum=0.9, weight_decay=weight_decay
    )

    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, max_epochs, eta_min=lr / 100
    )
    training_stats = naive_train_classification_model(
        net,
        criteria,
        max_epochs,
        train_dataloader,
        val_dataloader,
        DEVICE,
        optimizer,
        scheduler,
        verbose=True,
    )
    return net, training_stats


train_dataloader, val_dataloader = build_dataloader(128, train_dataset, val_dataset)

net, width_multiplier_1 = train_net(
    train_dataloader,
    val_dataloader,
    width_multiplier=1,
    lr=0.01,
    weight_decay=0,
    max_epochs=150,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")

#Parameter: 3228170
Epoch: 1/150 Train Loss: 2.1544 Accuracy: 0.1938 Time: 7.30377  | Val Loss: 2.6199 Accuracy: 0.1995
Epoch: 2/150 Train Loss: 1.8218 Accuracy: 0.3549 Time: 7.14844  | Val Loss: 2.4292 Accuracy: 0.3078
Epoch: 3/150 Train Loss: 1.5882 Accuracy: 0.4516 Time: 7.38312  | Val Loss: 1.9080 Accuracy: 0.3748
Epoch: 4/150 Train Loss: 1.4126 Accuracy: 0.5257 Time: 7.18147  | Val Loss: 1.5030 Accuracy: 0.5200
Epoch: 5/150 Train Loss: 1.3219 Accuracy: 0.5570 Time: 7.21082  | Val Loss: 1.3730 Accuracy: 0.5457
Epoch: 6/150 Train Loss: 1.2387 Accuracy: 0.5925 Time: 7.25243  | Val Loss: 1.1495 Accuracy: 0.6227
Epoch: 7/150 Train Loss: 1.1511 Accuracy: 0.6235 Time: 7.22659  | Val Loss: 1.1468 Accuracy: 0.6252
Epoch: 8/150 Train Loss: 1.0970 Accuracy: 0.6419 Time: 7.17840  | Val Loss: 1.0378 Accuracy: 0.6650
Epoch: 9/150 Train Loss: 1.0533 Accuracy: 0.6596 Time: 7.22944  | Val Loss: 0.9100 Accuracy: 0.6968
Epoch: 10/150 Train Loss: 1.0110 Accuracy: 0.6720 Time: 7.20997  | Val Loss: 0.8

In [4]:
net, width_multiplier_75 = train_net(
    train_dataloader,
    val_dataloader,
    width_multiplier=0.75,
    lr=0.01,
    weight_decay=0,
    max_epochs=150,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")

#Parameter: 1832458
Epoch: 1/150 Train Loss: 2.1558 Accuracy: 0.1941 Time: 6.08100  | Val Loss: 3.2276 Accuracy: 0.1692
Epoch: 2/150 Train Loss: 1.9888 Accuracy: 0.2768 Time: 6.06480  | Val Loss: 2.0031 Accuracy: 0.2790
Epoch: 3/150 Train Loss: 1.7359 Accuracy: 0.4074 Time: 6.06622  | Val Loss: 2.0154 Accuracy: 0.3761
Epoch: 4/150 Train Loss: 1.5375 Accuracy: 0.4765 Time: 6.06678  | Val Loss: 1.5029 Accuracy: 0.5011
Epoch: 5/150 Train Loss: 1.4244 Accuracy: 0.5238 Time: 6.08187  | Val Loss: 1.3914 Accuracy: 0.5287
Epoch: 6/150 Train Loss: 1.3517 Accuracy: 0.5481 Time: 6.12345  | Val Loss: 1.2934 Accuracy: 0.5600
Epoch: 7/150 Train Loss: 1.2755 Accuracy: 0.5813 Time: 6.03408  | Val Loss: 1.2936 Accuracy: 0.5778
Epoch: 8/150 Train Loss: 1.2102 Accuracy: 0.5985 Time: 6.11045  | Val Loss: 1.0763 Accuracy: 0.6479
Epoch: 9/150 Train Loss: 1.1512 Accuracy: 0.6188 Time: 6.04981  | Val Loss: 1.0718 Accuracy: 0.6420
Epoch: 10/150 Train Loss: 1.1019 Accuracy: 0.6343 Time: 6.09722  | Val Loss: 1.1

Exception ignored in: <function _releaseLock at 0x756bacfac4a0>
Traceback (most recent call last):
  File "/home/tf/anaconda3/envs/pytorch-cu124/lib/python3.11/logging/__init__.py", line 237, in _releaseLock
    def _releaseLock():
    
KeyboardInterrupt: 


RuntimeError: DataLoader worker (pid(s) 202067, 202099, 202131, 202163) exited unexpectedly

In [None]:
net, width_multiplier_50 = train_net(
    train_dataloader,
    val_dataloader,
    width_multiplier=0.5,
    lr=0.01,
    weight_decay=0,
    max_epochs=150,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")

In [None]:
net, width_multiplier_25 = train_net(
    train_dataloader,
    val_dataloader,
    width_multiplier=0.25,
    lr=0.01,
    weight_decay=0,
    max_epochs=150,
)

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"#Parameter: {count_trainable_parameter(net)} Accuracy: {sum(ss) / len(ss)}")