# VGGNet
VGGNet由牛津大学Visual Geometry Group（VGG）团队于2014年提出，是深度学习发展早期的重要里程碑。其核心目标是解决图像识别任务中模型深度与性能的关系问题。在VGGNet之前，AlexNet通过8层网络在2012年ImageNet竞赛中夺冠，但网络深度和结构设计尚未系统探索。当时主流观点认为，增加网络深度会导致梯度消失和计算复杂度激增，难以训练。如何通过更深的网络提升特征表达能力，同时避免参数爆炸和训练困难？VGGNet通过统一使用小卷积核（3×3）和标准化层结构，成功构建了16~19层的深度网络，验证了“深度提升性能”的假设。

其模型有如下特征
* 小卷积核堆叠策略： 使用多个3×3卷积核替代大尺寸卷积核（如AlexNet的5×5或7×7）。两个3×3卷积堆叠等效于5×5的感受野，但参数量减少28%，且引入更多非线性激活（ReLU），增强特征学习能力。
* 标准化网络架构：将网络划分为5段，每段包含2~3个卷积层和1个最大池化层（2×2，步长2），形成“卷积块”设计模式，简化网络结构并提升可扩展性。

VGGNet证明了深度对性能的关键作用，直接启发了ResNet，DenseNet等更深的网络设计。例如，ResNet通过残差连接解决了VGGNet训练极深网络时的梯度问题。VGGNet的特征提取能力被广泛用于迁移学习。

VGGNet网络结构如下图所示：

![alt text](resources/vgg_arch.png "Title")

In [1]:
# 自动重新加载外部module，使得修改代码之后无需重新import
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

import time

from hdd.device.utils import get_device
from hdd.dataset.imagenette_in_memory import ImagenetteInMemory, get_mean_and_std

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# 设置训练数据的路径
DATA_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
DEVICE = get_device(["cuda", "cpu"])
TENSORBOARD_ROOT = "~/workspace/hands-dirty-on-dl/dataset"
print("Use device: ", DEVICE)

Use device:  cuda


### 加载Imagenette数据集

In [2]:
from hdd.data_util.transforms import RandomResize

TRAIN_MEAN = [0.4625, 0.4580, 0.4295]
TRAIN_STD = [0.2452, 0.2390, 0.2469]
train_dataset_transforms = transforms.Compose(
    [
        RandomResize([256, 296, 384]),  # 随机在三个size中选择一个进行resize
        transforms.RandomRotation(10),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)
val_dataset_transforms = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=TRAIN_MEAN, std=TRAIN_STD),
    ]
)

BATCH_SIZE = 32
train_dataloader = torch.utils.data.DataLoader(
    ImagenetteInMemory(
        root=DATA_ROOT,
        split="train",
        size="full",
        download=True,
        transform=train_dataset_transforms,
    ),
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=8,
    pin_memory=True,
)
val_dataloader = torch.utils.data.DataLoader(
    ImagenetteInMemory(
        root=DATA_ROOT,
        split="val",
        size="full",
        download=True,
        transform=val_dataset_transforms,
    ),
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=8,
    pin_memory=True,
)

In [3]:
from hdd.models.cnn.vggnet import VGGNet, cfgs
from hdd.train.early_stopping import EarlyStoppingInMem
from hdd.train.classification_utils import naive_train_classification_model


net = VGGNet(cfgs["E"], num_classes=10, dropout=0.2).to(DEVICE)
criteria = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.005, momentum=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(
    optimizer, step_size=20, gamma=0.5, last_epoch=-1
)
early_stopper = EarlyStoppingInMem(patience=25, verbose=False)
max_epochs = 150
_ = naive_train_classification_model(
    net,
    criteria,
    max_epochs,
    train_dataloader,
    val_dataloader,
    DEVICE,
    optimizer,
    scheduler,
    early_stopper,
    verbose=True,
)

Epoch: 1/150 Train Loss: 2.0089 Accuracy: 0.2974 Time: 37.23322  | Val Loss: 1.7045 Accuracy: 0.4201
Epoch: 2/150 Train Loss: 1.6748 Accuracy: 0.4314 Time: 37.05103  | Val Loss: 1.5062 Accuracy: 0.5065
Epoch: 3/150 Train Loss: 1.4549 Accuracy: 0.5201 Time: 37.16928  | Val Loss: 1.2228 Accuracy: 0.5944
Epoch: 4/150 Train Loss: 1.3202 Accuracy: 0.5711 Time: 37.15570  | Val Loss: 1.0789 Accuracy: 0.6499
Epoch: 5/150 Train Loss: 1.2176 Accuracy: 0.6096 Time: 35.94672  | Val Loss: 1.0751 Accuracy: 0.6487
Epoch: 6/150 Train Loss: 1.1335 Accuracy: 0.6326 Time: 36.24680  | Val Loss: 0.9804 Accuracy: 0.6805
Epoch: 7/150 Train Loss: 1.0783 Accuracy: 0.6545 Time: 35.61917  | Val Loss: 1.2549 Accuracy: 0.6275
Epoch: 8/150 Train Loss: 1.0210 Accuracy: 0.6760 Time: 36.25680  | Val Loss: 0.8412 Accuracy: 0.7343
Epoch: 9/150 Train Loss: 0.9532 Accuracy: 0.6976 Time: 35.60282  | Val Loss: 0.9126 Accuracy: 0.7139
Epoch: 10/150 Train Loss: 0.9195 Accuracy: 0.7073 Time: 35.72305  | Val Loss: 0.7800 Accura

In [4]:
from hdd.train.classification_utils import eval_image_classifier

eval_result = eval_image_classifier(net, val_dataloader.dataset, DEVICE)
ss = [result.gt_label == result.predicted_label for result in eval_result]
print(f"Accuracy: {sum(ss) / len(ss)}")

Accuracy: 0.9156687898089172


在该**小**数据集上，我们分别测试了VGG A,B,D,E网络的性能
| VGG网络结构 | Val Accuracy | Learning Rate |
| :---:   | :---: | :---: | 
| A | 90.29% | 0.01 |
| B | 91.16%|  0.01 | 
| D | 91.49%| 0.005 |
| E | 91.57%| 0.005 | 