[**VGG Network (Simonyan and Zisserman, 2014)**](https://arxiv.org/abs/1409.1556) is a model that makes use of a number of repeating blocks of elements, known for its simplicity and deep stacks of small 3x3 convolution filters.

<div style="background-color: white; padding: 10px; display: inline-block; width:400px;">
    <img src="imgs/VGG.png" alt="VGG Architecture">
</div>

[VGG Architecture](https://datahacker.rs/deep-learning-vgg-16-vs-vgg-19/)

The VGG architecture has a simple yet effective design. It utilizes small 3×3 convolutional filters, stacked in multiple VGG blocks, with each block followed by a 2×2 max pooling layer to progressively reduce spatial dimensions while increasing depth.

The network consists of five convolutional blocks, where the number of filters doubles after each pooling operation, starting from 64 and increasing up to 512. After the convolutional layers, the feature maps are flattened and passed through three fully connected (FC) layers, with the final FC layer applying a softmax activation for classification.

VGG is available in multiple versions—VGG-11, VGG-13, VGG-16, and VGG-19—which differ in the number of convolutional layers. Among these, VGG-16 and VGG-19 are the most commonly used. Despite achieving high accuracy, VGG is computationally expensive due to its large number of parameters (~138 million in VGG-16), making it memory-intensive but highly effective for feature extraction and transfer learning.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import utils

In [2]:
VGG_CONFIGS = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}

In [3]:
class VGG(nn.Module):
    def __init__(self, config, num_classes=10, batch_norm=False):
        super().__init__()
        self.net = nn.Sequential(
            self._make_layers(config, batch_norm),
            nn.Flatten(),
            nn.Linear(512 * 7 * 7, 4096), nn.ReLU(True), nn.Dropout(p=0.5),
            nn.Linear(4096, 4096), nn.ReLU(True), nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        return self.net(x)

    def _make_layers(self, config, batch_norm):
        layers = []
        in_channels = 3
        for v in config:
            if v == 'M':
                layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
            else:
                conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
                if batch_norm:
                    layers.extend([conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)])
                else:
                    layers.extend([conv2d, nn.ReLU(inplace=True)])
                in_channels = v

        return nn.Sequential(*layers)

In [4]:
utils.layer_summary(VGG(VGG_CONFIGS['VGG11'], num_classes=10, batch_norm=True), (1, 3, 224, 224))

Input shape: (1, 3, 224, 224)
----------------------------------------
Conv2d          output shape: (1, 64, 224, 224)
BatchNorm2d     output shape: (1, 64, 224, 224)
ReLU            output shape: (1, 64, 224, 224)
MaxPool2d       output shape: (1, 64, 112, 112)
Conv2d          output shape: (1, 128, 112, 112)
BatchNorm2d     output shape: (1, 128, 112, 112)
ReLU            output shape: (1, 128, 112, 112)
MaxPool2d       output shape: (1, 128, 56, 56)
Conv2d          output shape: (1, 256, 56, 56)
BatchNorm2d     output shape: (1, 256, 56, 56)
ReLU            output shape: (1, 256, 56, 56)
Conv2d          output shape: (1, 256, 56, 56)
BatchNorm2d     output shape: (1, 256, 56, 56)
ReLU            output shape: (1, 256, 56, 56)
MaxPool2d       output shape: (1, 256, 28, 28)
Conv2d          output shape: (1, 512, 28, 28)
BatchNorm2d     output shape: (1, 512, 28, 28)
ReLU            output shape: (1, 512, 28, 28)
Conv2d          output shape: (1, 512, 28, 28)
BatchNorm2d     output sha

In [5]:
data = utils.CIFAR10DataLoader(batch_size=64, resize=(224, 224))
train_loader = data.get_train_loader()
test_loader = data.get_test_loader()

Files already downloaded and verified
Files already downloaded and verified


In [6]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = VGG(VGG_CONFIGS['VGG11'], num_classes=10, batch_norm=True)
model.apply(utils.init_kaiming).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

epochs = 10
for epoch in range(epochs):
    train_loss, train_acc = utils.train_step(train_loader, model, criterion, optimizer, device)
    test_loss, test_acc = utils.eval_step(test_loader, model, criterion, device)
    print(f"Epoch {epoch + 1:>{len(str(epochs))}}/{epochs} | "
          f"Train Loss: {train_loss:.4f} | "
          f"Test Loss: {test_loss:.4f} | "
          f"Test Acc: {test_acc:.4f}")

Epoch  1/10 | Train Loss: 2.7673 | Test Loss: 2.3030 | Test Acc: 0.0999
Epoch  2/10 | Train Loss: 2.3016 | Test Loss: 2.3025 | Test Acc: 0.1002
Epoch  3/10 | Train Loss: 2.2883 | Test Loss: 2.2068 | Test Acc: 0.1547
Epoch  4/10 | Train Loss: 2.1872 | Test Loss: 2.0891 | Test Acc: 0.2200
Epoch  5/10 | Train Loss: 1.9889 | Test Loss: 1.8015 | Test Acc: 0.3026
Epoch  6/10 | Train Loss: 1.7663 | Test Loss: 1.5292 | Test Acc: 0.4344
Epoch  7/10 | Train Loss: 1.5529 | Test Loss: 1.4135 | Test Acc: 0.4889
Epoch  8/10 | Train Loss: 1.3069 | Test Loss: 1.1412 | Test Acc: 0.5998
Epoch  9/10 | Train Loss: 1.1188 | Test Loss: 0.9823 | Test Acc: 0.6610
Epoch 10/10 | Train Loss: 0.9599 | Test Loss: 0.8385 | Test Acc: 0.7132
