[**VGG Network (Simonyan and Zisserman, 2014)**](https://arxiv.org/abs/1409.1556) is a model that makes use of a number of repeating blocks of elements, known for its simplicity and deep stacks of small 3x3 convolution filters.

<div style="background-color: white; padding: 10px; display: inline-block; width:600px;">
    <img src="imgs/VGG.png" alt="VGG Architecture">
</div>

> Image Source: [DATAHACKER](https://datahacker.rs/deep-learning-vgg-16-vs-vgg-19/)

The VGG architecture has a simple yet effective design. It utilizes small 3×3 convolutional filters, stacked in multiple VGG blocks, with each block followed by a 2×2 max pooling layer to progressively reduce spatial dimensions while increasing depth.

The network consists of five convolutional blocks, where the number of filters doubles after each pooling operation, starting from 64 and increasing up to 512. After the convolutional layers, the feature maps are flattened and passed through three fully connected (FC) layers, with the final FC layer applying a softmax activation for classification.

VGG is available in multiple versions—VGG-11, VGG-13, VGG-16, and VGG-19—which differ in the number of convolutional layers. Among these, VGG-16 and VGG-19 are the most commonly used. Despite achieving high accuracy, VGG is computationally expensive due to its large number of parameters (~138 million in VGG-16), making it memory-intensive but highly effective for feature extraction and transfer learning. 

In this notebook we implement a lightweight version of VGG network.

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
import utils

In [13]:
def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        layers.append(nn.ReLU(inplace=True))
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*layers)

In [24]:
class VGG(nn.Module):
    def __init__(self, arch, num_classes=10):
        super().__init__()

        conv_blks = []
        for (num_convs, in_channels, out_channels) in arch:
            conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        
        self.net = nn.Sequential(
            *conv_blks,
            nn.Flatten(),
            nn.LazyLinear(128), nn.ReLU(inplace=True), nn.Dropout(0.5),
            nn.Linear(128, 128), nn.ReLU(inplace=True), nn.Dropout(0.5),
            nn.Linear(128, num_classes)
        )

        self.net.apply(utils.init_cnn)

    def forward(self, x):
        return self.net(x)

    def layer_summary(self, X_shape):
        X = torch.randn(*X_shape)
        print(f"{'Layer':<25} {'Output Shape':<20}")
        print("=" * 50)
        
        for layer in self.net:
            X = layer(X)
            print(f"{layer.__class__.__name__:<25} {str(tuple(X.shape)):<20}")

In [26]:
tiny_arch = (
    (1, 3, 16), (1, 16, 32), (1, 32, 64),
    (2, 64, 64), (2, 64, 128)
)

VGG(tiny_arch).layer_summary((1, 3, 224, 224))

Layer                     Output Shape        
Sequential                (1, 16, 112, 112)   
Sequential                (1, 32, 56, 56)     
Sequential                (1, 64, 28, 28)     
Sequential                (1, 64, 14, 14)     
Sequential                (1, 128, 7, 7)      
Flatten                   (1, 6272)           
Linear                    (1, 128)            
ReLU                      (1, 128)            
Dropout                   (1, 128)            
Linear                    (1, 128)            
ReLU                      (1, 128)            
Dropout                   (1, 128)            
Linear                    (1, 10)             


In [27]:
data = utils.CIFAR10DataLoader(batch_size=64, resize=(224, 224))
train_loader = data.get_train_loader()
test_loader = data.get_test_loader()

In [28]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = VGG(tiny_arch, num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 10
for epoch in range(epochs):
    train_loss, train_acc = utils.train_step(train_loader, model, criterion, optimizer, device)
    test_loss, test_acc = utils.eval_step(test_loader, model, criterion, device)
    print(f"Epoch {epoch + 1}/{epochs}: Train Loss={train_loss}, Test Loss={test_loss}, Test Accuracy={test_acc}")

Epoch 1/10: Train Loss=1.9924904658361469, Test Loss=1.5887233595939199, Test Accuracy=0.424
Epoch 2/10: Train Loss=1.563642341310106, Test Loss=1.3630237419893787, Test Accuracy=0.5012
Epoch 3/10: Train Loss=1.361630715753721, Test Loss=1.17039374978679, Test Accuracy=0.5786
Epoch 4/10: Train Loss=1.2238055544588573, Test Loss=1.0759059509653954, Test Accuracy=0.6181
Epoch 5/10: Train Loss=1.1266484316200247, Test Loss=0.9788733986532612, Test Accuracy=0.6625
Epoch 6/10: Train Loss=1.0417423496008529, Test Loss=0.9685866798564886, Test Accuracy=0.6659
Epoch 7/10: Train Loss=0.9745808902299008, Test Loss=0.8944820943911365, Test Accuracy=0.6924
Epoch 8/10: Train Loss=0.9239050871728326, Test Loss=0.8756664566173675, Test Accuracy=0.6957
Epoch 9/10: Train Loss=0.8870430824244419, Test Loss=0.8646834974835633, Test Accuracy=0.7023
Epoch 10/10: Train Loss=0.8431263513805921, Test Loss=0.8366192784279015, Test Accuracy=0.7159
