# Deep Learning

# Tutorial 16: GoogLeNet architecture

In this tutorial, we will cover:

- Architecture for Deep Neural Networks GoogLeNet Inception v1 2014

Prerequisites:

- Python, Tensor basics, PyTorch

My contact:

- Niklas Beuter (niklas.beuter@th-luebeck.de)

Course:

- Slides and notebooks will be available at https://lernraum.th-luebeck.de/course/view.php?id=5383

## Expected Outcomes
* Understand the basic components of neural networks: layers, neurons, weights, biases, activations, and loss functions.
* Gain hands-on experience with the computational aspects of setting up neural networks, including training and usage.
* Learn how to add layers with correct sizes to a deep neural network

# Introduction to GoogleNet

[GoogleNet](https://arxiv.org/abs/1409.4842), also known as Inception v1, is a deep convolutional neural network architecture that was introduced by researchers at Google. It was designed primarily to perform well in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC14). The architecture dramatically reduced the number of parameters compared to its predecessors while increasing the depth and width of the network.

## Architecture Overview

GoogleNet consists of a sophisticated network architecture that includes multiple "Inception" modules stacked upon each other. An Inception module is designed to capture information at various scales by processing the input at multiple resolutions simultaneously. This is achieved through the use of convolutional filters of different sizes operating in parallel.

![Inception Architecture](https://pic4.zhimg.com/v2-2baf3210c834b17682d676144ec770e2_1440w.jpg?source=172ae18b)

The key idea behind the Inception module is to avoid an increase in computational cost by strategically sizing the convolutions. The architecture uses 1x1 convolutions to perform dimensionality reduction before applying larger convolutions such as 3x3 and 5x5. This helps in reducing the computational burden significantly.

## Key Features of GoogleNet

- **Depth and Width**: The network is deep with 22 layers and also wide due to the Inception modules.
- **Reduced Overfitting**: GoogleNet employs techniques such as dropout and extensive data augmentation to control overfitting despite its depth.
- **Computational Efficiency**: The use of 1x1 convolutions as dimension reduction modules before more expensive 3x3 and 5x5 convolutions reduces the computational cost drastically.

## Formula Representation

The dimensionality reduction can be represented by the following formula where $ R $ is the reduction ratio:

$$
\text{New dimensions} = \frac{\text{Original dimensions}}{R}
$$

## Training

Training such a deep network requires careful consideration of the initialization and the optimization algorithm. GoogleNet uses multiple auxiliary classifiers placed at intermediate points in the network to combat the vanishing gradient problem by injecting gradients deeper into the network during backpropagation.

## Impact and Legacy

GoogleNet won the ILSVRC 2014 competition by a significant margin, reducing the top-5 error rate to 6.67%, which was nearly half the error rate of the runner up. This victory underscored the efficiency and effectiveness of the Inception modules, influencing numerous subsequent works in the field of deep learning.

GoogleNet's design principles, particularly the notion of multi-scale processing via Inception modules, have inspired several improvements and variations, leading to the development of more advanced architectures like Inception v3 and Inception v4.


In [2]:
!pip install torch torchvision

[0mCollecting torch
  Downloading torch-2.3.0-cp311-cp311-manylinux1_x86_64.whl.metadata (26 kB)
Collecting torchvision
  Downloading torchvision-0.18.0-cp311-cp311-manylinux1_x86_64.whl.metadata (6.6 kB)
Collecting filelock (from torch)
  Downloading filelock-3.14.0-py3-none-any.whl.metadata (2.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Downloading nvidia_cublas_cu12-12.1.3.1

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [4]:
# Loading CIFAR10 dataset
transform = transforms.Compose([
    #transforms.Resize(224),             # Resize images to fit GoogleNet's input dimensions
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


## Channel dimension of the inception block

Consider an input tensor of shape (N, C, H, W) , where  N is the batch size,  C is the number of channels, and  H  and  W  are the height and width.

- **1x1 Convolution Branch**:
  - Input: (N, C, H, W)
  - Output: (N, C1, H, W)

- **1x1 -> 3x3 Convolution Branch**:
  - Input:  (N, C, H, W) 
  - Output:  (N, C2, H, W) 

- **1x1 -> 5x5 Convolution Branch**:
  - Input:  (N, C, H, W) 
  - Output:  (N, C3, H, W) 

- **3x3 Max Pooling -> 1x1 Convolution Branch**:
  - Input:  (N, C, H, W) 
  - Output:  (N, C4, H, W) 

After concatenation along the channel dimension, the final output of the Inception block has the same height and width as the input, with the number of channels equal to the sum of the output channels from each branch:
- Final Output:  (N, C1 + C2 + C3 + C4, H, W) 


In [5]:
# Define the GoogleNet model
class Inception(nn.Module):
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_channels, ch1x1, kernel_size=1),
            nn.ReLU(True),
        )

        self.branch2 = nn.Sequential(
            nn.Conv2d(in_channels, ch3x3red, kernel_size=1),
            nn.ReLU(True),
            nn.Conv2d(ch3x3red, ch3x3, kernel_size=3, padding=1),
            nn.ReLU(True),
        )

        self.branch3 = nn.Sequential(
            nn.Conv2d(in_channels, ch5x5red, kernel_size=1),
            nn.ReLU(True),
            nn.Conv2d(ch5x5red, ch5x5, kernel_size=5, padding=2),
            nn.ReLU(True),
        )

        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, pool_proj, kernel_size=1),
            nn.ReLU(True),
        )

    def forward(self, x):
        # The output of each branch is concatenated along the channel dimension
        # The input dimensionality of width / height keeps the same 
        outputs = [self.branch1(x), self.branch2(x), self.branch3(x), self.branch4(x)]
        return torch.cat(outputs, 1)

class GoogleNet(nn.Module):
    def __init__(self):
        super(GoogleNet, self).__init__()
        # We are skipping two Conv2d layers as the image size of Cifar10 is already quite small
        self.pre_layers = nn.Sequential(
            nn.Conv2d(3, 192, kernel_size=3, padding=1),
            nn.ReLU(True),
        )

        self.a3 = Inception(192,  64,  96, 128, 16, 32, 32)
        self.b3 = Inception(256, 128, 128, 192, 32, 96, 64)

        self.maxpool = nn.MaxPool2d(3, 2, 1)

        self.a4 = Inception(480, 192,  96, 208, 16, 48, 64)
        self.b4 = Inception(512, 160, 112, 224, 24, 64, 64)
        self.c4 = Inception(512, 128, 128, 256, 24, 64, 64)
        self.d4 = Inception(512, 112, 144, 288, 32, 64, 64)
        self.e4 = Inception(528, 256, 160, 320, 32, 128, 128)

        self.a5 = Inception(832, 256, 160, 320, 32, 128, 128)
        self.b5 = Inception(832, 384, 192, 384, 48, 128, 128)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, 10)

    def forward(self, x):
        x = self.pre_layers(x)
        x = self.a3(x)
        x = self.b3(x)
        x = self.maxpool(x)
        x = self.a4(x)
        x = self.b4(x)
        x = self.c4(x)
        x = self.d4(x)
        x = self.e4(x)
        x = self.maxpool(x)
        x = self.a5(x)
        x = self.b5(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.dropout(x)
        x = self.fc(x)
        return x

In [6]:
# Training loop (simplified)
def train(model, device, train_loader, criterion, optimizer, epochs):
    model.train() # set model in training mode
    for epoch in range(epochs):
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

In [7]:
def initialize_weights(m):
    if isinstance(m, nn.Conv2d):
        torch.nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
        torch.nn.init.constant_(m.weight, 1)
        torch.nn.init.constant_(m.bias, 0)
    elif isinstance(m, nn.Linear):
        torch.nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        torch.nn.init.constant_(m.bias, 0)

In [8]:
def evaluate_model(model, test_loader):
    model.eval()  # Set model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    accuracy = 100 * correct / total
    print(f'Accuracy: {accuracy:.2f}%')

In [9]:
# Define the model
net = GoogleNet()

# Initialize weights
net.apply(initialize_weights)
net.to(device)  # Move the model to the appropriate device

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.0001, momentum=0.9)

In [10]:
# Train the model
train(net, device, trainloader, criterion, optimizer, epochs=7)
evaluate_model(net, testloader)

  return F.conv2d(input, weight, bias, self.stride,


Epoch [1/8], Loss: 1.6876
Epoch [2/8], Loss: 1.7288
Epoch [3/8], Loss: 1.5096
Epoch [4/8], Loss: 1.7305
Epoch [5/8], Loss: 1.0392
Epoch [6/8], Loss: 1.0454
Epoch [7/8], Loss: 1.0008
Epoch [8/8], Loss: 1.4881
Accuracy: 55.56%


## Standard PyTorch Model Implementation of GoogLeNet

In [17]:
# Load the pretrained GoogleNet model
net = models.googlenet(pretrained=True)

# Modify the final layer to fit CIFAR10 (10 classes)
net.fc = nn.Linear(net.fc.in_features, 10)
net.to(device)  # Move the model to the appropriate device

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

In [18]:
# Train the model
train(net, device, trainloader, criterion, optimizer, epochs=2)
evaluate_model(net, testloader)

Epoch [1/2], Loss: 1.5163
Epoch [2/2], Loss: 0.8412
Accuracy: 71.39%


In [19]:
# Freeze all parameters in the model
for param in net.parameters():
    param.requires_grad = False

# Unfreeze the parameters in the last fully connected layer
for param in net.fc.parameters():
    param.requires_grad = True

net.to(device)  # Move the model to the appropriate device

optimizer = torch.optim.SGD(net.fc.parameters(), lr=0.01)

In [20]:
# Train the model
train(net, device, trainloader, criterion, optimizer, epochs=2)
evaluate_model(net, testloader)

Epoch [1/2], Loss: 0.8620
Epoch [2/2], Loss: 0.6555
Accuracy: 75.67%


# References

This notebook is uses following sources:
* [Dive into deep learning](https://d2l.ai/)