# Inception Net using Pytorch

This tutorial will teach you how to implement the original Inception Net proposed in [Going deeper with convolutions](https://arxiv.org/abs/1409.4842) by Szegedy _et al._ using PyTorch. The Inception Net architecture won the ImageNet classification challenge in 2014. If you are new to PyTorch I would recommend going through this [tutorial](https://github.com/Vinaypnaidu/deep-learning-essentials) first. It covers some Deep Learning basics and explains how to build an end-to-end image classification project using the CIFAR-10 dataset.<br>

The architecture of Inception Net is slightly different when compared to normal ConvNets. It uses the **Inception module** with feature concatenations and also has two **Auxiliary classifiers** branching out from the main stem. Hence this project will give you a solid experience of converting a paper to code, and also a good understanding of building models using PyTorch. I would suggest referring the original paper while going through this tutorial. (particularly table 1)

![](./images/google-net.png)

Fun fact: This first version of Inception Net is also called GoogLeNet as an homage to the original LeNet 5 architecture by LeCun _et al._

In [38]:
# import necessary libraries
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F

## Building the Inception Module

The Inception Module has 4 branches containing conv and pool operations. The number of input and output channels in each layer are indicated by the parameter: `block_params`. Necessary padding is applied to preserve the spatial dimensions.
1. Branch 1: `1x1` conv
2. Branch 2: `1x1` conv (aka reduction) followed by `3x3` conv
3. Branch 3: `1x1` conv (aka reduction) followed by `5x5` conv
4. Branch 4: `3x3` maxpool, followed by `1x1` conv

<div style="width: 25%; height: auto;">
    <img src="./images/inception.png" alt="Inception Block Architecture" style="width: 100%; height: 100%;">
</div>

The `Inception_Block` class, subclasses the neural network module: `nn.Module` provided by PyTorch, It is a very powerful and flexible API to build neural network architectures. We need to implement the following methods: 
1. `__init__`: Responsible for initializing the layers in the neural network or module. These include conv, pool and other operations. 
2. `forward`: Responsible for implementing the forward propagation of the network. 

In [23]:
class Inception_Block(nn.Module):
    """
    Implements the Inception block from the original paper. Takes a tuple
    block_params = (in_ch, out1, red2, out2, red3, out3, out4) to initialize
    the block. Outputs a volume (N, C, H, W) based on block_params.
    """
    def __init__(self, block_params):
        super().__init__()
        in_ch, out1, red2, out2, red3, out3, out4 = block_params

        # branch 1: 1x1 conv
        self.conv1 = nn.Conv2d(in_channels=in_ch, out_channels=out1, kernel_size=1)

        # branch 2: reduce - 3x3 conv
        self.reduce2 = nn.Conv2d(in_channels=in_ch, out_channels=red2, kernel_size=1)
        self.conv2 = nn.Conv2d(in_channels=red2, out_channels=out2, kernel_size=3, padding=1)

        # branch 3: reduce - 5x5 conv
        self.reduce3 = nn.Conv2d(in_channels=in_ch, out_channels=red3, kernel_size=1)
        self.conv3 = nn.Conv2d(in_channels=red3, out_channels=out3, kernel_size=5, padding=2)

        # branch 4: maxpool - 1x1 conv
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(in_channels=in_ch, out_channels=out4, kernel_size=1)

    def forward(self, x):
        conv1_out = F.relu(self.conv1(x))
        conv2_out = F.relu(self.conv2(F.relu(self.reduce2(x))))
        conv3_out = F.relu(self.conv3(F.relu(self.reduce3(x))))
        conv4_out = F.relu(self.conv4(self.maxpool4(x)))
        output = (conv1_out, conv2_out, conv3_out, conv4_out)
        output = torch.cat(output, axis=1)
        return output

In [24]:
# verify output shape for inception (3a) block in the paper
in_ch, out1, red2, out2, red3, out3, out4 = 192, 64, 96, 128, 16, 32, 32
out_ch = out1 + out2 + out3 + out4

block_params = (in_ch, out1, red2, out2, red3, out3, out4)
block = Inception_Block(block_params)

x = torch.zeros((64, in_ch, 32, 32)) # (N, C, H, W)
print(block(x).shape)
print(block(x).shape[1] == out_ch)

torch.Size([64, 256, 32, 32])
True


## Building the Auxiliary Classifier

The GoogLeNet architecture features two auxiliary classifiers, which branch out from the main classifier. They are small feed forward networks with the following layers:
1. `5x5` avg pool
2. `1x1` conv
3. Fully connected layer with 1024 units
4. Dropout layer that keeps 30% of the units
5. Fully connected layer with `num_classes` units
<br>

<div style="width: 40%; height: auto;">
    <img src="./images/auxiliary.png" alt="Auxiliary Classifier" style="width: 100%; height: 100%;">
</div>

In [25]:
class Auxiliary_Classifier(nn.Module):
    """
    Implements the auxiliary classifier from the original paper.
    """
    def __init__(self, in_ch, num_classes):
        super().__init__()
        self.avgpool1 = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv2 = nn.Conv2d(in_channels=in_ch, out_channels=128, kernel_size=1)
        self.fc3 = nn.Linear(4*4*128, 1024)
        self.dropout4 = nn.Dropout(p=0.7)
        self.fc5 = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.avgpool1(x)
        x = F.relu(self.conv2(x))
        x = F.relu(self.fc3(x.reshape(x.shape[0], -1)))
        x = self.dropout4(x)
        scores = self.fc5(x)
        return scores

In [26]:
in_ch, num_classes = 512, 1000
aux_classifier = Auxiliary_Classifier(in_ch, num_classes)
x = torch.zeros((64, 512, 14, 14)) # from inception 4a (Table 1)
print(aux_classifier(x).shape)

torch.Size([64, 1000])


## Building the Inception Net Model

Similar to `Inception_Block` and `Auxiliary_Classifier`, `Inception_Net` subclasses `nn.Module`. 

1. The `__init__` method will initialize the entire architecture. We store the `block_params` for each Inception block in `self.inception_params`. 
2. The `forward` method will compute the scores from the main and auxiliary classifiers. We store a copy of the outputs from Inception blocks `inception4a` and `inception4d` to compute the outputs of the auxiliary classifiers. Finally we return the scores of all 3 classifiers.  You can convince yourself about the implementation by referring  table 1 from the official paper.

In [27]:
class Inception_Net(nn.Module):
    """
    Implements the original Inception net from the paper. Expected input shape
    is (N, 3, 224, 224). Outputs class scores of shape (N, num_classes).
    """
    def __init__(self, num_classes):
        super().__init__()

        # from Table 1 in the paper
        self.inception_params = {
            'inception3a': (192, 64, 96, 128, 16, 32, 32),
            'inception3b': (256, 128, 128, 192, 32, 96, 64),
            'inception4a': (480, 192, 96, 208, 16, 48, 64),
            'inception4b': (512, 160, 112, 224, 24, 64, 64),
            'inception4c': (512, 128, 128, 256, 24, 64, 64),
            'inception4d': (512, 112, 144, 288, 32, 64, 64),
            'inception4e': (528, 256, 160, 320, 32, 128, 128),
            'inception5a': (832, 256, 160, 320, 32, 128, 128),
            'inception5b': (832, 384, 192, 384, 48, 128, 128),
        }

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3)
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.reduce2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=1)
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.inception3a = Inception_Block(self.inception_params['inception3a'])
        self.inception3b = Inception_Block(self.inception_params['inception3b'])
        self.maxpool3 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.inception4a = Inception_Block(self.inception_params['inception4a'])
        self.inception4b = Inception_Block(self.inception_params['inception4b'])
        self.inception4c = Inception_Block(self.inception_params['inception4c'])
        self.inception4d = Inception_Block(self.inception_params['inception4d'])
        self.inception4e = Inception_Block(self.inception_params['inception4e'])
        self.maxpool4 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.inception5a = Inception_Block(self.inception_params['inception5a'])
        self.inception5b = Inception_Block(self.inception_params['inception5b'])
        self.avgpool5 = nn.AvgPool2d(kernel_size=7, stride=1)
        self.dropout5 = nn.Dropout(p=0.4)
        self.linear5 = nn.Linear(1024, num_classes)

        # Auxiliary classifiers
        self.aux_classifier1 = Auxiliary_Classifier(512, num_classes)
        self.aux_classifier2 = Auxiliary_Classifier(528, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.maxpool1(x)
        x = F.relu(self.conv2(F.relu(self.reduce2(x))))
        x = self.maxpool2(x)
        x = self.inception3a(x)
        x = self.inception3b(x)
        x = self.maxpool3(x)
        x = self.inception4a(x)
        aux0 = x.clone() # for auxiliary branch connected to 4a
        x = self.inception4b(x)
        x = self.inception4c(x)
        x = self.inception4d(x)
        aux1 = x.clone() # for auxiliary branch connected to 4d
        x = self.inception4e(x)
        x = self.maxpool4(x)
        x = self.inception5a(x)
        x = self.inception5b(x)
        x = self.avgpool5(x)
        x = self.dropout5(x.reshape(x.shape[0], -1))
        # main branch scores
        scores2 = self.linear5(x)
        # auxiliary classifiers
        scores0 = self.aux_classifier1(aux0)
        scores1 = self.aux_classifier2(aux1)
        return (scores0, scores1, scores2)

In [28]:
# check output shape
x = torch.zeros((64, 3, 224, 224))
model = Inception_Net(num_classes=1000)
for scores in model(x):
    print(scores.shape) # should see [64, 1000]

torch.Size([64, 1000])
torch.Size([64, 1000])
torch.Size([64, 1000])


## Loss computation 

In the original paper, even the auxiliary classifiers contibute partially to the overall loss. The total loss is calculated as follows, which is then backpropagated through the entire network.

In [34]:
scores0, scores1, scores2 = model(x)
aux_loss1 = F.cross_entropy(scores0, y) # auxiliary loss 1
aux_loss2 = F.cross_entropy(scores1, y) # auxiliary loss 2
main_loss = F.cross_entropy(scores2, y) # main loss

# calculate weighted loss according to paper
total_loss = main_loss + (0.3 * aux_loss1) + (0.3 * aux_loss2)

Advice: If you get some errors while building the model or a part of it, try printing the shape of the output tensor after each layer and verify that it matches. 

That is it. You now know how to implement the original Inception Net paper using PyTorch. I would suggest exploring and implementing the latter versions of Inception Net as a fun exercise :)