# Inception Network
<hr>

**Multiple Filters in One Layer:** Instead of deciding between different filter sizes (like 1x1, 3x3, or 5x5) or pooling layers for a CNN, the Inception network proposes using all of them simultaneously within the same layer, known as the **Inception module.**

<br>

<div style="text-align:center">
    <img src="media/inception.png">
</div>


**Naive Version of Inception Module:**
Consider, for example, the following inception module that takes an input volume of 28x28x192 and applies various filters and pooling in parallel.

<div style="text-align:center">
    <img src="media/inception1.png" width=600>
</div>


Each path uses different filter sizes or pooling, and their outputs are concatenated, resulting in an output volume of 28x28x256.

- **Computational Cost Problem**
    - A naive Inception module can be computationally expensive, especially with larger filter sizes like 5x5.
    - For example, using a 5x5 filter with 32 channels on a 28x28x192 input results in 28x28x32 volume. The total number of multiplications are:
        - Each output position is the result of applying the 5x5x192 filter to the input volume. 
        - For a total of 28x28x32 output volume, the number of multiplications are: $(5 \times 5 \times 192) \times (28 \times 28 \times 32) \approx 120M$ multiplications.


- **Bottleneck Layer to Reduce Cost**
    - To mitigate computational costs, 1 x 1 convolutions, known as bottleneck layers, are used before larger convolutions.
    - These layers reduce the depth of the input volume before applying the expensive 5x5 convolutions, dramatically cutting down the number of multiplications needed.


<div style="text-align:center">
    <img src="media/bottleneck.png" width=700>
</div>


- **Reduced Computational Cost with 1 x 1 Convolutions**
    - The bottleneck architecture starts with a 1 x 1 convolution that reduces the depth from 192 to 16.
    - A subsequent 5 x 5 convolution produces the desired 28x28x32 output.
    - Multiplications needed are:
        - For the bottleneck layer, there are $(1 \times 1 \times 192) \times (28 \times 28 \times 16) \approx 2.4M$ multiplications.
        - For the inception module, there are $(5 \times 5 \times 16) \times (28 \times 28 \times 32) \approx 10M$ multiplications.
        - In total, there are $\approx 12.4M$ multiplications, a $\frac{1}{10}$ drop from the previouly 120M.


- **Efficiency and Performance**
    - The reduced representation, surprisingly, does not significantly affect network performance.
    - The inception module allows for a deep and wide architecture without an excessive computational burden.

# Inception Module
<hr>

An inception module, continuining from the previous example, could look like this:

<div style="text-align:center">
    <img src="media/inception_module.png" width=700>
    <caption><font color="red"><u>An Inception Module</u></font></caption>
</div>

Where channel concatenation is the concatenation of all channels $(64 + 128 + 32 + 32) = 256$.

# GoogLeNet 
<hr>

**Inception Network (GoogleNet)**
- GoogleNet consists of a sequence of concatenated Inception modules.

<div style="text-align:center">
    <img src="media/inception_network.png">
    <caption><font color="red"><u>Inception Network (GoogLeNet)</u></font></caption>
</div>

**Model Architecture**
- The full model is built from multiple Inception modules stacked together.
- Max-Pooling layers are occasionally inserted before Inception modules to reduce the dimensionality of the input data.


**Softmax Branches**
- The network includes three softmax branches at various depths. These auxiliary classifiers provide intermediate supervision, pushing the network towards the correct output earlier during training.
- Softmax0 and Softmax1, the classifiers in the intermediate layers, also impart a regularization effect, which can improve generalization and prevent overfitting.


**Evolution of Inception Networks**
- Following the original Inception module, several iterations and improvements have been made, resulting in Inception v2, v3, and v4.
- These newer versions incorporate advanced techniques like batch normalization, factorized convolutions, and label smoothing for enhanced performance.


**Combination with ResNet**
- There are also architectures that combine the concepts of the Inception module with Residual Networks (ResNets), leveraging the strengths of both designs.

- The architecture's name is inspired by the film "Inception" reflecting the network's depth within depth structure. The meme was actually referenced in the original [paper](https://arxiv.org/pdf/1409.4842v1.pdf).

<div style="text-align:center">
    <img src="media/meme.jpg">
</div>