### Why Inception Architecture was used ?


An image of one category can have large variation in sizes. For instance, an image with a dog can be either of the following, as shown below. The area occupied by the dog is different in each image.

Because of this huge variation in the location of the information, choosing the right kernel size for the convolution operation becomes tough.

![image.png](attachment:image.png)

### Inception Architecture

To overcome above problems Inception Module is used because.

Here , Filters with multiple sizes are operated on same level . In these way netwrok became wider and not deeper.

The below image is the “naive” inception module. It performs convolution on an input, with 3 different sizes of filters (1x1, 3x3, 5x5). Additionally, max pooling is also performed. The outputs are concatenated and sent to the next inception module.

Here the ouput of convolution is same because padding is applied at each conv and max pooling layer .So that Filter Concatenation have same size

![image.png](attachment:image.png)

#### Inception module with Dimension reductions

Deep neural networks are computationally expensive. To make it cheaper, the authors limit the number of input channels by adding an extra 1x1 convolution before the 3x3 and 5x5 convolutions. Though adding an extra operation may seem counterintuitive, 1x1 convolutions are far more cheaper than 5x5 convolutions, and the reduced number of input channels also help. Do note that however, the 1x1 convolution is introduced after the max pooling layer, rather than before.

![image.png](attachment:image.png)

#### Using these inception module - GoogleNet Inception v1 Architecture was developed

GoogLeNet has 9 such inception modules stacked linearly. It is 22 layers deep (27, including the pooling layers). It uses global average pooling at the end of the last inception module.

![image-3.png](attachment:image-3.png)

#### Inclusion of Auxilary Classifiers to prevent Vanishing Gradient problem in deep network

Since it is a very deep classifier , It is subjected to Vanishing gradient descent problem.

To prevent the middle part of the network from “dying out”, the authors introduced two auxiliary classifiers (The purple boxes in the image). They essentially applied softmax to the outputs of two of the inception modules, and computed an auxiliary loss over the same labels. The total loss function is a weighted sum of the auxiliary loss and the real loss. Weight value used in the paper was 0.3 for each auxiliary loss.

### GoogleNet Inception v2 and v3 

These Inception v2 architecture increased the accuracy and reduced the computational complexity using factorization methods

Factorize 5x5 convolution to two 3x3 convolution operations to improve computational speed.So stacking two 3x3 convolutions infact leads to a boost in performance.

![image.png](attachment:image.png)

### Further Factorization of Filter size of Convolutions

Moreover, they factorize convolutions of filter size nxn to a combination of 1xn and nx1 convolutions. For example, a 3x3 convolution is equivalent to first performing a 1x3 convolution, and then performing a 3x1 convolution on its output. They found this method to be 33% more cheaper than the single 3x3 convolution. This is illustrated in the below image.

![image.png](attachment:image.png)

#### Representation of Above diagram was Changed

If the above Inception module was used i-e Deeper , there would be loss of information . So to overcome that Wider Inception module was used instead of Deeper

![image.png](attachment:image.png)

### Inception v2 Architecture

![image-2.png](attachment:image-2.png)

### GoogleNet Inception v3 Architecture

Inception V3 is similar to and contains all the features of Inception V2 with following changes/additions:

1. Use of RMSprop optimizer.
2. Batch Normalization in the fully connected layer of Auxiliary classifier.
3. Use of 7×7 factorized Convolution
4. Label Smoothing Regularization: It is a method to regularize the classifier by estimating the effect of label-dropout during training. It prevents the classifier to predict too confidently a class. The addition of label smoothing gives 0.2% improvement from the error rate.

![image.png](attachment:image.png)

### Inception v4 

The “stem” of Inception v4 was modified. The stem here, refers to the initial set of operations performed before introducing the Inception blocks.

![image-2.png](attachment:image-2.png)

### Architecture of Inception v4

![image.png](attachment:image.png)

They had three main inception modules, named A,B and C (Unlike Inception v2, these modules are infact named A,B and C). They look very similar to their Inception v2 (or v3) counterparts.

##### Inception v4 contains 3 Inception modules (A,B,C)

![image.png](attachment:image.png)

### Inception v4 contains 2 Reduction Blocks

Inception v4 introduced specialized “Reduction Blocks” which are used to change the width and height of the grid. The earlier versions didn’t explicitly have reduction blocks, but the functionality was implemented.

![image.png](attachment:image.png)

### Inception ResNet V1 and V2


#### Inception Resenet Architecture

![image.png](attachment:image.png)

#### Stem

![image.png](attachment:image.png)

#### Inception Modules

Both sub-versions have the same structure for the modules A, B, C and the reduction blocks. Only difference is the hyper-parameter settings.

![image.png](attachment:image.png)

#### Reduction Blocks

![image.png](attachment:image.png)

1. Inception-ResNet v1 has a computational cost that is similar to that of Inception v3.
2. Inception-ResNet v2 has a computational cost that is similar to that of Inception v4.
3. They have different stems, as illustrated in the Inception v4 section.
4. Both sub-versions have the same structure for the modules A, B, C and the reduction blocks. Only difference is the hyper-parameter settings. In this section, we’ll only focus on the structure. Refer to the paper for the exact hyper-parameter settings (The images are of Inception-Resnet v1).