### Evolution of CNN Architectures

#### **LeNet** 

It was developed for Handwritten Digit Recognition (MNIST Dataset)

**Architecture**

`Input`: 32x32 grayscale image.

`Layers`:
 - Three convolutional layers followed by average pooling. Used for feature extraction. 
   - Layer 1 uses six `(5,5)` kernels and stride of 1 i.e. `Output Shape = (32-5+)*(32-5+1)*6 = (28*28*6)`. In the first pooling layer, it converts the `(28*28*6)` feature map to `(14*14*6)` through `2*2` average pooling with stride of `2`, reduces the spatial dimensions (height and width) via average pooling. Then the output is passed to `Hyperbolic Tangent` for the activation function.
  
   - Layer 2 uses sixteen `(5,5)` kernels and stride of 1 i.e. `Output Shape = (14-5+1)*(14-5+1)*16 = (10*10*16)`. In the second pooling layer, it converts the `(10*10*16)` feature map to `(5*5*16)` through `2*2` average pooling with stride of `2`, reduces the spatial dimensions (height and width) via average pooling. Then the output is passed to `Hyperbolic Tangent` for the activation function. 

   - Layer 3 (Flatten) uses `120` `(5,5)` kernels and stride of 1 i.e. same as size of the previous feature map the output results in the `Output = (5-5+1,5-5+1,120) = (1,1,120)` feature map which are passed to a fully connected layer. The result is flattened into a `120*1` vector.

<img src='../Notes_Images/DL8.png'>

 - Fully connected layers at the end. Used for classification. Here, it transforms `120-element vector` (Input Layer) to `84-element vector` (Hidden/Dense Layer) and Output Layer matches the number of classes i.e. `10 Neurons`. These `10 Neurons` generate the probability distribution for the classes. 

`Advantages`:

 - Reduced the need for manual feature extraction. E.g. using OpenCv libraries to extract features.

 - Demonstrated the effectiveness of CNNs for image recognition.

<style>
.bg{
	background-color: white;
}
.mar{
	text-align: center;
}
</style>

<figure>
<img src='../Notes_Images/DL9.svg' class="bg">
<figcaption class="mar">LeNet Architecture</figcaption>
</figure>

#### **AlexNet** 

The start of the Deep Learning Revolution. Developed by Geoffrey Hinton in 2012. They started the Deep Learning Revolution by establishing a Deeper 8 layer and wider network. Introduced the `ReLu` activation function. It was also the first Deep Learning Architecture that leveraged GPU. 

`AlexNet` also introduced the data augmentation techniques such as Cropping, Flipping and Color alterations to increase the size of the training dataset. It uses `Dropout` as a `Regularization` for better `Generalization`. It also introduced the concept of stacking Convolutional Layers. Stack of Convolutional Layer helps model learn more complex features. 

**Links for the Working Architecture of AlexNet**

[Medium Article](https://medium.com/@siddheshb008/alexnet-architecture-explained-b6240c528bd5)

[Second Article](https://learnopencv.com/understanding-alexnet/)

[AlexNet Visualization](https://tensorspace.org/html/playground/alexnet.html)

#### Working of Activation Functions in Depth

Here, we'll explore the working fo Activation Function under the hood. Why it is used? How it helps to introduce non-linearity in the network? Why do we need them? How does the activation work in `Convolutional Layer` and `Fully Connected Layer`? Why do we need to down sample the image using filters? What are the computational limitations related to Neural Network? How Neural Network represented in the `GPU RAM`?

#### How are the images or dataset is trained in the Neural Network? 

Here, we'll understand how the dataset is splited into batches, why do we need batches? How is one batch of the dataset trained parallely using `GPU`, for this tensor is used. What is tensor? How does it work? How is it represented in the GPU? 

**Resources**

[Understanding Tensor](https://www.youtube.com/watch?v=f5liqUk0ZTw&t=735s)


#### **VGG (Visual Geometry Group)** 

AlexNet is considered as the grandfather of the modern Convolutional Neural Network, `VGG` is considered the father of modern CNN Architecture Design. In 2014 it took first place at the `ImageNet Large Scale Visual Recognition` (ILSVR) challenge.

Before VGG, most Neural Network used a variety of layer sizes and types making them hard to scale and generalize but `VGG` uses small `3*3` kernels which allows deep networks to work without computational load. Also, VGG comprises of very large network (up to 16 - 19) layers in comparison to AlexNet i.e. 8 layers.  

It is due to this deep architecture which paved the way for the development of architectures like `ResNet` and `DenseNet`.

Also, VGG models are widely used for transfer learning tasks, given their pre-trained weights on ImageNet.

There are two versions of VGG i.e. `VGG16` (16 Layer) and `VGG19` (19 Layer)
 
**Resources**

[Understand VGG Architecture](https://medium.com/@siddheshb008/vgg-net-architecture-explained-71179310050f)


#### ResNet (Residual Network)

ResNet (Residual Network) is a CNN architecture introduced in 2015 by Kaiming He et al. in their paper "Deep Residual Learning for Image Recognition." It is a groundbreaking architecture designed to address the **vanishing gradient problem**, which occurs in deep networks.

**What is Vanishing Gradient Problem**

The vanishing gradient problem arises during the training of deep neural networks, where gradients (used to update the network's weights) become exceedingly small as they are backpropagated through many layers. This causes the earlier layers of the network (closer to the input) to learn very slowly or stop learning altogether, leading to poor overall performance.

<img src='../Notes_Images/DL10.png'>

<img src='../Notes_Images/DL11.png'>

<img src='../Notes_Images/DL12.png'>

The above problem is solved using `Skip Connection`

<img src='../Notes_Images/DL13.png'>

**Key Idea Behind ResNet**

`ResNet` introduces the concept of `residual learning` by using "skip connections" or "shortcuts."

[Skip Connection Explanation](https://theaisummer.com/skip-connections/#:~:text=ResNet%3A%20skip%20connections%20via%20addition&text=Then%20the%20gradient%20would%20simply,function%20to%20preserve%20the%20gradient.)

[Back Propagation Article1](https://medium.com/@juanc.olamendy/backpropagation-in-deep-learning-the-key-to-optimizing-neural-networks-7c063a03f677)

[Back Propagation Article2](https://builtin.com/machine-learning/backpropagation-neural-network)

**Versiona**

`ResNet50`
`ResNet101`
`ResNet1000`

### MobileNetV1

It was created to tackle the problem of implementing and running deep neural networks on mobile and embedded devices. The `MobileNet` family has 3 different architectures i.e. `V1`, `V2` and `V3`.

**V1** : It utilizes depthwise separable convolutions. In this architecture the traditional `Convolutional` layer is splited into two i.e. `depthwise` and `pointwise`. It requires less computational resources.

**V3** : It is the best architecture for IoT and Embedded devices.

### EfficientNet

It is another architecture for Low End Devices. It also performs well for small devices. 