The Residual Blocks idea was created by this design to address the issue of the vanishing/exploding gradient. We apply a method known as skip connections in this network. The skip connection bypasses some levels in between to link-layer activations to subsequent layers. This creates a leftover block. These leftover blocks are stacked to create resnets.
The strategy behind this network is to let the network fit the residual mapping rather than have layers learn the underlying mapping. Thus, let the network fit instead of using, say, the initial mapping of H(x),

**F(x) := H(x) - x which gives H(x) := F(x) + x.**
![Skip%20connection.webp](attachment:Skip%20connection.webp)

The benefit of including this kind of skip link is that regularisation will skip any layer that degrades architecture performance. As a result, training an extremely deep neural network is possible without encountering issues with vanishing or expanding gradients. The CIFAR-10 dataset’s 100–1000 layers were used for experimentation by the paper’s authors.

### Why do we need ResNet?
* We stack extra layers in the Deep Neural Networks, which improves accuracy and performance, often in order to handle a challenging issue. The idea behind layering is that when additional layers are added, they will eventually learn features that are more complicated. For instance, when recognising photographs, the first layer may pick up on edges, the second would pick up on textures, the third might pick up on objects, and so on. However, it has been discovered that the conventional Convolutional neural network model has a maximum depth threshold. This graphic shows the percentage of errors for training and test data for a 20-layer network and a 56-layer network, respectively.
* In both the training and testing situations, we can observe that the error per cent for a 56-layer network is higher than that of a 20-layer network. This shows that a network’s performance declines as additional layers are added on top of it. This might be attributed to the initialization of the network, the optimization function, and most significantly, the vanishing gradient problem. You may assume that overfitting is also at blame, however, in this case, the 56-layer network’s error percentage is the worst on both training and test data, which does not occur when the model is overfitting.
![Comparison%20of%2026-layer%20vs%2056-layer%20architecture.webp](attachment:Comparison%20of%2026-layer%20vs%2056-layer%20architecture.webp)
* Every consecutive winning architecture uses more layers in a deep neural network to lower the error rate after the first CNN-based architecture (AlexNet) that won the ImageNet 2012 competition. This is effective for smaller numbers of layers, but when we add more layers, a typical deep learning issue known as the Vanishing/Exploding gradient arises. This results in the gradient becoming zero or being overly large. Therefore, the training and test error rate similarly increases as the number of layers is increased.
* We can see from the following figure that a 20-layer CNN architecture performs better on training and testing datasets than a 56-layer CNN architecture. The authors came to the conclusion that the error rate is caused by a vanishing/exploding gradient after further analysis of the error rate.

### ResNet Architecture
The VGG-19-inspired 34-layer plain network architecture used by ResNet is followed by the addition of the shortcut connection. The architecture is subsequently transformed into the residual network by these short-cut connections, as depicted in the following figure:
![Restnet.png](attachment:Restnet.png)

### Steps of ResNet
**1. Initial Convolution and Pooling Layers:**
* The network begins with a standard convolutional layer using a 7x7 kernel with stride 2 and padding, followed by batch normalization and a ReLU activation function. This is followed by a max pooling layer with a 3x3 kernel and stride 2.

**2. Residual Blocks:**
* The architecture then stacks multiple residual blocks, each consisting of two or three convolutional layers. Each block has a shortcut connection that skips the convolutional layers and adds the input directly to the output.
* These blocks are arranged in groups (also called layers) with increasing numbers of filters: typically, 64, 128, 256, and 512 filters. The number of blocks in each group depends on the specific ResNet variant (e.g., ResNet-34, ResNet-50).

**3. Fully Connected Layer:**
* After the series of residual blocks, the network applies a global average pooling layer, which reduces each feature map to a single value.
* Finally, the output is passed through a fully connected layer (typically with 1000 units for ImageNet classification) with a softmax activation function to produce the final class probabilities.

### Activation Function Used in ResNet
**ReLU (Rectified Linear Unit):**
* ReLU is used after each convolutional layer within the residual blocks.
* Formula: ReLU(x)=max(0,x)
* Advantages: ReLU introduces non-linearity, mitigates the vanishing gradient problem, and accelerates convergence.

### Methods to Avoid Overfitting in ResNet
**1. Batch Normalization:**
* Applied after each convolutional layer to normalize the activations and stabilize training.
* Helps to prevent overfitting by reducing internal covariate shift.

**2. Residual Connections:**
* Facilitate the training of deeper networks, which improves generalization and reduces overfitting.

**3. Data Augmentation:**
* Techniques such as random cropping, flipping, and color jittering are used during training to increase the diversity of the training data.

### Advantages of ResNet
**1. Improved Training of Deep Networks:**
* Residual connections allow very deep networks to be trained effectively by mitigating the vanishing gradient problem.

**2. State-of-the-Art Performance:**
* Achieved excellent results on various benchmarks and competitions, demonstrating its effectiveness for image recognition tasks.

**3. Scalability:**
* ResNet can be scaled to very deep networks (e.g., ResNet-101, ResNet-152) while maintaining good performance.

### Disadvantages of ResNet
**1. Increased Computational Cost:**
* Deeper networks require significant computational resources for training and inference.

**2. Complexity:**
* The architecture is more complex compared to simpler models, requiring careful implementation and tuning.