## 5.2 Going deeper with ResNet Model

⚠️⚠️⚠️ *Please open this notebook in Google Colab* by click below link ⚠️⚠️⚠️<br><br>
<a href="https://colab.research.google.com/github/Muhammad-Yunus/Belajar-Image-Classification/blob/main/Pertemuan%205/5.2%20going_deeper_with_resnet_model.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><br><br><br>
- Click `Connect` button in top right Google Colab notebook,<br>
<img src="resource/cl-connect-gpu.png" width="250px">
- If connecting process completed, it will turn to something look like this<br>
<img src="resource/cl-connect-gpu-success.png" width="250px">

- Check GPU connected into Colab environment is active

In [None]:
!nvidia-smi

#### 5.2.1 Deeper Network vs Model Performance
- In the past experiment, we are able to achieve a good enough performance for image classification by just simply stacking several layer into the network, <br>
<img src="resource/small-net.png" width="100%"> <br>
- What if we keep adding extra layer and making the network got bigger and bigger?
    - Deeper convolutional neural networks some times beneficial to give a model ability to learn better.
- Until the train and validation accuracy is <font color="orange">saturated (won't changed)</font>, or even worst.<br>
<img src="resource/deep-network.png" width="100%"><br>
- This problem related to the network <font color="orange">degradation</font>.
    - Adding extra layer has no bennefit and accuracy remain the same.
    - The network experiencing <font color="orange">vanishing/exploding gradients</font>, making it hard to <font color="orange">converge</font> by optimizer. <br>
    <img src="resource/vanish-exploding-grad.png" width="500px"><br>
- This is explain why stacking more layers in deep neural network, not always making model learn better.
- When adding extra layer on that kind of situation, will just <font color="orange">learn to do nothing</font> and the result is <font color="orange">unchanged</font>.
    - The layer now act like an <font color="orange">Identity Function</font>, <br>
    <img src="resource/Identity-Function.png" width="550px"><br>
- On above illustration, we can verify that <font color="orange">with or without the additional layer</font>, the result is <font color="orange">unchanged</font>, so it makes sense if we just <font color="orange">skip it</font> (a.k.a <font color="cyan">Skip Connection</font>).    
    - This ensures the network <font color="orange">won’t degrade</font>.
- <font color="cyan">Skip Connection</font> allow the input to <font color="orange">skip</font> a layer and get <font color="orange">added</font> to the output of the next layer. 
    - This effectively means the network learns both the <font color="orange">transformation</font> and an <font color="orange">identity mapping</font>.<br>
    <img src="resource/residual-block-cat.png" width="600px"><br>

#### 5.2.2 Residual Block
- Based on above idea Kaiming He et al in his paper *['Deep Residual Learning for Image Recognition' - arxiv.org](https://arxiv.org/abs/1512.03385)*, proposing <font color="orange">Residual Block</font> to handling degradation problem in a very deep neural network. <br>
<img src="resource/residual-block.png" width="500px"><br><i>regular block (left) vs Residual Block (right) - source [[link](https://d2l.ai/chapter_convolutional-modern/resnet.html)]</i><br><br>
- It's called <font color="orange">"residual"</font> because it represents the difference between the <font color="orange">original signal</font> (input : $x$) and the <font color="orange">modified signal</font> (output : $F(x)$).
    - In the context of neural networks, a residual image captures what <font color="orange">remains after subtracting</font> the modified from the original signal ($G(x)$). 
    - It’s like a <font color="orange">visual residue</font> of the changes made. 
    - This concept helps models focus on learning changes or details that improve performance, rather than relearning everything from scratch.


- Implementation of <font color="orange">Residual Block</font> in Pytorch

In [None]:
import cv2
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms

- Define <font color="orange">Residual Block</font> following this structure, <br>
<img src="resource/residual-block-2.png" width="700px">

In [None]:
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        # Define first convolutional layer
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        # Define first batch normalization layer
        self.bn1 = nn.BatchNorm2d(out_channels)
        # Define second convolutional layer
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        # Define second batch normalization layer
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, x):
        # Apply first convolution -> batch normalization -> ReLU activation
        out = F.relu(self.bn1(self.conv1(x)))
        # Apply second convolution -> batch normalization
        out = self.bn2(self.conv2(out))
        # Add the input to the output
        out += self.shortcut(x)
        # Apply final ReLU activation
        out = F.relu(out)
        return out

In [None]:
# Load and preprocess an image
image = cv2.imread("cat.jpg")  # Load image 'cat.jpg' using OpenCV
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # Convert BGR to RGB
image = cv2.resize(image, (128, 128))

# Convert image to tensor
image = image.transpose((2, 0, 1))  # Change the order of dimensions from (H, W, C) to (C, H, W)
input_image = torch.tensor(image, dtype=torch.float32).unsqueeze(0)  # Add batch dimension

In [None]:
# Create a residual block and process the image
residual_block = ResidualBlock(in_channels=3, out_channels=3)
residual_image = residual_block(input_image)

In [None]:
# Calculate the residual (absolute difference between input and output)
visual_residual = torch.abs(input_image - residual_image)

In [None]:
# Convert tensors to NumPy arrays for visualization
input_image_np = input_image.squeeze().permute(1, 2, 0).detach().numpy().astype(np.uint8)
residual_image_np = residual_image.squeeze().permute(1, 2, 0).detach().numpy().astype(np.uint8)
visual_residual_np = visual_residual.squeeze().permute(1, 2, 0).detach().numpy().astype(np.uint8)

In [None]:
# Plot the images
fig, ax = plt.subplots(1, 3, figsize=(15, 5))
ax[0].imshow(input_image_np)
ax[0].set_title('Input Image')
ax[0].axis('off')

ax[1].imshow(residual_image_np)
ax[1].set_title('Output Image (After Residual Block)')
ax[1].axis('off')

ax[2].imshow(visual_residual_np)
ax[2].set_title('Visual Residual')
ax[2].axis('off')



<img src="resource/Resnet-18-Model.png" width="900px"><br><i>ResNet18 - source [[link](https://d2l.ai/chapter_convolutional-modern/resnet.html)]</i><br><br>
