## Deeper Network

### Skip Connection

**Degradation problem**
While training deep neural nets, the performance of the model drops down with the increase in depth of the architecture. This could be due to overfitting or vanishing/exploding gradients.

**skip connections**
 IT skips some layer in neural network and feeds the output of one layer as the input ot the next layers. 

<img src="./rsc/skip_connection.png" width="600" height="800">

There are two fundamental ways to use skip connections:
* **Addition** as in residual architecture
* **Concatenation** as in densely connected architecture

*further reading*
*[Skip Connection and Explanation of ResNet](https://chautuankien.medium.com/skip-connection-and-explanation-of-resnet-afabe792346c)

### Residual Block
Architecture of normal residual block (left) and pre-activation residual block (right). 
According to [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027), the pre-activation residual block makes training easier and improves generalization.
<img src="./rsc/residual_block.png" width="600" height="800">

In [18]:
import torch
from torch import nn
from torch.nn import functional as F

In [82]:
# normal residual block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1,padding=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

        self.shortcut = nn.Sequential()

        if stride !=1 or in_channels!=out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0, bias=False),
                nn.BatchNorm2d(out_channels)
            )
    def forward(self, X):
        out = F.relu(self.bn1(self.conv1(X)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(X)
        out = F.relu(out)

        return out
        

In [83]:
model = ResidualBlock(3, 64)
X = torch.randn((1,3,32,32))
Z = model(X)
Z.shape

torch.Size([1, 64, 32, 32])

In [104]:
# pre-activated residual block
class PreActBlock(nn.Module):
    def __init__(self, in_ch, out_ch, stride=1):
        super().__init__()
        self.bn1 = nn.BatchNorm2d(in_ch)
        self.conv1 = nn.Conv2d(in_ch, out_ch, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_ch)
        self.conv2 = nn.Conv2d(out_ch, out_ch, kernel_size=3, stride=1, padding=1, bias=False)

        self.shortcut = nn.Sequential()

        if stride !=1 or in_ch != out_ch:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_ch, out_ch, kernel_size=1, stride=stride, bias=False)
            )
    def forward(self, X):
        out = self.conv1(F.relu(self.bn1(X)))
        out = self.conv2(F.relu(self.bn2(out)))
        out += self.shortcut(X)

        return out
    

In [105]:
preact = PreActBlock(3, 64,stride=1)
X = torch.randn((1,3,32,32))
Z = preact(X)
Z.shape

torch.Size([1, 64, 32, 32])

<img src="./rsc/normal_convolution2.png" width="600" height="800">

### Pointwise Convolution

>Pointwise Convolution is a type of convolution that uses a 1x1 kernel: a kernel that iterates through every single point. This kernel has a depth of however many channels the input image has.

Convolution filter modifiy (HxW) size into a specific size with arranging a channel size. On the other hand, pointwise convolution (1x1 convolution) summarizes a size in a channel axis so that you can arange a desired channel size (reduction orexpansion).

Normal convolution:<br>

Input is a RGB image of size 12x12. A 5x5 convolution process with a stride of 1 and no padding -scalar multiplication with every 25pixels giving out 1 number every time- generates a 8x8 pixel image. Since the image has three channels, instead of 25 pixels, 75 pixels(25pixels * 3 channels) giving out 1 number. 
 
<img src="./rsc/normal_convolution.png" width="600" height="800">

<img src="./rsc/normal_convolution2.png" width="600" height="800">
Depthwise convolution:<br>

Input is a RGB image of size 12x12. A 5x5x1 convolution process with a stride of 1 and no padding -scalar multiplication with every 25pixels giving out 1 number every time- generates a 8x8 pixel per channel -> a 8x8x3 image.

<img src="./rsc/depthwise_convolution.png" width="600" height="800">

Pointwise convolution - reduction:<br>

Pointwise convolution uses a 1x1 kernel. Input is a RGB image of size 8x8. This pointwise convolution -sclar multiplication with 1x1pixel multiplied by channel numbers, 3, giving out 1 number <br>

<img src="./rsc/pointwise_convolution.png" width="600" height="800">

Pointwise convolution - expansion: <br>

Use 256 1x1x3 kernels that output a 8x8x1 image each to get a final image of shape 8x8x256.

<img src="./rsc/pointwise_convolution2.png" width="600" height="800">

**Advantage**:
* **Reduced computational load**
* **Paramter Efficiency**
* **Preserving Spatial Information**

*further reading*:
* [A Basic Introduction to Separable Convolutions](https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728n)
* [Exploring Pointwise Convolution in CNNs: Replacing Fully Connected Layers](https://www.analyticsvidhya.com/blog/2023/11/exploring-pointwise-convolution-in-cnns-replacing-fully-connected-layers/)

### Bottleneck

<img src="./rsc/bottleneck.png" width="300" height="400">

In [121]:
class Bottleneck(nn.Module):
    
    def __init__(self,in_ch, out_ch, stride=1):
        super().__init__()
        self.expansion_factor = 4
        self.conv1 = nn.Conv2d(in_ch, out_ch, kernel_size=1,stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(out_ch)
        
        self.conv2 = nn.Conv2d(out_ch, out_ch, kernel_size=3,stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_ch)
        
        self.conv3 = nn.Conv2d(out_ch, out_ch*self.expansion_factor, kernel_size=1,stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(out_ch*self.expansion_factor)

        self.shortcut = nn.Sequential()

        if stride !=1 or in_ch!=out_ch*self.expansion_factor:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_ch, out_ch*self.expansion_factor, kernel_size=1,stride=stride),
                nn.BatchNorm2d(out_ch*self.expansion_factor)
            )
    def forward(self,X):
        out = F.relu(self.bn1(self.conv1(X)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(X)
        out = F.relu(out)

        return out

In [124]:
model = Bottleneck(256, 64,stride=1)
X = torch.randn((1,256,32,32))
Z = model(X)
Z.shape

torch.Size([1, 256, 32, 32])

### ResNet Architecture

Residual blocks (commonly pre-activated residual blocks) are the main components of Residual network. ResNet is made by stacking these residual blocks together. 

ResNet-34 architecture:
<img src='./rsc/ResNet.png' width="600" height="800" title="ResNET">

Variants of ResNet are ResNet-18, ResNet-50, ResNet-101, ResNet-152 etc. ResNet which has more than 50 layers uses Bottleneck structure. 

*further reading*
* [Understanding ResNet Architecture: A Deep Dive into Residual Neural Network](https://medium.com/@ibtedaazeem/understanding-resnet-architecture-a-deep-dive-into-residual-neural-network-2c792e6537a9)


### Inception

Inception layer is a combination of 1x1 convolutional layer, 3x3 convolutional layaer, and 5x5 convolutional layer with 1x1 convolution layer for dimensionality reduction and a max pooling layer, giving out one output..


<img src="./rsc/inception.png" width="600" height="800">

image from [Going Deeper with Convolution](https://arxiv.org/abs/1409.4842) by C. Szegedy. 

##### scratch development of Inception

In [164]:
class Inception(nn.Module):

    def __init__(self, in_ch, out_ch1, out_ch2, out_ch3, out_ch_pool):
        super().__init__()

        # 1x1 conv
        self.bn1 = nn.BatchNorm2d(in_ch)
        self.conv1 = nn.Conv2d(in_ch, out_ch1, kernel_size=1, stride=1)

        # 1x1 pointwise
        self.bn2 = nn.BatchNorm2d(in_ch)
        self.conv2 = nn.Conv2d(in_ch, out_ch2, kernel_size=1,stride=1, padding=0, bias=False)
        # 3x3 conv
        self.bn3 = nn.BatchNorm2d(out_ch2)
        self.conv3 = nn.Conv2d(out_ch2, out_ch2, kernel_size=3, stride=1, padding=1)

        # 1x1 pointwise
        self.bn4 = nn.BatchNorm2d(in_ch)
        self.conv4 = nn.Conv2d(in_ch, out_ch3, kernel_size=1,stride=1, padding=0, bias=False)
        # 5x5 conv
        self.bn5 = nn.BatchNorm2d(out_ch3)
        self.conv5 = nn.Conv2d(out_ch3, out_ch3, kernel_size=5, stride=1, padding=2)

        # maxpoo with 3x3
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.bn6 = nn.BatchNorm2d(in_ch)
        self.conv6 = nn.Conv2d(in_ch, out_ch_pool, kernel_size=1,stride=1, padding=0, bias=False)

    def forward(self, X):

        # 1x1
        out1 = self.conv1(F.relu(self.bn1(X)))
        
        # 3x3
        out2 = self.conv2(F.relu(self.bn2(X)))
        out2 = self.conv3(F.relu(self.bn3(out2)))
        
        # 5x5
        out3 = self.conv4(F.relu(self.bn4(X)))
        out3 = self.conv5(F.relu(self.bn5(out3)))

        # maxpool
        out4 = self.maxpool(X)        
        out4 = self.conv6(F.relu(self.bn6(out4)))

        x = torch.cat([out1, out2, out3, out4], dim=1)

        return x

In [160]:
# using nn.Sequential improves readability
class InceptionModule(nn.Module):

    def __init__(self, in_ch, out_ch1, out_ch3, out_ch5, out_ch_pool):
        super().__init__()

        # 1x1
        self.branch1 = nn.Sequential(
            nn.Conv2d(in_ch, out_ch1, kernel_size=1),
            nn.BatchNorm2d(out_ch1),
            nn.ReLU(),
        )

        # 3x3
        self.branch2 = nn.Sequential(
            # pointwise
            nn.Conv2d(in_ch, out_ch3, kernel_size=1),
            nn.BatchNorm2d(out_ch3),
            nn.ReLU(),
            # 3x3 conv
            nn.Conv2d(out_ch3, out_ch3, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_ch3),
            nn.ReLU(),
        )

        # 5x5
        self.branch3 = nn.Sequential(
            # pointwise
            nn.Conv2d(in_ch, out_ch5, kernel_size=1),
            nn.BatchNorm2d(out_ch5),
            nn.ReLU(),
            # 5x5 conv
            nn.Conv2d(out_ch5, out_ch5, kernel_size=5, padding=2),
            nn.BatchNorm2d(out_ch5),
            nn.ReLU(),
        )

        # Max pooling
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_ch, out_ch_pool, kernel_size=1),
            nn.BatchNorm2d(out_ch_pool),
            nn.ReLU(),
        )
        
    def forward(self, X):
        branch1 = self.branch1(X)
        branch2 = self.branch2(X)
        branch3 = self.branch3(X)
        branch4 = self.branch4(X)

        return torch.cat([branch1, branch2, branch3, branch4], dim=1)

In [165]:
inception = Inception(192, 64, 128, 32, 32)
X = torch.randn((16,192, 28,28))
Z = inception(X)
Z.shape

torch.Size([16, 256, 28, 28])

In [163]:
inception = InceptionModule(192, 64, 128, 32, 32)
X = torch.randn((16,192, 28,28))
Z = inception(X)
Z.shape

torch.Size([16, 256, 28, 28])

### Depthwise Separable Convolution

Depthwise separable convolution is a combination of depthwise colution and pointwise convolution. 

These type of CNN's are widely used becauses of
* lesser number of parameters which reduces overfitting
* fewer parameters reduces computational cost
  
<img src="./rsc/normal_depthwise_convolution.png" width="600" height="800">

Some important applications of these type of CNNS's are MobileNet and Xception

*further reading*:
* [Depth wise Separable Convolutional Neural Networks](https://www.geeksforgeeks.org/depth-wise-separable-convolutional-neural-networks/)

#### Operation cost

$$\text{Total number of parameters} = \text{Input (C,H,W)} * \text{Filter (H,W)} * \text{Output channels} $$