# <font style="color:#0015FF">Advanced Convolutional Neural Networks</font> 

by: **Amr Abdelhamed**

**[Linkedin profile](https://www.linkedin.com/in/amrabdelhamed69/)
[Github repo](https://github.com/Amrabdelhamed611/ML_Implementation/tree/main/DL)**

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Residual-Block" data-toc-modified-id="Residual-Block-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Residual Block</a></span></li><li><span><a href="#Bottleneck-Residual-Block" data-toc-modified-id="Bottleneck-Residual-Block-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Bottleneck Residual Block</a></span></li><li><span><a href="#Linear-BottleNecks" data-toc-modified-id="Linear-BottleNecks-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Linear BottleNecks</a></span></li><li><span><a href="#Inverted-Residual-Block" data-toc-modified-id="Inverted-Residual-Block-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Inverted Residual Block</a></span></li><li><span><a href="#Depth-Wise-Separable-Convolution" data-toc-modified-id="Depth-Wise-Separable-Convolution-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Depth-Wise Separable Convolution</a></span></li><li><span><a href="#Squeeze-and-Excitation-Block" data-toc-modified-id="Squeeze-and-Excitation-Block-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Squeeze and Excitation Block</a></span></li><li><span><a href="#MBConv" data-toc-modified-id="MBConv-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>MBConv</a></span></li><li><span><a href="#Fused-Inverted-Residual-(Fused-MBConv)-:" data-toc-modified-id="Fused-Inverted-Residual-(Fused-MBConv)-:-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Fused Inverted Residual (Fused MBConv) :</a></span></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Conclusion</a></span></li><li><span><a href="#Resources" data-toc-modified-id="Resources-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Resources</a></span></li></ul></div>

In [1]:
import numpy as np
import pandas as pd
import torch ,torchinfo
import torch.nn as nn
import torch.nn.functional as F
from functools import partial 
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [2]:
class ConvBnAct(nn.Module):
    """
    A class to add block of 3 layers (conv  layer ,batch-normalization , activation function).
    
    """
    def __init__(self,in_channels,
                 out_channels,
                 kernel_size= 3,
                 conv_kwargs={},
                 Bn = nn.BatchNorm2d,
                 actvtion = nn.SiLU,
                 actvtion_kwargs={}):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            kernel_size : int (default is 3)
                kernal size for conv layer.
            conv_kwargs : dictionary (default empty dictionary )
                pass customize keyword arguments to conv2d function.
            Bn : nn.module.batchnorm (default is nn.BatchNorm2d)
                Batch normlizetion layer to Add.
            actvtion : nn.modules.activation (default is SilU)
                Activation function.
            actvtion_kwargs : dictionary (default empty dictionary )
                pass customize keyword arguments to Actvtion function.
        """
        
        super(ConvBnAct, self).__init__()
    
        self.Conv_BN_Act = nn.Sequential(
            nn.Conv2d(in_channels, out_channels,bias=False,kernel_size= kernel_size,**conv_kwargs),
            Bn(out_channels),
            actvtion(**actvtion_kwargs))
            # bias set to  false as we use BN
    def forward(self, x):
        return self.Conv_BN_Act(x)
#--------------------------------------------------------------------
#C_kwargs={'stride':1,'kernel_size':1,'padding' :0,'groups': 1,'padding_mode':'same'}
#A_kwargs={'inplace':True}
Conv1X1BnAct = partial(ConvBnAct,kernel_size= 1)
Conv3X3BnAct  = partial(ConvBnAct,kernel_size= 3)

x = torch.randn((1, 16, 6, 6))
Conv3X3BnAct(16, 16)(x).shape

torch.Size([1, 16, 4, 4])

## Residual Block
use direct connection to skip some layers(called skip connection) , in a way the connection passes input $x$ deeper in the network without apply any transformation on it.

the output of the layer is without skip connection: $H(x)= F(wx+b) $ where $F$ an activation function.
skip connection is element-wise addition of the input of the Residual Block $x$ and the output $H(X)$.
 
and we must consider to check o the shape of residuals $x$ and outputs $H(x)$:
* if they shapes are equales we carry on the addition operation.
* if they not equales we can use conv layer on the short cut to reduce the shape of residuals $x$ or pad $H(x)$ to increase the shape of $H(x)$.

<img src="./attachments/cnn_layers_image_1.jpg" alt="Residual Block" style="width:300px;height:300px;">

In [3]:
class ResBlock(nn.Module):
    """
    A class to implement a residual block by adding skip for given module.
    
    """
    def __init__(self, block,shortcutblock = None):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            block : nn.module 
                block to add skip conection to connect bloc input to its output.
            shortcutblock : nn.module
                shortcutblock a nn layer  (conv,pool)  to match shape of resduals and outputs

        """
        super(ResBlock, self).__init__()
        self.block = block
        self.shortcut= shortcutblock
    def forward(self, x):
        xres=x
        x = self.block(x)
        if x.shape ==xres.shape:
            x = x+xres
        else:
            assert isinstance(self.shortcut,nn.Module),'shortcutblock must be conv layer to match shape of resduals and outputs'
            x= self.shortcut(xres)+x
        return x
    
x = torch.rand((1,1,3,3))
nnconv = nn.Conv2d(in_channels=1,out_channels=1,kernel_size= 1,stride= 1)
resconv= ResBlock(nnconv)

print(f'input shape: {x.shape}')
print(f'output shape of conv block: {nnconv(x).shape}')
print(f'output shape of res bloc: {resconv(x).shape}')

input shape: torch.Size([1, 1, 3, 3])
output shape of conv block: torch.Size([1, 1, 3, 3])
output shape of res bloc: torch.Size([1, 1, 3, 3])


##  Bottleneck Residual Block
bottleneck layer has fewer neurons than the layer below or above it.such a layer encourages the network to compress feature representations to best fit in the available space.

A Bottleneck Residual Block takes an input of shape `HxWxC`, then reduces it to `HxWxC/r` using `1x1` conv, then applies a `3x3` conv and finally scale up the output to the same feature dimension as the input,`HxWxC` using again a 1x1 conv , and use skip connection to add the bottleneck output to the . 

A Bottleneck Residual Block combine both Bottleneck Block and Residual Block by adding skip connection to skip the bottleneck block.

<img src="./attachments/cnn_layers_image_2.jpg" alt="Residual Block" style="width:600px;height:300px;">

In [4]:
class BottleneckBlock(nn.Module):
    """
    A class to implement a Bottleneck Block.
    
    """
    def __init__(self, in_channels, out_channels, compersion_ratio = 4):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            compersion_ratio : int (default is 4)
                feature compersion ratio.
        """
        super(BottleneckBlock, self).__init__()
        reduced_channels = out_channels// compersion_ratio
        C_kwargs={'stride':1,'padding' :'same'}
        A_kwargs={'inplace':True}
        block = nn.Sequential(
                        Conv1X1BnAct(in_channels, reduced_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv3X3BnAct(reduced_channels, reduced_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv1X1BnAct(reduced_channels, out_channels ,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs))
        shortcut = Conv1X1BnAct(in_channels, out_channels,
                                conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs)  if in_channels != out_channels else None
        self.BottleNeck = ResBlock(block ,shortcut)
        
    def forward(self, x):
        x= self.BottleNeck(x)
        return x
    
x = torch.randn((1, 5, 4,4))  
BBn =BottleneckBlock(5,4)    
print(f'input shape: {x.shape}')
print(f'output shape: {BBn(x).shape}')
    

input shape: torch.Size([1, 5, 4, 4])
output shape: torch.Size([1, 4, 4, 4])


## Linear BottleNecks
Linear BottleNecks were introduced in MobileNetV2. A Linear BottleNeck Block is a BottleNeck Block without the last activation. the authors discussed how loss in performance due to dying rule.

In [5]:
class LinearBottleneckBlock(nn.Module):
    """
    A class to implement a Linear Bottleneck Block.
    
    """
    def __init__(self, in_channels, out_channels, compersion_ratio = 4):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            compersion_ratio : int (default is 4)
                feature compersion ratio.
        """
        super(LinearBottleneckBlock, self).__init__()
        reduced_channels = out_channels// compersion_ratio
        C_kwargs={'stride':1,'padding' :'same'}
        A_kwargs={'inplace':True}
        
        block = nn.Sequential(
                        Conv1X1BnAct(in_channels, reduced_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv3X3BnAct(reduced_channels, reduced_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv1X1BnAct(reduced_channels, out_channels, actvtion=nn.Identity ,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs ))
        shortcut = Conv1X1BnAct(in_channels, out_channels,
                                    conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs)  if in_channels != out_channels else None
        self.BottleNeck = ResBlock(block ,shortcut)
        
    def forward(self, x):
        
        x= self.BottleNeck(x)
        return x
    
x = torch.randn((1, 4, 4,4))  
BBn =LinearBottleneckBlock(4,4)    
print(f'input shape: {x.shape}')
print(f'output shape: {BBn(x).shape}')
    

input shape: torch.Size([1, 4, 4, 4])
output shape: torch.Size([1, 4, 4, 4])


## Inverted Residual Block
think of inverted residual as Bottleneck Residual Block but expand the features maps instead of reduce them.

Inverted Residual Block: takes an input of shape `HxWxC`, then expand it to `HxWxC*e` using `1x1` conv, then applies a `3x3` conv and finally scale up the output to the same feature dimension as the input `HxWxC` using again a `1x1` conv , and use skip connection to add the bottleneck output to the .

<img src="./attachments/cnn_layers_image_3.jpg" alt="Residual Block" style="width:600px;height:300px;">

In [6]:
class InvertedResidualBlock(nn.Module):
    """
    A class to implement a Inverted Residual Block.
    
    """
    def __init__(self, in_channels, out_channels, expansion_ratio = 4):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            expansion_ratio : int (default is 4)
                feature Expansion ratio.
        """
        super(InvertedResidualBlock, self).__init__()
        expanded_channels = int(in_channels*expansion_ratio)
        C_kwargs={'stride':1,'padding' :'same'}
        A_kwargs={'inplace':True}
        block = nn.Sequential(
                        Conv1X1BnAct(in_channels, expanded_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv3X3BnAct(expanded_channels, expanded_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv1X1BnAct(expanded_channels, out_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs))
        shortcut = Conv1X1BnAct(in_channels, out_channels,
                                conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs)  if in_channels != out_channels else None
        self.Block = ResBlock(block ,shortcut)
        
    def forward(self, x):
        x= self.Block(x)
        return x
    
x = torch.randn((1, 4, 3,3))  

IR =InvertedResidualBlock(4,5)    

print(f'input shape: {x.shape}')
print(f'output shape: {IR(x).shape}')    

input shape: torch.Size([1, 4, 3, 3])
output shape: torch.Size([1, 5, 3, 3])


## Depth-Wise Separable Convolution

The idea is spatial dimension `HxW` of a filter can be separated from the depth `C` of the filter by apply filter on the the spatial dimension and output will has the same depth as the input the filter will only apply on the `Hxw`.

Depth-Wise Separable Convolutions for `HxWxC`applies single `3x3xC` filter to each input's channels, then a N `1x1xc` conv to all the channels so the output `HxWxN` .hence, spatial dimension `HxW` may change depend on the stride and padding using.

<img src="./attachments/cnn_layers_image_4.jpg" alt="Residual Block" style="width:600px;height:300px;">

were `3x3` filter  called depth wise and `1x1` called point wise.

In [7]:
class DWSConv(nn.Module):
    """
    A class to implement a  Depth-Wise Separable Convolution.
    
    """
    def __init__(self, in_channels, out_channels):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
        """
        super(DWSConv, self).__init__()
        C1_kwargs={'stride':1,'padding' :'same','groups': in_channels}
        C2_kwargs={'stride':1,'padding' :'same'}
        A_kwargs={'inplace':True}
        self.Block = nn.Sequential(
                        Conv3X3BnAct(in_channels,in_channels,
                                     conv_kwargs=C1_kwargs,actvtion_kwargs=A_kwargs),
                        Conv1X1BnAct(in_channels, out_channels,
                                     conv_kwargs=C2_kwargs,actvtion_kwargs=A_kwargs))
    
    def forward(self, x):
        x= self.Block(x)
        return x
    
x = torch.randn((1, 3, 5,5))  
DWS =DWSConv(3,3)   
tconv =nn.Conv2d(3,3,kernel_size=3,stride=1,padding=1)
#print(x.view(3,5,5))
#print(DWS(x))

print(f'''shape of Depth-Wise Separable Convolution output: {list(DWS(x).shape)} 
shape of tradtional Convolution output: {list(tconv (x).shape)} 
parameter count of Depth-Wise Separable Convolution : {sum(p.numel() for p in DWS.parameters() if p.requires_grad)} 
parameter count of tradtional Convolution :  {sum(p.numel() for p in tconv.parameters() if p.requires_grad)} 
parameter count of Depth-Wise Separable Convolution with more channels : {sum(p.numel() for p in DWSConv(32,64).parameters() if p.requires_grad)}
parameter count of traditional Convolution with more channels :  {sum(p.numel() for p in nn.Conv2d(32, 64, kernel_size=3).parameters() if p.requires_grad)}
''')

shape of Depth-Wise Separable Convolution output: [1, 3, 5, 5] 
shape of tradtional Convolution output: [1, 3, 5, 5] 
parameter count of Depth-Wise Separable Convolution : 48 
parameter count of tradtional Convolution :  84 
parameter count of Depth-Wise Separable Convolution with more channels : 2528
parameter count of traditional Convolution with more channels :  18496



there is difference between traditional Convolutions Depth-Wise Separable Convolutions in parameter count specially when has more bigger number of channels so that makes Depth-Wise Separable Convolutions more efficient.

## Squeeze and Excitation Block
squeeze excitation makes the neural nets able to map the channels dependency along with access to global information (means use information from the whole batch not only the sample). 
 
**Squeeze :**
used to extract the global information from each channel of the feature map `BxCxHxW` (wher B is batch size).

the convolution is a local operation on parts of feature map .So, it is beneficial to get a global understanding of the feature map by use global pooling to reduce the spatial dimensions from `BxCxHxW` to `BxCx1x1`,by using Max Pooling either Average Pooling (the paper authors found Average Pooling has lower error)

**Excitation :**
The feature map reduced to `BxCx1x1`,excitation operation use a fully connected Layer with a bottleneck structure to generate the weights for each channel of the feature map adaptively.

<img src="./attachments/cnn_layers_image_5.jpg" alt="Residual Block" style="width:800px;height:200px;">

In [8]:
class SE(nn.Module):
    """
    A class to implement a Squeeze and Excitation Block.
    
    """
    def __init__(self, channels, squeeze_ratio=8):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            Squeeze_ratio : int (default is 8)
                feature reduction ratio.
        """
        super(SE, self).__init__()
        squeezed_channels = channels//squeeze_ratio
        self.SEBlock =  nn.Sequential(
                                nn.AdaptiveAvgPool2d(1),
                                nn.Conv2d(channels,squeezed_channels,kernel_size=1),
                                nn.SiLU(inplace=True),
                                nn.Conv2d(squeezed_channels, channels,kernel_size =1),
                                nn.Sigmoid())
    
    def forward(self, x):
        return x * self.SEBlock(x)
    
x = torch.randn((1, 3, 5,5))  
se=SE(3,3)   

print(f'input shape: {x.shape}')
print(f'output shape: {se(x).shape}')   

input shape: torch.Size([1, 3, 5, 5])
output shape: torch.Size([1, 3, 5, 5])


## MBConv 
a key component in Efficient net.
it is slightly different then Inverted Residual :
* normalization is applied to both depth and point convolution. 
* non-linearity only in the depth convolution.

<img src="./attachments/cnn_layers_image_6.jpg" alt="Residual Block" style="width:600px;height:400px;">

In [9]:
class MBConv(nn.Module):
    """
    A class to implement Inverted Residual Block used in mobilenetv2.
    
    """
    def __init__(self, in_channels, out_channels, expansion_ratio = 4,squeeze_ratio=4):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            expansion_ratio : int (default is 4)
                feature reduction ratio.
            Squeeze_ratio : int (default is 4)
                feature reduction ratio.
        """
        super(MBConv, self).__init__()
        expanded_channels = int(in_channels*expansion_ratio)
        C_kwargs={'stride':1,'padding' :'same'}
        C1_kwargs={'stride':1,'padding' :'same','groups': expanded_channels}
        A_kwargs={'inplace':True}
        block = nn.Sequential(
                        Conv1X1BnAct(in_channels, expanded_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        Conv3X3BnAct(expanded_channels, expanded_channels,
                                     conv_kwargs=C1_kwargs,actvtion_kwargs=A_kwargs),
                        SE(expanded_channels,squeeze_ratio),
                        Conv1X1BnAct(expanded_channels, out_channels,actvtion=nn.Identity,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs))
       
        self.Block = nn.Sequential( ResBlock(block) ,nn.SiLU(inplace=True))
        
    def forward(self, x):
        x= self.Block(x)
        return x
    
x = torch.randn((1, 4, 3,3))  

mbconv =MBConv(4,4)    
#print(x.view(4,3,3))
#print(mbconv(x).view(4,3,3))

print(f'input shape: {x.shape}')
print(f'output shape: {mbconv(x).shape}')   

input shape: torch.Size([1, 4, 3, 3])
output shape: torch.Size([1, 4, 3, 3])


## Fused Inverted Residual (Fused MBConv) :

Fused Inverted Residuals were introduced in EfficientNetV2 Smaller Models and Faster Training to make MBConv faster. 


In [10]:
class FusedMBConv(nn.Module):
    """
    A class to implement Inverted Residual Block used in EfficientNetV2.
    
    """
    def __init__(self, in_channels, out_channels, expansion_ratio = 4,squeeze_ratio=4):
        """
        Constructs the there layers.
        
        Parameters
        ----------
            in_channels : int 
                block input channels.
            out_channels : int
                block output channels.
            expansion_ratio : int (default is 4)
                feature reduction ratio.
            Squeeze_ratio : int (default is 4)
                feature reduction ratio.
        """
        super(FusedMBConv, self).__init__()
        expanded_channels = int(in_channels*expansion_ratio)
        C_kwargs={'stride':1,'padding' :'same'}
        A_kwargs={'inplace':True}
        block = nn.Sequential(
                        Conv3X3BnAct(in_channels, expanded_channels,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),
                        SE(expanded_channels,squeeze_ratio),
                        Conv1X1BnAct(expanded_channels, out_channels,actvtion=nn.Identity,
                                     conv_kwargs=C_kwargs,actvtion_kwargs=A_kwargs),)
       
        self.Block = nn.Sequential( ResBlock(block) ,nn.SiLU(inplace=True))
        
    def forward(self, x):
        x= self.Block(x)
        return x
    
x = torch.randn((1, 4, 3,3))  

fusedmbconv =FusedMBConv(4,4)    
#print(x.view(4,3,3))
#print(fusedmbconv(x).view(4,3,3))
print(f'input shape: {x.shape}')
print(f'output shape: {fusedmbconv(x).shape}')   

input shape: torch.Size([1, 4, 3, 3])
output shape: torch.Size([1, 4, 3, 3])


## Conclusion
Now we have the building blocks of many **convolutional neural network architectures** like `Res-Nets`, `SE-Net`, `Mobile-Nets`, `Efficient-Nets` so we can read , understand , implement various papers which uses the building blocks we Built.

## Resources
* [Brief introduction of mobilenetv1 / V2 / V3 lightweight network ](https://developpaper.com/brief-introduction-of-mobilenetv1-v2-v3-lightweight-network/)
* [Inverted Residual Block paper](https://paperswithcode.com/method/inverted-residual-block)
* [Squeeze and Excitation Networks Explained ](https://amaarora.github.io/2020/07/24/SeNet.html)
* [Squeeze and Excitation Networks By Nikhil Tomar](https://medium.com/analytics-vidhya/squeeze-and-excitation-networks-idiot-developer-17de2fd02596?utm_source=pocket_mylist)
* [EfficientNetV2 paper ](https://paperswithcode.com/method/inverted-residual-block)
