<b> Quick Tip </b>

To apply depthwise convolution, we will only use one more parameter of Conv2d function.
<b> groups </b> parameter is used to manage the way we apply the kernels. The default value of this parameter 
is 1. So until know we didn't do anything explicitly, but applied all the kernels in a filter together, since they 
form 1 group. If we said groups = 2, we would group the kernels in a filter 2 by 2 and apply them group by group. 
Since depthwise separable convolutions applies all the kernels 1 by 1, (so every kernel creates one group alone)
we need to set groups parameter to the input_channel size. 

For example, we have an input image with 3 channels --> so we a filter with 3 kernels. We need to set groups = 3
to divide this 3 kernels into 3 groups, which gives us 1 kernel per group so we apply all the kernels individually and obtain one feature map from each.

In [1]:
from torch import nn
import torch
import torchvision.transforms as transforms
from torchsummary import summary
import cv2

""" standard convolution """

conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3)
params = sum(p.numel() for p in conv.parameters() if p.requires_grad)

""" depthwise separable convolution"""

depth_conv = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, groups=3) # depthwise
point_conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=1) # pointwise

depthwise_separable_conv = nn.Sequential(depth_conv, point_conv)
params_depthwise = sum(p.numel() for p in depthwise_separable_conv.parameters() if p.requires_grad)


print(f"The standard convolution uses {params} parameters.")
print(f"The depthwise separable convolution uses {params_depthwise} parameters.")




The standard convolution uses 896 parameters.
The depthwise separable convolution uses 158 parameters.


Apply depthwise separable convolution to the input image

In [2]:
img = cv2.imread("data_flowers/daisy/100080576_f52e8ee070_n.jpg") 

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224,224)),
    transforms.ToTensor()
])  

img = transform(img)
img = img.unsqueeze(0)

out = depthwise_separable_conv(img)
print(out.shape)

torch.Size([1, 32, 222, 222])


After learning how to create depthwise separable convolution in Pytorch, its nothing more than added necessary 
convolutions sequentially to obtain the MobileNetV1 architecture.

Note that for the following table;

<b> dw </b> means depthwise separable convolution, <br>
<b> s1 </b> means stride = 1 <br>
<b> s2 </b> means stride = 2 <br>


![Screenshot%20from%202023-02-06%2016-18-59.png](attachment:Screenshot%20from%202023-02-06%2016-18-59.png)


Note that MobilenetV1 uses Batch Normalization and ReLU activation function both after depthwise and pointwise convolutions.

![Screenshot%20from%202023-02-06%2018-51-05.png](attachment:Screenshot%20from%202023-02-06%2018-51-05.png)

<b> Quick Tip </b> <br>
Below code implements a little the first 5 convolutional block of mobilenetv1 with width multiplier.
Channel argument given to the PyTorch's any layer creation function, should be integer. Therefore 
we use int() conversion to not obtain the following error <u> "empty() received an invalid combination of arguments - got (float, dtype=NoneType, device=NoneType), but expected one of:" </u>


In [3]:
class MobilenetV1(nn.Module):
     
    def __init__(self, wm=1.0):
        """
        Inputs:
            wm = width multiplier
        """
        super().__init__()

        self.net = nn.Sequential(
            nn.Conv2d(3, int(32*wm), kernel_size=3, padding=1, stride=2), #squeeze
            nn.BatchNorm2d(int(32*wm)),
            nn.ReLU(),
            nn.Conv2d(int(32*wm), int(32*wm), kernel_size=3, groups=int(32*wm), padding="same", stride=1), # depthwise
            nn.Conv2d(int(32*wm), int(64*wm), kernel_size=1, stride=1), # pointwise
            nn.BatchNorm2d(int(64*wm)),
            nn.ReLU(),
            nn.Conv2d(int(64*wm), int(64*wm), kernel_size=3, groups=int(64*wm), padding=1, stride=2), # depthwise
            nn.Conv2d(int(64*wm), int(128*wm), kernel_size=1, stride=1), # pointwise
            nn.BatchNorm2d(int(128*wm)),
            nn.ReLU(),
        )

   
    def forward(self, x):
        x = self.net(x)
        return x

In [4]:
net = MobilenetV1()

summary(net, input_size = (3, 224, 224), batch_size = 1)


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [1, 32, 112, 112]             896
       BatchNorm2d-2          [1, 32, 112, 112]              64
              ReLU-3          [1, 32, 112, 112]               0
            Conv2d-4          [1, 32, 112, 112]             320
            Conv2d-5          [1, 64, 112, 112]           2,112
       BatchNorm2d-6          [1, 64, 112, 112]             128
              ReLU-7          [1, 64, 112, 112]               0
            Conv2d-8            [1, 64, 56, 56]             640
            Conv2d-9           [1, 128, 56, 56]           8,320
      BatchNorm2d-10           [1, 128, 56, 56]             256
             ReLU-11           [1, 128, 56, 56]               0
Total params: 12,736
Trainable params: 12,736
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/ba

In [5]:
net = MobilenetV1(wm=0.5)

summary(net, input_size = (3, 224, 224), batch_size = 1)


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [1, 16, 112, 112]             448
       BatchNorm2d-2          [1, 16, 112, 112]              32
              ReLU-3          [1, 16, 112, 112]               0
            Conv2d-4          [1, 16, 112, 112]             160
            Conv2d-5          [1, 32, 112, 112]             544
       BatchNorm2d-6          [1, 32, 112, 112]              64
              ReLU-7          [1, 32, 112, 112]               0
            Conv2d-8            [1, 32, 56, 56]             320
            Conv2d-9            [1, 64, 56, 56]           2,112
      BatchNorm2d-10            [1, 64, 56, 56]             128
             ReLU-11            [1, 64, 56, 56]               0
Total params: 3,808
Trainable params: 3,808
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/back