# What is MobileNet?

As the name applied, the MobileNet model is designed to be used in mobile applications, and it is TensorFlow’s first mobile computer vision model. MobileNet uses depthwise separable convolutions. It significantly reduces the number of parameters when compared to the network with regular convolutions with the same depth in the nets. This results in lightweight deep neural networks.It was proposed by Andrew NG in “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”,2017. 

It basically means MobileNet performs a single convolution on each colour channel rather than combining all three and flattening it. 

<img src="./Sources/6.png" width="70%">

To understand what a depthwise separable convolution really is, let’s compare it to a normal convolution between a 12x12x3 input and 256 kernels of size 5x5x3.

 
# Normal convolution

* In a normal convolution, **all channels** of a kernel are used to produce a feature map and then to increase the number of channels to 256, we just have to use **256 filters of size 5x5x3**.

<img src="./Sources/1.png" width="70%">
<img src="./Sources/3.png" width="70%">

# Depthwise separable convolution

A depthwise separable convolution is divided into 2 parts:

    (1) Depthwise convolution   +  (2) Pointwise convolution
                    
In a depthwise convolution, **each channel** of a kernel is used to produce a feature map and then to increase the number of channels to 256, we just have to use **256 filters of size 1x1x3.**

<img src="./Sources/2.png" width="70%">
<img src="./Sources/4.png" width="70%">


# What is the difference?

The main difference is the number of computations. In our example:
* For a normal convolution, we have ((8x8x5x5)x3)x256 = **1,228,800** operations.
* For a depthwise separable convolution, we have 4800 + 49,152 = **53,952** operations:
    * in depthwise convolution: (8x8x5x5)x3 = 4800 operations.
    * in pointwise convolution: ((8x8x1x1)x3)x256 = 49,152 operations.

We can clearly see that a depthwise separable convolution is less expensive than a normal convolution (**~22.7% less computations**).

The reason is, in a normal convolution, we are transforming the image 256 times whereas in a depthwise separable convolution, we transform the image once and then expand it 256 times along the channel axis.

In [1]:
from tensorflow.keras.layers import Input, DepthwiseConv2D, Conv2D, BatchNormalization, ReLU, AvgPool2D, Flatten, Dense
from tensorflow.keras import Model

<img src="./Sources/9.png" width="90%">

In [8]:
def DSConv(x, filters, strides):
    x = DepthwiseConv2D(kernel_size=3, strides=strides, padding='same')(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
 
    x = Conv2D(filters=filters, kernel_size=1, strides=1, padding='same')(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    return x

<img src="./Sources/8.png" width="110%">

In [9]:
def mobile_net(in_shape):
    
    input = Input(in_shape)
    
    x = Conv2D(filters=32, kernel_size=3, strides=2, padding='same')(input)
    x = BatchNormalization()(x)
    x = ReLU()(x)

    # DSConv1
    x = DSConv(x, filters=64, strides=1)

    # DSConv2   & DSConv3
    x = DSConv(x, filters=128, strides=2)
    x = DSConv(x, filters=128, strides=1)

    # DSConv4   & DSConv5
    x = DSConv(x, filters=256, strides=2)
    x = DSConv(x, filters=256, strides=1)

    # DSConv6 
    x = DSConv(x, filters=512, strides=2)
    
    # DSConv7
    for _ in range(5):
        x = DSConv(x, filters=512, strides=1)

    # DSConv8   & DSConv9
    x = DSConv(x, filters=1024, strides=2)
    x = DSConv(x, filters=1024, strides=1)

    
    #GA Pool
    x = AvgPool2D(pool_size=7, strides=1)(x)
    
    output = Dense(units=1000, activation='softmax')(x)

    model = Model(inputs=input, outputs=output)

    return model

In [10]:
model=mobile_net(in_shape = (224, 224, 3))
model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 112, 112, 32)      896       
_________________________________________________________________
batch_normalization_28 (Batc (None, 112, 112, 32)      128       
_________________________________________________________________
re_lu_28 (ReLU)              (None, 112, 112, 32)      0         
_________________________________________________________________
depthwise_conv2d_13 (Depthwi (None, 112, 112, 32)      320       
_________________________________________________________________
batch_normalization_29 (Batc (None, 112, 112, 32)      128       
_________________________________________________________________
re_lu_29 (ReLU)              (None, 112, 112, 32)      0   

A PyTorch  implementation is available here: https://hackmd.io/@bouteille/rk-MSuYFU