<a href="https://colab.research.google.com/github/NarendraPatwardhan/quicksilver/blob/master/Efficient_Filter_Usage_for_CNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Efficient Filter Usage in Convolutional Neural Networks


This notebook shows the traditional approach of using **n x n** convolution filters and how by using consecutive** n x 1** and **1 x n** filters instead significantly reduces the number of parameters needed while maintaining the accuracy level.

A convolutional layer is defined by stacking *m* filters of size *$n_1$ x $n_2$*. When such a layer receives input with *c* channels, the number of parameters *$P$*  is given by:

>$P = W + B$

where *W* stands for the number of weights  and *B* stands for the number of biases in the convolutional layer and are obtained using the following equations.

>$W = n_1n_2 mc$ 

>$B = m$ 

thus when using the traditional approach with $n_1 = n_2 = n$ 

>$P_0 = m(n^2c + 1)$

replaceing this layer with combination of 2 seperate layers  having *m* filters with  *(n x 1)* and *(1 x n)* as filter sizes respectively  causes the number of parameters to be changed into:

>$P_1 = W_1 + B_1 + W_2 + B_2$

Using the base equation,

>$W_1 = nmc$

>$W_2 = nm^2$

>$B_1 = B_2 = m$

thus,

>$P _1= m(nc + nm + 2)$


Therefore replacing a layer is only advantageous when,
>$P_0 \geq P_1$

>$m(n^2c + 1) \geq m(nc + nm + 2)$

>$n \geq (1 + m/c + 1/cn)$ 

since $1/n << m$ in most cases, we can have an approximate rule of thumb for replacement as

>$n > (1 + m/c)$ 


In [0]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models

In [0]:
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

train_images, test_images = train_images / 255.0, test_images / 255.0

## The conventional model

In [0]:
v0 = models.Sequential()
v0.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
v0.add(layers.MaxPooling2D((2, 2)))
v0.add(layers.Conv2D(64, (3, 3), activation='relu'))
v0.add(layers.MaxPooling2D((2, 2)))
v0.add(layers.Conv2D(64, (3, 3), activation='relu'))
v0.add(layers.Flatten())
v0.add(layers.Dense(64, activation='relu'))
v0.add(layers.Dense(10, activation='softmax'))

Instructions for updating:
Colocations handled automatically by placer.


In [0]:
v0.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                36928     
__________

In [0]:
v0.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

_ = v0.fit(train_images, train_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [0]:
_, test_acc = v0.evaluate(test_images, test_labels)

print(test_acc)

0.9897


## The Efficient Filter Model

In the 3 convolutional layers in the network used before, we use the following table to know whether to replace the layer or not.

| Layer | n | m  | c  | m/c | n>m/c | replace? |
|-------|---|----|----|-----|-------|----------|
| 1     | 3 | 32 | 1  | 32  | False | No       |
| 2     | 3 | 64 | 32 | 2   | False | No       |
| 3     | 3 | 64 | 64 | 1   | True  | Yes    |

In [0]:
v1 = models.Sequential()
v1.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
v1.add(layers.MaxPooling2D((2, 2)))
v1.add(layers.Conv2D(64, (3, 3), activation='relu'))
v1.add(layers.MaxPooling2D((2, 2)))
v1.add(layers.Conv2D(64, (3, 1), activation='relu'))
v1.add(layers.Conv2D(64, (1, 3), activation='relu'))
v1.add(layers.Flatten())
v1.add(layers.Dense(64, activation='relu'))
v1.add(layers.Dense(10, activation='softmax'))

In [0]:
v1.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 3, 5, 64)          12352     
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 3, 3, 64)          12352     
_________________________________________________________________
flatten_1 (Flatten)          (None, 576)               0         
__________

Even in the simplistic case of using 3 x 3 filters and replacing only 1 convolutional layer, we obtain ~10% reduction in the number of parameters. This advantage is magnified in larger networks and in case of larger filter sizes.

In [0]:
v1.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

_ = v1.fit(train_images, train_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [0]:
_, test_acc = v1.evaluate(test_images, test_labels)

print(test_acc)

0.9914


It is evident that even with the reduction in the number of parameters, we attain a similar level of accuraccy.