**Implementing Convolutional Layers with Keras:**

In [1]:
from sklearn.datasets import load_sample_images
import tensorflow as tf

In [15]:
import numpy as np

In [2]:
images = load_sample_images()["images"]
images = tf.keras.layers.CenterCrop(height=70, width=120)(images)
images = tf.keras.layers.Rescaling(scale=1/255)(images)

In [3]:
images.shape

TensorShape([2, 70, 120, 3])

In [4]:
conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=7)
fmaps = conv_layer(images)

In [5]:
fmaps.shape

TensorShape([2, 64, 114, 32])

In [6]:
conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=7, padding="same")

In [7]:
fmaps = conv_layer(images)

In [8]:
fmaps.shape

TensorShape([2, 70, 120, 32])

In [9]:
kernels, biases = conv_layer.get_weights()

In [11]:
kernels.shape

(7, 7, 3, 32)

In [12]:
biases.shape

(32,)

**Pooling Layers:**

In [13]:
max_pool = tf.keras.layers.MaxPool2D(pool_size=2)
# strides default to the kernel size - so this layer uses a stride of 2 and by default also uses valid padding, i.e. no padding 

Max pooling - offers stronger translation invariance than avg pooling. We could also have depthwise maxpooling although its not as common. This can allow the cnn to be invariant to various features - for example it could learn multiple features each detecting a different rotation of the same pattern and the depthwise max pooling layer would ensure the output is the same regardless of the rotation. The CNN could similarly learn to be invariant to anything: thickness, brightness, skew, color, etc.

In [31]:
# custom depthwise max pool layer

class DepthPool(tf.keras.layers.Layer):
    def __init__(self, pool_size=2, **kwargs):
        super().__init__(**kwargs)
        self.pool_size = pool_size
    
    def call(self, inputs):
        shape = tf.shape(inputs) # shape[-1] is the number of channels
        groups = shape[-1] // self.pool_size # number of channel groups
        new_shape = tf.concat([shape[:-1],[groups, self.pool_size]], axis=0)
        return tf.reduce_max(tf.reshape(inputs, new_shape), axis=-1)

In [33]:
global_avg_pool = tf.keras.layers.GlobalAvgPool2D()

Common mistake is to use convolution kernels that are too large. For example, instead of using a conv layer with a 5x5 kernel, stack 2 layers with 3x3 kernels: it will use fewer parameters and will require fewer computations, and will perform better. 

**Basic CNN to tackle the Fashion MNIST problem:**

In [34]:
from functools import partial

In [35]:
DefaultConv2D = partial(tf.keras.layers.Conv2D, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal")

In [36]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=[28,28,1]),
    DefaultConv2D(filters=64, kernel_size=7),
    tf.keras.layers.MaxPool2D(),
    DefaultConv2D(filters=128),
    DefaultConv2D(filters=128),
    tf.keras.layers.MaxPool2D(),
    DefaultConv2D(filters=256),
    DefaultConv2D(filters=256),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=128, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units=64, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units=10, activation="softmax")
])

**LeNet-5**