# A Keras 'Bare-Bones' Implementation of ResNet-34

First we'll create the residual unit, made up of two convolutional layers (no pooling layers), using Batch Normalization, ReLU activations, a 3x3 kernel, and preserved spatial dimensions (using a stride of 1, padding "same"; zero padding added to input so that output has the same spatial dimensions).

> Note: The bias term shifts the inputs passed to ReLU, ensuring activation even in the absense of strong input signals. It can be helpful to think of this as the general "brightness" of the filter. However, with a BatchNormalization layer sitting before the activation function, the output is normalized using separate trainable parameters (gamma, beta). This leaves the offset achieved by the bias term redundant, so we can specify we don't want to use bias terms when constructing the conv layer.

In [1]:
import tensorflow as tf
from functools import partial

DefaultConv2D = partial(
    tf.keras.layers.Conv2D, 
    kernel_size=3, strides=1, padding="same", kernel_initializer="he_normal", use_bias=False)

class ResUnit(tf.keras.layers.Layer):
    def __init__(self, filters, strides=1, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.activation = tf.keras.layers.Activation(activation)
        self.main_layers = [
            DefaultConv2D(filters, strides=strides),
            tf.keras.layers.BatchNormalization(),
            self.activation,
            DefaultConv2D(filters),
            tf.keras.layers.BatchNormalization()
        ]
        self.skip_layers = []
        if strides > 1:
            self.skip_layers = [
                DefaultConv2D(filters, kernel_size=1, strides=strides),
                tf.keras.layers.BatchNormalization()
            ]

    def call(self, inputs):
        main_output = inputs
        for layer in self.main_layers:
            main_output = layer(main_output)
        skip_output = inputs
        for layer in self.skip_layers:
            skip_output = layer(skip_output)
        return self.activation(main_output + skip_output)

When the number of feature maps is doubled and the spatial dimension is halved (using a convolutional layer with a stride of 2), the input and output shapes of the residual unit differ. To enable concatenation of the input and output, the skip connection should pass through a 1x1 convolutional layer with the same stride and number of feature maps as the main convolutional layer. The skip connections will pass through the `skip_layers`.

![resnet-block.svg](./resnet-block.svg)

In [2]:
model = tf.keras.Sequential([
    DefaultConv2D(64, kernel_size=7, strides=2, input_shape=[224, 224, 3]),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Activation("relu"),
    tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding="same"),
])

# count of filters for each stage in ResNet-34
filter_configuration = [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3
previous_filters = 64

for filters in filter_configuration:
    # if number of filters has changed, update the stride
    strides = 1 if filters == previous_filters else 2
    model.add(ResUnit(filters, strides=strides))
    previous_filters = filters

model.add(tf.keras.layers.GlobalAvgPool2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(10, activation="softmax"))

As can be seen, the architecture of a ResNet-34 includes:
- 3 RUs with 43 feature maps
- 4 RUs with 128 feature maps
- 6 RUs with 256 feature maps
- 3 RUs with 512 feature maps