# Add Deconvolution Pre-Stem to ResNet50

### Background

The ResNet50 architecture does not learn well (or at all) with small image sizes, such as the CIFAR-10 and CIFAR-100 whose image size is 32x32. The reason is that the feature maps are downsampled too soon in the architecture and become 1x1 (single pixel) before reaching the bottleneck layer prior to the classifier.

The ResNet50 was designed for 224 x 224 but will work well for size 128 x 128.

### Solution

We could updsample the CIFAR-10 images upstream from 32 x 32 to 128 x 128, using an interpolation algorithm such as BI-CUBIC --but this 'hardwired' interpolation may not be the best and may introduce artifacts. Additionally, being upstream from the model, it is generally an inefficient method.

Instead, we will add a Pre-Stem Group at the bottom (input) layer of a stock ResNet to learn the best upsampling using deconvolution. Additionally, the pre-stem becomes part of the graph.

### Step 1

We start with a stock `ResNet50` without a classifier and reset the input shape to (128, 128, 3), which we name as the `base` model.

Next, we add the classifier layer as a Dense layer of 10 nodes, which we name as the `resnet` model.

In [None]:
from tensorflow.keras import Sequential, Model
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, Conv2DTranspose
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import BatchNormalization

# Get a pre-built ResNet50 w/o the top layer (classifer) and input shape configured for 128 x 128
base = ResNet50(include_top=False, input_shape=(128, 128, 3), pooling='max')

# Add a new classifier (top) layer for the 10 classes in CIFAR-10
outputs = Dense(10, activation='softmax')(base.output)

# Rebuild the model with the new classifier
resnet = Model(base.input, outputs)
resnet.summary()

### Step 2

We construct a pre-stem group using two deconvolutions (also called a transpose convolution):

    1. First deconvolution takes (32, 32, 3) as input and upsamples to (64, 64, 3).
    2. Second deconvolution upsamples to (128, 128, 3)
    3. We use the add() method to attach the pre-stem to the resnet model.
    
Essentially, the pre-stem takes the (32, 32, 3) CIFAR-10 inputs and outputs (128, 128, 3) which is then the input to the resnet model.

In [None]:
# Create the pre-stem as a Sequential model
model = Sequential()

# This is the first deconvolution, which takes the (32, 32, 3) CIFAR-10 input and outputs (64, 64, 3)
model.add(Conv2DTranspose(3, (3, 3), strides=2, padding='same', activation='relu', input_shape=(32,32,3)))
model.add(BatchNormalization())

# This is the second deconvolution which outputs (128, 128, 3) which matches the input to our ResNet50 model
model.add(Conv2DTranspose(3, (3, 3), strides=2, padding='same', activation='relu'))
model.add(BatchNormalization())

# Add the ResNet50 model as the remaining layers and rebuild
model.add(resnet)
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['acc'])

model.summary()

### Train the Model

Let's partially train the model to demonstrate how a pre-stem works. First, for ResNet50 I find a reliable choice of optimizer and learning rate is the Adam optimizer with a learning rate = 0.001. While the batch normalization should provide the ability for higher learning rates, I find with higher ones on ResNet50 it per epoch loss does not converge.

We will use the CIFAR-10 builtin dataset and normalize the image data and one-hot encode the labels upstream from the model.

We will then use the fit() method for a small number of epochs (5) and set aside 10% of the training data for the per epoch validation data.

From my test run, I got:

    Epoch 1: 27.7%
    Epoch 2: 33.8%
    Epoch 3: 42.9%
    Epoch 4: 35.9%  -- dropped into a less good local minima
    Epoch 5: 49.1%  -- found a better local minima to dive into

In [None]:
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import numpy as np

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = (x_train / 255.0).astype(np.float32)
x_test  = (x_test  / 255.0).astype(np.float32)

y_train = to_categorical(y_train)
y_test  = to_categorical(y_test)

pivot=int(len(x_train) * 0.1)
x_val = x_train[:pivot]
y_val = y_train[:pivot]
x_train = x_train[pivot:]
y_train = y_train[pivot:]

print(x_train.shape)

In [None]:
model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=1, validation_split=0.1)