In [1]:
import tensorflow as tf

## Rebuilding other architectures
Maybe we can add some smart tricks from price winning architectures. If you have read the book, you have seen some smart architectures. We can build those by small units by hand, and add layers like that. Maybe these smart ideas can help us out. The InceptionUnit would look like this:

<img src=https://images.deepai.org/django-summernote/2019-06-18/2cec735b-2347-4ded-ae2b-e8a8384f7b46.png width=600/>

In [2]:
class InceptionUnit(tf.keras.layers.Layer):
    def __init__(self, conv1_filters, conv3_filters, conv5_filters, conv1_max_filters, pre_conv3_filters, 
                 pre_conv5_filters, **kwargs):
        super().__init__(**kwargs)
        self.conv1 = tf.keras.layers.Conv2D(filters=conv1_filters, kernel_size=1, strides=1, padding='same', activation='relu')
        self.conv3 = tf.keras.layers.Conv2D(filters=conv3_filters, kernel_size=3, strides=1, padding='same', activation='relu')
        self.conv5 = tf.keras.layers.Conv2D(filters=conv5_filters, kernel_size=5, strides=1, padding='same', activation='relu')
        self.conv1_max = tf.keras.layers.Conv2D(filters=conv1_max_filters, kernel_size=1, strides=1, padding='same', activation='relu')
        self.pre_conv3 = tf.keras.layers.Conv2D(filters=pre_conv3_filters, kernel_size=1, strides=1, padding='same', activation='relu')
        self.pre_conv5 = tf.keras.layers.Conv2D(filters=pre_conv5_filters, kernel_size=1, strides=1, padding='same', activation='relu')
        self.pre_conv1 = tf.keras.layers.MaxPool2D(pool_size=(3,3), strides=1, padding='same')
    
    def call(self, inputs):
        c1 = self.conv1(inputs)
        pre_c3 = self.pre_conv3(inputs)
        c3 = self.conv3(pre_c3)
        pre_c5 = self.pre_conv5(inputs)
        c5 = self.conv5(pre_c5)
        pre_c1m = self.pre_conv1(inputs)
        c1m = self.conv1_max(pre_c1m)
        out = tf.concat([c1, c3, c5, c1m], axis=3)
        return out

Even though your model does not have to look as complex as this, you might still get some inspiration from this architecture. For example, note how the datastream is duplicated and fed through filters of a different size. That might be an interesting idea, in a simplified model.

Another architecture is ResNet. A ResNet module would  be build like this, with a skip layer.


<img src=http://d2l.ai/_images/resnet-block.svg 
     width=700/>

In [3]:
class ResidualUnit(tf.keras.layers.Layer):
  # when the layer is initialized, we create some default elements
    def __init__(self, filters, strides=1, activation="relu", **kwargs):
      super().__init__(**kwargs)
      # a general activation
      self.activation = tf.keras.activations.get(activation)
      # the main layers with 3x3 kernels 
      self.main_layers = [
          Conv2D(filters, kernel_size=3, strides=strides, padding='same'),
          BatchNormalization(),
          self.activation,
          Conv2D(filters, kernel_size=3, strides=1, padding='same'),
          BatchNormalization()]
      # and the skip layer, sometimes with a 1x1 kernel Conv2D    
      self.skip_layers = []
      if strides > 1:
          self.skip_layers = [
              Conv2D(filters, kernel_size=1, strides=strides),
              BatchNormalization()]

    # the main architecture.
    # we walk through the main layers, and add the skip layer at the end.
    def call(self, inputs):
      Z = inputs
      for layer in self.main_layers:
        Z = layer(Z)
      skip_Z = inputs
      for layer in self.skip_layers:
        skip_Z = layer(skip_Z)
      return self.activation(Z + skip_Z)
    
    # this is not essential, but can sometimes give errors if left out when saving the model.
    # this mainly returns the elements that where set on initialization.
    def get_config(self):
      config = super().get_config().copy()
      config.update({
        'activation': self.activation,
        'main_layers': self.main_layers,
        'skip_layers': self.skip_layers
        })
      return config

In [5]:
import sys
sys.path.insert(0, "..")
from pathlib import Path
from src.data import make_dataset
data_dir = Path("../data/raw")
make_dataset.get_raw_data(data_dir)

2021-12-21 13:20:06.836 | INFO     | src.data.make_dataset:get_raw_data:18 - found flowers in ../data/raw, not downloading again


In [6]:
from pathlib import Path
flowers_dir = Path("../data/raw/flower_photos")
flowers_dir.exists()

True

In [13]:
targetsize = (150, 150)
train, valid = make_dataset.dataset_from_dir(flowers_dir, targetsize=targetsize)

Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Found 3670 files belonging to 5 classes.
Using 734 files for validation.


Let's rebuild the complete architecture from [ResNet34](https://arxiv.org/pdf/1512.03385.pdf).
From their paper:
>We use SGD with a mini-batch size of 256. The learning rate starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to $60\times 10^4$ iterations. We use a weight decay of 0.0001 and a momentum of 0.9. We do not use dropout

And the table from their paper with architectures for different depths:
<img src=https://miro.medium.com/max/1400/1*aq0q7gCvuNUqnMHh4cpnIw.png width=600/>

Let us implement the 34-layer ResNet.

In [18]:
from tensorflow.keras.layers import (
    Activation,
    Input,
    Conv2D,
    BatchNormalization, 
    MaxPool2D, 
    GlobalAveragePooling2D,
    Dense,
)

input = Input(shape = targetsize+ (3,))
x = Conv2D(64, 7, strides=2, padding='same')(input)
x = BatchNormalization()(x)
x = Activation('relu')(x)
x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)
prev = 64
for filters in [64]*3+[128]*4+[256]*6+[512]*3:
    strides=1 if filters == prev else 2
    x = ResidualUnit(filters, strides=strides)(x)
    prev = filters

x = GlobalAveragePooling2D()(x)
output = Dense(5)(x)

model = tf.keras.Model(inputs = [input], outputs=[output])

In [24]:
from tensorflow.keras.optimizers import SGD, Adam
optimizer = SGD(learning_rate=1e-3, momentum=0.9)

model.compile(
  optimizer=optimizer,
  loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True), 
  metrics=['accuracy'])

In [25]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_3 (InputLayer)        [(None, 150, 150, 3)]     0         
                                                                 
 conv2d_37 (Conv2D)          (None, 75, 75, 64)        9472      
                                                                 
 batch_normalization_36 (Bat  (None, 75, 75, 64)       256       
 chNormalization)                                                
                                                                 
 activation_1 (Activation)   (None, 75, 75, 64)        0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 38, 38, 64)       0         
 2D)                                                             
                                                                 
 residual_unit_16 (ResidualU  (None, 38, 38, 64)       74368 

Note, how the image is reduced to 7x7 squares, while the amount of channels grows into 512 different features! Also, the hidden Dense layers are completely removed! We have just one Dense layer as the output, and the model is a full stack of Conv2D layers.

In [26]:
from tensorflow.keras.callbacks import ReduceLROnPlateau, TensorBoard
logdir = Path("logs/resnet")
cb = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=4, verbose=1, min_lr=0.01)
tb = TensorBoard(logdir, histogram_freq=1)

model.fit(
  train,
  validation_data=valid,
  callbacks=[cb, tb],
  epochs=10
)

Epoch 1/10
 5/92 [>.............................] - ETA: 3:56 - loss: 2.1248 - accuracy: 0.2313

KeyboardInterrupt: 

This will take over an hour for 100 epochs, even with a GPU on colab (note that you can't leave your model unattended on colab; if you laptop goes to sleep, colab might shut down your machine). 
But after 80 epochs, I got over 80%!

If you try this on the hub (that does not has enough RAM), or on a laptop with RAM below 16GB, it will crash the jupyter notebook due to the RAM memory the model needs. 

Yet, on colab it does train and the accuracy will get above 80% and is still slowly increasing. And if you have a machine where you can keep this running, unattended, an hour is no problem at all.

Also, realize that the ResNet they used to win a prize was trained for sixty thousands epochs!! Again, that is a complete new order of training your model...