In [None]:
# Copyright 2019 Google LLC
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<a target="_blank" href="https://colab.research.google.com/github/GoogleCloudPlatform/keras-idiomatic-programmer/blob/master/community-labs/Community Lab - Encoders for CNN.ipynb">
<img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>

For best performance using Colab, once the notebook is launched, from dropdown menu select **Runtime -> Change Runtime Type**, and select **GPU** for **Hardware Accelerator**.

# Composable "Design Pattern" for AutoML friendly models

## Community Lab 1: Training Encoder for CNN

### Objective

To replace a traditional "stem convolution group" of higher input dimensionality with lower dimensionality encoding, learned from first training the dataset on an autoencoder. Goal is that by using a lower dimensionality encoding, one can substantially increase training time of a model.

*Question*: Can one achieve the same accuracy as using the original input image?

*Question*: How fast can we speed up training?

### Approach

We will use the composable design pattern, and prebuilt units from the Google Cloud AI Developer Relations repo: [Model Zoo](https://github.com/GoogleCloudPlatform/keras-idiomatic-programmer/tree/master/zoo)

If you are not familiar with the Composable design pattern, we recommemd you review the [ResNet](https://github.com/GoogleCloudPlatform/keras-idiomatic-programmer/tree/master/zoo/resnet) model in our zoo. Then review the [AutoEncoder](https://github.com/GoogleCloudPlatform/keras-idiomatic-programmer/tree/master/zoo/autoencoder) model.

We recommend a constant set for hyperparameters, where batch_size is 32 and initial learning rate is 0.001 -- but you may use any value for hyperparameters you prefer.

We will use the metaparameters feature in the composable design pattern for the macro architecture search -- sort of a 'human assisted AutoML'.

We recommend using a warmup training to find most optimal numerical stabilization of weights.

### Reporting Findings

You can contact us on your findings via the twitter account: @andrewferlitsch

### Dataset

In this notebook, we use the CIFAR-10 datasets which consist of images 32x32x3 for 10 classes -- but you may use any dataset you prefer.

### Steps

1. Build and Train an AutoEncoder for CIFAR10 (or your dataset).

2. Extract the pretrained Encoder network from the trained AutoEncoder.

3. Preprocess the training and test data with the Encoder.

4. Build a composable model for CIFAR10 using the Encoder embedding.

5. Use warmup to initialize the weights on the model.

6. Train the model with the encoded training set.

7. Evaluate the model with the endoded test set.

8. Repeat making macro architecture modifications to the AutoEncoder and/or model.

## Lab

### Imports

In [None]:
import tensorflow as tf
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Conv2D, Flatten, Conv2DTranspose, ReLU, Add, Dense, Dropout, Activation
from tensorflow.keras.layers import BatchNormalization, GlobalAveragePooling2D, ZeroPadding2D, MaxPooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import numpy as np

### Import Composable class

Composable is a super (base) class that is inherited by our models which are coded using the Composable design pattern. It provides many abstracted functions in the construction and training of the models. Don't concern yourself about the details; it's not necessary to know how the underlying base works for the purpose of this lab.

In [None]:
# from models_c.py
class Composable(object):
    ''' Composable base (super) class for Models '''
    init_weights = 'he_normal'	# weight initialization
    reg          = None         # kernel regularizer
    relu         = None         # ReLU max value
    bias         = True         # whether to use bias in dense/conv layers

    def __init__(self, init_weights=None, reg=None, relu=None, bias=True):
        """ Constructor
            init_weights : kernel initializer
            reg          : kernel regularizer
            relu         : clip value for ReLU
            bias         : whether to use bias
        """
        if init_weights is not None:
            self.init_weights = init_weights
        if reg is not None:
            self.reg = reg
        if relu is not None:
            self.relu = relu
        if bias is not None:
            self.bias = bias

        # Feature maps encoding at the bottleneck layer in classifier (high dimensionality)
        self._encoding = None
        # Pooled and flattened encodings at the bottleneck layer (low dimensionality)
        self._embedding = None
        # Pre-activation conditional probabilities for classifier
        self._probabilities = None
        # Post-activation conditional probabilities for classifier
        self._softmax = None

        self._model = None

    @property
    def model(self):
        return self._model

    @model.setter
    def model(self, _model):
        self._model = _model

    @property
    def encoding(self):
        return self._encoding

    @encoding.setter
    def encoding(self, layer):
        self._encoding = layer

    @property
    def embedding(self):
        return self._embedding

    @embedding.setter
    def embedding(self, layer):
        self._embedding = layer

    @property
    def probabilities(self):
        return self._probabilities

    @probabilities.setter
    def probabilities(self, layer):
        self._probabilities = layer

    def classifier(self, x, n_classes, **metaparameters):
      """ Construct the Classifier Group 
          x         : input to the classifier
          n_classes : number of output classes
          pooling   : type of feature map pooling
      """
      if 'pooling' in metaparameters:
          pooling = metaparameters['pooling']
      else:
          pooling = GlobalAveragePooling2D
      if 'dropout' in metaparameters:
          dropout = metaparameters['dropout']
      else:
          dropout = None

      if pooling is not None:
          # Save the encoding layer (high dimensionality)
          self.encoding = x

          # Pooling at the end of all the convolutional groups
          x = pooling()(x)

          # Save the embedding layer (low dimensionality)
          self.embedding = x

      if dropout is not None:
          x = Dropout(dropout)(x)

      # Final Dense Outputting Layer for the outputs
      x = self.Dense(x, n_classes, use_bias=True, **metaparameters)
      
      # Save the pre-activation probabilities layer
      self.probabilities = x
      outputs = Activation('softmax')(x)
      # Save the post-activation probabilities layer
      self.softmax = outputs
      return outputs

    def Dense(self, x, units, activation=None, use_bias=True, **hyperparameters):
        """ Construct Dense Layer
            x           : input to layer
            activation  : activation function
            use_bias    : whether to use bias
            init_weights: kernel initializer
            reg         : kernel regularizer
        """
        if 'reg' in hyperparameters:
            reg = hyperparameters['reg']
        else:
            reg = self.reg
        if 'init_weights' in hyperparameters:
            init_weights = hyperparameters['init_weights']
        else:
            init_weights = self.init_weights
            
        x = Dense(units, activation, use_bias=use_bias,
                  kernel_initializer=init_weights, kernel_regularizer=reg)(x)
        return x

    def Conv2D(self, x, n_filters, kernel_size, strides=(1, 1), padding='valid', activation=None, **hyperparameters):
        """ Construct a Conv2D layer
            x           : input to layer
            n_filters   : number of filters
            kernel_size : kernel (filter) size
            strides     : strides
            padding     : how to pad when filter overlaps the edge
            activation  : activation function
            use_bias    : whether to include the bias
            init_weights: kernel initializer
            reg         : kernel regularizer
        """
        if 'reg' in hyperparameters:
            reg = hyperparameters['reg']
        else:
            reg = self.reg
        if 'init_weights' in hyperparameters:
            init_weights = hyperparameters['init_weights']
        else:
            init_weights = self.init_weights
        if 'bias' in hyperparameters:
            bias = hyperparameters['bias']
        else:
            bias = self.bias

        x = Conv2D(n_filters, kernel_size, strides=strides, padding=padding, activation=activation,
                   use_bias=bias, kernel_initializer=init_weights, kernel_regularizer=reg)(x)
        return x

    def Conv2DTranspose(self, x, n_filters, kernel_size, strides=(1, 1), padding='valid', activation=None, **hyperparameters):
        """ Construct a Conv2DTranspose layer
            x           : input to layer
            n_filters   : number of filters
            kernel_size : kernel (filter) size
            strides     : strides
            padding     : how to pad when filter overlaps the edge
            activation  : activation function
            use_bias    : whether to include the bias
            init_weights: kernel initializer
            reg         : kernel regularizer
        """
        if 'reg' in hyperparameters:
            reg = hyperparameters['reg']
        else:
            reg = self.reg
        if 'init_weights' in hyperparameters:
            init_weights = hyperparameters['init_weights']
        else:
            init_weights = self.init_weights 
        if 'bias' in hyperparameters:
            bias = hyperparameters['bias']
        else:
            bias = self.bias

        x = Conv2DTranspose(n_filters, kernel_size, strides=strides, padding=padding, activation=activation, 
                            use_bias=bias, kernel_initializer=init_weights, kernel_regularizer=reg)(x)
        return x



    def ReLU(self, x):
        """ Construct ReLU activation function
            x  : input to activation function
        """
        x = ReLU(self.relu)(x)
        return x


    def BatchNormalization(self, x, **params):
        """ Construct a Batch Normalization function
            x : input to function
        """
        x = BatchNormalization(epsilon=1.001e-5, **params)(x)
        return x

    ###
    # Preprocessing
    ###

    def normalization(self, x_train, x_test=None, centered=False):
        """ Normalize the input
            x_train : training images
            y_train : test images
        """
        if x_train.dtype == np.uint8:
            if centered:
                x_train = ((x_train - 1) / 127.5).astype(np.float32)
                if x_test:
                    x_test  = ((x_test  - 1) / 127.5).astype(np.float32)
            else:
                x_train = (x_train / 255.0).astype(np.float32)
                if x_test:
                    x_test  = (x_test  / 255.0).astype(np.float32)
        return x_train, x_test

    def standardization(self, x_train, x_test):
        """ Standardize the input
            x_train : training images
            x_test  : test images
        """
        self.mean = np.mean(x_train)
        self.std  = np.std(x_train)
        x_train = ((x_train - self.mean) / self.std).astype(np.float32)
        x_test  = ((x_test  - self.mean) / self.std).astype(np.float32)
        return x_train, x_test

    def label_smoothing(self, y_train, n_classes, factor=0.1):
        """ Convert a matrix of one-hot row-vector labels into smoothed versions. 
            y_train  : training labels
            n_classes: number of classes
            factor   : smoothing factor (between 0 and 1)
        """
        if 0 <= factor <= 1:
            # label smoothing ref: https://www.robots.ox.ac.uk/~vgg/rg/papers/reinception.pdf
            y_train *= 1 - factor
            y_train += factor / n_classes
        else:
            raise Exception('Invalid label smoothing factor: ' + str(factor))
        return y_train

    ###
    # Training
    ###

    def compile(self, loss='categorical_crossentropy', optimizer=Adam(lr=0.001, decay=1e-5), metrics=['acc']):
        """ Compile the model for training
            loss     : the loss function
            optimizer: the optimizer
            metrics  : metrics to report
        """
        self.model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
        
    def warmup_scheduler(self, epoch, lr):
        """ learning rate schedular for warmup training
            epoch : current epoch iteration
            lr    : current learning rate
        """
        if epoch == 0:
           return lr
        return epoch * self.w_lr / self.w_epochs

    def warmup(self, x_train, y_train, epochs=5, s_lr=1e-6, e_lr=0.001):
        """ Warmup for numerical stability
            x_train : training images
            y_train : training labels
            epochs  : number of epochs for warmup
            s_lr    : start warmup learning rate
            e_lr    : end warmup learning rate
        """
        print("*** Warmup")
        # Setup learning rate scheduler
        self.compile(optimizer=Adam(s_lr))
        lrate = LearningRateScheduler(self.warmup_scheduler, verbose=1)
        self.w_epochs = epochs
        self.w_lr     = e_lr - s_lr

        # Train the model
        self.model.fit(x_train, y_train, epochs=epochs, batch_size=32, verbose=1,
                       callbacks=[lrate])
        
    def cosine_decay(self, epoch, lr, alpha=0.0):
        """ Cosine Decay
        """
        cosine_decay = 0.5 * (1 + np.cos(np.pi * (self.e_steps * epoch) / self.t_steps))
        decayed = (1 - alpha) * cosine_decay + alpha
        return lr * decayed

    def training_scheduler(self, epoch, lr):
        """ Learning Rate scheduler for full-training
            epoch : epoch number
            lr    : current learning rate
        """
        # First epoch (not started) - do nothing
        if epoch == 0:
            return lr

        # Decay the learning rate
        if self.t_decay > 0:
            lr -= self.t_decay
            self.t_decay *= 0.9 # decrease the decay
        else:
            lr = self.cosine_decay(epoch, lr)
        return lr

    def training(self, x_train, y_train, epochs=10, batch_size=32, lr=0.001, decay=0):
        """ Full Training of the Model
            x_train    : training images
            y_train    : training labels
            epochs     : number of epochs
            batch_size : size of batch
            lr         : learning rate
            decay      : step-wise learning rate decay
        """

        # Check for hidden dropout layer in classifier
        for layer in self.model.layers:
            if isinstance(layer, Dropout):
                self.hidden_dropout = layer
                break    

        self.t_decay = decay
        self.e_steps = x_train.shape[0] // batch_size
        self.t_steps = self.e_steps * epochs
        self.compile(optimizer=Adam(lr=lr, decay=decay))

        lrate = LearningRateScheduler(self.training_scheduler, verbose=1)
        self.model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.1, verbose=1,
                       callbacks=[lrate])

### Get the Dataset

Load the dataset into memory as numpy arrays, and then normalize the image data (preprocessing).

In [None]:
from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = (x_train / 255.0).astype(np.float32)
x_test  = (x_test / 255.0).astype(np.float32)
print(x_train.shape)

y_train = to_categorical(y_train, 10)
y_test  = to_categorical(y_test, 10)

### Build the AutoEncoder for CIFAR-10

Now, let's build the AutoEncoder for the dataset.

In our example, the dimensionality of the input (3072 pixels) is reduced down to 512 at the bottleneck layer (ReLU (None, 4, 4, 32)).

In [None]:
# from autoencoder/autoencoder_c.py

class AutoEncoder(Composable):
    ''' Construct an AutoEncoder '''
    # metaparameter: number of filters per layer
    layers = [ {'n_filters': 64 }, { 'n_filters': 32 }, { 'n_filters': 16 } ]

    def __init__(self, layers=None, input_shape=(32, 32, 3),
                 init_weights='he_normal', reg=None, relu=None, bias=True):
        ''' Construct an AutoEncoder
            input_shape : input shape to the autoencoder
            layers      : the number of filters per layer
            init_weights: kernel initializer
            reg         : kernel regularizer
            relu        : clip value for ReLU
            bias        : whether to use bias
        '''
        # Configure base (super) class
        super().__init__(init_weights=init_weights, reg=reg, relu=relu, bias=bias)

        if layers is None:
           layers = self.layers

        # remember the layers
        self.layers = layers

        # remember the input shape
        self.input_shape = input_shape

        inputs = Input(input_shape)
        encoder = self.encoder(inputs, layers=layers)
        outputs = self.decoder(encoder, layers=layers)
        self._model = Model(inputs, outputs)

    def encoder(self, x, **metaparameters):
        ''' Construct the Encoder 
            x     : input to the encoder
            layers: number of filters per layer
        '''
        layers = metaparameters['layers']

        # Progressive Feature Pooling
        for layer in layers:
            n_filters = layer['n_filters']
            x = self.Conv2D(x, n_filters, (3, 3), strides=2, padding='same')
            x = self.BatchNormalization(x)
            x = self.ReLU(x)

        # The Encoding
        return x

    def decoder(self, x, init_weights=None, **metaparameters):
        ''' Construct the Decoder
            x     : input to the decoder
            layers: filters per layer
        '''
        layers = metaparameters['layers']

        # Progressive Feature Unpooling
        for _ in range(len(layers)-1, 0, -1):
            n_filters = layers[_]['n_filters']
            x = self.Conv2DTranspose(x, n_filters, (3, 3), strides=2, padding='same')
            x = self.BatchNormalization(x)
            x = self.ReLU(x)

        # Last unpooling and match shape to input
        x = self.Conv2DTranspose(x, 3, (3, 3), strides=2, padding='same')
        x = self.BatchNormalization(x)
        x = self.ReLU(x)

        # The decoded image
        return x

    def compile(self, optimizer='adam'):
        ''' Compile the model using Mean Square Error loss '''
        self._model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])

    def extract(self):
        ''' Extract the pretrained encoder
        '''
        # Get the trained weights from the autoencoder
        weights = self._model.get_weights()

        # Extract out the weights for just the encoder  (6 sets per layer)
        encoder_weights = weights[0 : int((6 * len(self.layers)))]
  
        # Construct a copy the encoder
        inputs = Input(self.input_shape)
        outputs = self.encoder(inputs, layers=self.layers)
        encoder = Model(inputs, outputs)

        # Initialize the encoder with the pretrained weights
        encoder.set_weights(encoder_weights)

        return encoder

In [None]:
autoencoder = AutoEncoder(input_shape=(32, 32, 3), layers=[{'n_filters': 64}, {'n_filters': 32}, {'n_filters': 32}])
autoencoder.model.summary()

### Warmup Training for AutoEncoder

Let's numerical stabilize the weights (which are initialized from a random draw from a random distribution (i.e., he_normal) using warmup.

We will start with a very low learning rate (1e-6) and over five epochs incremently step it up to out target learning rate (0.001).

In [None]:
autoencoder.warmup(x_train, x_train, epochs=5, s_lr=1e-6, e_lr=0.001)

### Train the AutoEncoder

Let's now fully train the autoencoder on our image data for 20 epochs -- but you may choose to use more.

*When using colab with runtime=GPU, this takes about 4 minutes*
*You should see a validation accuracy ~80%*

In [None]:
autoencoder.compile(optimizer='adam')
autoencoder.training(x_train, x_train, epochs=20, batch_size=32)

Let's see what the accuracy is on the test (holdout) data.

In [None]:
autoencoder.model.evaluate(x_test, x_test)

### Extract the pre-trained Encoder

Next, we will extract from the pretrained encoder from our trained autoencoder.

In [None]:
encoder = autoencoder.extract()

### Encode the CIFAR-10 Training Data

Next, we will encode the higher dimensional training data (*x_train*) into the lower dimensional encoding (*e_train*).

In [None]:
e_train = encoder.predict(x_train)

### Build mini-ResNet with Encoding as input (no stem convolution)

Let's now use the composable design pattern for ResNet to build a mini-resnet model (*e_resnet*).

In [None]:
class ResNetV2(Composable):
    """ Construct a Residual Convolution Network Network V2 """
    # Meta-parameter: list of groups: number of filters and number of blocks
    groups = { 50 : [ { 'n_filters' : 64, 'n_blocks': 3 },
                      { 'n_filters': 128, 'n_blocks': 4 },
                      { 'n_filters': 256, 'n_blocks': 6 },
                      { 'n_filters': 512, 'n_blocks': 3 } ],            # ResNet50
               101: [ { 'n_filters' : 64, 'n_blocks': 3 },
                      { 'n_filters': 128, 'n_blocks': 4 },
                      { 'n_filters': 256, 'n_blocks': 23 },
                      { 'n_filters': 512, 'n_blocks': 3 } ],            # ResNet101
               152: [ { 'n_filters' : 64, 'n_blocks': 3 },
                      { 'n_filters': 128, 'n_blocks': 8 },
                      { 'n_filters': 256, 'n_blocks': 36 },
                      { 'n_filters': 512, 'n_blocks': 3 } ]             # ResNet152
             }

    def __init__(self, n_layers, input_shape=(224, 224, 3), n_classes=1000, 
                 reg=l2(0.001), relu=None, init_weights='he_normal', bias=False):
        """ Construct a Residual Convolutional Neural Network V2
            n_layers    : number of layers
            input_shape : input shape
            n_classes   : number of output classes
            reg         : kernel regularizer
            init_weights: kernel initializer
            relu        : max value for ReLU
            bias        : whether to include a bias in the dense/conv layers
        """
        # Configure base (super) class
        super().__init__(reg=reg, init_weights=init_weights, relu=relu, bias=bias)

        # predefined
        if isinstance(n_layers, int):
            if n_layers not in [50, 101, 152]:
                raise Exception("ResNet: Invalid value for n_layers")
            groups = self.groups[n_layers]
        # user defined
        else:
            groups = n_layers

        # The input tensor
        inputs = Input(input_shape)

        # The stem convolutional group
        x = self.stem(inputs)

        # The learner
        x = self.learner(x, groups=groups)

        # The classifier 
        # Add hidden dropout for training-time regularization
        outputs = self.classifier(x, n_classes, dropout=0.0)

        # Instantiate the Model
        self._model = Model(inputs, outputs)

    def stem(self, inputs):
        """ Construct the Stem Convolutional Group 
            inputs : the input vector
        """
        # The 224x224 images are zero padded (black - no signal) to be 230x230 images prior to the first convolution
        x = ZeroPadding2D(padding=(3, 3))(inputs)
    
        # First Convolutional layer uses large (coarse) filter
        x = self.Conv2D(x, 64, (7, 7), strides=(2, 2), padding='valid')
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
    
        # Pooled feature maps will be reduced by 75%
        x = ZeroPadding2D(padding=(1, 1))(x)
        x = MaxPooling2D((3, 3), strides=(2, 2))(x)
        return x

    def learner(self, x, **metaparameters):
        """ Construct the Learner
            x     : input to the learner
            groups: list of groups: number of filters and blocks
        """
        groups = metaparameters['groups']

        # First Residual Block Group (not strided)
        x = self.group(x, strides=(1, 1), **groups.pop(0))

        # Remaining Residual Block Groups (strided)
        for group in groups:
            x = self.group(x, **group)
        return x
    
    def group(self, x, strides=(2, 2), **metaparameters):
        """ Construct a Residual Group
            x         : input into the group
            strides   : whether the projection block is a strided convolution
            n_blocks  : number of residual blocks with identity link
        """
        n_blocks  = metaparameters['n_blocks']

        # Double the size of filters to fit the first Residual Block
        x = self.projection_block(x, strides=strides, **metaparameters)

        # Identity residual blocks
        for _ in range(n_blocks):
            x = self.identity_block(x, **metaparameters)
        return x

    def identity_block(self, x, **metaparameters):
        """ Construct a Bottleneck Residual Block with Identity Link
            x        : input into the block
            n_filters: number of filters
        """
        n_filters = metaparameters['n_filters']
        del metaparameters['n_filters']
    
        # Save input vector (feature maps) for the identity link
        shortcut = x
    
        ## Construct the 1x1, 3x3, 1x1 convolution block
    
        # Dimensionality reduction
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
        x = self.Conv2D(x, n_filters, (1, 1), strides=(1, 1), **metaparameters)

        # Bottleneck layer
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
        x = self.Conv2D(x, n_filters, (3, 3), strides=(1, 1), padding="same", **metaparameters)

        # Dimensionality restoration - increase the number of output filters by 4X
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
        x = self.Conv2D(x, n_filters * 4, (1, 1), strides=(1, 1), **metaparameters)

        # Add the identity link (input) to the output of the residual block
        x = Add()([shortcut, x])
        return x

    def projection_block(self, x, strides=(2,2), **metaparameters):
        """ Construct a Bottleneck Residual Block of Convolutions with Projection Shortcut
            Increase the number of filters by 4X
            x        : input into the block
            strides  : whether the first convolution is strided
            n_filters: number of filters
            reg      : kernel regularizer
        """
        n_filters = metaparameters['n_filters']
        del metaparameters['n_filters']

        # Construct the projection shortcut
        # Increase filters by 4X to match shape when added to output of block
        shortcut = self.BatchNormalization(x)
        shortcut = self.Conv2D(shortcut, 4 * n_filters, (1, 1), strides=strides, **metaparameters)

        ## Construct the 1x1, 3x3, 1x1 convolution block
    
        # Dimensionality reduction
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
        x = self.Conv2D(x, n_filters, (1, 1), strides=(1,1), **metaparameters)

        # Bottleneck layer
        # Feature pooling when strides=(2, 2)
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
        x = self.Conv2D(x, n_filters, (3, 3), strides=strides, padding='same', **metaparameters)

        # Dimensionality restoration - increase the number of filters by 4X
        x = self.BatchNormalization(x)
        x = self.ReLU(x)
        x = self.Conv2D(x, 4 * n_filters, (1, 1), strides=(1, 1), **metaparameters)

        # Add the projection shortcut to the output of the residual block
        x = Add()([x, shortcut])
        return x
    
groups = [ { 'n_filters' : 64, 'n_blocks': 1 },
           { 'n_filters': 128, 'n_blocks': 2 },
           { 'n_filters': 256, 'n_blocks': 2 }]
e_resnet = ResNetV2(groups, input_shape=(4, 4, 32), n_classes=10)
e_resnet.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
e_resnet.model.summary()

### Train the Model

Let's now train our mini-resnet model (*e_resnet*) with the encoded training data (*e_train*).

*When using colab with runtime=GPU, this takes about 4 minutes*

In [None]:
e_resnet.training(e_train, y_train, epochs=20, batch_size=32)

### Evaluate the Model

Let's convert our test (holdout) data into an encoding (*e_test*) using our pretrained encoder (*encoder*), and evaluate our model (*e_resnet*).

In [None]:
e_test = encoder.predict(x_test)
e_resnet.model.evaluate(e_test, y_test)

## Next

If you followed this lab as-is, our encoded model overfits the encoded training data, and plateaus on accuracy on the encoded test data at ~61% (50% with V2).

Think how you can modify this experiment, to meet the objectives.