In [None]:
# Copyright 2019 Google LLC
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Composable "Design Pattern" for AutoML friendly models

## Community Lab 1: Training Encoder for CNN

### Objective

To replace a traditional "stem convolution group" of higher input dimensionality with lower dimensionality encoding, learned from first training the dataset on an autoencoder. Goal is that by using a lower dimensionality encoding, one can substantially increase training time of a model.

*Question*: Can one achieve the same accuracy as using the original input image?

*Question*: How fast can we speed up training?

### Approach

We will use the composable design pattern, and prebuilt units from the Google Cloud AI Developer Relations repo: [Model Zoo](https://github.com/GoogleCloudPlatform/keras-idiomatic-programmer/tree/master/zoo)

If you are not familiar with the Composable design pattern, we recommemd you review the [ResNet](https://github.com/GoogleCloudPlatform/keras-idiomatic-programmer/tree/master/zoo/resnet) model in our zoo. Then review the [AutoEncoder](https://github.com/GoogleCloudPlatform/keras-idiomatic-programmer/tree/master/zoo/autoencoder) model.

We recommend a constant set for hyperparameters, where batch_size is 32 and initial learning rate is 0.001 -- but you may use any value for hyperparameters you prefer.

We will use the metaparameters feature in the composable design pattern for the macro architecture search -- sort of a 'human assisted AutoML'.

We recommend using a warmup training to find most optimal initialization of weights.

### Reporting Findings

You can contact us on your findings via the twitter account: @andrewferlitsch

### Dataset

In this notebook, we use the CIFAR-10 datasets which consist of images 32x32x3 for 10 classes -- but you may use any dataset you prefer.

### Steps

1. Build and Train an AutoEncoder for CIFAR10 (or your dataset).

2. Extract the pretrained Encoder network from the trained AutoEncoder.

3. Preprocess the training and test data with the Encoder.

4. Build a composable model for CIFAR10 using the Encoder embedding.

5. Use warmup to initialize the weights on the model.

6. Train the model with the encoded training set.

7. Evaluate the model with the endoded test set.

8. Repeat making macro architecture modifications to the AutoEncoder and/or model.

## Lab

### Imports

In [None]:
import tensorflow as tf
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Conv2D, Flatten, Conv2DTranspose, ReLU, Add, Dense, Dropout
from tensorflow.keras.layers import BatchNormalization, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.datasets import cifar10
import numpy as np

### Get the Dataset

Load the dataset into memory as numpy arrays, and then normalize the image data (preprocessing).

In [None]:
from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = (x_train / 255.0).astype(np.float32)
x_test  = (x_test / 255.0).astype(np.float32)
print(x_train.shape)

### Build the AutoEncoder for CIFAR-10

Now, let's build the AutoEncoder for the dataset.

In our example, the dimensionality of the input (3072 pixels) is reduced down to 512 at the bottleneck layer (ReLU (None, 4, 4, 32)).

In [None]:
# from autoencoder/autoencoder_c.py

class AutoEncoder(object):
    ''' Construct an AutoEncoder '''
    # metaparameter: number of filters per layer
    layers = [ {'n_filters': 64 }, { 'n_filters': 32 }, { 'n_filters': 16 } ]

    input_shape=(32, 32, 3)

    _model = None
    init_weights = 'he_normal'
    reg = None

    def __init__(self, layers=None, input_shape=(32, 32, 3)):
        ''' Construct an AutoEncoder
            input_shape : input shape to the autoencoder
            layers      : the number of filters per layer
        '''
        if layers is None:
           layers = AutoEncoder.layers

        # remember the layers
        self.layers = layers

        # remember the input shape
        self.input_shape = input_shape

        inputs = Input(input_shape)
        encoder = AutoEncoder.encoder(inputs, layers=layers)
        outputs = AutoEncoder.decoder(encoder, layers=layers)
        self._model = Model(inputs, outputs)

    @property
    def model(self):
        return self._model

    @model.setter
    def model(self, _model):
        self._model = _model

    @staticmethod
    def encoder(x, init_weights=None, **metaparameters):
        ''' Construct the Encoder 
            x     : input to the encoder
            layers: number of filters per layer
        '''
        layers = metaparameters['layers']

        if init_weights is None:
            init_weights = AutoEncoder.init_weights

        # Progressive Feature Pooling
        for layer in layers:
            n_filters = layer['n_filters']
            x = Conv2D(n_filters, (3, 3), strides=2, padding='same', kernel_initializer=init_weights,
                       kernel_regularizer=AutoEncoder.reg)(x)
            x = BatchNormalization()(x)
            x = ReLU()(x)

        # The Encoding
        return x

    @staticmethod
    def decoder(x, init_weights=None, **metaparameters):
        ''' Construct the Decoder
            x     : input to the decoder
            layers: number of filters per layer
        '''
        layers = metaparameters['layers']

        if init_weights is None:
            init_weights = AutoEncoder.init_weights

        # Progressive Feature Unpooling
        for _ in range(len(layers)-1, 0, -1):
            n_filters = layers[_]['n_filters']
            x = Conv2DTranspose(n_filters, (3, 3), strides=2, padding='same', kernel_initializer=init_weights,
                                kernel_regularizer=AutoEncoder.reg)(x)
            x = BatchNormalization()(x)
            x = ReLU()(x)

        # Last unpooling and match shape to input
        x = Conv2DTranspose(3, (3, 3), strides=2, padding='same', kernel_initializer=init_weights,
                            kernel_regularizer=AutoEncoder.reg)(x)
        x = BatchNormalization()(x)
        x = ReLU()(x)

        # The decoded image
        return x

    def compile(self, optimizer='adam'):
        ''' Compile the model using Mean Square Error loss '''
        self._model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])

    def extract(self):
        ''' Extract the pretrained encoder
        '''
        # Get the trained weights from the autoencoder
        weights = self._model.get_weights()

        # Extract out the weights for just the encoder  (6 sets per layer)
        encoder_weights = weights[0 : int((6 * len(self.layers)))]
  
        # Construct a copy the encoder
        inputs = Input(self.input_shape)
        outputs = self.encoder(inputs, layers=self.layers)
        encoder = Model(inputs, outputs)

        # Initialize the encoder with the pretrained weights
        encoder.set_weights(encoder_weights)

        return encoder

In [None]:
autoencoder = AutoEncoder(input_shape=(32, 32, 3), layers=[{'n_filters': 64}, {'n_filters': 32}, {'n_filters': 32}])
autoencoder.model.summary()

### Warmup Training for AutoEncoder

Now let's find the best initialization of the encoder. We will do five separate draws from a random He Normal distribution -- but you can use more draws if you want to.

For each draw, we will use a small subset of the training data (100 batches for 3200 images), a very low learning rate of 0.00001, and three epochs.

In [None]:
import random

WARMUP_LR=0.00001 # The warmup learning rate

models = []
#  We will warmup train 5 instances of the non-compiled model.
for _ in range(5):
    warmup = AutoEncoder(input_shape=(32, 32, 3), layers=[{'n_filters': 64}, {'n_filters': 32}, {'n_filters': 32}])
    
    #  Compile the model, which will initialize the weights.
    warmup.compile(optimizer=Adam(lr=WARMUP_LR))
    
    w_train = x_train[0:32 * 100]

    #  Do a brief warmup training.
    history = warmup.model.fit(w_train, w_train, epochs=3, verbose=1, batch_size=32, validation_split=0.1)
    models.append((warmup, history))


### Pick best initialized model

When completed, we will review the warmup history for each model instance, and use your judgement which draw (model instance) will give you the best training result (i.e., 'the winning ticket').

In [None]:
autoencoder = models[2][0]

### Train the AutoEncoder

Let's now fully train the autoencoder on our image data for 20 epochs -- but you may choose to use more.

*When using colab with runtime=GPU, this takes about 4 minutes*
*You should see a validation accuracy ~80%*

In [None]:
autoencoder.compile(optimizer='adam')
autoencoder.model.fit(x_train, x_train, epochs=20, batch_size=32, validation_split=0.1, verbose=1)

Let's see what the accuracy is on the test (holdout) data.

In [None]:
autoencoder.model.evaluate(x_test, x_test)

### Extract the pre-trained Encoder

Next, we will extract from the pretrained encoder from our trained autoencoder.

In [None]:
encoder = autoencoder.extract()

### Encode the CIFAR-10 Training Data

Next, we will encode the higher dimensional training data (*x_train*) into the lower dimensional encoding (*e_train*).

In [None]:
e_train = encoder.predict(x_train)

### Build mini-ResNet with Encoding as input (no stem convolution)

Let's now use the composable design pattern for ResNet to build a mini-resnet model (*e_resnet*).

In [None]:
# from resnet/resnet_v2_c.py

class ResNetV2(object):
    """ Construct a Residual Convolution Network Network V2 """
    # Meta-parameter: list of groups: number of filters and number of blocks
    groups = { 50 : [ { 'n_filters' : 64, 'n_blocks': 3 },
                      { 'n_filters': 128, 'n_blocks': 4 },
                      { 'n_filters': 256, 'n_blocks': 6 },
                      { 'n_filters': 512, 'n_blocks': 3 } ],            # ResNet50
               101: [ { 'n_filters' : 64, 'n_blocks': 3 },
                      { 'n_filters': 128, 'n_blocks': 4 },
                      { 'n_filters': 256, 'n_blocks': 23 },
                      { 'n_filters': 512, 'n_blocks': 3 } ],            # ResNet101
               152: [ { 'n_filters' : 64, 'n_blocks': 3 },
                      { 'n_filters': 128, 'n_blocks': 8 },
                      { 'n_filters': 256, 'n_blocks': 36 },
                      { 'n_filters': 512, 'n_blocks': 3 } ]             # ResNet152
             }
    init_weights = 'he_normal'
    reg=l2(0.001)
    _model = None

    def __init__(self, n_layers, input_shape=(224, 224, 3), n_classes=1000):
        """ Construct a Residual Convolutional Neural Network V2
            n_layers   : number of layers
            input_shape: input shape
            n_classes  : number of output classes
        """
        # predefined
        if isinstance(n_layers, int):
            if n_layers not in [50, 101, 152]:
                raise Exception("ResNet: Invalid value for n_layers")
            groups = self.groups[n_layers]
        # user defined
        else:
            groups = n_layers

        # The input tensor
        inputs = Input(input_shape)

        # The stem convolutional group
        x = self.stem(inputs)

        # The learner
        x = self.learner(x, groups=groups)

        # The classifier 
        outputs = self.classifier(x, n_classes)

        # Instantiate the Model
        self._model = Model(inputs, outputs)

    @property
    def model(self):
        return self._model

    @model.setter
    def model(self, _model):
        self._model = _model

    def stem(self, inputs):
        """ Construct the Stem Convolutional Group 
            inputs : the input vector
        """
        # The 224x224 images are zero padded (black - no signal) to be 230x230 images prior to the first convolution
        x = ZeroPadding2D(padding=(3, 3))(inputs)
    
        # First Convolutional layer uses large (coarse) filter
        x = Conv2D(64, (7, 7), strides=(2, 2), padding='valid', use_bias=False, 
                   kernel_initializer=self.init_weights, kernel_regularizer=self.reg)(x)
        x = BatchNormalization()(x)
        x = ReLU()(x)
    
        # Pooled feature maps will be reduced by 75%
        x = ZeroPadding2D(padding=(1, 1))(x)
        x = MaxPooling2D((3, 3), strides=(2, 2))(x)
        return x

    def learner(self, x, **metaparameters):
        """ Construct the Learner
            x     : input to the learner
            groups: list of groups: number of filters and blocks
        """
        groups = metaparameters['groups']

        # First Residual Block Group (not strided)
        x = ResNetV2.group(x, strides=(1, 1), **groups.pop(0))

        # Remaining Residual Block Groups (strided)
        for group in groups:
            x = ResNetV2.group(x, **group)
        return x
    
    @staticmethod
    def group(x, strides=(2, 2), init_weights=None, **metaparameters):
        """ Construct a Residual Group
            x         : input into the group
            strides   : whether the projection block is a strided convolution
            n_filters : number of filters for the group
            n_blocks  : number of residual blocks with identity link
        """
        n_blocks  = metaparameters['n_blocks']

        # Double the size of filters to fit the first Residual Group
        x = ResNetV2.projection_block(x, strides=strides, init_weights=init_weights, **metaparameters)

        # Identity residual blocks
        for _ in range(n_blocks):
            x = ResNetV2.identity_block(x, init_weights=init_weights, **metaparameters)
        return x

    @staticmethod
    def identity_block(x, init_weights=None, **metaparameters):
        """ Construct a Bottleneck Residual Block with Identity Link
            x        : input into the block
            n_filters: number of filters
            reg      : kernel regularizer
        """
        n_filters = metaparameters['n_filters']
        if 'reg' in metaparameters:
            reg = metaparameters['reg']
        else:
            reg = ResNetV2.reg

        if init_weights is None:
            init_weights = ResNetV2.init_weights
    
        # Save input vector (feature maps) for the identity link
        shortcut = x
    
        ## Construct the 1x1, 3x3, 1x1 convolution block
    
        # Dimensionality reduction
        x = BatchNormalization()(x)
        x = ReLU()(x)
        x = Conv2D(n_filters, (1, 1), strides=(1, 1), use_bias=False, 
                   kernel_initializer=init_weights, kernel_regularizer=reg)(x)

        # Bottleneck layer
        x = BatchNormalization()(x)
        x = ReLU()(x)
        x = Conv2D(n_filters, (3, 3), strides=(1, 1), padding="same", use_bias=False, 
                   kernel_initializer=init_weights, kernel_regularizer=reg)(x)

        # Dimensionality restoration - increase the number of output filters by 4X
        x = BatchNormalization()(x)
        x = ReLU()(x)
        x = Conv2D(n_filters * 4, (1, 1), strides=(1, 1), use_bias=False, 
                   kernel_initializer=init_weights, kernel_regularizer=reg)(x)

        # Add the identity link (input) to the output of the residual block
        x = Add()([shortcut, x])
        return x

    @staticmethod
    def projection_block(x, strides=(2,2), init_weights=None, **metaparameters):
        """ Construct a Bottleneck Residual Block of Convolutions with Projection Shortcut
            Increase the number of filters by 4X
            x        : input into the block
            strides  : whether the first convolution is strided
            n_filters: number of filters
            reg      : kernel regularizer
        """
        n_filters = metaparameters['n_filters']
        if 'reg' in metaparameters:
            reg = metaparameters['reg']
        else:
            reg = ResNetV2.reg

        if init_weights is None:
            init_weights = ResNetV2.init_weights

        # Construct the projection shortcut
        # Increase filters by 4X to match shape when added to output of block
        shortcut = BatchNormalization()(x)
        shortcut = Conv2D(4 * n_filters, (1, 1), strides=strides, use_bias=False, 
                          kernel_initializer=init_weights, kernel_regularizer=reg)(shortcut)

        ## Construct the 1x1, 3x3, 1x1 convolution block
    
        # Dimensionality reduction
        x = BatchNormalization()(x)
        x = ReLU()(x)
        x = Conv2D(n_filters, (1, 1), strides=(1,1), use_bias=False, 
                   kernel_initializer=init_weights, kernel_regularizer=reg)(x)

        # Bottleneck layer
        # Feature pooling when strides=(2, 2)
        x = BatchNormalization()(x)
        x = ReLU()(x)
        x = Conv2D(n_filters, (3, 3), strides=strides, padding='same', use_bias=False, 
                   kernel_initializer=init_weights, kernel_regularizer=reg)(x)

        # Dimensionality restoration - increase the number of filters by 4X
        x = BatchNormalization()(x)
        x = ReLU()(x)
        x = Conv2D(4 * n_filters, (1, 1), strides=(1, 1), use_bias=False, 
                   kernel_initializer=init_weights, kernel_regularizer=reg)(x)

        # Add the projection shortcut to the output of the residual block
        x = Add()([x, shortcut])
        return x

    def classifier(self, x, n_classes):
        """ Construct the Classifier Group 
            x         : input to the classifier
            n_classes : number of output classes
        """
        # Pool at the end of all the convolutional residual blocks
        x = GlobalAveragePooling2D()(x)

        # Final Dense Outputting Layer for the outputs
        outputs = Dense(n_classes, activation='softmax', 
                        kernel_initializer=self.init_weights, kernel_regularizer=self.reg)(x)
        return outputs

    
# Encoded Input (no stem)
inputs = Input((4, 4, 32))

# Learner
# Residual group: 2 blocks, 64 filters
# Residual group: 1 blocks, 128 filters
x = ResNetV2.group(inputs, n_blocks=2, n_filters=64)
x = ResNetV2.group(x, n_blocks=1, n_filters=128)

# Classifier
x = GlobalAveragePooling2D()(x)
outputs = Dense(10, activation='softmax')(x)
e_resnet = Model(inputs, outputs)
e_resnet.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['acc'])
e_resnet.summary()

### Train the Model

Let's now train our mini-resnet model (*e_resnet*) with the encoded training data (*e_train*).

*When using colab with runtime=GPU, this takes about 4 minutes*

In [None]:
e_resnet.fit(e_train, y_train, epochs=20, batch_size=32, verbose=1, validation_split=0.1)

### Evaluate the Model

Let's convert our test (holdout) data into an encoding (*e_test*) using our pretrained encoder (*encoder*), and evaluate our model (*e_resnet*).

In [None]:
e_test = encoder.predict(x_test)
e_resnet.evaluate(e_test, y_test)

## Next

If you followed this lab as-is, our encoded model overfits the encoded training data, and plateaus on accuracy on the encoded test data at ~61% (50% with V1).

Think how you can modify this experiment, to meet the objectives.