Pretraining Autoencoder for Downstream Task
=====

## Overview 

In this notebook, we train a neural network with mini VGG layers as a baseline for the experimental mini VGG-based autoencoder + neural network.
## Setup 

We setup our dependencies.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

__author__ = 'Abien Fred Agarap'
__version__ = '1.0.0'

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist

Set the memory growth of GPU.

In [2]:
tf.config.experimental.set_memory_growth(
    tf.config.experimental.list_physical_devices('GPU')[0],
    True
)

Set the random number generator seed value.

In [3]:
SEED = 42
tf.random.set_seed(SEED)
np.random.seed(SEED)

We set the batch size and epochs.

In [4]:
batch_size = 512
epochs = 100

## Dataset

We load the MNIST classification dataset.

In [5]:
(train_features, train_labels), (test_features, test_labels) = mnist.load_data()

We preprocess the MNIST dataset.

In [6]:
train_features = train_features.reshape(-1, 28, 28, 1)

train_features = train_features.astype('float32')
train_features = train_features / 255.

test_features = test_features.reshape(-1, 28, 28, 1)

test_features = test_features.astype('float32')
test_features = test_features / 255.

train_labels = tf.one_hot(train_labels, len(np.unique(train_labels)))
test_labels = tf.one_hot(test_labels, len(np.unique(test_labels)))

Create the `tf.data.Dataset` object for training and evaluation.

In [7]:
train_dataset = tf.data.Dataset.from_tensor_slices((train_features, train_labels))
train_dataset = train_dataset.batch(batch_size)
train_dataset = train_dataset.prefetch(batch_size * 4)
train_dataset = train_dataset.shuffle(train_features.shape[0])

test_dataset = tf.data.Dataset.from_tensor_slices((test_features, test_labels))
test_dataset = train_dataset.batch(batch_size)
test_dataset = train_dataset.prefetch(batch_size * 4)
test_dataset = train_dataset.shuffle(test_features.shape[0])

Build a mini VGG neural network.

In [8]:
class NN(tf.keras.Model):
    def __init__(self, **kwargs):
        super(NN, self).__init__()
        self.input_layer = tf.keras.layers.InputLayer(input_shape=kwargs["input_shape"])
        self.conv_1_layer_1 = tf.keras.layers.Conv2D(
            filters=32, kernel_size=(3, 3), activation=tf.nn.relu
        )
        self.conv_1_layer_2 = tf.keras.layers.Conv2D(
            filters=32, kernel_size=(3, 3), activation=tf.nn.relu
        )
        self.pool_layer_1 = tf.keras.layers.MaxPooling2D((2, 2))
        self.conv_2_layer_1 = tf.keras.layers.Conv2D(
            filters=64, kernel_size=(3, 3), activation=tf.nn.relu
        )
        self.conv_2_layer_2 = tf.keras.layers.Conv2D(
            filters=64, kernel_size=(3, 3), activation=tf.nn.sigmoid
        )
        self.pool_layer_2 = tf.keras.layers.MaxPooling2D((2, 2))
        self.flatten = tf.keras.layers.Flatten()
        self.dense_layer = tf.keras.layers.Dense(units=512, activation=tf.nn.relu)
        self.dropout = tf.keras.layers.Dropout(rate=2e-1)
        self.output_layer = tf.keras.layers.Dense(units=10, activation=tf.nn.softmax)
        
    def call(self, features):
        features = self.input_layer(features)
        activation = self.conv_1_layer_1(features)
        activation = self.conv_1_layer_2(activation)
        activation = self.pool_layer_1(activation)
        activation = self.conv_2_layer_1(activation)
        activation = self.conv_2_layer_2(activation)
        activation = self.pool_layer_2(activation)
        activation = self.flatten(activation)
        activation = self.dense_layer(activation)
        activation = self.dropout(activation)
        outputs = self.output_layer(activation)
        return outputs

Instantiate the neural network.

In [9]:
clf = NN(input_shape=(28, 28, 1))

Compile the neural network for training and inference.

In [10]:
clf.compile(loss=tf.losses.categorical_crossentropy,
            optimizer=tf.optimizers.SGD(
                learning_rate=1e-2, momentum=9e-1, decay=1e-6
                ),
            metrics=['accuracy'])

Train the model.

In [11]:
clf.fit(train_dataset, epochs=epochs, verbose=2)

Epoch 1/100
118/118 - 12s - loss: 2.1805 - accuracy: 0.2315
Epoch 2/100
118/118 - 14s - loss: 0.4376 - accuracy: 0.8733
Epoch 3/100
118/118 - 20s - loss: 0.2118 - accuracy: 0.9375
Epoch 4/100
118/118 - 19s - loss: 0.1544 - accuracy: 0.9546
Epoch 5/100
118/118 - 21s - loss: 0.1256 - accuracy: 0.9632
Epoch 6/100
118/118 - 19s - loss: 0.1061 - accuracy: 0.9686
Epoch 7/100
118/118 - 18s - loss: 0.0938 - accuracy: 0.9712
Epoch 8/100
118/118 - 18s - loss: 0.0839 - accuracy: 0.9750
Epoch 9/100
118/118 - 18s - loss: 0.0774 - accuracy: 0.9769
Epoch 10/100
118/118 - 18s - loss: 0.0705 - accuracy: 0.9789
Epoch 11/100
118/118 - 18s - loss: 0.0661 - accuracy: 0.9805
Epoch 12/100
118/118 - 18s - loss: 0.0608 - accuracy: 0.9819
Epoch 13/100
118/118 - 18s - loss: 0.0576 - accuracy: 0.9830
Epoch 14/100
118/118 - 18s - loss: 0.0546 - accuracy: 0.9841
Epoch 15/100
118/118 - 18s - loss: 0.0511 - accuracy: 0.9850
Epoch 16/100
118/118 - 18s - loss: 0.0488 - accuracy: 0.9855
Epoch 17/100
118/118 - 18s - loss

<tensorflow.python.keras.callbacks.History at 0x7fb7901b5cf8>

Evaluate the performance.

In [12]:
clf.evaluate(test_dataset, verbose=2)

118/118 - 6s - loss: 0.0033 - accuracy: 0.9997


[0.003284570782236218, 0.99965]

Perturb the test data with noise from a Normal distribution having a standard deviation of `5e-2`.

In [13]:
test_features += tf.random.normal(stddev=5e-2, shape=test_features.shape)

Evaluate the model on the perturbed test data.

In [14]:
clf.evaluate(test_features, test_labels, batch_size=512, verbose=2)

10000/1 - 2s - loss: 0.0432 - accuracy: 0.9905


[0.030686995440721513, 0.9905]

Reload the test features, and increase the standard deviation of the Normal distribution from which we shall get the noise from for perturbing the test data.

In [15]:
_, (test_features, _) = mnist.load_data()

test_features = test_features.astype('float32') / 255.
test_features = test_features.reshape(-1, 28, 28, 1)
test_features += tf.random.normal(stddev=5e-1, shape=test_features.shape)

Evaluate on the new perturbed test data.

In [16]:
clf.evaluate(test_features, test_labels, batch_size=batch_size, verbose=2)

10000/1 - 1s - loss: 0.3376 - accuracy: 0.9146


[0.2891769985198975, 0.9146]

In [18]:
clf.save_weights('../assets/export/baseline/mnist/100_epochs', save_format='tf')