Pretraining Autoencoder for Downstream Task
=====

## Overview 

In this notebook, we train a neural network with mini VGG layers as a baseline for the experimental mini VGG-based autoencoder + neural network.
## Setup 

We setup our dependencies.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

__author__ = 'Abien Fred Agarap'
__version__ = '1.0.0'

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tf.vgg_ae import CAE

Set the memory growth of GPU.

In [2]:
tf.config.experimental.set_memory_growth(
    tf.config.experimental.list_physical_devices('GPU')[0],
    True
)

Set the random number generator seed value.

In [3]:
SEED = 42
tf.random.set_seed(SEED)
np.random.seed(SEED)

We set the batch size and epochs.

In [4]:
batch_size = 512
epochs = 100

## Dataset

We load the MNIST classification dataset.

In [5]:
(train_features, train_labels), (test_features, test_labels) = mnist.load_data()

We further split the test data into validation and test.

In [6]:
validation_features = test_features[:5000]
validation_labels = test_labels[:5000]

test_features = test_features[5000:]
test_labels = test_labels[5000:]

We preprocess the MNIST dataset.

In [7]:
train_features = train_features.reshape(-1, 28, 28, 1)
train_features = train_features.astype('float32')
train_features = train_features / 255.

validation_features = validation_features.reshape(-1, 28, 28, 1)
validation_features = validation_features.astype('float32')
validation_features = validation_features / 255.

test_features = test_features.reshape(-1, 28, 28, 1)
test_features = test_features.astype('float32')
test_features = test_features / 255.

train_labels = tf.one_hot(train_labels, len(np.unique(train_labels)))
validation_labels = tf.one_hot(validation_labels, len(np.unique(validation_labels)))
test_labels = tf.one_hot(test_labels, len(np.unique(test_labels)))

Create the `tf.data.Dataset` object for training and evaluation.

In [8]:
train_dataset = tf.data.Dataset.from_tensor_slices((train_features, train_labels))
train_dataset = train_dataset.batch(batch_size)
train_dataset = train_dataset.prefetch(batch_size * 4)
train_dataset = train_dataset.shuffle(train_features.shape[0])

validation_dataset = tf.data.Dataset.from_tensor_slices((validation_features, validation_labels))
validation_dataset = validation_dataset.batch(batch_size)
validation_dataset = validation_dataset.prefetch(batch_size * 4)
validation_dataset = validation_dataset.shuffle(validation_features.shape[0])

test_dataset = tf.data.Dataset.from_tensor_slices((test_features, test_labels))
test_dataset = train_dataset.batch(batch_size)
test_dataset = train_dataset.prefetch(batch_size * 4)
test_dataset = train_dataset.shuffle(test_features.shape[0])

Instantiate the mini VGG-based autoencoder.

In [9]:
model = CAE(input_shape=(28, 28, 1))

Get the encoder component.

In [10]:
encoder = model.encoder

Build a mini VGG neural network.

In [11]:
class NN(tf.keras.Model):
    def __init__(self, **kwargs):
        super(NN, self).__init__()
        self.encoder = encoder
        self.flatten = tf.keras.layers.Flatten()
        self.dense_layer = tf.keras.layers.Dense(units=512, activation=tf.nn.relu)
        self.output_layer = tf.keras.layers.Dense(units=10, activation=tf.nn.softmax)
        
    def call(self, features):
        activation = self.encoder(features)
        activation = self.flatten(activation)
        activation = self.dense_layer(activation)
        outputs = self.output_layer(activation)
        return outputs

Instantiate the neural network.

In [12]:
clf = NN()

Let's set an early stopping.

In [13]:
early_stop_callback = tf.keras.callbacks.EarlyStopping(
    monitor="val_accuracy",
    min_delta=1e-4,
    patience=10
)

Compile the neural network for training and inference.

In [14]:
clf.compile(loss=tf.losses.categorical_crossentropy,
            optimizer=tf.optimizers.Adam(learning_rate=1e-3),
            metrics=['accuracy'])

Train the model.

In [15]:
history = clf.fit(train_dataset,
                  epochs=epochs,
                  validation_data=validation_dataset,
                  callbacks=[early_stop_callback],
                  verbose=2)

Epoch 1/100
118/118 - 24s - loss: 2.2643 - accuracy: 0.7297 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/100
118/118 - 21s - loss: 0.2332 - accuracy: 0.9301 - val_loss: 0.1908 - val_accuracy: 0.9358
Epoch 3/100
118/118 - 21s - loss: 0.1148 - accuracy: 0.9652 - val_loss: 0.0975 - val_accuracy: 0.9706
Epoch 4/100
118/118 - 21s - loss: 0.0692 - accuracy: 0.9797 - val_loss: 0.0788 - val_accuracy: 0.9732
Epoch 5/100
118/118 - 21s - loss: 0.0505 - accuracy: 0.9844 - val_loss: 0.0743 - val_accuracy: 0.9750
Epoch 6/100
118/118 - 21s - loss: 0.0380 - accuracy: 0.9882 - val_loss: 0.0627 - val_accuracy: 0.9804
Epoch 7/100
118/118 - 21s - loss: 0.0315 - accuracy: 0.9907 - val_loss: 0.0561 - val_accuracy: 0.9814
Epoch 8/100
118/118 - 21s - loss: 0.0230 - accuracy: 0.9932 - val_loss: 0.0559 - val_accuracy: 0.9806
Epoch 9/100
118/118 - 21s - loss: 0.0187 - accuracy: 0.9945 - val_loss: 0.0548 - val_accuracy: 0.9822
Epoch 10/100
118/118 - 21s - loss: 0.0139 - accuracy: 0.9963 - val_loss: 0

Evaluate the performance.

In [16]:
clf.evaluate(test_dataset, verbose=2)

118/118 - 7s - loss: 2.4605e-04 - accuracy: 1.0000


[0.0002460519570346243, 1.0]

Perturb the test data with noise from a Normal distribution having a standard deviation of `5e-2`.

In [17]:
test_features += tf.random.normal(stddev=5e-2, shape=test_features.shape)

Evaluate the model on the perturbed test data.

In [18]:
clf.evaluate(test_features, test_labels, batch_size=512, verbose=2)

5000/1 - 1s - loss: 0.0625 - accuracy: 0.9914


[0.031710679519176485, 0.9914]

Reload the test features, and increase the standard deviation of the Normal distribution from which we shall get the noise from for perturbing the test data.

In [19]:
_, (test_features, _) = mnist.load_data()

test_features = test_features.astype('float32') / 255.
test_features = test_features.reshape(-1, 28, 28, 1)
test_features += tf.random.normal(stddev=5e-1, shape=test_features.shape)

Get the last 5000 test features.

In [20]:
test_features = test_features[5000:]

Evaluate on the new perturbed test data.

In [21]:
clf.evaluate(test_features, test_labels, batch_size=batch_size, verbose=2)

5000/1 - 1s - loss: 1.5057 - accuracy: 0.7390


[1.1073272378921508, 0.739]

In [22]:
clf.save_weights('../assets/export/baseline/mnist/100_epochs', save_format='tf')

Save the training history.

In [23]:
with open("../assets/export/baseline/mnist/history.txt", "w") as file:
    file.write(repr(history.history))

We can load the saved history later by,

In [None]:
with open("../assets/export/baseline/mnist/history.txt", "r") as file:
    data = file.read()
history = eval(data)