In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import numpy as np
from tensorflow import keras

# Autoencoders
are ANNs that are trained to reproduce their inputs. Autoencoders often do this by compressing the inputs, and so they can generate lossy outputs sometimes. Further, autoencoders are trained without supervision (the training data isn't labeled). Autoencoders can also generate outputs that are similar to their training data rather than just replicating their inputs. So, autoencoders can be used for compression and generating outputs similar to inputs, though these outputs tend to be low quality. However, **Generative Adversarial Networks (GANs)** are the newest form of autoencoder, and they generate amazingly detailed images. GANs work by having a generator and a discriminator. The discriminator is like a typical neural network that learns how to recognize training data, whereas the generator generates outputs that look like the training data but are not, and so tries to trick the discriminator into believing lies, namely that the outputs from the generator are training data. So, the discriminator and generator are enganged in an arms race that is propelled forward by their ability to lean, respectively, how to not be fooled and how to fool.

Autoencoders are less sophisticated, in that they simply learn to copy their inputs to their outputs. However, this process is made more diffcult, and the outputs more interesting, by imposing constraints on the autoencoder. For example, we may require that the representations that the autoencoder learns are of a limited size, so that the autoencoder is forced to compress the data. Thus, autoencoders can be thought of as ANNs that learn the identity function under the imposition of various constraints. The representations of the input data that the autoencoders learn are dense representations called **codings** or **latent representations**.

Autoencoders always consist of two components: an encoder (aka recognition network) and a decoder. The encoder is in the business of learning the latent representation and the decoder interprets the latent representation. The hope is that the autoencoder discovers and interprets a program that takes less space than simply recapitulating the input data, say in the form of a look-up table. Further, autoencoder output layers always have the same number of dimensions (nodes) as their input. So, for example, if we have an autoencoder with 3 inputs, then it will also have 3 outputs. The simple example given in the text is of a single hidden layer MLP that has two hidden units, 3 inputs and 3 outputs. By having less dimensions in the hidden layer, this constraint forces the autoencoder to learn a latent representation, and an interpretation of it, that does not consist in simply copying the input data. So, the hope is that the autoencoder can discover the two most important dimensions of the input data and then extract from those dimensions the actual output.

In [5]:
# We can implement PCA with a very simple autoencoder.
encoder = keras.models.Sequential([keras.layers.Dense(2, input_shape=[3])])
decoder = keras.models.Sequential([keras.layers.Dense(3, input_shape=[2])])
PCA_autoencoder = keras.models.Sequential([encoder, decoder])   # We can use full models as layers in other models
# So, this is an autoencoder that takes in 3d input, encodes the input into
# a 2d representation (the two leading principal components, if we use MSE as the loss function), and
# then returns the two principal components.

In [7]:
PCA_autoencoder.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=0.1))

In [None]:
# Notice that X_train is what we train on and what we predict!
history = autoencoder.fit(X_train, X_train, epochs = 20)
codings = encoder.predict(X_train)

# Generative Adversarial Networks (GANs)
consist of two components: a generator and a discriminator, which are trained separately in disjoint steps. Thus, we can conceive of a GANs training process as a two-part cycle, namely (1) the training of the discriminator and (2) training of the generator. The reason we can think of these parts as being disjoint is that the weights of the generator are not updated during the first part and the weights of the discriminator are not updated during the second part.

So, what does the training actually consist of? First, the discriminator is trained to classify images. The set of images that the classifier is dealing with comes from two sources, namely a training set of "real" images from humans (real <==> labeled 1) and a set of "fake" (labeled 0) images made by the generator. In the first step, the discriminator is trained to recognize whether images are real or fake, and only the discriminator has its weights updated using the binary cross entropy loss function. In the second step, the discriminator is exposed only to images made by the generator, and every such image is labeled as real (i.e., labeled 1); the generator is rewarded for getting the discriminator to predict that the images are real and punished for predicting that the images are fake, wherein only the weights of the generator are updated during this step. Eventually, this allows the generator to get really good at making images that look realistic.

In [2]:
codings_size = 30

generator = keras.models.Sequential([keras.layers.Dense(100, activation='selu', input_shape=[codings_size]),
                                    keras.layers.Dense(150, activation='selu'),
                                    keras.layers.Dense(28*28, activation='sigmoid'),
                                    keras.layers.Reshape([28, 28])])

discriminator = keras.models.Sequential([keras.layers.Flatten(input_shape=[28, 28]),
                                        keras.layers.Dense(150, activation='selu'),
                                        keras.layers.Dense(100, activation='selu'),
                                        keras.layers.Dense(1, activation='sigmoid')])

gan = keras.models.Sequential([generator, discriminator])

In [3]:
# Now, we compile the models and make the discriminator not trainable for the second phase

discriminator.compile(loss='binary_crossentropy', optimizer='rmsprop')
discriminator.trainable = False
gan.compile(loss='binary_crossentropy', optimizer='rmsprop')

"Note: The trainable attribute is taken into account by Keras only when compiling a model, so after running this code, the discriminator *is* trainable if we call its fit() method or its train_on_batch() method (which we will be using), while it is *not* trainable when we call these methods on the gain model." So, setting the trainable attribute to false just ensures that we are not training during the second step, but changes nothing for the first step, which is just what we need.

In [7]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

In [8]:
batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)
dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)

In [10]:
# We have to write a special function for training the gan:
def train_gan(gan, dataset, batch_size, codings_size, n_epochs = 50): 
    generator, discriminator = gan.layers 
    for epoch in range(n_epochs): 
        for X_batch in dataset: 
            # phase 1 - training the discriminator
            noise = tf.random.normal(shape = [batch_size, codings_size]) 
            generated_images = generator(noise) 
            X_fake_and_real = tf.concat([generated_images, X_batch], axis = 0) 
            y1 = tf.constant([[ 0.]] * batch_size + [[ 1.]] * batch_size) 
            discriminator.trainable = True 
            discriminator.train_on_batch(X_fake_and_real, y1) 
            # phase 2 - training the generator 
            noise = tf.random.normal(shape = [batch_size, codings_size])
            y2 = tf.constant([[ 1.]] * batch_size)
            discriminator.trainable = False
            gan.train_on_batch(noise, y2)

In [11]:
train_gan(gan, dataset, batch_size, codings_size)

InvalidArgumentError: cannot compute ConcatV2 as input #1(zero-based) was expected to be a float tensor but is a uint8 tensor [Op:ConcatV2] name: concat