# Tutorial: Generative Adversarial Networks - Advanced Techniques
This tutorial is about Generative Models and **Generative Adversarial Networks** (**GANs**).
In this tutorial we will implement different types of GANs, which were proposed recently:
- Vanilla GAN - https://arxiv.org/abs/1406.2661
- Conditional GAN - https://arxiv.org/abs/1610.09585
- Wasserstein GAN (WGAN-GP) - https://arxiv.org/abs/1704.00028
- Spectral Normalization SNGAN - https://arxiv.org/abs/1802.05957

and learn about further techniques to stabilize the training of GANs. (DCGANs, conditioning of the generator ...)
We will have a look on three data sets (1 from computer vision, 2 physics data sets)
- CIFAR10, learn more: https://www.cs.toronto.edu/~kriz/cifar.html
- Footprints of Air Showers, learn more: https://git.rwth-aachen.de/DavidWalz/airshower
- Calorimeter Images, learn more: https://doi.org/10.1007/s41781-018-0019-7

As framework, we make use of TensorFlow:
- TensorFlow Keras (API shipped with TensorFlow) [learn more >>](https://keras.io/)
- TensorFlow-GAN (lightweight library for training GANs) [learn more >>](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/gan)

## Basics
### Generative models
Before we jump in to the practical implementation of GANs, we need to introduce _Generative Models_.
Let us assume we have a bunch of images which forms the distribution of real images $P_{r}$.
In our case the distribution consists of several classes horse, airplane, frog, cars etc.
Instead of training a classifier to be able to label our data we no would like to generate samples which are really
similar to samples out of $P_{r}$.
 
![CIFAR 10 Image](images/CIFAR10_collection.png)
So in a mathematical way we would like to approximate the real distribution $P_{r}$ with a model $P_{\theta}$.
With this *generative* model, we then would like to generate new samples out of our approximation $x \sim P_{\theta}$.

### Generative Adversarial Networks
The basic idea of generative adversarial networks is to train a **generator network** to learn the underlying distribution.<sup>[1](#myfootnote1)</sup>
In other words, we would like to design a generator machine what we can feed with noise and which outputs us nice samples following
the distribution of real images $P_{r}$, but which are not part of the training dataset.
So in our case, we would like to generate new samples of airplanes, cars, dogs etc..
 
![Generator Machine](images/generator_machine.png)

The generator network $G(z)$ gets as input a noise vector $z$ sampled from a muldimensional noise distribution $z \sim p(z)$.
Thi space of $z$ is often called the latent space. The generator should then map the noise vector $z$ into the data space (the space where our
real data samples lie) $\tilde{x} \sim G(z)$.

For the training of the generator network we need feedback, if the generated samples are of good or bad quality.
Because a classical supervised loss is incapable for giving a good feedback to the generator network, it is trained in an unsupervised manner.
So instead of using "mean squared error" or similar metrics, the performance measure is given by a **second** _adversarial_ neural network, which is called disicriminator.
This is the fascinating idea of _adversarial training_.

#### Adversarial training

Our adversarial framework consists out of 2 networks:
- the generator network $G" (learn the mapping from nois to images)
- the discriminator "D" network (measures the image quality, by discriminating if the images if true or fake)


In a figurative sense the what is the fascinating idea of GANs.

 
$ \mathcal{L} = \mathbb{E}_{\mathbf{x} \sim p_{data}(\mathbf{x})} [log(D_w(\mathbf{x}))] +  \mathbb{E}_{\mathbf{z} \sim p_z(\mathbf{z})} [log(1-D_w(G(\mathbf{z})))]$


<a name="myfootnote1">1</a>: In contrast to e.g. _Variational Autoencoders_ the idea is not to "directly fit the distribution but to train a generator which approximates the real disitribution directly.
Remember that in VAEs we learn a mapping in to the latent space where we can "fit" a gaussian. Therefore, after the training we can just
generate new samples by sampling from the latentspace using the Gaussian prior.
Because most problems are to complex, the Gaussian is not able to capture all modes, this leads to blurry images which is
a well known problem for VAE generated samples.


In [1]:
import numpy as np
import tensorflow as tf
from plotting import plot_images
import tutorial
layers = tf.layers
print("TensorFLow version", tf.__version__)


  from ._conv import register_converters as _register_converters
  from . import h5a, h5d, h5ds, h5f, h5fd, h5g, h5r, h5s, h5t, h5p, h5z
  from .. import h5g, h5i, h5o, h5r, h5t, h5l, h5p
  from . import _csparsetools
  from ._solve_toeplitz import levinson
  from ._decomp_update import *
  from ._ufuncs import *
  from ._ellip_harm_2 import _ellipsoid, _ellipsoid_norm
  from ._trlib import TRLIBQuadraticSubproblem
  from ._group_columns import group_dense, group_sparse
  from . import _bspl
  from .ckdtree import *
  from .qhull import *
  from . import _voronoi
  from . import _hausdorff
  from . import _ni_label
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
  from pandas._libs import (hashtable as _hashtable,
  from pandas._libs import (hashtable as _hashtable,
  from pandas._libs import algos, lib
  from pandas._libs import algos, lib
  from pandas._libs import hashing
  from panda

eos access: ✗
('TensorFLow version', '1.13.1')


Let's start to build a data pipeline.
First we need to define our Data generator.
The generator should output real samples (input for the discriminator) and noise (input for the generator)
The variable LATENT_DIM defines the dimensionality of the latent space of the generator.
(The noise distribution we sample from).

In [5]:
def generator(LATENT_DIM):
    while True:
        (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
        images = (np.expand_dims(x_train, axis=-1)) / 255.
        images = images.astype(np.float32)
        noise = np.random.randn(60000, LATENT_DIM).reshape(60000, LATENT_DIM)
        idx = np.random.permutation(60000)
        noise = noise[idx]
        images = images[idx]
        for i in range(60000):
            yield (noise[i], images[i])

Let us now check if our generator is working

In [6]:
import itertools
test_image = np.array(list(itertools.islice(generator(64), 1)))
test_image.shape

(1, 2)

To train our estimator we can make create a TensorflowDataset out of our data generator.
The function outputs a batches of our dataset.

In [7]:
def batch_dataset(BATCH_SIZE, LATENT_DIM, generator_fn):
    Dataset = tf.data.Dataset.from_generator(
        lambda: generator_fn(LATENT_DIM), output_types=(tf.float32, tf.float32),
        output_shapes=(tf.TensorShape((LATENT_DIM,)), tf.TensorShape((28, 28, 1))))
    return Dataset.batch(BATCH_SIZE)

For training GANs we need to further define our generator and discriminator network.
We start by defining our generator network, which should map from our noise space into the space of out images (LATENT_DIM --> IMAGE_DIM)

In [15]:
def generator_fn(x, latent_dim=LATENT_DIM):
    x = layers.Dense(7 * 7 * 128, activation='relu', input_shape=(latent_dim,))(x)  #
    x = tf.reshape(x, shape=[BATCH_SIZE, 7, 7, 128])
    x = layers.Conv2DTranspose(128, (5, 5), strides=(2, 2), padding='same', activation='relu')(x)
    x = layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', activation='relu')(x)
    x = layers.Conv2D(1, (5, 5), padding='same', activation='sigmoid')(x)
    return x

After defining our generator network we need now to implement our discriminator.
The task of the discriminator is to measure the similarity between the fake images (output of the generator) and the real images.
So, the network maps from the image space into a 1D space where we can measure the 'distance' between the distributions of the real and generated images.  (IMAGE_DIM --> 1)

In [9]:
def discriminator_fn(x, drop_rate=0.25):
    """ Discriminator network """
    x = layers.Conv2D(32, (5, 5), padding='same', strides=(2, 2), activation='relu', input_shape=(28, 28, 1))(x)
    x = tf.nn.leaky_relu(x, 0.2)
    x = layers.Conv2D(64, (5, 5), padding='same', strides=(2, 2), activation='relu')(x)
    x = tf.nn.leaky_relu(x, 0.2)
    x = layers.Conv2D(128, (5, 5), padding='same', strides=(2, 2), activation='relu')(x)
    x = tf.nn.leaky_relu(x, 0.2)
    x = layers.Flatten()(x)
    x = layers.Dense(256)(x)
    x = tf.nn.leaky_relu(x, 0.2)
    x = layers.Dense(1)(x)
    return x

As Wasserstein-1 is a meaningful distance measure for disjoint distributions let's use it as objective of our GAN training.
We can very easily make use of the losses predefined in tf.contrib.gan.
The are 2 possible constraints to construct the Wasserstein distance:
- Use weight clipping
- Penalize the gradient

(Easy interpretation: We need a constraint to train the discriminator to convergence, otherwise the discriminator could focus on one feature which differs between real and fake samples and won't converge)
Weight clamping will heavily reduce the capacity of the discriminator which is unfavourable.
So let use use the Gradient Penalty (https://arxiv.org/abs/1704.00028):
By penalizing the gradient to be smaller than 1, we enforce the lipschitz constraint needed to construct Wasserstein using the Kantorovich-Rubinstein duality(https://cedricvillani.org/wp-content/uploads/2012/08/preprint-1.pdf)

In [10]:
def discrimintator_loss(model, add_summaries=True):

    loss = tf.contrib.gan.losses.wasserstein_discriminator_loss(model, add_summaries=add_summaries)
    gp_loss = GP * tf.contrib.gan.losses.wasserstein_gradient_penalty(model, epsilon=1e-10, one_sided=True, add_summaries=add_summaries)
    loss += gp_loss

    if add_summaries:
        tf.summary.scalar('discriminator_loss', loss)

    return loss

After defining our loss we can choose our training parameters

In [11]:
BATCH_SIZE = 32  # number of samples fed into the framework in each iteration
LATENT_DIM = 64  # dimension of the generators latent space
GEN_LR = 0.001   # learning rate of the generator
DIS_LR = 0.0001  # learning rate of the discriminator
ITER = 1000      # framework iterations
LOG_DIR = "."    # directory of the estimator (to save the graph and checkpoints)
dir = tutorial.make_dir(LOG_DIR, "WGAN_GP")
GP = 10          # factor to scale the gradient penalty (higher means larger enforcing the Lipschitz constrain)
N_CRIT = 5       # number of critic iterations per generator iterations.

Now we can very easily implement our framework as estimator using tfgan.
This will heavily simplify our training procedure.

In [16]:
tfgan = tf.contrib.gan
gan_estimator = tfgan.estimator.GANEstimator(
    dir,
    generator_fn=generator_fn,
    discriminator_fn=discriminator_fn,
    generator_loss_fn=tfgan.losses.wasserstein_generator_loss,
    discriminator_loss_fn=discrimintator_loss,
    generator_optimizer=tf.train.AdamOptimizer(GEN_LR, 0.5),
    discriminator_optimizer=tf.train.AdamOptimizer(DIS_LR, 0.5),
    get_hooks_fn=tfgan.get_sequential_train_hooks(tfgan.GANTrainSteps(1, N_CRIT)),
    config=tf.estimator.RunConfig(save_summary_steps=10, keep_checkpoint_max=1, save_checkpoints_steps=200),
    use_loss_summaries=True)

INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_num_ps_replicas': 0, '_keep_checkpoint_max': 1, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7feb89959890>, '_model_dir': './WGAN_GP_train_2019-04-04_17:03:06', '_protocol': None, '_save_checkpoints_steps': 200, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': None, '_save_summary_steps': 10, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}


Let us train our framework using our gan_estimator and our data_pipeline

In [21]:
# gan_estimator.train(lambda: batch_dataset(BATCH_SIZE, LATENT_DIM, generator), steps=ITER)

INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./WGAN_GP_train_2019-04-04_17:03:06/model.ckpt.
INFO:tensorflow:loss = 0.041437, step = 1
INFO:tensorflow:global_step/sec: 0.434456
INFO:tensorflow:loss = -5.485588, step = 101 (230.182 sec)
INFO:tensorflow:Saving checkpoints for 200 into ./WGAN_GP_train_2019-04-04_17:03:06/model.ckpt.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:global_step/sec: 0.427456
INFO:tensorflow:loss = 1.0505079, step = 201 (233.934 sec)
INFO:tensorflow:global_step/sec: 0.388586
INFO:tensorflow:loss = 0.48207843, step = 301 (257.343 sec)
INFO:tensorflow:Saving checkpoints for 400 into

<tensorflow.contrib.gan.python.estimator.python.gan_estimator_impl.GANEstimator at 0x7feb89959e10>