
![alt text](edmond.png "generator discriminator formula")

# Programming Imagination : An introduction to Generative Adversarial Networks with Python

Christie’s auctioneers in New York City recently made international headlines for selling a portrait of Edmond de Belamy , who appears to be of the French Aristocracy, for a sum of $432,500. This isnt an unusual feat for this an auction house that caters specifically to those of more discerning tastes. What makes this photo stand out isn't just the painting style or its masterful use of water colors; the artist that created this master piece is a neural network developed by a team in France. It's almost as if this network has an imagination of its own, so thats what we'll dip into today, building imaginary works and feature with GAN's.

Generative Adversarial Networks (referred to as GAN(s) for brevity) are one of the hottest topics in machine learning, due to what they can accomplish with training data and what they can output. Rather than being a pure discriminator network that can map labels onto features afrter it learns about the underlying patterns of that data, GAN's takes this in a backwards manner and can generate *possible* features after learning about the given labels.

## The method behind the madness of GANs

At a high level, his type of neural network architecture is all about generation of new data and validation of that new generated data. Let's go over the parts of this system and their specific duties:

- The **generator network** takes the training data, and based on what features it sees, generates new instances of that training data
    - If we were using the titanic data set it would create a whole new passenger that was never on the titanic
    - If we used the MNIST data set, it would generate new "numbers" that are not present in the current arabic numerals
    - It then takes this newly created data along with samples from the actual data set and feeds them into the discriminator network

- The **discriminator network** a more traditional neural network that takes the outputs the generator network created and classifies all of the inputs with a probability of them being real or fake (this being a value between 0-1)

This is the equivalent of having an art critic and a counterfiet art maker in competition with eachother.
The counterfiet art maker (generator) initially only has accesss to the real paintings and attempts to make paintings are close to those as possible, with a few changes for originality (imagine someone "finding" a Rembrant that doesnt appear in his his known catalogue and attemping to put that to auction).

The art critic (the discriminator) appraises the paintings, both real and fake that are submitted to an auction house, and through careful study that they have built up over seeing many sampels of work, attempts to make sure that fake images don't make it to the auction.

Over a series of training epochs, these two networks both get better at their respective jobs, maximizing their loss against eachother until a happpy medium is met.

The mathematical formula for this is the following:



![alt text](minmaxDG.png "generator discriminator formula")



D is the discrimator and G is the generator. This formula discribes how the loss functions of each contribute to the training of the network

## Best Practices in handling dataand training GANs

Luckily for us using python, we have access to Kera's sequential models for training out neural networks. This is important becuase when training the both the generative and discrimanator network, these need to be done one after another in a for-loop, generally in the same function or part of script (It's good practice to see these  placed into a OO style class for simplicity, but those working in languages such as Scala might be use to a more functional approach). The reason for  this is that each of the networks should be training against an already trained static adversary, by static we mean that the outputs and inputs that are being trained on  are valid for the single for-loop epoch that both are being pitted agaisnt eachother.

The order of operation that has yeiled consistent results is to train the discriminator first, so the generative network can better capture the gradient by why this discriminator uses to classify the results as fake or real
This allows the generative network to better learn not just the data that is it going to generate better but also to "fool" the discriminator.

Just like any other neural network archetecture, one must be careful of having learning rates that are too high or too low, but in this case this must be account for on both networks. Example, too high of a learning rate for the generative network, and too low of a learning rate for the disciminator network would result in the generator far outclasssing the discriminator to the point of returning false negatives, as anything outside of what the discriminator "knows" based on training data will be labeled as fake. This is the equivalent of having almost a perfect counterfeit in the same style of a painter, but it having a single stroke out of place from what is deemed normal for that painter, and therefore being automatically labeled as fake.


The following is a bas example of GAN archetecture taken from the github of Erik Linder-Norén, an up and coming research scientist out of Stocholm, Sweden. As discussed earlier the usual program design is that of a python class that handles the training, building, and the execution of the sample images. 

In [2]:
from __future__ import print_function, division

from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam

import matplotlib.pyplot as plt

import sys

import numpy as np

class GAN():
    def __init__(self):
        self.img_rows = 28
        self.img_cols = 28
        self.channels = 1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)
        self.latent_dim = 100

        optimizer = Adam(0.0002, 0.5)

        # Build and compile the discriminator
        self.discriminator = self.build_discriminator()
        self.discriminator.compile(loss='binary_crossentropy',
            optimizer=optimizer,
            metrics=['accuracy'])

        # Build the generator
        self.generator = self.build_generator()

        # The generator takes noise as input and generates imgs
        z = Input(shape=(self.latent_dim,))
        img = self.generator(z)

        # For the combined model we will only train the generator
        self.discriminator.trainable = False

        # The discriminator takes generated images as input and determines validity
        validity = self.discriminator(img)

        # The combined model  (stacked generator and discriminator)
        # Trains the generator to fool the discriminator
        self.combined = Model(z, validity)
        self.combined.compile(loss='binary_crossentropy', optimizer=optimizer)


    def build_generator(self):

        model = Sequential()

        model.add(Dense(256, input_dim=self.latent_dim))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(1024))
        model.add(LeakyReLU(alpha=0.2))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Dense(np.prod(self.img_shape), activation='tanh'))
        model.add(Reshape(self.img_shape))

        model.summary()

        noise = Input(shape=(self.latent_dim,))
        img = model(noise)

        return Model(noise, img)

    def build_discriminator(self):

        model = Sequential()

        model.add(Flatten(input_shape=self.img_shape))
        model.add(Dense(512))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(256))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dense(1, activation='sigmoid'))
        model.summary()

        img = Input(shape=self.img_shape)
        validity = model(img)

        return Model(img, validity)

    def train(self, epochs, batch_size=128, sample_interval=50):

        # Load the dataset
        (X_train, _), (_, _) = mnist.load_data()

        # Rescale -1 to 1
        X_train = X_train / 127.5 - 1.
        X_train = np.expand_dims(X_train, axis=3)

        # Adversarial ground truths
        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):

            # ---------------------
            #  Train Discriminator
            # ---------------------

            # Select a random batch of images
            idx = np.random.randint(0, X_train.shape[0], batch_size)
            imgs = X_train[idx]

            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

            # Generate a batch of new images
            gen_imgs = self.generator.predict(noise)

            # Train the discriminator
            d_loss_real = self.discriminator.train_on_batch(imgs, valid)
            d_loss_fake = self.discriminator.train_on_batch(gen_imgs, fake)
            d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

            # ---------------------
            #  Train Generator
            # ---------------------

            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))

            # Train the generator (to have the discriminator label samples as valid)
            g_loss = self.combined.train_on_batch(noise, valid)

            # Plot the progress
            print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))

            # If at save interval => save generated image samples
            if epoch % sample_interval == 0:
                self.sample_images(epoch)

    def sample_images(self, epoch):
        r, c = 5, 5
        noise = np.random.normal(0, 1, (r * c, self.latent_dim))
        gen_imgs = self.generator.predict(noise)

        # Rescale images 0 - 1
        gen_imgs = 0.5 * gen_imgs + 0.5

        fig, axs = plt.subplots(r, c)
        cnt = 0
        for i in range(r):
            for j in range(c):
                axs[i,j].imshow(gen_imgs[cnt, :,:,0], cmap='gray')
                axs[i,j].axis('off')
                cnt += 1
        fig.savefig("images/%d.png" % epoch)
        plt.close()


if __name__ == '__main__':
    gan = GAN()
    gan.train(epochs=30000, batch_size=32, sample_interval=200)
 

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 256)               131328    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 256)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 257       
Total params: 533,505
Trainable params: 533,505
Non-trainable params: 0
_________________________________________________________________
____

KeyboardInterrupt: 