<a href="https://colab.research.google.com/github/Aladoro/tutorials/blob/master/GenerativeAdversarialNetworks/GAN_tutorial_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generative Adversarial Networks - Tutorial 1

Purpose: we want to obtain some *distributional representation* of some given set data, in the form of sampler.

**Input:** The dataset we want to represent.

**Output:** A  (neural network) model (*the generator*) which can generate new samples based on the structurally consistencies of our dataset samples.



Current capabilities of GANs:

https://www.youtube.com/watch?v=6E1_dgYlifc

## A step backwards, **the original paper**:



![alt text](https://drive.google.com/uc?id=1VQtjYYJ95V7flzsik368_Nu25lrEU_XA)

![alt text](https://drive.google.com/uc?id=1RgfNOgDF-6qLNBC_lJcNt3wsxu0SX7Sv)

## Key equation:

![alt text](https://drive.google.com/uc?id=11Tc_OAZEbvhFag7fh0SAADTzxXY_RjV8)

### **Main concept of GANs:**

We train 2 Networks:


1.   A Generator **G**:


*   Takes as input random variables **z** from a fixed noise distribution **P(z)**
*   Outputs generated samples **G(z)**, which we would want to be close to our real data distribution **p_*data(x)***(from our dataset) 

---

2.   A Discriminator **D**:


*   Takes as input samples either from our dataset, **x_*real***, or generated from **G**, **x_*gen***
*   Outputs a probability for each sample **D(x)**, ideally proportional to how likely its input **x** came from our dataset


> **D** is trained as a classifier, to assign a high score to real sample and a low score to generated sample through the cross-entropy loss.


> **G** is trained to adversarially fool the discriminator, to obtain higher scores.

Hence, 2 Neural Networks are pitted against each other, a classifier (the discriminator) and a generator. 
The classifier improves by learning to classify the real from the generated samples, the generator by learning how to bring its samples closer to the real samples (in order to fool the classifier).












![alt text](https://drive.google.com/uc?id=1w4DPCzn1AfUsbp4x_AeWAmIP2hgGPYpm)

## Algorithm

![alt text](https://drive.google.com/uc?id=1HppTTWvwQHa_9BV1zy42fjdT-RVH5s6d)

Samples produced by the original GAN work were good for the time (2014), but nowhere near comparable the ones produced by today's model




![alt text](https://drive.google.com/uc?id=12ydeEfQuGlJ5DlVb0j_v0anMy9W3-f4r)

## Further major breakthrough: **DCGAN** (2015)



![alt text](https://drive.google.com/uc?id=1d2CMtD48BP-7SjmCli4I8D0V45cgArzc)

This work introduced and/or popularized many practices that became standards in GAN training for many further works:



> 1. **Discriminator** and **Generator** architectural practices:




![alt text](https://drive.google.com/uc?id=1_a4uL1nSqLqs8Rnp7da4r_8eZ5JORdDT)



> 2. Non-saturating **Generator** loss:






![alt text](https://drive.google.com/uc?id=1Wm2MAlU-2VK5VV6bGyihB9_aWOEhroWr)

### Different **Generator** objectives visualizations:

Minimax = original proposed objective

![alt text](https://drive.google.com/uc?id=1oJGt5vWmSrVIy4LEX1sjw-WPXiS-3HMm)

### Comparison on *MNIST* dataset:

![alt text](https://drive.google.com/uc?id=1uA4RsuTpzy_6no_0iNYhS0KbzBRCSFWJ)

## Implementation on MNIST

### Train utils

In [0]:
import matplotlib.pyplot as plt
import numpy as np

In [0]:
def display_samples(samples, images_per_row=5):
  float_samples = samples
  number_samples = samples.shape[0]
  grid_rows = int(np.ceil(number_samples/images_per_row))
  plt.figure(figsize=(images_per_row, grid_rows))
  for i in range(number_samples):
    plt.subplot(grid_rows, images_per_row, i+1)
    plt.imshow(float_samples[i])
    plt.axis('off')
  plt.tight_layout()
  plt.show()

def get_random_batch(dataset, batch_size):
  indices = np.random.randint(dataset.shape[0], size=batch_size)
  return dataset[indices]

def train_epoch(model, dataset, batch_size, updates_per_epoch=1000,
                disc_steps_per_update=1, gen_steps_per_update=1, log_every=100,
                lr_to_decay=None):
  gen_losses = []
  disc_losses = []
  for i in range(updates_per_epoch):
    for step in range(disc_steps_per_update):
      batch = get_random_batch(dataset, batch_size)
      disc_loss = model.step_discriminator(batch, batch_size)
      disc_losses.append(disc_loss)
    for step in range(gen_steps_per_update):
      gen_loss = model.step_generator(batch_size)
      gen_losses.append(gen_loss)
    if lr_to_decay is not None:
      for lr in lr_to_decay:
        lr.decay_lr()
    if i % log_every == 0:
      print('Update {}/{}'.format(i+1, updates_per_epoch))
      print('Discriminator loss: {}; Generator loss: {}'.format(
          disc_loss, gen_loss
      ))
      if lr_to_decay is not None:
        for j, lr in enumerate(lr_to_decay):
          print('Current learning rate {}: {}'.format(j, lr()))
  return gen_losses, disc_losses

def train_gan(model, data, batch_size, epochs, samples_to_display=0, 
              disc_steps_per_update=1, gen_steps_per_update=1, 
              log_every=100, tanh_outputs=False, lr_to_decay=None):
  _ = model(1)
  model.summary()
  gen_losses = []
  disc_losses = []
  size = data.shape[0]
  updates_per_epoch = size // batch_size
  for e in range(epochs):
    print('Training epoch {}/{}'.format(e+1, epochs))
    e_gen_losses, e_disc_losses = train_epoch(
        model, data, batch_size, updates_per_epoch, disc_steps_per_update, 
        gen_steps_per_update, log_every, lr_to_decay)
    
    gen_losses += e_gen_losses
    disc_losses += e_disc_losses

    if samples_to_display > 0:
      samples = model.generate(samples_to_display)
      if samples.shape[-1] == 1:
        samples = np.reshape(samples, samples.shape[:-1])
      if tanh_outputs:
        samples = (samples + 1)/2
      display_samples(samples)

  return gen_losses, disc_losses

### Model Implementation

In [0]:
%tensorflow_version 2

import tensorflow as tf
import numpy as np
import tensorflow_probability as tfp

tfl = tf.keras.layers
tfo = tf.keras.optimizers
tfm = tf.keras.models

In [0]:
def run_layers(inputs, layers):
  out = inputs
  for layer in layers:
    out = layer(out)
  return out

class Generator(tfl.Layer):
  def __init__(self, layers):
    super(Generator, self).__init__()
    self._gen_layers = layers
  def call(self, inputs):
    return run_layers(inputs, self._gen_layers)

class Discriminator(tfl.Layer):
  def __init__(self, layers):
    super(Discriminator, self).__init__()
    self._disc_layers = layers
  def call(self, inputs):
    return run_layers(inputs, self._disc_layers)

class SimpleGAN(tfm.Model):
  def __init__(self, noise_dims, generator, discriminator, non_saturating=False,
               gen_optimizer=tfo.Adam(1e-3),
               disc_optimizer=tfo.Adam(1e-3)):
    super(SimpleGAN, self).__init__()
    self._input_distribution = tfp.distributions.MultivariateNormalDiag(
        loc=tf.zeros([noise_dims]), scale_diag=tf.ones([noise_dims]))
    self._gen = generator
    self._disc = discriminator
    self._gen_opt = gen_optimizer
    self._disc_opt = disc_optimizer
    self.step_discriminator = self._make_discriminator_trainining_op()
    self.step_generator = self._make_generator_training_op(non_saturating)

  def call(self, batch_size):
    out = {'gen': self.generate(batch_size)}
    out['dis'] = self._disc(out['gen'])
    return out

  def generate(self, batch_size):
    # batch_size: number of datapoints we want to generate
    raise NotImplementedError

  def discriminator_loss(self, real_data, fake_data_n):
    # real_data: batch of datapoints, from the dataset we want to represent
    # fake_data_n: number of samples to generate and evaluate
    raise NotImplementedError

  def generator_loss(self, fake_data_n):
    # fake_data_n: number of samples to generate and evaluate
    raise NotImplementedError

  def non_saturating_generator_loss(self, fake_data_n):
    # fake_data_n: number of samples to generate and evaluate
    raise NotImplementedError

  def _make_discriminator_trainining_op(self,):
    def train(real_data, fake_data_n):
      with tf.GradientTape() as tape:
        disc_loss = self.discriminator_loss(real_data, fake_data_n)
      gradients = tape.gradient(disc_loss, self._disc.trainable_weights)
      self._disc_opt.apply_gradients(zip(gradients, 
                                         self._disc.trainable_weights))
      return disc_loss
    return tf.function(train)
  
  def _make_generator_training_op(self, non_saturating=False):
    if non_saturating:
      generator_loss = self.non_saturating_generator_loss
    else:
      generator_loss = self.generator_loss
    def train(fake_data_n):
      with tf.GradientTape() as tape:
        gen_loss = generator_loss(fake_data_n)
      gradients = tape.gradient(gen_loss, self._gen.trainable_weights)
      self._gen_opt.apply_gradients(zip(gradients, self._gen.trainable_weights))
      return gen_loss
    return tf.function(train)

## Training on MNIST

In [0]:
from tensorflow.keras.datasets import mnist

(x_train, y_train), (_, _) = mnist.load_data()
x_train = np.expand_dims(x_train.astype('float32')/255, axis=-1)

random_data = np.squeeze(x_train[np.random.randint(x_train.shape[0], size=10)])
display_samples(random_data)

#### Custom models based on DCGAN practices

In [0]:
class MnistGenerator(Generator):
  def __init__(self,):
    layers = [
          tfl.Dense(1024, input_shape=(74,)),
          tfl.BatchNormalization(axis=-1),
          tfl.Activation('relu'),
          tfl.Dense(7*7*128),
          tfl.BatchNormalization(axis=-1),
          tfl.Activation('relu'),
          tfl.Reshape((7, 7, 128)),
          tfl.Conv2DTranspose(64, 4, strides=(2, 2), padding='same'),
          tfl.BatchNormalization(axis=-1),
          tfl.Activation('relu'),
          tfl.Conv2DTranspose(1, 4, strides=(2, 2), padding='same'),
          tfl.Activation('sigmoid'),
      ]
    super(MnistGenerator, self).__init__(layers=layers)

class MnistDiscriminator(Discriminator):
  def __init__(self,):
    layers = [
          tfl.Conv2D(64, 4, strides=(2, 2), padding='same'),
          tfl.LeakyReLU(),
          tfl.Conv2D(128, 4, strides=(2, 2), padding='same'),
          tfl.BatchNormalization(axis=-1),
          tfl.LeakyReLU(),
          tfl.Reshape((7*7*128,)),
          tfl.Dense(1024),
          tfl.BatchNormalization(axis=-1),
          tfl.LeakyReLU(),
          tfl.Dense(1, activation='sigmoid')
      ]
    super(MnistDiscriminator, self).__init__(layers=layers)


In [0]:
generator = MnistGenerator()
discriminator = MnistDiscriminator()

gan = SimpleGAN(noise_dims=128,
                generator=generator,
                discriminator=discriminator,
                non_saturating=True,
                gen_optimizer=tfo.Adam(2e-4),
                disc_optimizer=tfo.Adam(2e-4))

tf.keras.backend.set_learning_phase(
  1
)
gen_losses, disc_losses = train_gan(gan, x_train,
                                    batch_size=128,
                                    epochs=20, 
                                    samples_to_display=10, 
                                    disc_steps_per_update=1, 
                                    gen_steps_per_update=1, 
                                    log_every=200, 
                                    tanh_outputs=False)

### Plot losses

In [0]:
iterations_range = np.arange(len(gen_losses))
plt.plot(iterations_range, gen_losses, label='generator loss')
plt.plot(iterations_range, disc_losses, label='discriminator loss')
plt.legend()
plt.show()

### References

made by Edoardo Cetin (edoardo.cetin@kcl.ac.uk), utilizing figures from:

*Generative Adversarial Nets (Goodfellow et al. 2014)*

*Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (Radford et al. 2016)*

*NIPS 2016 Tutorial: Generative Adversarial Networks (Goodfellow et al. 2017)*