# Conditional GAN
---
*Responsible:* Robert Currie (<rob.currie@ed.ac.uk>)

## What is the target of this workshop?

### cGAN (paper - https://arxiv.org/abs/1411.1784)

The goal of this workshop is to construct and train a cGAN model from scratch using the mnist dataset.

This will be making use of the TensorFlow functional API to build a Generator and Discriminator model in separate and them combining these to train a complete cGAN model.

### FID (Fréchet Inception Distance)

Using the Inception v3 model from Google (paper - http://arxiv.org/abs/1512.00567) it's possible to calculate the FID metric from our trained model.

This gives us an ability to score the behaviour of our model compared to our real or training dataset.

$ FID = d^2 = \|\mu_1 - \mu_2\|^2 + Tr\left(\sigma_1 + \sigma_2 - 2\sqrt{ \sigma_1 \cdot \sigma_2 }\right) $


## Mark Schema

As with previous ML notebooks the sections marked **##FINISH ME##** are to be completed by you.

Marks for the different parts are shown below.

* Sections are intended to be tackled in order, i.e. 1->9
* In this notebook different sections can be tackled independently
* There are bonus problems at the end to tackle but the maximum mark is 10/10

| <p align='left'> Title                         | <p align='left'> Parts | <p align='left'> Number of marks |
| ------------------------------------- | ----- | --- |
| <p align='left'> 1. Load the dataset and normalise to $ \left[-1, 1\right] $ | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 2. Complete the Generator class | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 3. Use the Generator to generate some 'pseudo-numbers' with the correct shape | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 4. Complete the Discriminator class and use the Discriminator to decide if an image is real or fake | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 5. Complete the training methods for the cGAN class | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 6. Why are 2 separate optimizers needed for this cGAN model? | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 7. Write a method to generate&plot 10x10 images with numbers 0-9 left to right | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 8. Train the cGAN for 20 epochs | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 9. Complete the code required to calculate the FID | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 10. Calculate the FID for your trained model | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> (Optional) 11. Plot how the FID for models varies vs training epochs | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> **Total** | | <p align='left'> max **10** |



# First Imports and Fixing Reproducibility

TF on a GPU will use different algorithms behind the scenes then with a CPU.

Some of these algorithms will make some assumptions/sacrifices which will give up among other things exact numerical reproducibility in favour of speed/performance.

In addition to this, we should always try to fix the seeds for the AI/ML framework and NumPy

In [None]:
import os
import random
#os.environ["CUDA_VISIBLE_DEVICES"] = '-1'
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
_FIXED_SEED=5432
os.environ["PYTHONHASHSEED"]=str(_FIXED_SEED)
random.seed(_FIXED_SEED)

import numpy as np
np.random.seed(_FIXED_SEED)

import tensorflow as tf
tf.random.set_seed(_FIXED_SEED)
tf.keras.utils.set_random_seed(_FIXED_SEED)  # sets seeds for base-python, numpy and tf
tf.config.experimental.enable_op_determinism()

# Importing the rest of the tools needed for the workshop

This workshop will focus on using the MNIST dataset again.

One of the reasons for this is that this datasdet has good ojects and features which means that we can recover the different species with relatively short training on a CPU.
If you have access to a GPU you can try running the same algorithms on more complex datasets to see the output. One of these is the fashion_mnist dataset.

In [None]:
# Used for building/training a cGAN
from numpy.random import randint
from matplotlib import pyplot
import matplotlib.pyplot as plt

from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras import layers as L
from tensorflow.keras.losses import BinaryCrossentropy

#from tensorflow.keras.datasets.fashion_mnist import load_data
from tensorflow.keras.datasets.mnist import load_data

# Used for calculating the FID of the model

from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
from scipy.linalg import sqrtm

# Global constants

In [None]:
# Target size
height_px, width_px, n_channels = (28, 28, 1)
BATCH_SIZE = 128
N_CLASSES = 10

SAVE_RESULT = "results"

LATENT_DIM = 100

# Set the number of epochs for trainining.
epochs = 20

# Load data

In [None]:
# Load train data
(x_train, y_train), (x_test, y_test) = load_data()

## Normalise the dataset to have values between $\left[-1, 1\right]$

We also want to build a TensorFlow dataset using `tf.data.Dataset.from_tensor_slices`

It's perhaps more of a convention that when traiing a GAN we normalise the dataset to be between [-1, 1] rather than [0,1]. This should have little impact in the final result other than changing your Generators final Activation layer.

In [None]:

## FINISH_ME ##


# Define models

## Generator model

The GeneratorClass is where we want to collect everything together for the Generation of new pseudo-number images.

The intention is that we be able to call an instance of this class to generate a new value such as:

```
generator_instance = GeneratorClass(...)
output_image = generator_instance(noise=noise, label=label)
```

In [None]:
class GeneratorClass(tf.Module):
    def __init__(self, out_dim, n_classes=10, h_low=7, w_low=7):
        tf.Module.__init__(self)

        # 7x7 chosen to give us a 28x28 generated output
        n_nodes = h_low * w_low

        # Labels inputs
        # We need a sequential set of layers to take our label data and 'Embedding' this with a Dense output.
        # The output of this layer set has to output with the format of: h_low x w_low x 1

        self.label_embedding = ## FINISH_ME ##

        # Noise inputs
        # We need a sequential set of layers to go from the latent-space dim to be able to up-scale an image
        # The output of this layer set has to output with teh format of: h_low x w_low x 64

        self.noise_sampler = ## FINISH_ME ##


        # Model layers
        # Building on the CNN model in the last workshop we need to build a sequential model to upscale using:
        # Conv2DTranspose 4x4 stride (2,2)
        # Conv2DTranspose 4x4 stride (2,2)
        # Conv2D 4x4 stride (1,1)
        # This model should take as input a 7x7 model and output a 28x28 

        self.upscale_model = ## FINISH_ME ##

        # merge layer for optim
        self.merge = L.Concatenate()

    @tf.function
    def __call__(self, labels, noise, training=False):

        # Take labels and add dense layer to connect them to noise images
        labels = self.label_embedding(labels, training=training)
        # Take noise 'seed' and to input images
        noise = self.noise_sampler(noise, training=training)

        # Merge the noise and labels
        x = self.merge([noise, labels])

        # Generate an output image from 'noise' seed using input label
        output_img = self.upscale_model(x, training=training)

        return output_img

## Now Construct a Generator instance and use it to 'generate' an image with a given label

We're going to use the g_model later, but we should check that it works as expected before we move on.

In [None]:
# Construct a Generator instance
g_model = GeneratorClass(n_channels, n_classes=N_CLASSES)
# This line is needed to help define our generator for later
call = g_model.__call__.get_concrete_function(tf.TensorSpec((1, 1), tf.int32, name='label'), tf.TensorSpec((1, LATENT_DIM), tf.float32, name='noise'))


## FINISH_ME ##
# Generate a random sample in the latent space with a random label and use this to seed the generator model

## Discriminator model

This is our discriminator model which gives us an output of how 'real' or 'fake' an image as a single score between [0,1].

As with the generator we intend to be able to call an instance of this class to perform an evaluation on an input for us.

In [None]:
class DiscriminatorClass(tf.Module):
    def __init__(self, in_shape, out_dim, n_classes=10):
        tf.Module.__init__(self)

        n_nodes = in_shape[0]*in_shape[1]*in_shape[2]

        # Labels inputs
        # Build a short sequential model which embeds the the label data into a layer with the same dim as the input
        self.label_embedding = ## FINISH_ME ##

        # Model layers 
        # Building on the CNN model in the last workshop we need to build a sequential model to downscale the input image into the latent-space
        self.downsample_model = ## FINISH_ME ##

        # merge layer for optim
        self.merge = L.Concatenate()


    @tf.function
    def __call__(self, labels, images, training=False):
        
        # Take labels and add dense layer to connect them to noise images
        labels = self.label_embedding(labels, training=training)

        # Merge labels and images
        x = self.merge([images, labels])
        
        # Discriminate input and produce a vector representation of the input in latent-space
        latent_rep = self.downsample_model(x, training=training)

        return latent_rep

## Construct a Discriminator instance and use it to make a decision if an image is real ('1') or fake ('0')

We want to test the functionality again here, we haven't trained the model so don't expect the output to be correct.

In [None]:
d_model = DiscriminatorClass(in_shape=(height_px, width_px, n_channels), out_dim=1, n_classes=N_CLASSES)

## FINISH_ME ##
# Test the discriminator model and see if it

# Define whole cGAN class which controls the training process

The intention of the GAN model is that it captures everything to do with the training of the GAN within the train_step method.


In [None]:
class CGAN_Class(tf.Module):

    def __init__(self, discriminator, generator, latent_dim):
        tf.Module.__init__(self)
        self.discriminator = discriminator
        self.generator = generator
        self.latent_dim = latent_dim
        # Instantiate the optimizer for both networks
        # (learning_rate=0.0002, beta_1=0.5 are recommended)
        self.d_optimizer = ## FINISH_ME ##
        self.g_optimizer = ## FINISH_ME ##
        self.bce = BinaryCrossentropy()


    def train_step(self, real_images, real_labels):

        # Get the batch size
        batch_size = tf.shape(real_images)[0]


        ## First train the Discriminator
        #################################

        # Get the latent vector
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))

        with tf.GradientTape() as tape:
            d_loss = self.discriminator_train_step(real_images, real_labels, random_latent_vectors)

        # Get the gradients w.r.t the discriminator loss
        d_gradient = tape.gradient(d_loss, self.discriminator.trainable_variables)
        # Update the weights of the discriminator using the discriminator optimizer
        self.d_optimizer.apply_gradients(zip(d_gradient, self.discriminator.trainable_variables))


        ## Now train the Generator
        #################################

        # Get the latent vector
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))
        random_labels = tf.random.uniform([batch_size], minval=0, maxval=N_CLASSES, dtype=tf.int32)

        # Train the generator
        # Get the latent vector
        with tf.GradientTape() as tape:
            g_loss = self.generator_train_step(random_latent_vectors, random_labels)

        # Get the gradients w.r.t the generator loss
        gen_gradient = tape.gradient(g_loss, self.generator.trainable_variables)
        # Update the weights of the generator using the generator optimizer
        self.g_optimizer.apply_gradients(zip(gen_gradient, self.generator.trainable_variables))   

        
        ## Return the Discriminator and the Generator loss
        return {"d_loss": d_loss, "g_loss": g_loss}


    def discriminator_train_step(self, real_images, real_labels, random_latent_vectors):

        # Generate fake images from the latent vector
        fake_images = self.generator(labels=real_labels, noise=random_latent_vectors, training=False)
        # Get the logits for the fake images
        fake_logits = self.discriminator(labels=real_labels, images=fake_images, training=True)
        # Get the logits for the real images
        real_logits = self.discriminator(labels=real_labels, images=real_images, training=True)


        ## Calculate the BCE for the Real and Fake images
        ## Hint: `tf.ones_like` and `tf.zeros_like` gives us a Tensor with dimensions similar to input to use in BCE
        real_loss = self.bce( ## FINISH_ME ## )
        fake_loss = self.bce( ## FINISH_ME ## )
        return (fake_loss + real_loss) * 0.5


    def generator_train_step(self, random_latent_vectors, random_labels):
        # Train the generator

        # Generate fake images using the generator
        generated_images = self.generator(labels=random_labels, noise=random_latent_vectors, training=True)
        # Get the discriminator logits for fake images
        gen_img_logits = self.discriminator(labels=random_labels, images=generated_images, training=False)

        ## Calculate the generator loss
        ## Hint: we're training the generator to 'trick' the Discriminator
        g_loss = self.bce( ## FINISH_ME ## )
        
        return g_loss


In [None]:
# Instantiate model.
cgan_model = CGAN_Class(discriminator=d_model, generator=g_model, latent_dim=LATENT_DIM)

## Can you say why this cGAN model is using 2 separate optimizers for training?

`## FINISH_ME ##`

# Build a method to generate and plot 10x10 images going 0-9, left-right

This method should take a model and use the Generator to generate 100 images with 10 zeros, ones, ... left to right in columns

In [None]:
def plot_numbers(model, latent_dim):

    ## FINISH_ME ##


# Run training

We now want to train our cGAN.

We want to report on the progress of our training every n steps and every epoch to make sure that the training is converging correctly.

Training the cGAN on CPLab machines should take ~1hr for 20 steps. You should expect to see number-like output from the model after 5-6 epochs and if you're stuck for time you can consider just training for 10 epochs.

In [None]:
gen_loss, disc_loss = [], []

for ep in range(epochs):
    
    # Shuffle and batch the dataset
    _dataset = train_dataset.shuffle(60000)
    # Create an iterable over the dataset
    train_iter = iter(_dataset)
    for i, (real_images, real_labels) in enumerate(train_iter):

        data_losses = cgan_model.train_step(real_images=real_images, real_labels=real_labels)

        plot_numbers( ## FINISH_ME ## )

        ## FINISH_ME ##

        # Report on the progress of the training to make sure the training doesn't go wrong
        # At the start of each epoch, generate 10x10 new images and display them to see if the trained generator is improving


# Plot loss graphics

We want to see the loss functions of Generator and the Discriminator as the models are trained against each other.

In [None]:
def get_smoothed_values(data_list, decay=0.1):

    ## FINISH_ME ##
    # This is technically an optional step, but plotting an averaged loss function is less intensive than throwing all of the raw values into pyplot
    
    return final_values_list

In [None]:
plt.figure(figsize=(8,8))
plt.plot(get_smoothed_values(gen_loss, decay=0.2))
plt.plot(get_smoothed_values(disc_loss, decay=0.2))
#plt.plot(gen_loss, decay=0.2)
#plt.plot(disc_loss, decay=0.2)

plt.legend(['gen loss', 'disc loss'])
plt.show()

# Save final model

In [None]:
# Needed to allow us to save our model correctly and be able to call it after we load it from disk
call = g_model.__call__.get_concrete_function(tf.TensorSpec((1, 1), tf.int32, name='label'), tf.TensorSpec((1, LATENT_DIM), tf.float32, name='noise'))

In [None]:
tf.saved_model.save(g_model, 'final_model_{}'.format(epochs), signatures=call)

### Zip model in order to download it

In [None]:
!zip -r 'final_model.zip' 'final_model'

In order to download final model - click link below.

<h1><a href="final_model.zip"> Download trained generator </a></h1>

# Test saved model 

In [None]:
model = tf.saved_model.load('final_model_{}'.format(epochs))

In [None]:
random_latent_vectors = np.random.normal(size=(100, LATENT_DIM))
random_labels = np.asarray([min(x, N_CLASSES-1)  for _ in range(10) for x in range(10)])
#generated_images = model(label_i=random_labels, noise_i=random_latent_vectors)
generated_images = model(random_labels, random_latent_vectors)
# scale from [-1,1] to [0,1]
generated_images = ((generated_images + 1) / 2.0).numpy()

fig, axes = plt.subplots(nrows=3, ncols=6, figsize=(15,10), subplot_kw={'xticks':[], 'yticks':[]})
indx = 0
for i, ax in enumerate(axes.flat):
    img = generated_images[i]#.reshape(height_px, width_px)
    ax.imshow(img, cmap='gray')
    label_leg = random_labels[i]
    ax.set_title(label_leg)
    
plt.show()

# Calculate FID metric

This is calculating the FID distance between 2 distributions as defined at the top of the notebook.

This uses the InceptionV3 model from Google to calculate 'activations' based on the presented images. We then effevctively want to compare the distribution of 'real' and 'fake' images in this activation-space.

If the Generator is perfectly the same as the training data this distance would be zero as the generator would be generating new numbers. Hence the smaller the distance between the 2 distributions of images the closer the generator generatig new instances from the input datadet. Large numbers here suggest the generator may be generating new datapoints which may not be realistic or physical.

In [None]:
N_DATA = 60000
N_BATCH = 30

#x_test, y_test = shuffle(x_test, y_test)

generated_imgs_list = []
for label in y_test[:N_DATA]:

    ## GENERATE N_DATA 'pseudo-datapoints' from the same labels as the real _training_ data ##
    

In [None]:
# Test plot
plt.imshow(generated_imgs_list[400][0], cmap='gray'), y_test[400]

In [None]:
# Meet inception model minimum size and channels, 75x75x3
inception_model_input = (height_px*3, width_px*3, n_channels*3)

In [None]:
# scale an array of images to a new size
def scale_images(images, new_shape=inception_model_input):
    images_list = list()

    ## FINISH_ME ##

    ## IMAGES FROM OUR GENERATOR NEED TO BE SCALED TO A SIZE WE CAN USE THEM WITH InceptionV3 ##
    ## HINT: tf.image.resize exists for such problems ##

    return np.asarray(images_list)

In [None]:
def take_prediction(model, images1, images2):
    # calculate activations
    act1 = model.predict(images1)
    act2 = model.predict(images2)
    return act1, act2

In [None]:
def collect_prediction(model, images1, images2, bs=N_BATCH):
    """
    Collect prediction on `images1`/`imges2` datasets with `model` 
    using batch size equal to `bs`
    """
    final_act1_l = []
    final_act2_l = []
    
    batch_i1 = []
    batch_i2 = []

    ## FINISH_ME ##

    ## We need to:
    ## 1) iterate through all of out input and generated data
    ## 2) Scale all images to the correct size so we can use this model
    ## 3) Convert the images from B&W to RGB using `tf.image.grayscale_to_rgb` or equivalent (again just for compatability with the model)
    ## 4) Calculate and store the predictions for the real and fake data


    final_act1_l = np.concatenate(final_act1_l, axis=0)
    final_act2_l = np.concatenate(final_act2_l, axis=0)
    return final_act1_l, final_act2_l

In [None]:
# calculate frechet inception distance
def calculate_fid(model, images1, images2, bs=N_BATCH):

    # Calculate activations for real and fake data
    # The returned list/vector of activations should be the same length
    act1, act2 = collect_prediction(model, images1, images2, bs)

    # calculate mean and covariance-matrix over all analyzed images in 'activation-space'
    # You should have a 1D tensor of vectors in the activation-space in numpy
    # With that in mind you should be able to use the numpy built-in methods to extract the means and varience
    mu1, sigma1 = ## FINISH_ME ##
    mu2, sigma2 = ## FINISH_ME ##

    # difference between means ^2
    mu_diff = np.sum((mu1 - mu2)**2.0)
    # SQRT of dot-product of sigmas
    covmean = sqrtm(sigma1.dot(sigma2))
    # Only take real values
    if np.iscomplexobj(covmean):
        covmean = covmean.real

    # calculate distance
    fid = mu_diff + np.trace(sigma1 + sigma2 - 2.0 * covmean)

    return fid

In [None]:
# Prepare the inception v3 model
classifier_model = InceptionV3(include_top=False, pooling='avg', input_shape=inception_model_input)
# Take parts of datasets
images1 = generated_imgs_list[:N_DATA]
images2 = x_test[:N_DATA]

In [None]:
# Calculate fid between images1 and images2
fid = calculate_fid(classifier_model, images1, images2, BATCH_SIZE)

In [None]:
print('FID (different): %.3f' % fid)

## Show how FID changes as models improve

I've provided a few examples of the cGAN model which have been trained by a GPU.

As a short 'free-form' exercise, load each of these models and use them to calculate how the FID varies for a more trained model.