# Activity 7 | Making a GAN

We're not here to teach the fundamentals of neural networks or ML, but we think GANs are a pretty neat demo. GANs (Generative Adversarial Networks) have two entirely separate networks (models) that work together/compete against each other to generate something.

Their overarching goal is to generate new data that is somewhat similar to some of the data they were trained with.

IMAGE OF GAN ARCHITECTURE HERE?
    
Basically, the **generator** generates fake images that are then used by the **discriminator** to see if they're real. Working together, they both get cleverer and cleverer, until the discriminator cannot distinguish the difference between generator-generated images, and the real thing.

## Imports

We need to import `Datasets`, so we can use the MNIST data, `Foundation` so we can use the Swift types, `TensorFlow`, so we can use the machine learning bits and pieces, and `ModelSupport`, which helps us work with existing datasets and files. We also include a file `GANSupport.swift` which is a collection of convenience methods and helpers to write/read files, and such.

In [12]:
import Datasets
import Foundation
import ModelSupport
import TensorFlow
%include "GANSupport.swift"

## Parameters

Our parameters are as follows:

* `epochCount` is how many epochs it should train for. 10 is a good number to get a reasonable GAN in this case.
* `batchSize` is the size of a batch that we're going to ask the MNIST dataset for.
* `outputFolder` defines the output folder where we'll be writing things on the file system.
* `imageHeight` and `imageWidth`, together with `imageSize` define the output imagesize that the Generator will make, as well as (naturally) the input image size the Discriminator will take.
* `latentSize` defines the latent representation size used by the Generator to generate.
* `testImageGridSize` defines the size of the grid of images that we'll generate to look at the result of the GAN.

In [13]:
let epochCount = 10
let batchSize = 32
let outputFolder = "./MNIST_GAN_Output/"
let imageHeight = 28
let imageWidth = 28
let imageSize = imageHeight * imageWidth
let latentSize = 64
let testImageGridSize = 4

# Generator Model

Our `Generator` is a `Struct` adhering to the  [`Layer` Protocol](https://www.tensorflow.org/swift/api_docs/Protocols/Layer) (which is part of Swift For TensorFlow's API). The Generator has the following layers:

* `dense1`, a `Dense` layer (a [densely-connected layer](https://www.tensorflow.org/swift/api_docs/Structs/Dense)) that takes an `inputSize` of `latentSize` (defined earlier), and an `outputSize` of `latentSize*2`. The `activation` function determines the output shape of each node in the layer. There are many available activations, but [ReLU](https://www.tensorflow.org/swift/api_docs/Functions#leakyrelu_:alpha:) is common for hidden layers.

* `dense2` is likewise, but with an `inputSize` of `latentSize*2` (taking the output of the previous layer), and an `outputSize` of `latestSize*4`.

* `dense3` is likewise, taking the previous output as input, and outputting it larger.

* `dense4` is, again, the same, but has an `outputSize` of `imageSize` instead (our final desired image size). It uses [tanh](https://www.tensorflow.org/swift/api_docs/Functions#tanh_:) as its activation, tanh (hyperbolic tangent) is sigmoidal (s-shaped) and outputs values that range from -1 to 1.

* three [`BatchNorm`]() layers, `batchnorm1`, `batchnorm2`, `batchnorm3`, that normalise the activations of the previous layer at each batch by applying transformations that maintain the mean activation close to 0 and the activation standard deviation close to 1. `featureCount` is the number of features.
    
Finally, we have our `callAsFunction()` method, which sequences through the `Dense` layers, using the `BatchNorm` layers to normalise, before finally returning the output of the fourth and final `Dense` layer.



    

In [14]:
struct Generator: Layer {
    var dense1 = Dense<Float>(
        inputSize: latentSize, outputSize: latentSize * 2,
        activation: { leakyRelu($0) })

    var dense2 = Dense<Float>(
        inputSize: latentSize * 2, outputSize: latentSize * 4,
        activation: { leakyRelu($0) })

    var dense3 = Dense<Float>(
        inputSize: latentSize * 4, outputSize: latentSize * 8,
        activation: { leakyRelu($0) })

    var dense4 = Dense<Float>(
        inputSize: latentSize * 8, outputSize: imageSize,
        activation: tanh)

    var batchnorm1 = BatchNorm<Float>(featureCount: latentSize * 2)
    var batchnorm2 = BatchNorm<Float>(featureCount: latentSize * 4)
    var batchnorm3 = BatchNorm<Float>(featureCount: latentSize * 8)

    @differentiable
    func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> {
        let x1 = batchnorm1(dense1(input))
        let x2 = batchnorm2(dense2(x1))
        let x3 = batchnorm3(dense3(x2))
        return dense4(x3)
    }
}

## Discriminator Model

Our `Discriminator` is a `Struct` adhering to the `Layer` Protocol. The `Discriminator` has the following layers:

* `dense1`, a `Dense` layer, taking an `inputSize` of `imageSize`, outputting an `outputSize` of 256. It also uses ReLU for activation.

* `dense2` and `dense3`, which take an `inputSize` and `outputSize` of 256 and 64, and 64 and 16, respectively, also using ReLU.

* `dense4`, which takes the `inputSize` of 16, and has an `outputSize` of 1, and using `identity` as the activation (just linear).

Finally, we have our `callAsFunction()` method, which just sequences the input through the four (`Dense`) layers.

In [15]:
struct Discriminator: Layer {
    var dense1 = Dense<Float>(
        inputSize: imageSize, outputSize: 256,
        activation: { leakyRelu($0) })

    var dense2 = Dense<Float>(
        inputSize: 256, outputSize: 64,
        activation: { leakyRelu($0) })

    var dense3 = Dense<Float>(
        inputSize: 64, outputSize: 16,
        activation: { leakyRelu($0) })

    var dense4 = Dense<Float>(
        inputSize: 16, outputSize: 1,
        activation: identity)

    @differentiable
    func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> {
        input.sequenced(through: dense1, dense2, dense3, dense4)
    }
}

## Loss functions

### Discriminator Loss Function

Our `discriminatorLoss()` function, which takes both the real and fake [logits](https://datascience.stackexchange.com/a/31045), and returns the `realLoss` and `fakeLoss`, via the `sigmoidCrossEntropy()` function. That's it!

In [16]:
@differentiable
func discriminatorLoss(realLogits: Tensor<Float>, fakeLogits: Tensor<Float>) -> Tensor<Float> {
    let realLoss = sigmoidCrossEntropy(
        logits: realLogits,
        labels: Tensor(ones: realLogits.shape))
    let fakeLoss = sigmoidCrossEntropy(
        logits: fakeLogits,
        labels: Tensor(zeros: fakeLogits.shape))
    return realLoss + fakeLoss
}

### Generator Loss Function

Our `generatorLoss()` function takes the fake logits, and calculates the `sigmoidCrossEntropy()`.

In [17]:
@differentiable
func generatorLoss(fakeLogits: Tensor<Float>) -> Tensor<Float> {
    sigmoidCrossEntropy(
        logits: fakeLogits,
        labels: Tensor(ones: fakeLogits.shape))
}

### Random Samples

Our `sampleVector()` function returns random stuff, that we use for both the Discriminator and Generator later on.

In [18]:
/// Returns `size` samples of noise vector.
func sampleVector(size: Int) -> Tensor<Float> {
    Tensor(randomNormal: [size, latentSize])
}

## Setting up to train

### Getting a dataset

We're going to use the "Hello, world!" of machine learning, MNIST, as our dataset. This comes from some of the helper libraries we've provided for this session (which, in turn, are largely drawn from deep in the bowels of the TensorFlow project):

In [19]:
let dataset = MNIST(batchSize: batchSize, flattening: true, normalizing: true)

Loading resource: train-images-idx3-ubyte
Loading local data at: /notebooks/TFWorld_2019_Finished_Examples/train-images-idx3-ubyte
Succesfully loaded resource: train-images-idx3-ubyte
Loading resource: train-labels-idx1-ubyte
Loading local data at: /notebooks/TFWorld_2019_Finished_Examples/train-labels-idx1-ubyte
Succesfully loaded resource: train-labels-idx1-ubyte
Loading resource: t10k-images-idx3-ubyte
Loading local data at: /notebooks/TFWorld_2019_Finished_Examples/t10k-images-idx3-ubyte
Succesfully loaded resource: t10k-images-idx3-ubyte
Loading resource: t10k-labels-idx1-ubyte
Loading local data at: /notebooks/TFWorld_2019_Finished_Examples/t10k-labels-idx1-ubyte
Succesfully loaded resource: t10k-labels-idx1-ubyte


### Creating a generator and a discriminator

In [20]:
var generator = Generator()
var discriminator = Discriminator()

### Creating optimisers for the generator and the discriminator

We need an optimization algorithm for both the models. In each case, we'll use the Adam optimisation algorithm. It's a popular choice!

In [21]:
let optG = Adam(for: generator, learningRate: 2e-4, beta1: 0.5)
let optD = Adam(for: discriminator, learningRate: 2e-4, beta1: 0.5)

### Creating a function to save a grid of images

Our `saveImageGrid()` function generates a nice grid of images to look at the output of the GAN.

In [22]:
func saveImageGrid(_ testImage: Tensor<Float>, name: String) throws {
    var gridImage = testImage.reshaped(
        to: [
            testImageGridSize, testImageGridSize,
            imageHeight, imageWidth,
        ])
    // Add padding.
    gridImage = gridImage.padded(forSizes: [(0, 0), (0, 0), (1, 1), (1, 1)], with: 1)
    // Transpose to create single image.
    gridImage = gridImage.transposed(withPermutations: [0, 2, 1, 3])
    gridImage = gridImage.reshaped(
        to: [
            (imageHeight + 2) * testImageGridSize,
            (imageWidth + 2) * testImageGridSize,
        ])
    // Convert [-1, 1] range to [0, 1] range.
    gridImage = (gridImage + 1) / 2

    try saveImage(
        gridImage, size: (gridImage.shape[0], gridImage.shape[1]), directory: outputFolder,
        name: name)
}

## Training and Inference

To train, we iterate through to our desired `epochCount`, runs training using both the Generator and the Discriminator, and then runs an inference to generate a grid of images and print out the current epoch, and the generator's loss.

In [23]:
print("Start training...")

// Start training loop.
for epoch in 1...epochCount {
    // Start training phase.
    Context.local.learningPhase = .training
    for i in 0 ..< dataset.trainingSize / batchSize {
        // Perform alternative update.
        // Update generator.
        let vec1 = sampleVector(size: batchSize)

        let 𝛁generator = generator.gradient { generator -> Tensor<Float> in
            let fakeImages = generator(vec1)
            let fakeLogits = discriminator(fakeImages)
            let loss = generatorLoss(fakeLogits: fakeLogits)
            return loss
        }
        optG.update(&generator, along: 𝛁generator)

        // Update discriminator.
        let realImages = dataset.trainingImages.minibatch(at: i, batchSize: batchSize)
        let vec2 = sampleVector(size: batchSize)
        let fakeImages = generator(vec2)

        let 𝛁discriminator = discriminator.gradient { discriminator -> Tensor<Float> in
            let realLogits = discriminator(realImages)
            let fakeLogits = discriminator(fakeImages)
            let loss = discriminatorLoss(realLogits: realLogits, fakeLogits: fakeLogits)
            return loss
        }
        optD.update(&discriminator, along: 𝛁discriminator)
    }

    // Start inference phase.
    Context.local.learningPhase = .inference
    let testImage = generator(sampleVector(size: testImageGridSize * testImageGridSize))

    do {
        try saveImageGrid(testImage, name: "epoch-\(epoch)-output")
    } catch {
        print("Could not save image grid with error: \(error)")
    }

    let lossG = generatorLoss(fakeLogits: testImage)
    print("[Epoch: \(epoch)] Loss-G: \(lossG)")
}

Start training...
[Epoch: 1] Loss-G: 1.127014
[Epoch: 2] Loss-G: 1.1459562
[Epoch: 3] Loss-G: 1.1477773
[Epoch: 4] Loss-G: 1.1610749
[Epoch: 5] Loss-G: 1.1275166
[Epoch: 6] Loss-G: 1.1709677
[Epoch: 7] Loss-G: 1.1521204
[Epoch: 8] Loss-G: 1.1473638
[Epoch: 9] Loss-G: 1.131887
[Epoch: 10] Loss-G: 1.1520165
