#### GENERATIVE ADVERSARIAL NETWORKS
## J. Steiner

#### BACKGROUND

Machine learning and nueral networks have been commonly used for supervised learning. That is, trying to make a predictive model with a training data set. Some common applications of supervised learning have been computer vision (think depositing checks with a cell phone), sentiment analysis (does an email sound happy or sad), and countless more. One application of supervised learning that had previously been possible, but not always working as well as would may be liked were generative models. Generative models are computer programs that take in a dataset, and create fabricated data for the dataset. For example, if I had a dataset of pictures of dogs, a generative model would output a picture of a dog which looked real, but did not actually exist and was not part of the dataset.

Ian Goodfellow et al proposed Generative Adversarial Networks in 2014 with the goal of using supervised learning techniques to create a generative model. His idea was to use not one neural network to create the dataset, but two. One he called the generator, the generator takes a vector of random numbers and outputs fabricated data. The second is the discriminator, which takes either real data or fabricated data, and outputs a confidence level that the data is real as opposed to fake. This accurracy level of this confidence level allows the generator to be trained with supervised learning (specifically gradient descent) as we will train our generator to to minimize the success of the discriminator, but it can only reach an absolute minimum by generating data that is indistinguishable from the real data.

#### LOADING DEPENDENCIES

The first step is to load in the library that will be used for the model implementation. This way we don't have to code neural network layers from scratch and can focus energy on re-inventing the wheel so to speak.

In [4]:
# imports pytorch dependencies that will be used for the math behind model implementation
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.utils

The first thing that we need to do is create our generator and discriminator. We'll start with the generator. As a reminder, the generator takes a vector of noise (random numbers) and outputs a vector that is the same size as our fabricated data. The dataset I'll be using to illustrate this first example is called MNIST; it consists of 28x28 pixel grayscale images of handwritten numbers and was used in the original paper as an application of the framework. We'll be using a basic feed-forward fully connected network to take an input of the size of our noise vector, and output the size of our desired fabricated data.

In [39]:
# generator class definition, being a subclass of nn.Module is required for a pytorch implementation
class Generator(nn.Module):

    # class constructor, runs automatically upon object creation
    def __init__(self, noise_dims, output_dims):

        # nois_dims   - the dimensionality of the input
        # output_dims - the dimensionality of the output

        # runs the class constructor for nn.Module, required for a pytorch implementation
        super(Generator, self).__init__()

        # we're defining the three layers of our model here. We take a noise vector and transform it
        # to the size of our ouput vector. The generator block is defined below, but is just a fully
        # connected layer with a ReLU activation function (to avoid the vanishing gradient problem) 
        self.block1 = self.generator_block(noise_dims, 256)
        self.block2 = self.generator_block(256, 512)
        self.block3 = self.generator_block(512, 1024)
        self.output = nn.Sequential( nn.Linear(1024, output_dims), nn.Tanh())

    # defines a layer of the generator, a fully connected layer with a ReLU activation
    def generator_block(self, in_dims, out_dims):

        # in_dims  - the dimensionality of the vector inputted into the 
        # out_dims - the dimensionality of the vector the layer will output

        # returns a layer with the specified dimensionality
        return nn.Sequential( nn.Linear(in_dims, out_dims),
                              nn.ReLU() )

    # a forward pass through the network
    def forward(self, x):
        
        # pushes x through the network linearly without reshaping
        # or normalization
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)

        # the final sigmoid activation is to scale the output from 0.0 to 1.0, this will be interpreted
        # as the intensity of a pixel (how white a pixel is) when the vector is converted to an image
        x = self.output(x)

        # returns the output of the network
        return x

The next thing we'll do is create our discriminator. The discriminator will take an input the size of our real/fabricated data, and output a single value. This value is the confidence that the vector inputted was real data (1.0 means that it is 100% confident that the vector was real data and 0.0 means that the discriminator is 0% confident that the vector is real data AKA 100% confident the vector is fabricated).

In [33]:
# class definition
class Discriminator(nn.Module):

    # class constructor
    def __init__(self, input_dims):
        
        # input_dims - 

        # super class constructor
        super(Discriminator, self).__init__()

        # we define three layers of our model here 
        self.block1 = self.discriminator_block(input_dims, 512)
        self.block2 = self.discriminator_block(512, 128)
        self.block3 = self.discriminator_block(128, 64)
        self.output = nn.Sequential( nn.Linear(64, 1), nn.Sigmoid() )

    # defines the layer we repeat for the discriminator
    def discriminator_block(self, in_dims, out_dims):

        # in_dims  - the dimensionality of the vector inputted into the layer
        # out_dims - the dimensionality of the vector ouputted by the layer

        # returns a fully connected layer, followed by a leaky ReLU activation
        # and a dropout (randomly 0s out 3% of the output to control for overfitting)
        return nn.Sequential( nn.Linear(in_dims, out_dims),
                              nn.LeakyReLU(0.2),
                              nn.Dropout(0.3) )

    # a forward pass through the discriminator
    def forward(self, x):

        # pushes x through the network linearly without reshaping
        # or normalization
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.output(x)

        # returns the output of the network
        return x

The code below loads in the MNIST data that will be used for the first example. MNIST is such a common dataset, that code to download it exists within the pytorch library.

In [50]:
# transforms the dataset to a torch tensor (a tensor is a multi dimensinal vector)
transform = transforms.Compose([ transforms.ToTensor(), 
                                 transforms.Normalize([ 0.5 ], [ 0.5 ]) ] )

# loads in the dataset, it is so common that it can be loaded by the pytorch library
dataset = datasets.MNIST(root = "./dataset/", train = True, transform = transform, download = True)

# batches the data and wraps it into an iterable object
loader = torch.utils.data.DataLoader(dataset, batch_size=100, shuffle=True)

Defines the trianing function. After creating objects for the generator and discriminator models, optimizers, and error function we run through the GAN psuedo code.

GAN Psuedocode:
1. generate a batch of noise and get a batch of real data
2. get the output of the noise from the generator
3. train the output of the discriminator when given real data to be 1 and when given fake data to be 0
4. train the generator to make the discriminator output 1

Also to note: every epoch I take the first 16 training examples and the output of 16 fixed noise vectors and make them into a 4x8 grid of images. This way we see the generator's progress throughout training.

In [51]:
# defines the function to train the data
def train(epochs, learning_rate, noise_dims):

    # creates object for the generator and discriminator
    gen = Generator(noise_dims, 28*28)
    dis = Discriminator(28*28)

    # creates the opitmizers for the generator and the discriminator
    gen_optim = optim.Adam(gen.parameters(), lr=learning_rate, betas=(0.5, 0.999))
    dis_optim = optim.Adam(dis.parameters(), lr=learning_rate, betas=(0.5, 0.999))

    # defines the loss function for training, this stands for binary cross entropy
    criterion = nn.BCELoss()

    # defines a fixed batch of noise that will be run through the generator at given intervals
    # this will show how the generator reacts to the same input, improving over time
    fixed_noise_batch = torch.randn(16, noise_dims)

    # loops through each epoch
    for epoch in range(epochs):

        # loops through each batch, taking the real data and discarding the label
        for batch_idx, (real_batch, _) in enumerate(iter(loader)):
            
            # flattens out the image so it can be viewed like a vector
            real_batch = real_batch.reshape(real_batch.shape[0], 28*28)

            # zeros out model gradients
            gen.zero_grad()
            dis.zero_grad()

            # creates a batch of noise with the same batch size as the real dat
            noise_batch = torch.randn(real_batch.shape[0], noise_dims)

            # a forward pass through the generator, a vector of fabricated data
            fake_data_batch = gen(noise_batch)
            # a forward pass through the discriminator, the confidence that the
            # real batch is real for the line below, and the confidence that the
            # fake data is real for the line below that
            dis_out_real = dis(real_batch)
            dis_out_fake = dis(fake_data_batch)
            

            # computes the error of the discriminator's confidence on the real data
            loss_dis_real = criterion(dis_out_real, torch.ones_like(dis_out_real))
            # computes the error of the discriminator's confidence on the fake data
            loss_dis_fake = criterion(dis_out_fake, torch.zeros_like(dis_out_real))
            # computes the total error of the discriminator
            loss_dis_total = loss_dis_real + loss_dis_fake
            
            # runs a backward pass through the discriminator, calculating model gradients
            # and retaining intermediate steps
            loss_dis_total.backward(retain_graph = True)
            # makes the discriminator learn
            dis_optim.step()

            # calculates the error and model gradients for what the generator would have to do
            # to get the discriminator to think the fake data was real
            new_out  = dis(fake_data_batch)
            loss_gen = criterion(new_out, torch.ones_like(dis_out_real))
            loss_gen.backward()
            # makes the generator learn
            gen_optim.step()

            # every 100 batches
            if batch_idx + 1 % 100 == 0:
                
                # print the progress
                print(f"Epoch [{epoch}/{epochs}] Batch [{batch_idx}/{len(loader)}]")

        # every 10 epochs
        if epoch + 1 % 10 == 0:
            # every epoch
            # without model gradients
            with torch.no_grad():

                # generates new data based on fixed noise
                # the + 1, / 2 scales it from (-1, 1) to (0, 1) just like the real
                # images
                fake = gen(fixed_noise_batch)

                real_imgs = real_batch[:16].reshape(16, 1, 28, 28)
                fake_imgs = fake[:16].reshape(16, 1, 28, 28)
                grid_imgs = torch.cat((real_imgs, fake_imgs), dim = 0)

                # creates an image grid of the outputted images, both real and fake
                img_grid = torchvision.utils.make_grid(grid_imgs, normalize=True)

                # saves the image grid
                torchvision.utils.save_image(img_grid, './img' + str(epoch) + '.png')

In [52]:
# trains the model
train(100, 0.001, 100)

Epoch [0/100] Batch [0/600]
Epoch [0/100] Batch [100/600]
Epoch [0/100] Batch [200/600]
Epoch [0/100] Batch [300/600]
Epoch [0/100] Batch [400/600]
Epoch [0/100] Batch [500/600]
Epoch [1/100] Batch [0/600]
Epoch [1/100] Batch [100/600]
Epoch [1/100] Batch [200/600]
Epoch [1/100] Batch [300/600]
Epoch [1/100] Batch [400/600]
Epoch [1/100] Batch [500/600]
Epoch [2/100] Batch [0/600]
Epoch [2/100] Batch [100/600]
Epoch [2/100] Batch [200/600]
Epoch [2/100] Batch [300/600]
Epoch [2/100] Batch [400/600]
Epoch [2/100] Batch [500/600]
Epoch [3/100] Batch [0/600]
Epoch [3/100] Batch [100/600]
Epoch [3/100] Batch [200/600]
Epoch [3/100] Batch [300/600]
Epoch [3/100] Batch [400/600]
Epoch [3/100] Batch [500/600]
Epoch [4/100] Batch [0/600]
Epoch [4/100] Batch [100/600]
Epoch [4/100] Batch [200/600]
Epoch [4/100] Batch [300/600]
Epoch [4/100] Batch [400/600]
Epoch [4/100] Batch [500/600]
Epoch [5/100] Batch [0/600]
Epoch [5/100] Batch [100/600]
Epoch [5/100] Batch [200/600]
Epoch [5/100] Batch [3

In [65]:
# generator class definition
class Generator(nn.Module):

    # generator class constructor
    def __init__(self, noise_dims):
        
        # noise_dims - the dimensionality of the noise inputted

        # runs the super class constructor
        super(Generator, self).__init__()

        # transforms a noise vector from a 100 dimension vector to a vector that can be reshaped into
        # a tensor of size (512, 4, 4) which is 512 layers of a 4x4 image
        self.linear = nn.Linear(noise_dims, 256*4*4)

        # runs through convolution transpose blocks to scale from 512 channels to 64 channels
        self.block1 = self.generator_block(256, 128)
        self.block2 = self.generator_block(128, 64)

        # runs through one more convolution transpose and scales a 64x64 3 channel (rgb) image
        self.output = nn.Sequential( nn.ConvTranspose2d(64, 3, 4, 2, 1), nn.Tanh() )

    # defines the generator block
    def generator_block(self, in_channels, out_channels):

        # a convolutional transpose layer, with a batch normalization and a ReLU activation
        return nn.Sequential( nn.ConvTranspose2d(in_channels, out_channels, 4, 2, 1),
                              nn.BatchNorm2d(out_channels),
                              nn.ReLU() )

    # a forward pass through the network
    def forward(self, x):

        # transforms a noise vector to something that can be reshaped into an image
        x = self.linear(x.view(x.shape[0], 100))

        # reshapes x into a 4x4 image with 512 channels
        x = x.reshape(x.shape[0], 256, 4, 4)

        # runs through 3 generator blocks
        x = self.block1(x)
        x = self.block2(x)

        # outputs to a 64x64 3 channel image
        x = self.output(x)

        # returns the output of the network
        return x

In [70]:
# defines the discriminator class
class Discriminator(nn.Module):

    # class constructor
    def __init__(self):

        # runs the super class constructor
        super(Discriminator, self).__init__()

        # the first convolutional layer, downscales the inputted 64x64 3 channel image
        # this is not a block because we do not want to batch normalize on the first layer
        self.conv   = nn.Sequential( nn.ConvTranspose2d(3, 64, 4, 2, 1),
                                     nn.LeakyReLU(0.2) )

        # runs through 3 discriminator blocks
        self.block1 = self.discriminator_block(64, 128)
        self.block2 = self.discriminator_block(128, 256)

        # downscales to a 1 channel image
        self.conv2  = nn.ConvTranspose2d(256, 1, 4, 2, 0)

        # outputs a single value, the confidence that the inputted image was real
        self.output = nn.Sequential( nn.Linear(34 * 34, 1), nn.Sigmoid() )

    # defines a discriminator block
    def discriminator_block(self, in_channels, out_channels):

        # consists of a convolutional layer, batch normalization, and leaky ReLU activation
        return nn.Sequential( nn.Conv2d(in_channels, out_channels, 4, 2, 1),
                              nn.BatchNorm2d(out_channels),
                              nn.LeakyReLU(0.2))

    # forward pass through the network
    def forward(self, x):

        # first convolutional layer
        x = self.conv(x)

        # 3 discriminator blocks
        x = self.block1(x)
        x = self.block2(x)

        # downscales the image with a second convolution
        x = self.conv2(x)

        # flattens out the image and outputs to a single value
        x = x.view(x.shape[0], 34*34)
        x = self.output(x)
        
        # returns the output of the network
        return x

In [71]:
# initializes the weights of the model parameters
def init_weights(model):

    # loops through each layer in the model and initializes as a normal distribution with a mean
    # of 0.0 and a standard deviation of 0.02
    for m in model.modules():
        if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d)):
            nn.init.normal_(m.weight.data, 0.0, 0.02)


# defines a transformation pipeline for the dataset we load in
transform = transforms.Compose(
        [ transforms.Resize((32, 32)),
          transforms.ToTensor(),
          transforms.Normalize([ 0.5 for _ in range(3) ], [ 0.5 for _ in range(3) ]) ] )

# loads in the CIFAR10 dataset, full color 32x32 images of animals and objects
dataset = torchvision.datasets.CIFAR10(root = "./dataset/", train = True, transform = transform, download = True)

# creates a data loader object
loader = torch.utils.data.DataLoader(dataset, batch_size=100, shuffle=True)

Files already downloaded and verified


In [72]:
# defines the function to train the data
def train(epochs, learning_rate, noise_dims):

    # creates object for the generator and discriminator
    gen = Generator(noise_dims)
    dis = Discriminator()

    # creates the opitmizers for the generator and the discriminator
    gen_optim = optim.Adam(gen.parameters(), lr=learning_rate, betas=(0.5, 0.999))
    dis_optim = optim.Adam(dis.parameters(), lr=learning_rate, betas=(0.5, 0.999))

    # defines the loss function for training, this stands for binary cross entropy
    criterion = nn.BCELoss()

    # defines a fixed batch of noise that will be run through the generator at given intervals
    # this will show how the generator reacts to the same input, improving over time
    fixed_noise_batch = torch.randn(16, noise_dims)

    # loops through each epoch
    for epoch in range(epochs):

        # loops through each batch, taking the real data and discarding the label
        for batch_idx, (real_batch, _) in enumerate(iter(loader)):

            # zeros out model gradients
            gen.zero_grad()
            dis.zero_grad()

            # creates a batch of noise with the same batch size as the real dat
            noise_batch = torch.randn(real_batch.shape[0], noise_dims)

            # a forward pass through the generator, a vector of fabricated data
            fake_data_batch = gen(noise_batch)
            # a forward pass through the discriminator, the confidence that the
            # real batch is real for the line below, and the confidence that the
            # fake data is real for the line below that
            dis_out_real = dis(real_batch)
            dis_out_fake = dis(fake_data_batch)
            

            # computes the error of the discriminator's confidence on the real data
            loss_dis_real = criterion(dis_out_real, torch.ones_like(dis_out_real))
            # computes the error of the discriminator's confidence on the fake data
            loss_dis_fake = criterion(dis_out_fake, torch.zeros_like(dis_out_real))
            # computes the total error of the discriminator
            loss_dis_total = loss_dis_real + loss_dis_fake
            
            # runs a backward pass through the discriminator, calculating model gradients
            # and retaining intermediate steps
            loss_dis_total.backward(retain_graph = True)
            # makes the discriminator learn
            dis_optim.step()

            # calculates the error and model gradients for what the generator would have to do
            # to get the discriminator to think the fake data was real
            new_out  = dis(fake_data_batch)
            loss_gen = criterion(new_out, torch.ones_like(dis_out_real))
            loss_gen.backward()
            # makes the generator learn
            gen_optim.step()

            # every 100 batches
            if batch_idx % 100 == 0:
                
                # print the progress
                print(f"Epoch [{epoch}/{epochs}] Batch [{batch_idx}/{len(loader)}]")

        # every 10 epochs
        if epoch % 10 == 0:
            # every epoch
            # without model gradients
            with torch.no_grad():

                # generates new data based on fixed noise
                # the + 1, / 2 scales it from (-1, 1) to (0, 1) just like the real
                # images
                fake = gen(fixed_noise_batch)

                grid_imgs = torch.cat((real_batch[:16], fake[:16]), dim = 0)

                # creates an image grid of the outputted images, both real and fake
                img_grid = torchvision.utils.make_grid(grid_imgs, normalize=True)

                # saves the image grid
                torchvision.utils.save_image(img_grid, './img' + str(epoch) + '.png')

I'm not going to spend the weeks that it would take to train this here.

Another interesting application that could combine conditional GANs and the DCGANs was done in a the DeepArt paper. This would take in 2 input images, 1 would be a real life photograph, the other a famous art piece, and some noise. The Generator would then output the picture, stylized according to the famous art piece. This shows some of the exciting places that conditional GANs can go.

A next step for GANs is to apply it to natural language processing. The current Gold Standard Architecture for NLP is the Transformer. Created by Google's AI lab in late 2017, the transformer improves off the recurrent neural networks which had been used previously for NLP problems. The TransGAN paper uses transformers for more image generation tasks, however it would be theoretically possible to do this for NLP GAN tasks. I've been working on something to do just that and have seen moderate success. That's a little outside the scope of this notebook because of just how involved it is. It's also something that is a little too resource intensive for me to realistic run it on my personal computer.