# Generative Adversarial Networks
So far, all the models we have worked with except the VAE have been discriminative models. This means that they are simply trying to predit something about our existing dataset. Sometimes, we would not like to discriminate but generate new examples of the data as in the case of video or image generation. Technically, this is the problem of modelling a probability distribution which we have samples of.

One approach which implicitly models the distribution, the work of Ian Goodfellow, has be enjoying great success - often producing images indistinguishable from the examples on which it was trained. GANs take a game-theoretic approach, pitting two networks against eachother - the discriminator and the generator. The job of the generator is to produce images which are indistinguishable from the training set from latent variables while the job of the discriminator is to catch out the generator and discriminate real data from generated data. Initially they will both be terrible at their jobs but as the discriminator gets better, the generator is forced to get better to fool it and vice-versa. This loop continues until they are both excellent at their jobs and the generator can now be used to produce very realistic data points.

![](images/GAN.png)

An analogy often used to describe this is the detective and the forger. The generator is like a forger who is trying to produce paintings indistinguishable from other famous paintings by an artists while the discriminator is like a detective who is trying to catch the forger out. As the detective gets better at catching the generator out, the generator is forced to improve to fool the detective.

Here's the architecture for DCGAN (Deep Convolutional GAN), which we will implement today.

![](images/dcgan_architecture.png)

## Implementation
We will be training a GAN on the fashion MNIST dataset so we will be able to produce images of items of clothing which look like they came from the original dataset.

We begin by importing the appropriate libraries.

In [1]:
import torch
from torch import nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

We load our dataset into a pytorch dataloader which we will use later to produce random batches of samples from our dataset.

In [3]:
batch_size=64

train_data = datasets.MNIST(
    train=True,
    download=True,
    root='./MNIST-data',
    transform=transforms.ToTensor()
)

train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST-data/MNIST/raw/train-images-idx3-ubyte.gz
100.1%Extracting ./MNIST-data/MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST-data/MNIST/raw/train-labels-idx1-ubyte.gz
113.5%Extracting ./MNIST-data/MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST-data/MNIST/raw/t10k-images-idx3-ubyte.gz
100.4%Extracting ./MNIST-data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST-data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST-data/MNIST/raw/t10k-labels-idx1-ubyte.gz
180.4%Extracting ./MNIST-data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST-data/MNIST/raw
Processing...
Done!


Define the NN model we will be using for our discriminator. It takes in the 28x28 image and performs convolutions followed by one fully connected layer to output the probability of the data point being real and not generated.

In [4]:
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(1, 16, 3, 2),
            nn.LeakyReLU(),
            nn.BatchNorm2d(16),
            nn.Conv2d(16, 64, 3, 2),
            nn.LeakyReLU(),
            nn.BatchNorm2d(64),
            nn.Flatten(),
            nn.Linear(2304, 256),
            nn.LeakyReLU(),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.layers(x)

d = Discriminator()

x, _ = train_data[0]
x = x.unsqueeze(0)
print(d(x))

tensor([[0.4996]], grad_fn=<SigmoidBackward>)


Define the NN model for the generator. This takes in a latent vector of size 128 and performs fully connected layers followed by upconvolution to output us a 28x28 image.

One layer that we'd like to use in a generator would be one that unflattens the input from a vector into a tensor of given dimensions. So the first thing we do in the cell below is define that custom pytorch layer. Look how easy it is to do that!

In [5]:
class Unflatten(nn.Module):
    def __init__(self, out_shape):
        super().__init__()
        self.out_shape = out_shape

    def forward(self, x):
        return x.view(-1, *self.out_shape)

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(100, 256),
            nn.LeakyReLU(),
            nn.Linear(256, 1152),
            # nn.LeakyReLU(),
            # nn.Linear(1024, 1152),
            nn.LeakyReLU(),
            Unflatten((128, 3, 3)),
            nn.ConvTranspose2d(128, 64, 3, 3),
            nn.LeakyReLU(),
            nn.ConvTranspose2d(64, 32, 3, 3),
            nn.LeakyReLU(),
            nn.ConvTranspose2d(32, 1, 2, 1),
            nn.Sigmoid(),
        )

    def forward(self, z):
        return self.layers(z)

g = Generator()
latent_vec_size = 100
ran_batch = torch.rand(batch_size, latent_vec_size)
fake = g(ran_batch)
print(fake.shape)

#%%
def show(x):
    pass

    

torch.Size([64, 1, 28, 28])


Now that we have the generator, we're almost ready to code up the training loop. However, before we do that let's make a function to randomly sample a vector of noise from the latent space and generate a sample from it.

In [7]:
def sample(writer=None, device='cpu'):
    if writer is None:
        writer = SummaryWriter(log_dir=f'runs/DCGAN/{time()}')
    z = torch.randn(batch_size, latent_vec_size).to(device)
    for img in G(z):
        writer.add_image(f'test', img)

We define the training loop. For every batch of training data that we look at, we get the generator to produce an equally sized batch of generated images. We then get the discriminator to make predictions on both sets and calulate the cost for both networks before calculating the gradients and training each one in turn.

You should run tensorboard to see the evolution of the generated images and visualise the loss curves.

In [8]:
from torch.utils.tensorboard import SummaryWriter
from time import time

toPILImg = transforms.ToPILImage()

criterion = nn.BCELoss()

device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

def train(G, D, epochs=1):
    optimiser_d = torch.optim.Adam(D.parameters(), lr=0.00001)
    optimiser_g = torch.optim.Adam(G.parameters(), lr=0.0001)
    writer = SummaryWriter(log_dir=f'runs/DCGAN/{time()}')
    G = G.to(device)
    D = D.to(device)
    batch_idx = 0
    
    for epoch in range(epochs):
        for idx, (x, _) in enumerate(train_loader):
            x = x.to(device)
            z = torch.randn(batch_size, latent_vec_size)
            z = z.to(device)

            # GENERATOR UPDATE
            optimiser_g.zero_grad()
            labels = torch.ones(batch_size).to(device)
            G_loss = criterion(D(G(z)), labels)
            # G_loss = - torch.log(1 - D(G(z)))
            # G_loss = torch.mean(G_loss)
            G_loss.backward(retain_graph=True)
            optimiser_g.step()

            # DISCRIMINATOR UPDATE
            optimiser_d.zero_grad()
            labels = torch.zeros(batch_size).to(device)
            D_loss = criterion(D(G(z)), labels) # loss on fake examples
            # D_loss = - (torch.log(D(x)) + torch.log(1 - D(G(z))))
            # D_loss = torch.mean(D_loss)
            D_loss.backward()
            labels = torch.ones(x.shape[0]).to(device)
            D_loss = criterion(D(x), labels)
            D_loss.backward()
            optimiser_d.step()
            
            writer.add_scalar('Loss/G', G_loss.item(), batch_idx)
            writer.add_scalar('Loss/D', D_loss.item(), batch_idx)
            batch_idx += 1
            print(
                'Epoch:', epoch ,
                'Batch:', idx,
                'Loss G:', G_loss.item(),
                'Loss D:', D_loss.item()
            )
            if idx % 100 == 0:
                print('sampling')
                sample(writer, device)

G = Generator()
D = Discriminator()
train(G, D, epochs=10)

89855957
Epoch: 0 Batch: 128 Loss G: 0.2680076062679291 Loss D: 0.4070155918598175
Epoch: 0 Batch: 129 Loss G: 0.26785269379615784 Loss D: 0.3963998854160309
Epoch: 0 Batch: 130 Loss G: 0.2697426676750183 Loss D: 0.40959662199020386
Epoch: 0 Batch: 131 Loss G: 0.2649030387401581 Loss D: 0.4023708701133728
Epoch: 0 Batch: 132 Loss G: 0.26895198225975037 Loss D: 0.4059301018714905
Epoch: 0 Batch: 133 Loss G: 0.2685399651527405 Loss D: 0.38539648056030273
Epoch: 0 Batch: 134 Loss G: 0.26647239923477173 Loss D: 0.4242362976074219
Epoch: 0 Batch: 135 Loss G: 0.2660166025161743 Loss D: 0.392152339220047
Epoch: 0 Batch: 136 Loss G: 0.26944154500961304 Loss D: 0.3905590772628784
Epoch: 0 Batch: 137 Loss G: 0.26714906096458435 Loss D: 0.42616626620292664
Epoch: 0 Batch: 138 Loss G: 0.26920539140701294 Loss D: 0.4115142822265625
Epoch: 0 Batch: 139 Loss G: 0.2729398012161255 Loss D: 0.3930966854095459
Epoch: 0 Batch: 140 Loss G: 0.2719000577926636 Loss D: 0.4031646251678467
Epoch: 0 Batch: 141 L

KeyboardInterrupt: 

## GANs are hard to train!

Well that's a first version done

We can see that it doesn't produce perfect results, even after many epochs.

Let's try and implement these ["GAN hacks"](https://github.com/soumith/ganhacks) which have been shown to stabilise the training.