# Generative Adversarial Networks

This is an implementation of a [Generative Adversarial Network (GAN)](https://arxiv.org/abs/1406.2661). The networks used here are based on the [Deep Convolutional GAN](https://arxiv.org/abs/1511.06434) architecture.

We apply GANs here to image generation. In order to train a GAN for this, we need a starting set of images which we will call the 'original images.' This is to distinguish it from the 'generated images.'

## Generative Adversarial Network
### Basic Idea
Roughly speaking, the goal of a GAN is to generate images similar to the original images. To do this, a GAN uses two competing networks: a generator and a discriminator. The generator is trained to produce images that resemble our original images. The discriminator is trained to tell the generated images apart from the original images. The process of training the two networks is typically done in the following process:
1. We pass generated images along with original images to the discriminator.
1. The discriminator assigns a probability that the image is generated or real. If the discriminator assigns a score of $0$, then the image is considered generated. If the discriminator assigns a score of $1$, then the image is considered original. Scores that are between these two reflect uncertainty about the source.
1. We score the *discriminator* based on how accurate the probabilities assigned were. We update the discriminator to improve it's ability to assign the correct probability.

We then train the generator on a competing goal:
1. We pass generated images to the discriminator.
1. The discriminator assigns a probability that the image is generated or real.
1. We score the *generator* based on how convincing the generated images were. We update the generator so that generated images are more likely to be labeled as original by the discriminator.

### Score Function
Mathematically, if $D$ is the discriminator and $G$ is the generator, then we can characterize this as a two-player minimax game (i.e. a game where we are trying to minimize the maximum expected loss) with the expected value function

$$\mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)]+\mathbb{E}_{z\sim p_{d}}[\log (1 - D(G(z)))].$$
In this equation, the first term scores the original images, while the second term scores the generated images. (Quick math breakdown: $\mathbb{E}$ means 'expected value,' or what we expect to get based on the distributions. $x\sim p_{\text{data}}$ is shorthand for 'x drawn from the distribution of data,' which can be thought of 'x is an original image.' Similarly, $z\sim p_{d}$ is shorthand for 'z drawn from random noise,' with $G(z)$ being a generated image from random noise.)

When we update the discriminator, we update with the intent of increasing the value function. This is accomplished by updating $D$ so that $D(x)$ is closer to $1$ (which increases the first term), and $D(G(z))$ is closer to $0$ (which increases the second term). That is, improving the score of the discriminator on this value function improves the ability of the discriminator to tell apart generated and original images.

In contrast, we update $G$ with the intent of lowering the value function. This is why these are adversarial networks: they have competing goals. As the first term does not depend on $G$, the updates only effect the second term. In this situation, $G$ is updated so that $D(G(z))$ is pushed closer to $1$.

It's worth noting that there is a critical issue with this function. In particular, if the discriminator is able to tell apart images with a high degree of accuracy early in training, then $\log(1-D(G(z)))$ will not be able to provide sufficient gradients for training $G$. The reason is due to the saturation, as $\log(1-D(G(z))$ has vanishing gradients if $D(G(z))$ is very close to 1. Due to this, it is suggested to use a different function for updating the generator.

### Alternate Generator Score Function
To alleviate the problem of vanishing gradients, it is recommended to update $G$ using the value function

$$\mathbb{E}_{z\sim p_{d}}[\log D(G(z))].$$

By minimizing this score function, we maximize the second term of the original score function. Moreover, the gradients are suitable for learning even when the generator is low quality.

In [1]:
import torch as t
import torch.nn as nn

In [None]:
channel_factor = 16
discriminator = nn.Sequential(
    nn.Conv2d(3, channel_factor, 4, 2, bias=False),
    nn.BatchNorm2d(channel_factor),
    nn.LeakyReLU(0.2),
    nn.Conv2d(channel_factor, 2*channel_factor, 4, 2, bias=False),
    nn.BatchNorm2d(2*channel_factor),
    nn.LeakyReLU(0.2),
    nn.Conv2d(2*channel_factor, 4*channel_factor, 4, 2, bias=False),
    nn.BatchNorm2d(4*channel_factor),
    nn.LeakyReLU(0.2),
    nn.Conv2d(4*channel_factor, 1, 4, 2, bias=False),
    nn.Sigmoid()
)

generator = nn.Sequential(
    nn.ConvTranspose2d(z_size, 4*channel_factor, 4, 2, bias=False),
    nn.BatchNorm2d(4*channel_factor),
    nn.ReLU(),
    nn.ConvTranpose2d(4*channel_factor, 2*channel_factor, 4, 2, bias=False),
    nn.BatchNorm2d(2*channel_factor),
    nn.ReLU(),
    nn.ConvTranpose2d(2*channel_factor, channel_factor, 4, 2, bias=False),
    nn.BatchNorm2d(channel_factor),
    nn.ReLU(),
    nn.ConvTranpose2d(channel_factor, 3, 4, 2, bias=False),
    nn.Tanh()
)

In [None]:
import torch.utils.data as data
import torchvision.datasets as datasets
from torchvision.transforms import Compose, Normalize, ToTensor
from torchvision.utils import make_grid, save_image

transforms = Compose([ToTensor(), Normalize([0.5], [0.5])])
training_set = datasets.CIFAR10(root='./data/', train=True, download=True, transform=transforms)
batch_size = 100
training_loader = data.DataLoader(dataset=training_set, batch_size=batch_size, shuffle=True)