# Generative Adversarial Networks

The premise behind _generative adversarial networks_ is that we can turn the process of learning a probability distribution into a game. The game has two players: a _generator_ and a _discriminator_. The starting point of the game is a _data distribution_ $p(x)$ that what we're trying to learn. The discriminator is a function $D$ that takes an input that may or may not be from the data distribution and decides whether or not the input is from $p(x)$. The generator is a function $G$ that takes random noise as its input and produces an output that can "fool" $D$ into classifying it as a real sample from $p(x)$. 

Both $D$ and $G$ are implemented as neural networks. The training process for these neural nets alternates between optimizing $D$ and optimizing $G$. At the end of the game, the hope is that $G$ has learned to produce such good outputs that $D$ can't distinguish them from real samples. This notebook walks through that process on a simple example.

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%pylab inline

To demonstrate how GANs work, we'll use the MNIST dataset. Our generator will learn how to generate small pictures of handwritten digits, and our discriminator will try to distinguish between real images and fakes. Tensorflow makes working with MNIST pretty trivial. The only thing worth noting in the next cell is that we use the "one hot" encoding for our labels. That means that if the label for our image is 4, then the representation of the label would be a vector of length 10 that contains zeros everywhere except in the fifth position (assuming we put 0 at the beginning of the vector).

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

We have two parameters that directly impact training time and model quality: `n_train_steps` and `batch_size`. You should play with these numbers depending on how much time you have to spend waiting for training to complete.

Reasonable numbers for `batch_size` include small powers of 2.

In [None]:
n_train_steps = 20000
batch_size = 256

### The Generator

The goal of our _generator_ network is to take random noise as its input and to produce an image

Our generator will have a very simple structure. The input will be a 100-dimensional noise vector $z$, and we'll use just two layers. As an equation, the network looks like:

$G = \sigma((\mbox{relu}(z * W_1 + b_1)) * W_2 + b_2)$,

where $\sigma(x) = \frac{1}{1 + e^{-x}}$ is applied component-wise.

In [None]:
z = tf.placeholder(tf.float32, shape=(None, 100))
g_w1 = tf.get_variable("g_w1", [100,512], initializer=tf.contrib.layers.xavier_initializer())
g_b1 = tf.get_variable("g_b1", [512], initializer=tf.constant_initializer(0.0))
g_w2 = tf.get_variable("g_w2", [512,784], initializer=tf.contrib.layers.xavier_initializer())
g_b2 = tf.get_variable("g_b2", [784], initializer=tf.constant_initializer(0.0))

g_params = [g_w1, g_b1, g_w2, g_b2]

def generator(z):
    g_y1 = tf.nn.relu(tf.matmul(z, g_w1) + g_b1)
    g = tf.matmul(g_y1, g_w2) + g_b2
    G = tf.nn.sigmoid(g)
    return G

In [None]:
def noise_prior(batch_size, dim):
    return np.random.uniform(-1.0, 1.0, size=(batch_size, dim))

### The Discriminator

The architecture of the discriminator is similar to that of the generator. The only difference is that instead of producing an entire image as output, the discriminator produces a single number that estimates the probability that the input $x$ came from the data distribution.

In [None]:
x = tf.placeholder(tf.float32, shape=(None, 784))
d_w1 = tf.get_variable("d_w1", [784,256], initializer=tf.contrib.layers.xavier_initializer())
d_b1 = tf.get_variable("d_b1", [256], initializer=tf.constant_initializer(0.0))
d_w2 = tf.get_variable("d_w2", [256,1], initializer=tf.contrib.layers.xavier_initializer())
d_b2 = tf.get_variable("d_b2", [1], initializer=tf.constant_initializer(0.0))

d_params = [d_w1, d_b1, d_w2, d_b2]

def discriminator(x):
    d_y1 = tf.nn.relu(tf.matmul(x, d_w1) + d_b1)
    d_logit = tf.matmul(d_y1, d_w2) + d_b2
    d_prob = tf.nn.sigmoid(d_logit)
    return d_prob, d_logit


In [None]:
G = generator(z)
D_real, D_logit_real = discriminator(x)
D_fake, D_logit_fake = discriminator(G)

### The Training Objectives

We have an objective for $D$ and an objective for $G$. The objective for $D$ consists of a term that rewards correctly classifying actual data samples, and a term that rewards correctly picking out the fakes generated by $G$.

The objective for $G$ is designed to reward it for correctly fooling $D$. We optimize both of these objectives using the same strategy and learning rate.

In [None]:
obj_d = -tf.reduce_mean(tf.log(D_real) + tf.log(1-D_fake))
obj_g = -tf.reduce_mean(tf.log(D_fake))

In [None]:
opt_d = tf.train.AdamOptimizer(2e-5).minimize(obj_d, var_list=d_params)
opt_g = tf.train.AdamOptimizer(2e-5).minimize(obj_g, var_list=g_params)

### The Training Process

In [None]:
sess=tf.InteractiveSession()
tf.initialize_all_variables().run()

In [None]:
k_d = 1
k_g = 2
histd, histg= np.zeros(n_train_steps), np.zeros(n_train_steps)
for i in range(n_train_steps):
    x_data, t_data = mnist.train.next_batch(batch_size)
    for j in range(k_d):
        noise = noise_prior(batch_size, 100)
        histd[i], _ = sess.run([obj_d, opt_d], {x : x_data, z : noise})
    for j in range(k_g):
        noise = noise_prior(batch_size, 100)
        histg[i], _ = sess.run([obj_g, opt_g], {z : noise})
        
    if i % (n_train_steps//10) == 0:
        print i, histd[i], histg[i]


In [None]:
plt.plot(range(n_train_steps), histd, label='obj_d')
plt.plot(range(n_train_steps), histg, label='obj_g')
plt.legend()

### Visualizing The Result

In [None]:
out_im = sess.run(G, {z : noise_prior(1, 100)})
out_im.shape = (28,28)
imshow(out_im)