# Generative Adversial Nets 정리

## GAN Paper를 정리하고 구현하였습니다.

GAN은 Adversarial process를 통해서 generative model을 얻어냅니다.
GAN은 두 모델을 동시에 학습시킵니다. 하나는 Generator이고 다른 하나는 Discriminator입니다.
- Generator는 Generator를 통해 생성된 데이터가 Discriminator에 의해 검출되지 않도록 학습하며
- Discriminator는 Real 데이터는 Real로, Generated 데이터로 잘 판별하도록 학습합니다.

이 두개를 함께 포함하는 식으로 나타내보면 다음과 같습니다.

$$\min_{G}\max_{D}V(D,G)=\mathbb{E}_{\textbf{x}\sim p_{data}(\textbf{x})}[\log{(D(\textbf{x}))}]+\mathbb{E}_{\textbf{z}\sim p_{z}(\textbf{z})}[\log{(1−D(G(\textbf{z}))}] \cdots (1)$$

$G$는 random noise variable인 $p_{z}(\textbf{z})$를 입력으로 받아들이고, 그 결과로 데이터 $x$를 생성하는 **미분가능한 함수**입니다.

$$G(z; \theta_{g})$$

$D$는 $x$를 입력으로 받아서 [0,1]의 확률값을 내놓는 **미분가능한 함수**입니다.

$$D(x; \theta_{d})$$

이 두 함수(모델)은 다음과 같은 목적함수가 최대가 되도록 각각 $\theta_{g}$와 $\theta_{d}$를 변경시킵니다.

$$G \rightarrow \min_{G}\mathbb{E}_{\textbf{z}\sim p_{z}(\textbf{z})}[\log{(1−D(G(\textbf{z}))}] \cdots (2)$$

$$D \rightarrow \max_{D}\mathbb{E}_{\textbf{x}\sim p_{data}(\textbf{x})}[\log{D(\textbf{x})}]+\mathbb{E}_{\textbf{z}\sim p_{z}(\textbf{z})}[\log{(1−D(G(\textbf{z}))}] \cdots (3)$$

$G$ 함수는 사실 변수 $z$로 부터 얻어낸 함수값 $G(z)$의 분포를 만들어냅니다. 이 분포를 $p_{g}$라고 합시다.

![GAN_DIAGRAM](gan_image.png)

직관적으로 $G$가 만들어내는 분포 $p_{g}$가 $p_{data}$와 동일해지면 좋은 generator가 될 것 같습니다.

이를 식으로 증명해봅시다.

### $p_{g} = p_{data}$ 일때 Global Optimal임을 증명해봅시다.

#### Proposition 1. $G$가 고정일 때 Optimal $D$는
#### $$D_{G}^{*}(x) = \frac{p_{data}(x)}{p_{data}(x) + p_{g}(x)}$$

식 $(2)$에서, $z\sim p_{z}(\textbf{z})$는 $x\sim p_{g}(\textbf{x})$로, $g(z)$는 $x$로 생각할 수 있습니다.

따라서 이를 바꿔서 다시 적어보면

$$\mathbb{E}_{\textbf{x}\sim p_{g}(\textbf{x})}[\log{(1-D(\textbf{x}))}]$$

따라서,

$$V(D,G)=\int_{x} \!\! p_{data}(x)\log{(D(x))}+p_{g}(x)\log{(1-D(x))}\, dx$$

입니다. 이를 극대화하는 값은 간단히 구할 수 있습니다.

즉 어느 Generator든 무관하게, Discriminator는

$$D_{G}^{*}(x) = \frac{p_{data}(x)}{p_{data}(x) + p_{g}(x)}$$

가 최적해입니다.

#### Theorem 1. $V(D_{G}^{*},G)$는 $p_{g}=p_{data}$ 일때 Global Minimal이고 그 역도 성립한다. 이때 Global Minimal은 $-\log4$이다.

즉 $V(D_{G}^{*}, G)$의 Global Minimal은 $p_{g}=p_{data}$일때 $-\log4$로 유일합니다.

일단, $p_{g}=p_{data}$ 일때 $D_{g}^{*}=\frac{1}{2}$이므로

$$V(D_{g}^{*},G)|_{p_{g}=p_{data}}=\int_{x} \!\! p_{data}(x)\log\frac{1}{2}+p_{g}(x)\log\frac{1}{2}\, dx=-\log 4$$

이제 이 값이 유일한 Global Minimal 임을 증명합시다.

$$V(D,G)=\int_{x} \!\! p_{data}(x)\log(D(x))+p_{g}(x)\log (1-D(x))\, dx$$

$$=-\log 4 + \log 4 + \int_{x} \!\! p_{data}(x)\log(D(x))+p_{g}(x)\log (1-D(x))\, dx$$

$$=-\log 4 + \int_{x} \!\! p_{data}(x)\log{\frac{2p_{data}(x)}{p_{data}(x) + p_{g}(x)}}+p_{g}(x)\log{\frac{2p_{g}(x)}{p_{data}(x) + p_{g}(x)}}\, dx$$

$$=-\log{4} + KL(p_{data}\|\frac{p_{data}+p_{g}}{2}) + KL(p_{g}\|\frac{p_{data}+p_{g}}{2})$$

$$=-\log{4} + 2\cdot JSD(p_{data}\|p_{g})$$

$$D_{KL}(P\|Q)=\int_{-\inf}^{\inf} \!\! p(x)\log{\frac{p(x)}{q(x)}}\, dx$$

$$D_{JS}(P\|Q)=\frac{1}{2}D_{KL}(P\|M)+\frac{1}{2}D_{KL}(Q\|M),\quad M=\frac{P+Q}{2}$$

$D_{JS}$는 nonnegative인 값을 가지며, $D_{JS}=0$은 $P=Q$일때 성립함이 알려져 있습니다.

![PROBABILITY](probability_image.png)

In [None]:
"""

"""
import os
import matplotlib.pyplot as plt
import itertools
import pickle
import imageio
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# G(z)
class generator(nn.Module):
    # initializers
    def __init__(self, input_size=32, n_class = 10):
        super(generator, self).__init__()
        self.fc1 = nn.Linear(input_size, 256)
        self.fc2 = nn.Linear(self.fc1.out_features, 512)
        self.fc3 = nn.Linear(self.fc2.out_features, 1024)
        self.fc4 = nn.Linear(self.fc3.out_features, n_class)

    # forward method
    def forward(self, input):
        x = F.leaky_relu(self.fc1(input), 0.2)
        x = F.leaky_relu(self.fc2(x), 0.2)
        x = F.leaky_relu(self.fc3(x), 0.2)
        x = F.tanh(self.fc4(x))

        return x

class discriminator(nn.Module):
    # initializers
    def __init__(self, input_size=32, n_class=10):
        super(discriminator, self).__init__()
        self.fc1 = nn.Linear(input_size, 1024)
        self.fc2 = nn.Linear(self.fc1.out_features, 512)
        self.fc3 = nn.Linear(self.fc2.out_features, 256)
        self.fc4 = nn.Linear(self.fc3.out_features, n_class)

    # forward method
    def forward(self, input):
        x = F.leaky_relu(self.fc1(input), 0.2)
        x = F.dropout(x, 0.3)
        x = F.leaky_relu(self.fc2(x), 0.2)
        x = F.dropout(x, 0.3)
        x = F.leaky_relu(self.fc3(x), 0.2)
        x = F.dropout(x, 0.3)
        x = F.sigmoid(self.fc4(x))

        return x

fixed_z_ = torch.randn((5 * 5, 100))    # fixed noise
# fixed_z_ = Variable(fixed_z_.cuda(), volatile=True)
fixed_z_ = Variable(fixed_z_, volatile=True)
def show_result(num_epoch, show = False, save = False, path = 'result.png', isFix=False):
    z_ = torch.randn((5*5, 100))
    # z_ = Variable(z_.cuda(), volatile=True)
    z_ = Variable(z_, volatile=True)

    G.eval()
    if isFix:
        test_images = G(fixed_z_)
    else:
        test_images = G(z_)
    G.train()

    size_figure_grid = 5
    fig, ax = plt.subplots(size_figure_grid, size_figure_grid, figsize=(5, 5))
    for i, j in itertools.product(range(size_figure_grid), range(size_figure_grid)):
        ax[i, j].get_xaxis().set_visible(False)
        ax[i, j].get_yaxis().set_visible(False)

    for k in range(5*5):
        i = k // 5
        j = k % 5
        ax[i, j].cla()
        ax[i, j].imshow(test_images[k, :].cpu().data.view(28, 28).numpy(), cmap='gray')

    label = 'Epoch {0}'.format(num_epoch)
    fig.text(0.5, 0.04, label, ha='center')
    plt.savefig(path)

    if show:
        plt.show()
    else:
        plt.close()

def show_train_hist(hist, show = False, save = False, path = 'Train_hist.png'):
    x = range(len(hist['D_losses']))

    y1 = hist['D_losses']
    y2 = hist['G_losses']

    plt.plot(x, y1, label='D_loss')
    plt.plot(x, y2, label='G_loss')

    plt.xlabel('Epoch')
    plt.ylabel('Loss')

    plt.legend(loc=4)
    plt.grid(True)
    plt.tight_layout()

    if save:
        plt.savefig(path)

    if show:
        plt.show()
    else:
        plt.close()

# training parameters
batch_size = 128
lr = 0.0002
train_epoch = 100

# data_loader
transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True, transform=transform),
    batch_size=batch_size, shuffle=True)

# network
G = generator(input_size=100, n_class=28*28)
D = discriminator(input_size=28*28, n_class=1)
# G.cuda()
# D.cuda()

# Binary Cross Entropy loss
BCE_loss = nn.BCELoss()

# Adam optimizer
G_optimizer = optim.Adam(G.parameters(), lr=lr)
D_optimizer = optim.Adam(D.parameters(), lr=lr)

# results save folder
if not os.path.isdir('MNIST_GAN_results'):
    os.mkdir('MNIST_GAN_results')
if not os.path.isdir('MNIST_GAN_results/Random_results'):
    os.mkdir('MNIST_GAN_results/Random_results')
if not os.path.isdir('MNIST_GAN_results/Fixed_results'):
    os.mkdir('MNIST_GAN_results/Fixed_results')

train_hist = {}
train_hist['D_losses'] = []
train_hist['G_losses'] = []
for epoch in range(train_epoch):
    D_losses = []
    G_losses = []
    for x_, _ in train_loader:
        # train discriminator D
        D.zero_grad()

        x_ = x_.view(-1, 28 * 28)

        mini_batch = x_.size()[0]

        y_real_ = torch.ones(mini_batch)
        y_fake_ = torch.zeros(mini_batch)

        # x_, y_real_, y_fake_ = Variable(x_.cuda()), Variable(y_real_.cuda()), Variable(y_fake_.cuda())
        x_, y_real_, y_fake_ = Variable(x_), Variable(y_real_), Variable(y_fake_)
        D_result = D(x_)
        D_real_loss = BCE_loss(D_result, y_real_)
        D_real_score = D_result

        z_ = torch.randn((mini_batch, 100))
        # z_ = Variable(z_.cuda())
        z_ = Variable(z_)
        G_result = G(z_)

        D_result = D(G_result)
        D_fake_loss = BCE_loss(D_result, y_fake_)
        D_fake_score = D_result

        D_train_loss = D_real_loss + D_fake_loss

        D_train_loss.backward()
        D_optimizer.step()

        D_losses.append(D_train_loss.data[0])

        # train generator G
        G.zero_grad()

        z_ = torch.randn((mini_batch, 100))
        y_ = torch.ones(mini_batch)

        # z_, y_ = Variable(z_.cuda()), Variable(y_.cuda())
        z_, y_ = Variable(z_), Variable(y_)
        G_result = G(z_)
        D_result = D(G_result)
        G_train_loss = BCE_loss(D_result, y_)
        G_train_loss.backward()
        G_optimizer.step()

        G_losses.append(G_train_loss.data[0])

    print('[%d/%d]: loss_d: %.3f, loss_g: %.3f' % (
        (epoch + 1), train_epoch, torch.mean(torch.FloatTensor(D_losses)), torch.mean(torch.FloatTensor(G_losses))))
    p = 'MNIST_GAN_results/Random_results/MNIST_GAN_' + str(epoch + 1) + '.png'
    fixed_p = 'MNIST_GAN_results/Fixed_results/MNIST_GAN_' + str(epoch + 1) + '.png'
    show_result((epoch+1), save=True, path=p, isFix=False)
    show_result((epoch+1), save=True, path=fixed_p, isFix=True)
    train_hist['D_losses'].append(torch.mean(torch.FloatTensor(D_losses)))
    train_hist['G_losses'].append(torch.mean(torch.FloatTensor(G_losses)))


print("Training finish!... save training results")
torch.save(G.state_dict(), "MNIST_GAN_results/generator_param.pkl")
torch.save(D.state_dict(), "MNIST_GAN_results/discriminator_param.pkl")
with open('MNIST_GAN_results/train_hist.pkl', 'wb') as f:
    pickle.dump(train_hist, f)

show_train_hist(train_hist, save=True, path='MNIST_GAN_results/MNIST_GAN_train_hist.png')

images = []
for e in range(train_epoch):
    img_name = 'MNIST_GAN_results/Fixed_results/MNIST_GAN_' + str(e + 1) + '.png'
    images.append(imageio.imread(img_name))
imageio.mimsave('MNIST_GAN_results/generation_animation.gif', images, fps=5)

![GAN_histogram](MNIST_GAN_results/MNIST_GAN_train_hist.png)

![GAN_animation](MNIST_GAN_results/generation_animation.gif)

### GAN의 문제점

### 보완한 GAN

cGAN

renderGAN

SGAN

...

Adversarial process란?
Markov chain이나 unrolled approximate inference networks가 필요없습니다. → 이 두개가 뭐지?