In [None]:
# For tips on running notebooks in Google Colab, see
# https://pytorch.org/tutorials/beginner/colab
%matplotlib inline

### DCGAN Tutorial

**Author**: [Nathan Inkawhich](https://github.com/inkawhich)


### Introduction

이 튜토리얼은 예제를 통해 DCGAN(Deep Convolutional Generative Adversarial Networks)을 소개합니다.<br>
우리는 많은 실제 유명인 사진을 보여준 후 새로운 유명인을 생성하기 위해 GAN(생성적 적대 신경망)을 훈련시킬 것입니다.<br>
여기서 사용되는 대부분의 코드는 [pytorch/examples](https://github.com/pytorch/examples)의 DCGAN 구현에서 가져왔습니다.<br>
이 문서는 구현에 대한 자세한 설명과 이 모델이 어떻게 그리고 왜 작동하는지에 대한 이해를 돕고자 합니다.<br>
걱정하지 마세요, GAN에 대한 사전 지식은 필요 없지만,
처음 접하는 사람이라면 내부에서 실제로 무슨 일이 일어나는지 이해하기 위해 약간의 시간을 투자해야 할 수 있습니다.<br>
또한 시간 절약을 위해 GPU를 하나 또는 두 개 사용하는 것이 도움이 될 것입니다.<br>
그럼 시작해봅시다.

### Generative Adversarial Networks

#### What is a GAN?

GAN은 딥러닝 모델이 훈련 데이터 분포를 학습하여 동일한 분포에서 새로운 데이터를 생성할 수 있도록 하는 프레임워크입니다.<br>
GAN은 2014년 Ian Goodfellow에 의해 발명되었으며, 처음에는 [Generative Adversarial Nets](https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf) 논문에서 소개되었습니다.<br>
GAN은 *생성자*와 *판별자*라는 두 개의 개별 모델로 구성됩니다.<br>

생성자의 역할은 훈련 이미지처럼 보이는 '가짜' 이미지를 생성하는 것이고, <br>
판별자의 역할은 이미지를 보고 그것이 실제 훈련 이미지인지 생성자가 만든 가짜 이미지인지를 출력하는 것입니다.<br>

훈련하는 동안 생성자는 더 나은 가짜를 생성하여 판별자를 속이려고 지속적으로 노력하고, <br>
판별자는 실제와 가짜 이미지를 정확하게 분류하기 위해 더 나은 탐정이 되려고 합니다.<br>

이 게임의 균형점은 생성자가 훈련 데이터에서 직접 나온 것처럼 보이는 완벽한 가짜를 생성하고, <br>
판별자는 생성자의 출력이 실제인지 가짜인지 항상 50%의 확률로 추측하게 되는 것입니다.<br>

이제 튜토리얼 전반에서 사용할 몇 가지 표기법을 정의해 봅시다. 먼저 판별자부터 시작하겠습니다.<br>
$x$를 이미지를 나타내는 데이터라고 합시다. $D(x)$는 입력 $x$가 생성자로부터 나온 것이 아니라 훈련 데이터에서 왔을 확률을 출력하는 판별자 네트워크입니다.<br>
여기서 우리는 이미지를 다루고 있기 때문에, $D(x)$에 대한 입력은 크기가 3x64x64인 이미지(CHW 형식)입니다.<br>
직관적으로, $x$가 훈련 데이터에서 왔을 때 $D(x)$는 HIGH여야 하고, $x$가 생성자로부터 왔을 때는 LOW여야 합니다.<br>
$D(x)$는 전통적인 이진 분류기로 생각할 수도 있습니다.<br>

생성자의 표기법에 대해, $z$를 표준 정규 분포에서 샘플링한 잠재 공간 벡터라고 합시다.<br>
$G(z)$는 잠재 벡터 $z$를 데이터 공간으로 매핑하는 생성자 함수입니다.<br>
$G$의 목표는 훈련 데이터가 오는 분포($p_{data}$)를 추정하여 해당 분포($p_g$)에서 가짜 샘플을 생성하는 것입니다.

따라서 $D(G(z))$는 생성자 $G$의 출력이 실제 이미지일 확률(스칼라)입니다.<br>
[Goodfellow의 논문](https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf)에 설명된 바와 같이, $D$와 $G$는 미니맥스 게임을 진행하며, <br>
여기서 $D$는 실제와 가짜를 올바르게 분류할 확률($\log D(x)$)을 최대화하려 하고, <br>
$G$는 $D$가 자신의 출력이 가짜라고 예측할 확률을 최소화하려 합니다($\log(1 - D(G(z)))$). <br>
논문에서 GAN의 손실 함수는 다음과 같습니다.<br>

$$
\underset{G}{\text{min}} \underset{D}{\text{max}} V(D, G) = \mathbb{E}_{x \sim p_{data}(x)} \big[ \log D(x) \big] + \mathbb{E}_{z \sim p_{z}(z)} \big[ \log(1 - D(G(z))) \big]
$$

이론적으로, 이 미니맥스 게임의 해는 $p_g = p_{data}$인 지점이며, 이때 판별자는 입력이 실제인지 가짜인지 무작위로 추측하게 됩니다.<br>
그러나 GAN의 수렴 이론은 여전히 활발히 연구되고 있으며, 실제로 모델이 항상 이 지점까지 학습되는 것은 아닙니다.<br>

#### What is a DCGAN?

DCGAN은 위에서 설명한 GAN의 직접적인 확장판이며, 판별자와 생성자에서 각각 합성곱 및 전치 합성곱 레이어를 명시적으로 사용합니다. <br>
이는 Radford 등([Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf))의 논문에서 처음으로 설명되었습니다. <br>

판별자는 스트라이드된 [합성곱](https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d) 레이어, 
[배치 정규화](https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm2d) 레이어, 그리고 
[LeakyReLU](https://pytorch.org/docs/stable/nn.html#torch.nn.LeakyReLU) 활성화 함수로 구성됩니다. <br>
입력은 크기가 3x64x64인 이미지이며, 출력은 입력이 실제 데이터 분포에서 왔을 확률을 나타내는 스칼라 값입니다. <br>

생성자는 [전치 합성곱](https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d) 레이어, 배치 정규화 레이어, 그리고 
[ReLU](https://pytorch.org/docs/stable/nn.html#relu) 활성화 함수로 구성됩니다. <br>
입력은 표준 정규 분포에서 추출한 잠재 벡터 $z$이며, 출력은 크기가 3x64x64인 RGB 이미지입니다. <br>
스트라이드된 전치 합성곱 레이어를 통해 잠재 벡터를 이미지와 동일한 형태의 볼륨으로 변환할 수 있습니다. <br>

논문에서 저자들은 또한 옵티마이저 설정 방법, 손실 함수 계산 방법, 모델 가중치 초기화 방법 등에 대한 팁을 제공하며, 이는 다음 섹션에서 설명될 것입니다.<br>


In [2]:
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

# 재현 가능한 결과를 위해 랜덤 시드 설정
manualSeed = 999  # 시드를 999로 고정
# manualSeed = random.randint(1, 10000)  # 새로운 결과가 필요할 때 랜덤 시드를 생성하여 사용할 수 있음
print("Random Seed: ", manualSeed)
random.seed(manualSeed)  # Python random 모듈에 시드 설정
torch.manual_seed(manualSeed)  # PyTorch에 시드 설정
torch.use_deterministic_algorithms(True)  # 재현 가능한 결과를 위해 결정론적 알고리즘 사용 설정

Random Seed:  999


Inputs
======

실행을 위한 입력값 정의:

- `dataroot`: 데이터셋 폴더의 루트 경로. 다음 섹션에서 데이터셋에 대해 더 자세히 설명할 것입니다.
- `workers`: `DataLoader`에서 데이터를 로드할 때 사용하는 워커 스레드 수.
- `batch_size`: 학습에 사용되는 배치 크기. DCGAN 논문에서는 128의 배치 크기를 사용합니다.
- `image_size`: 학습에 사용되는 이미지의 공간적 크기. 기본 크기는 64x64로 설정됩니다. 다른 크기를 사용하려면 D와 G의 구조를 변경해야 합니다. 자세한 내용은 [여기](https://github.com/pytorch/examples/issues/70)를 참고하십시오.
- `nc`: 입력 이미지의 컬러 채널 수. 컬러 이미지는 일반적으로 3입니다.
- `nz`: 잠재 벡터의 길이.
- `ngf`: 생성자에서 전달되는 특성 맵의 깊이와 관련 있음.
- `ndf`: 판별자에서 전달되는 특성 맵의 깊이를 설정함.
- `num_epochs`: 실행할 학습 에포크 수. 학습을 오래 할수록 결과가 좋아질 가능성이 있지만, 그만큼 시간이 많이 소요됩니다.
- `lr`: 학습을 위한 학습률. DCGAN 논문에서는 이 값이 0.0002여야 한다고 설명합니다.
- `beta1`: Adam 옵티마이저의 beta1 하이퍼파라미터. 논문에서는 이 값이 0.5여야 한다고 제시합니다.
- `ngpu`: 사용 가능한 GPU의 수. 이 값이 0이면 코드가 CPU 모드에서 실행되고, 0보다 크면 해당 GPU 개수로 실행됩니다.


In [None]:
# 데이터셋의 루트 디렉토리
dataroot = "data/celeba"

# 데이터 로더를 위한 워커(Worker) 수
workers = 2

# 훈련 중 배치 크기
batch_size = 128

# 훈련 이미지의 공간적 크기. 모든 이미지는 변환기를 사용해 이 크기로 조정됨
image_size = 64

# 훈련 이미지의 채널 수. 컬러 이미지의 경우 3
nc = 3

# z 잠재 벡터의 크기 (즉, 생성자의 입력 크기)
nz = 100

# 생성자에서의 특성 맵(feature map) 크기
ngf = 64

# 판별자에서의 특성 맵(feature map) 크기
ndf = 64

# 훈련 에포크 수
num_epochs = 100

# 옵티마이저의 학습률
lr = 0.0002

# Adam 옵티마이저의 Beta1 하이퍼파라미터
beta1 = 0.5

# 사용 가능한 GPU 수. CPU 모드를 사용하려면 0으로 설정
ngpu = 1

Data
====

In this tutorial we will use the [Celeb-A Faces
dataset](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) which can be
downloaded at the linked site, or in [Google
Drive](https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg).
The dataset will download as a file named `img_align_celeba.zip`. Once
downloaded, create a directory named `celeba` and extract the zip file
into that directory. Then, set the `dataroot` input for this notebook to
the `celeba` directory you just created. The resulting directory
structure should be:

``` {.sourceCode .sh}
/path/to/celeba
    -> img_align_celeba  
        -> 188242.jpg
        -> 173822.jpg
        -> 284702.jpg
        -> 537394.jpg
           ...
```

This is an important step because we will be using the `ImageFolder`
dataset class, which requires there to be subdirectories in the dataset
root folder. Now, we can create the dataset, create the dataloader, set
the device to run on, and finally visualize some of the training data.


In [None]:
# We can use an image folder dataset the way we have it setup.
# Create the dataset
dataset = dset.ImageFolder(root=dataroot,
                           transform=transforms.Compose([
                               transforms.Resize(image_size),
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ]))
# Create the dataloader
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                         shuffle=True, num_workers=workers)

# Decide which device we want to run on
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

# Plot some training images
real_batch = next(iter(dataloader))
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))
plt.show()

Implementation
==============

With our input parameters set and the dataset prepared, we can now get
into the implementation. We will start with the weight initialization
strategy, then talk about the generator, discriminator, loss functions,
and training loop in detail.

Weight Initialization
---------------------

From the DCGAN paper, the authors specify that all model weights shall
be randomly initialized from a Normal distribution with `mean=0`,
`stdev=0.02`. The `weights_init` function takes an initialized model as
input and reinitializes all convolutional, convolutional-transpose, and
batch normalization layers to meet this criteria. This function is
applied to the models immediately after initialization.


In [None]:
# custom weights initialization called on ``netG`` and ``netD``
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

Generator
=========

The generator, $G$, is designed to map the latent space vector ($z$) to
data-space. Since our data are images, converting $z$ to data-space
means ultimately creating a RGB image with the same size as the training
images (i.e. 3x64x64). In practice, this is accomplished through a
series of strided two dimensional convolutional transpose layers, each
paired with a 2d batch norm layer and a relu activation. The output of
the generator is fed through a tanh function to return it to the input
data range of $[-1,1]$. It is worth noting the existence of the batch
norm functions after the conv-transpose layers, as this is a critical
contribution of the DCGAN paper. These layers help with the flow of
gradients during training. An image of the generator from the DCGAN
paper is shown below.

![](https://pytorch.org/tutorials/_static/img/dcgan_generator.png)

Notice, how the inputs we set in the input section (`nz`, `ngf`, and
`nc`) influence the generator architecture in code. `nz` is the length
of the z input vector, `ngf` relates to the size of the feature maps
that are propagated through the generator, and `nc` is the number of
channels in the output image (set to 3 for RGB images). Below is the
code for the generator.


In [None]:
# Generator Code

class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. ``(ngf*8) x 4 x 4``
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. ``(ngf*4) x 8 x 8``
            nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. ``(ngf*2) x 16 x 16``
            nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. ``(ngf) x 32 x 32``
            nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. ``(nc) x 64 x 64``
        )

    def forward(self, input):
        return self.main(input)

Now, we can instantiate the generator and apply the `weights_init`
function. Check out the printed model to see how the generator object is
structured.


In [None]:
# Create the generator
netG = Generator(ngpu).to(device)

# Handle multi-GPU if desired
if (device.type == 'cuda') and (ngpu > 1):
    netG = nn.DataParallel(netG, list(range(ngpu)))

# Apply the ``weights_init`` function to randomly initialize all weights
#  to ``mean=0``, ``stdev=0.02``.
netG.apply(weights_init)

# Print the model
print(netG)

Discriminator
=============

As mentioned, the discriminator, $D$, is a binary classification network
that takes an image as input and outputs a scalar probability that the
input image is real (as opposed to fake). Here, $D$ takes a 3x64x64
input image, processes it through a series of Conv2d, BatchNorm2d, and
LeakyReLU layers, and outputs the final probability through a Sigmoid
activation function. This architecture can be extended with more layers
if necessary for the problem, but there is significance to the use of
the strided convolution, BatchNorm, and LeakyReLUs. The DCGAN paper
mentions it is a good practice to use strided convolution rather than
pooling to downsample because it lets the network learn its own pooling
function. Also batch norm and leaky relu functions promote healthy
gradient flow which is critical for the learning process of both $G$ and
$D$.


Discriminator Code


In [None]:
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is ``(nc) x 64 x 64``
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf) x 32 x 32``
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf*2) x 16 x 16``
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf*4) x 8 x 8``
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. ``(ndf*8) x 4 x 4``
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)

Now, as with the generator, we can create the discriminator, apply the
`weights_init` function, and print the model's structure.


In [None]:
# Create the Discriminator
netD = Discriminator(ngpu).to(device)

# Handle multi-GPU if desired
if (device.type == 'cuda') and (ngpu > 1):
    netD = nn.DataParallel(netD, list(range(ngpu)))
    
# Apply the ``weights_init`` function to randomly initialize all weights
# like this: ``to mean=0, stdev=0.2``.
netD.apply(weights_init)

# Print the model
print(netD)

Loss Functions and Optimizers
=============================

With $D$ and $G$ setup, we can specify how they learn through the loss
functions and optimizers. We will use the Binary Cross Entropy loss
([BCELoss](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss))
function which is defined in PyTorch as:

$$\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right]$$

Notice how this function provides the calculation of both log components
in the objective function (i.e. $log(D(x))$ and $log(1-D(G(z)))$). We
can specify what part of the BCE equation to use with the $y$ input.
This is accomplished in the training loop which is coming up soon, but
it is important to understand how we can choose which component we wish
to calculate just by changing $y$ (i.e. GT labels).

Next, we define our real label as 1 and the fake label as 0. These
labels will be used when calculating the losses of $D$ and $G$, and this
is also the convention used in the original GAN paper. Finally, we set
up two separate optimizers, one for $D$ and one for $G$. As specified in
the DCGAN paper, both are Adam optimizers with learning rate 0.0002 and
Beta1 = 0.5. For keeping track of the generator's learning progression,
we will generate a fixed batch of latent vectors that are drawn from a
Gaussian distribution (i.e. fixed\_noise) . In the training loop, we
will periodically input this fixed\_noise into $G$, and over the
iterations we will see images form out of the noise.


In [None]:
# Initialize the ``BCELoss`` function
criterion = nn.BCELoss()

# Create batch of latent vectors that we will use to visualize
#  the progression of the generator
fixed_noise = torch.randn(64, nz, 1, 1, device=device)

# Establish convention for real and fake labels during training
real_label = 1.
fake_label = 0.

# Setup Adam optimizers for both G and D
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

Training
========

Finally, now that we have all of the parts of the GAN framework defined,
we can train it. Be mindful that training GANs is somewhat of an art
form, as incorrect hyperparameter settings lead to mode collapse with
little explanation of what went wrong. Here, we will closely follow
Algorithm 1 from the [Goodfellow's
paper](https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf),
while abiding by some of the best practices shown in
[ganhacks](https://github.com/soumith/ganhacks). Namely, we will
"construct different mini-batches for real and fake" images, and also
adjust G's objective function to maximize $log(D(G(z)))$. Training is
split up into two main parts. Part 1 updates the Discriminator and Part
2 updates the Generator.

**Part 1 - Train the Discriminator**

Recall, the goal of training the discriminator is to maximize the
probability of correctly classifying a given input as real or fake. In
terms of Goodfellow, we wish to "update the discriminator by ascending
its stochastic gradient". Practically, we want to maximize
$log(D(x)) + log(1-D(G(z)))$. Due to the separate mini-batch suggestion
from [ganhacks](https://github.com/soumith/ganhacks), we will calculate
this in two steps. First, we will construct a batch of real samples from
the training set, forward pass through $D$, calculate the loss
($log(D(x))$), then calculate the gradients in a backward pass.
Secondly, we will construct a batch of fake samples with the current
generator, forward pass this batch through $D$, calculate the loss
($log(1-D(G(z)))$), and *accumulate* the gradients with a backward pass.
Now, with the gradients accumulated from both the all-real and all-fake
batches, we call a step of the Discriminator's optimizer.

**Part 2 - Train the Generator**

As stated in the original paper, we want to train the Generator by
minimizing $log(1-D(G(z)))$ in an effort to generate better fakes. As
mentioned, this was shown by Goodfellow to not provide sufficient
gradients, especially early in the learning process. As a fix, we
instead wish to maximize $log(D(G(z)))$. In the code we accomplish this
by: classifying the Generator output from Part 1 with the Discriminator,
computing G's loss *using real labels as GT*, computing G's gradients in
a backward pass, and finally updating G's parameters with an optimizer
step. It may seem counter-intuitive to use the real labels as GT labels
for the loss function, but this allows us to use the $log(x)$ part of
the `BCELoss` (rather than the $log(1-x)$ part) which is exactly what we
want.

Finally, we will do some statistic reporting and at the end of each
epoch we will push our fixed\_noise batch through the generator to
visually track the progress of G's training. The training statistics
reported are:

-   **Loss\_D** - discriminator loss calculated as the sum of losses for
    the all real and all fake batches ($log(D(x)) + log(1 - D(G(z)))$).
-   **Loss\_G** - generator loss calculated as $log(D(G(z)))$
-   **D(x)** - the average output (across the batch) of the
    discriminator for the all real batch. This should start close to 1
    then theoretically converge to 0.5 when G gets better. Think about
    why this is.
-   **D(G(z))** - average discriminator outputs for the all fake batch.
    The first number is before D is updated and the second number is
    after D is updated. These numbers should start near 0 and converge
    to 0.5 as G gets better. Think about why this is.

**Note:** This step might take a while, depending on how many epochs you
run and if you removed some data from the dataset.


In [None]:
# Training Loop

# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0

print("Starting Training Loop...")
# For each epoch
for epoch in range(num_epochs):
    # For each batch in the dataloader
    for i, data in enumerate(dataloader, 0):
        
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        netD.zero_grad()
        # Format batch
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        # Forward pass real batch through D
        output = netD(real_cpu).view(-1)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        # Generate fake image batch with G
        fake = netG(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = netD(fake.detach()).view(-1)
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch, accumulated (summed) with previous gradients
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Compute error of D as sum over the fake and the real batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        netG.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = netD(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()
        
        # Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))
        
        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())
        
        # Check how the generator is doing by saving G's output on fixed_noise
        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))
            
        iters += 1

Results
=======

Finally, lets check out how we did. Here, we will look at three
different results. First, we will see how D and G's losses changed
during training. Second, we will visualize G's output on the
fixed\_noise batch for every epoch. And third, we will look at a batch
of real data next to a batch of fake data from G.

**Loss versus training iteration**

Below is a plot of D & G's losses versus training iterations.


In [None]:
plt.figure(figsize=(10,5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses,label="G")
plt.plot(D_losses,label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()

**Visualization of G's progression**

Remember how we saved the generator's output on the fixed\_noise batch
after every epoch of training. Now, we can visualize the training
progression of G with an animation. Press the play button to start the
animation.


In [None]:
fig = plt.figure(figsize=(8,8))
plt.axis("off")
ims = [[plt.imshow(np.transpose(i,(1,2,0)), animated=True)] for i in img_list]
ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)

HTML(ani.to_jshtml())

**Real Images vs. Fake Images**

Finally, lets take a look at some real images and fake images side by
side.


In [None]:
# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))

# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))

# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.show()

Where to Go Next
================

We have reached the end of our journey, but there are several places you
could go from here. You could:

-   Train for longer to see how good the results get
-   Modify this model to take a different dataset and possibly change
    the size of the images and the model architecture
-   Check out some other cool GAN projects
    [here](https://github.com/nashory/gans-awesome-applications)
-   Create GANs that generate
    [music](https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio/)
