chainer
In this tutorial, we generate images with generative adversarial networks (GAN). GAN are kinds of deep neural network for generative modeling that are often applied to image generation. GAN-based models are also used in PaintsChainer, an automatic colorization service.
In this tutorial, you will learn the following things:
- Generative Adversarial Networks (GAN)
- Implementation of DCGAN in Chainer
As explained in GAN tutorial in NIPS 2016 [1], generative models can be classified into the categories as shown in the following figure:
cited from [1]Besides GAN, other famous generative models include Fully visible belief networks (FVBNs) and Variational autoencoder (VAE). Unlike FVBNs and VAE, GAN do not explicitly model the probability distribution
The advantages of GAN are low sampling cost and its state-of-the-art performance in image generation. The disadvantage is that we cannot calculate the likelihood
As explained above, GAN use the two models, the generator and the discriminator. When training the networks, we should match the data distribution
The generator G learns the target distribution, and ideally eventually reaches a Nash equilibrium [2] of game theory. In detail, while training the discriminator D, the generator G is also trained, so that the discriminator D makes a mistake.
As an intuitive example, the relationship between counterfeiters of banknotes and the police is frequently used. The counterfeiters try to make counterfeit notes that look like real banknotes. The police try to distinguish real bank notes from counterfeit notes. It is supposed that the ability of the police gradually rises, so that real banknotes and counterfeit notes can be recognized well. Then, the counterfeiters will not be able to use counterfeit banknotes, so they will create counterfeit banknotes that appear more realistic. As the police improve their skill further, they can distinguish real and counterfeit notes... Eventually, the counterfeiter will be able to produce counterfeit banknotes look as real as genuine ones.
The training process is explained by the following mathematical expressions. First, since the discriminator
Then, when we match the data distribution
The DJS of
where
When we actually train the model, the above min-max problem is solved by alternately updating the discriminator
In this section, we will introduce the model called DCGAN(Deep Convolutional GAN) proposed by Radford et al.[5]. As shown below, it is a model using CNN(Convolutional Neural Network) as its name suggests.
cited from [5]In addition, although GAN are known for its difficulty in training, this paper introduces various techniques for successful training:
- Convert max-pooling layers to convolution layers with larger or fractional strides
- Convert fully connected layers to global average pooling layers in the discriminator
- Use batch normalization layers in the generator and the discriminator
- Use leaky ReLU activation functions in the discriminator
There is an example of DCGAN in the official repository of Chainer, so we will explain how to implement DCGAN based on this: chainer/examples/dcgan
First, let's define a network for the generator.
../../../examples/dcgan/net.py
When we make a network in Chainer, there are some conventions:
- Define a network class which inherits
~chainer.Chain
. - Make
chainer.links
's instances in theinit_scope():
of the initializer__init__
. - Define network connections in the
__call__
operator by using thechainer.links
's instances andchainer.functions
.
If you are not familiar with constructing a new network, please refer to this tutorial<creating_models>
.
As we can see from the initializer __init__
, the Generator
uses deconvolution layers ~chainer.links.Deconvolution2D
and batch normalization layers ~chainer.links.BatchNormalization
. In __call__
, each layer is called and followed by ~chainer.functions.relu
except the last layer.
Because the first argument of L.Deconvolution
is the channel size of input and the second is the channel size of output, we can find that each layer halves the channel size. When we construct Generator
with ch=1024
, the network is same as the above image.
Note
Be careful when passing the output of a fully connected layer to a convolution layer, because the convolutional layer needs additional dimensions for inputs. As we can see the 1st line of __call__
, the output of the fully connected layer is reshaped by ~chainer.functions.reshape
to add the dimensions of the channel, the width and the height of images.
In addtion, let's define the network for the discriminator.
../../../examples/dcgan/net.py
The Discriminator
network is almost mirrors of the Generator
network. However, there are minor different points:
- Use
~chainer.functions.leaky_relu
as activation functions - Deeper than
Generator
- Add some noise to every intermediate outputs before giving them to the next layers
../../../examples/dcgan/net.py
Let's retrieve the CIFAR-10 dataset by using Chainer's dataset utility function ~chainer.datasets.get_cifar10
. CIFAR-10 is a set of small natural images. Each example is an RGB color image of size 32x32. In the original images, each of R, G, B of pixels is represented by one-byte unsigned integer (i.e. from 0 to 255). This function changes the scale of pixel values into [0, scale]
float values.
../../../examples/dcgan/train_dcgan.py
../../../examples/dcgan/train_dcgan.py
Let's make the instances of the generator and the discriminator.
../../../examples/dcgan/train_dcgan.py
Next, let's make optimizers for the models created above.
../../../examples/dcgan/train_dcgan.py
GAN need the two models: the generator and the discriminator. Usually, the default updaters pre-defined in Chainer take only one model. So, we need to define a custom updater for GAN training.
The definition of DCGANUpdater
is a little complicated. However, it just minimizes the loss of the discriminator and that of the generator alternately.
As you can see in the class definiton, DCGANUpdater
inherits ~chainer.training.updaters.StandardUpdater
. In this case, almost all necessary functions are defined in ~chainer.training.updaters.StandardUpdater
, we just override the functions of __init__
and update_core
.
Note
We do not need to define loss_dis
and loss_gen
because the functions are called only in update_core
. It aims at improving readability.
../../../examples/dcgan/updater.py
In the intializer __init__
, an addtional keyword argument models
is required as you can see the code below. Also, we use keyword arguments iterator
, optimizer
and device
. It should be noted that the optimizer
augment takes a dictionary. The two different models require two different optimizers. To specify the different optimizers for the models, we give a dictionary, {'gen': opt_gen, 'dis': opt_dis}
, to the optimizer
argument. we should input optimizer
as a dictionary {'gen': opt_gen, 'dis': opt_dis}
. In the DCGANUpdater
, you can access the iterator with self.get_iterator('main')
. Also, you can access the optimizers with self.get_optimizer('gen')
and self.get_optimizer('dis')
.
In update_core
, the two loss functions loss_dis
and loss_gen
are minimized by the optimizers. At first two lines, we access the optimizers. Then, we create next minibatch of training data by self.get_iterator('main').next()
, copy batch
to the device by self.converter
, and make it a Variable
object. After that, we minimize the loss functions with the optimizers.
Note
When defining update_core
, we may want to manipulate the underlying array
of a Variable
with numpy
or cupy
library. Note that the type of arrays on CPU is numpy.ndarray
, while the type of arrays on GPU is cupy.ndarray
. However, users do not need to write if
condition explicitly, because the appropriate array module can be obtained by xp = chainer.backends.cuda.get_array_module(variable.array)
. If variable
is on GPU, cupy
is assigned to xp
, otherwise numpy
is assigned to xp
.
../../../examples/dcgan/train_dcgan.py
../../../examples/dcgan/train_dcgan.py
../../../examples/dcgan/train_dcgan.py
We can run the example as follows.
$ pwd
/root2chainer/chainer/examples/dcgan
$ python train_dcgan.py --gpu 0
GPU: 0
# Minibatch-size: 50
# n_hidden: 100
# epoch: 1000
epoch iteration gen/loss dis/loss ................] 0.01%
0 100 1.2292 1.76914
total [..................................................] 0.02%
this epoch [#########.........................................] 19.00%
190 iter, 0 epoch / 1000 epochs
10.121 iters/sec. Estimated time to finish: 1 day, 3:26:26.372445.
The results will be saved in the directory /root2chainer/chainer/examples/dcgan/result/
. The image is generated by the generator trained for 1000 epochs, and the GIF image on the top of this page shows generated images after every 10 epochs.