# 8.5 Introduction to generative adversarial networks

GAN を構成する 2 つのネットワーク

* Generator network -- ランダムなベクトル（画像の潜在表現）を受け取って，画像を生成する
* Adversary network -- 画像を受け取って，それがジェネレータから生成された画像なのかを判別する

特徴

* GAN のロス関数は毎ステップ変わる（２つの力の均衡を取ろうとするため）

## A bag of tricks

GAN の訓練は難しいため，いくつかの学習のためのトリック（ヒューリスティック）が存在する．

* ジェネレータの最後の活性化関数に `sigmoid` ではなく `tanh` を使用する
* 潜在スペースからの点のサンプリングは，一様分布ではなくガウス分布するようにする
* 識別ネットワークにドロップアウトを使い，さらにその予測のラベルにランダムなノイズを加える -- これはランダム性を加えることでロバスト性を高めるのが目的
* スパースな勾配（殆どの成分が 0 な勾配ベクトル）は GAN の学習を妨げる（普段は望ましいのだが・・・）
    * Max Pooling を使う代わりに strided convolution を使う
    * ReLU を使う代わりに LeakyReLU (0以下でも小さい勾配を発生させる活性化関数）を使う
* 畳み込みのカーネルサイズとストライドのサイズを同じにしないと，*checkerboard artifacts* と呼ばれるまだら模様が生成されてしまう減少が発生する

In [1]:
import keras
import numpy as np
from keras.layers import Dense, LeakyReLU, Reshape, Conv2D, Conv2DTranspose

latent_dim = 32
height = 32
width = 32
channels = 3

generator_input = keras.Input(shape=(latent_dim, ))

# 16x16 の特徴マップに変換する
x = keras.layers.Dense(128 * 16 * 16)(generator_input)
x = LeakyReLU()(x)
x = Reshape((16, 16, 128))(x)

# 畳み込み層を追加
x = Conv2D(256, 5, padding='same')(x)
x = LeakyReLU()(x)

# upsaple to 32x32
x = Conv2DTranspose(256, 4, strides=2, padding='same')(x)
x = LeakyReLU()(x)

x = Conv2D(256, 5, padding='same')(x)
x = LeakyReLU()(x)
x = Conv2D(256, 5, padding='same')(x)
x = LeakyReLU()(x)

x = Conv2D(channels, 7, activation='tanh', padding='same')(x)
generator = keras.models.Model(generator_input, x)

generator.summary()

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32768)             1081344   
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 32768)             0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 16, 16, 256)       819456    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 16, 16, 256)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 32, 32, 256)       1048832   
__________

In [2]:
from keras.layers import Input, Flatten, Dropout, Dense

discriminator_input = Input(shape=(height, width, channels))

x = Conv2D(128, 3)(discriminator_input)
x = LeakyReLU()(x)
x = Conv2D(128, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Conv2D(128, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Conv2D(128, 4, strides=2)(x)
x = LeakyReLU()(x)
x = Flatten()(x)

x = Dropout(0.4)(x)
x = Dense(1, activation='sigmoid')(x)

discriminator = keras.models.Model(discriminator_input, x)
discriminator.summary()

discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0008, clipvalue=1.0, decay=1e-8)
discriminator.compile(optimizer=discriminator_optimizer, loss='binary_crossentropy')

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 30, 30, 128)       3584      
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 30, 30, 128)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 14, 14, 128)       262272    
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 6, 6, 128)         262272    
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU)    (None, 6, 6, 128)         0         
__________

In [3]:
# gan を訓練している間は識別ネットワークは訓練しない（識別の訓練時には訓練する）
discriminator.trainable = False

gan_input = Input(shape=(latent_dim, ))
gan_output = discriminator(generator(gan_input))
gan = keras.models.Model(gan_input, gan_output)

gan_optimizer = keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)
gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy')

In [None]:
import os
from keras.preprocessing import image

# Load CIFAR10 data
(x_train, y_train), (_, _) = keras.datasets.cifar10.load_data()

# Select frog images (class 6)
x_train = x_train[y_train.flatten() == 6]

# Normalize data
x_train = x_train.reshape(
    (x_train.shape[0],) + (height, width, channels)).astype('float32') / 255.

iterations = 10000
batch_size = 20
save_dir = "./gan"

# Start training loop
start = 0
for step in range(iterations):
    # Sample random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))

    # Decode them to fake images
    generated_images = generator.predict(random_latent_vectors)

    # Combine them with real images
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = np.concatenate([generated_images, real_images])

    # Assemble labels discriminating real from fake images
    labels = np.concatenate([np.ones((batch_size, 1)),
                             np.zeros((batch_size, 1))])
    # Add random noise to the labels - important trick!
    labels += 0.05 * np.random.random(labels.shape)

    # Train the discriminator
    d_loss = discriminator.train_on_batch(combined_images, labels)

    # sample random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))

    # Assemble labels that say "all real images"
    misleading_targets = np.zeros((batch_size, 1))

    # Train the generator (via the gan model,
    # where the discriminator weights are frozen)
    a_loss = gan.train_on_batch(random_latent_vectors, misleading_targets)
    
    start += batch_size
    if start > len(x_train) - batch_size:
      start = 0

    # Occasionally save / plot
    if step % 100 == 0:
        # Save model weights
        gan.save_weights('gan.h5')

        # Print metrics
        print('discriminator loss at step %s: %s' % (step, d_loss))
        print('adversarial loss at step %s: %s' % (step, a_loss))

        # Save one generated image
        img = image.array_to_img(generated_images[0] * 255., scale=False)
        img.save(os.path.join(save_dir, 'generated_frog' + str(step) + '.png'))

        # Save one real image, for comparison
        img = image.array_to_img(real_images[0] * 255., scale=False)
        img.save(os.path.join(save_dir, 'real_frog' + str(step) + '.png'))

  'Discrepancy between trainable weights and collected trainable'


discriminator loss at step 0: 0.678688
adversarial loss at step 0: 0.6700016
discriminator loss at step 100: 0.7150573
adversarial loss at step 100: 0.6359584
discriminator loss at step 200: 0.67022604
adversarial loss at step 200: 0.96320057
discriminator loss at step 300: 0.69792265
adversarial loss at step 300: 0.75299823
discriminator loss at step 400: 0.6507484
adversarial loss at step 400: 0.78786325
discriminator loss at step 500: 0.67616665
adversarial loss at step 500: 1.0092065
discriminator loss at step 600: 0.71469915
adversarial loss at step 600: 0.73844594
discriminator loss at step 700: 0.7051999
adversarial loss at step 700: 1.8121433
discriminator loss at step 800: 0.681502
adversarial loss at step 800: 1.2028092
discriminator loss at step 900: 0.69740593
adversarial loss at step 900: 0.74776447
discriminator loss at step 1000: 0.6836648
adversarial loss at step 1000: 0.76006424
discriminator loss at step 1100: 0.674837
adversarial loss at step 1100: 0.73602116
discrim

# Wrapping up

* GAN は識別ネットワークに組み込まれた生成ネットワークからなるものである．驚くべきことに，ジェネレータは訓練データから直接画像を見るわけではなく，識別器からの情報で生成方法を自動的に獲得する．
* 最適化すべきロス関数が時間によって変換するので訓練が非常に難しく，多くのトリックが存在する．
* ハイクオリティな画像を生成する潜在的能力はあるが，VAE などと違う点は，潜在空間が構造的特徴をもたないことである．この問題によって，潜在ベクトルの編集による画像編集などのアプリケーションはできないと見られている．