# GAN 네트워크란
적대적 생성 신경망이라는 뜻으로 가짜 데이터를 생성하는 모델입니다. 이 이름이 붙여진 이유는 두 모델이 서로 적대적으로 학습되기 때문입니다. 네트워크 중 하나인 판별모델 Discriminator는 표본이 가짜 표본에 속하는지 진짜 표본에 속하는 지를 결정합니다. 반면 생성모델 Generator는 실제 데이터의 분포에 가깝도록 데이터를 만들어내게끔 학습됩니다. 원 논문에서는 이를 경찰과 위조 지폐범에 비유합니다. 생성모델은 위조 지폐를 만들어내어야 하는 위조범이 되고 판별 모델은 위조 지폐범에게 위조지폐를 받아 위조 지폐를 탐지 해내는 역할로 적대적인 관계를 갖게 됩니다. 이 경쟁은 위조지폐인지 아닌지 판단하기 어려울 때까지 지속됩니다.
![](https://www.researchgate.net/publication/351558593/figure/fig1/AS:1023239748845568@1620970775250/A-standard-GAN-architecture-and-loss-function.png)
## Loss 함수
원본 데이터 분포에서 샘플 x를 뽑아 logD(x)의 기댓값 계산합니다.
D(x)의 출력은 0~1사이의 확률로 1에 가까울 수록 진짜에 가깝다고 할 수 있습니다
$$
E_{x\char`\~p_{data(x)}}[logD(x)]
$$

노이즈를 하나 샘플링해서 가짜 이미지를 만든 다음에 $G(z)$에 판별자에 넣으면($D((G(z)))$) 가짜 이미지가 진짜인지
아닌지에 대한 확률이 나오게됩니다. 1.0에서 해당 확률을 빼주게 되면 가짜 이미지가 들어왔을 경우 제대로 판별했다면 식의 값이
0이 됩니다.
$$
E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))]
$$

전체 식이 아래와 같을 때 
$$
\underset{G}{Min}\underset{D}{Max} V(D, G) = E_{x\char`\~p_{data(x)}}[logD(x)] + E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))]
$$

### 판별자 D의 입장
$x$는 실제 데이터 샘플이라 할 때 $z$는 잠재 공간의 무작위 샘플입니다. 판별자는 $D(x) = 1$, 생성된 데이터에 대해 $D(G(z))$ = 0 이 되록 학습합니다.
$$
\underset{D}{Max} = E_{x\char`\~p_{data}(x)}[logD(x)] + E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))]
$$0

### 생성자 G의 입장
반면에 생성자는 판별자와 반대로 아래 수식이 즉 판별자의 손실값이 최소가 되게끔 학습을 하려고 합니다.
$$
\underset{G}{Min} = E_{x\char`\~p_{data}(x)}[logD(x)] + E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))]
$$
생성자에서 D(x)는 등장하지 않으니 정확히는 아래 수식을 최소화(0에 가깝게) 되도록 학습을 합니다. 판별자를 속여야하므로 생성된
이미지가 진짜같아야 하니, 판별자의 손실이 최소화 되어야 하는 것 입니다.
$$
\underset{G}{Min} = E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))]
$$


# 수식 증명
먼저 Generator G를 고정 시키면 위 수식을 아래와 같이 쓸 수 있게 됩니다.
아래 수식을 기댓값을 적분 형태로 나타낸 뒤, g(z)가 x로 매핑되는 과정이므로 하나의 적분에 합쳐 쓸 수 있게 됩니다.
$$
V(G,D) = \underset{G}{Min}\underset{D}{Max} V(D, G) = E_{x\char`\~p_{data(x)}}[logD(x)] + E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))] \\
= \int_{x}p_{data}(x)log(D(x))dx + \int_{z}p_{z}(z)log(1 - D(g(z)))dz \\
= \int_{x}p_{data}(x)log(D(x)) + p_{x}(x)log(1 - D(x))dx
$$

$y = alog(y) + blog(1-y) $ 형태일 때 [0, 1] 사이에서 $\frac{a}{a+b}$값이 최대값을 가지므로 G가 고정되었을 때 위 적분식은
아래 수식에서 최댓값을 가지게 됩니다
$$
D_{G}(x) = \frac{p_{data}(x)}{p_{data}(x) + p_{g}(x)} 
$$

그럼 G가 고정되었을 때 최대 D(x)의 최대값을 알았으니
$$
V(G,D) = \underset{D}{Max} V(D, G) = E_{x\char`\~p_{data(x)}}[logD(x)] + E_{z \char`\~ p_{z}(z)}[log(1 - D(G(z)))] \\
= E_{x\char`\~p_{data(x)}}[log\frac{p_{data}(x)}{p_{data}(x) + p_{g}(x)}] + E_{z \char`\~ p_{z}(z)}[log\frac{p_{g}(x)}{p_{data}(x) + p_{g}(x)})] \\
= E_{x\char`\~p_{data(x)}}[log\frac{2 * p_{data}(x)}{p_{data}(x) + p_{g}(x)}] + E_{z \char`\~ p_{z}(z)}[log\frac{2 * p_{g}(x)}{p_{data}(x) + p_{g}(x)})] - log(4) \\
= KL(p_{data} || \frac{p_{data}(x) + p_{g}(x)}{2}) + KL(p_{g} || \frac{p_{data}(x) + p_{g}(x)}{2}) - log(4) \\
= 2 * JSD(p_{data} || p_{g}) - log(4)
$$

이렇게 나오게 된다. JSD는 data와 g의 분포가 동일할 때 0이 나오므로 학습이 잘 된다는 것은 Generator가 원본데이터와 동일한 데이터를 내뱉을 때 최적값을 갖게 됩니다

# 캐글 데이터 셋 준비
https://www.kaggle.com/datasets/joosthazelzet/lego-brick-images

In [1]:
import keras
from keras import layers, Model, utils
import tensorflow as tf
import numpy as np
from matplotlib.pylab import plt

2024-10-25 17:00:31.053051: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-25 17:00:31.177150: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-25 17:00:31.234666: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-25 17:00:31.249766: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-25 17:00:31.360718: I tensorflow/core/platform/cpu_feature_guar

In [2]:
train_data = utils.image_dataset_from_directory(
    "../dataset",
    labels = None,
    color_mode="grayscale",
    image_size=(64, 64),
    batch_size=128,
    shuffle=True,
    interpolation="bilinear"
)

Found 40000 files.


I0000 00:00:1729843240.622966  130640 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729843240.845811  130640 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729843240.845945  130640 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729843240.849345  130640 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729843240.849408  130640 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:0

In [3]:
def preprocess(img):
    """
    tanh를 활성화 함수로 쓰기 위해 -1 ~ 1 사이로 조정해서 tanh 함수의 출력값과 범위를 맞춤
    """
    img = (tf.cast(img, "float32") -127.5) / 127.5
    return img

train = train_data.map(lambda x: preprocess(x))     

In [4]:
discriminator_input = layers.Input(shape=(64, 64, 1))
x = layers.Conv2D(64, 4, strides=2, padding="same", use_bias=False)(discriminator_input)
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)
x = layers.Conv2D(128, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)
x = layers.Conv2D(256, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)
x = layers.Conv2D(512, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)
x = layers.Conv2D(1, kernel_size=4, strides=1, padding="valid", use_bias=False, activation="sigmoid")(x)
discriminator_output = layers.Flatten()(x)
discriminator = Model(discriminator_input, discriminator_output)


In [5]:
generator_input = layers.Input(shape=(100,))
x = layers.Reshape((1, 1, 100))(generator_input)
x = layers.Conv2DTranspose(512, kernel_size=4, strides=1, padding="valid", use_bias = False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2DTranspose(256, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Conv2DTranspose(64, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization(momentum=0.9)(x)
x = layers.LeakyReLU(0.2)(x)
generator_output = layers.Conv2DTranspose(1, kernel_size=4, strides=2, padding="same", use_bias=False, activation="tanh")(x)
generator = Model(generator_input, generator_output)

In [None]:
class DCGAN(Model):
    def __init__(self, discriminator, generator, latent_dim):
        super(DCGAN, self).__init__()
        self.discriminator = discriminator
        self.generator = generator
        self.latent_dim = latent_dim

    def compile(self, d_optimizer, g_optimizer):
        super(DCGAN, self).compile()
        self.loss_function = keras.losses.BinaryCrossentropy()
        self.d_optimizer = d_optimizer
        self.g_optimizer = g_optimizer
        self.d_loss_metric = keras.metrics.Mean(name="d_loss")
        self.g_loss_metric = keras.metrics.Mean(name="g_loss")

    @property
    def metrics(self):
        return [self.d_loss_metric, self.g_loss_metric]
    
    def train_step(self, real_images):
        batch_size = tf.shape(real_images)[0]
        random_latent_vectors = tf.random.normal(shape=(batch_size, self.latent_dim))

        with tf.GradientTape() as g_tape, tf.GradientTape() as d_tape:
            generated_images = self.generator(random_latent_vectors, training=True)
            real_predictions = self.discriminator(real_images, training=True)
            fake_predictions = self.discriminator(generated_images, training=True)
            real_labels = tf.ones_like(real_predictions)
            # 생성자가 단순히 특정 패턴만 반복하지 않고 다양한 출력을 만들어내게 함
            real_nosiy_labels = real_labels + 0.1 + tf.random.uniform(tf.shape(real_predictions))
            fake_lables = tf.zeros_like(fake_predictions)
            fake_noisy_labels = fake_lables - 0.1 * tf.random.uniform(tf.shape(fake_predictions))

            d_real_loss = self.loss_function(real_nosiy_labels, real_predictions)
            d_fake_loss = self.loss_function(fake_noisy_labels, fake_predictions)
            d_loss = (d_real_loss + d_fake_loss) / 2.0

            g_loss = self.loss_function(real_labels, fake_predictions)
        
        gradients_discriminator = d_tape.gradient(d_loss, self.discriminator.trainable_variables)
        gradients_generator = g_tape.gradient(g_loss, self.generator.trainable_variables)

        self.d_optimizer.apply_gradients(zip(gradients_discriminator, self.discriminator.trainable_variables))
        self.g_optimizer.apply_gradients(zip(gradients_generator, self.generator.trainable_variables))
        
        self.d_loss_metric.update_state(d_loss)
        self.g_loss_metric.update_state(g_loss)

        return {m.name: m.result() for m in self.metrics}
    
dcgan = DCGAN(discriminator=discriminator, generator=generator, latent_dim=100)
dcgan.compile(d_optimizer=keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.4, beta_2=0.999),
              g_optimizer=keras.optimizers.Adam(learning_rate=0.0002, beta_1=0.4, beta_2=0.999))

dcgan.fit(train, epochs=300)


Epoch 1/300


I0000 00:00:1729843255.100653  131025 service.cc:146] XLA service 0x7f1b3c016690 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1729843255.100800  131025 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 4050 Laptop GPU, Compute Capability 8.9
2024-10-25 17:00:55.285009: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-10-25 17:00:58.052753: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8902
I0000 00:00:1729843271.874257  131025 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 91ms/step - d_loss: -2.7561 - g_loss: 5.5216
Epoch 2/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 56ms/step - d_loss: -3.1847 - g_loss: 5.6293
Epoch 3/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 57ms/step - d_loss: -3.2380 - g_loss: 6.7544
Epoch 4/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 57ms/step - d_loss: -3.1343 - g_loss: 6.3329
Epoch 5/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 57ms/step - d_loss: -2.9876 - g_loss: 6.4529
Epoch 6/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 57ms/step - d_loss: -3.3221 - g_loss: 6.6650
Epoch 7/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 57ms/step - d_loss: -2.9999 - g_loss: 6.4531
Epoch 8/300
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 57ms/step - d_loss: -3.3368 - g_loss: 6.5430
Epoch 9/300
[1m313/

In [None]:
grid_width, grid_height = (10, 3)
z_sample = np.random.normal(size=(grid_width * grid_height, 100))
reconstructions = generator.predict(z_sample)
fig = plt.figure(figsize=(18, 5))
fig.subplots_adjust(hspace=0.4, wspace=0.4)

for i in range(grid_width * grid_height):
    ax = fig.add_subplot(grid_height, grid_width, i + 1)
    ax.axis("off")
    ax.imshow(reconstructions[i, :, :], cmap="Greys")