# Introduction
## Overview of the Competition
The "GAN Getting Started" competition on Kaggle is designed to introduce participants to the world of Generative Adversarial Networks (GANs). In this competition, participants are tasked with generating new images that are stylistically similar to a provided dataset. The challenge focuses on assessing the ability of GANs to produce high-quality, realistic images that can be difficult to distinguish from the original data. This competition serves as a platform for data scientists, machine learning engineers, and enthusiasts to explore and innovate in the field of generative models, particularly GANs.


## Objective of the Notebook
This Jupyter Notebook aims to guide the reader through the process of developing a GAN model tailored to the competition's requirements. The primary objectives of this notebook are to:


*   Provide a comprehensive understanding of the data and problem statement.
*   Explore and implement data preprocessing techniques suitable for training GANs.
*   Design, build, and train a GAN model from scratch.
*   Evaluate the performance of the generated images against the competition's criteria.
*  Share insights, challenges, and potential improvements for GAN models.

## Brief Introduction to Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. This technique was introduced by Ian Goodfellow and his colleagues in 2014 and has since been an area of active research and development.

A GAN consists of two main parts: the generator and the discriminator. The generator's role is to create images (or other types of data) that resemble the real data as closely as possible. On the other hand, the discriminator's role is to distinguish between the generator's fake images and real images from the dataset. During training, these two networks are in a constant battle, with the generator trying to produce more and more realistic images, while the discriminator gets better at telling them apart from real images. The end goal is to train a generator that produces realistic images that the discriminator can no longer distinguish from real images, hence improving the generative model's performance.

GANs have been used in various applications, including but not limited to, image generation, photo realistic image synthesis, style transfer, image super-resolution, and more. Their ability to generate new data from existing data makes them a powerful tool in the field of artificial intelligence and data science.

# Preparations & Installations

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

import os
import time
import matplotlib.pyplot as plt
from IPython.display import clear_output
from kaggle_datasets import KaggleDatasets

AUTOTUNE = tf.data.AUTOTUNE

# Loading dataset & Preprocessing

In [None]:
GCS_PATH = KaggleDatasets().get_gcs_path()

In [None]:
monet_files= tf.io.gfile.glob(str(GCS_PATH + '/monet_tfrec/*.tfrec'))
photo_files= tf.io.gfile.glob(str(GCS_PATH + '/photo_tfrec/*.tfrec'))

In [None]:
IMAGE_SIZE= [256,256]                                            
def decode_img(image):                                           
    image= tf.image.decode_jpeg(image,channels= 3)
    image= (tf.cast(image, tf.float32)/255)*2 -1
    image= tf.reshape(image, shape= [*IMAGE_SIZE,3])
    return image

def read_tfrec(example):
    tfrec_format= {
        'image_name': tf.io.FixedLenFeature([], tf.string),
        'image': tf.io.FixedLenFeature([], tf.string),
        'target': tf.io.FixedLenFeature([], tf.string)
    }
    example= tf.io.parse_single_example(example, tfrec_format)
    image= decode_img(example['image'])
    return image

In [None]:
def load_data(files):
    data= tf.data.TFRecordDataset(files)
    data= data.map(read_tfrec)
    return data

In [None]:
monet_data= load_data(monet_files).batch(1)
photo_data= load_data(photo_files).batch(1)

In [None]:
train_monet = monet_data
train_photo = photo_data.take(300)
test_photo = photo_data.skip(300)

In [None]:
BUFFER_SIZE = 300
BATCH_SIZE = 1
IMG_WIDTH = 256
IMG_HEIGHT = 256

In [None]:
sample_photo = next(iter(train_photo))
sample_monet = next(iter(train_monet))

In [None]:
plt.subplot(121)
plt.title('Photo')
plt.imshow(sample_photo[0] * 0.5 + 0.5)

plt.subplot(122)
plt.title('Monet')
plt.imshow(sample_monet[0] * 0.5 + 0.5)

plt.show()

# Creating models

This code is copied from: https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py and modified.
Please refer to the original paper: https://arxiv.org/pdf/1703.10593.pdf

# Cycle-Consistency Loss in CycleGAN

The concept of cycle-consistency loss is a fundamental component of the CycleGAN architecture, which is designed for image-to-image translation tasks where paired examples are not available. This method enables the translation of images from one domain to another (e.g., horses to zebras, summer to winter scenes) without the need for corresponding image pairs in both domains.

## Components of CycleGAN

CycleGAN consists of two main types of networks:

### Generators (G and F)

- **G**: Maps from domain X to domain Y ($G: X \rightarrow Y$).
- **F**: Maps from domain Y to domain X ($F: Y \rightarrow X$).

These generators are responsible for translating images from one domain to the other.

### Discriminators (Dx and Dy)

- **Dx**: Discriminates between images from domain X and translated images $F(Y)$.
- **Dy**: Discriminates between images from domain Y and translated images $G(X)$.

The discriminators aim to distinguish real images from translated ones, helping to refine the generators.

## Cycle-Consistency Loss

The cycle-consistency loss ensures that an image can undergo a round-trip translation (domain X to Y and back to X, or Y to X and back to Y) ending up similar to the original image. This is critical for learning meaningful translations without paired examples.

### Forward Cycle

1. An image from domain X is translated to domain Y using generator G to get $\hat{Y}$.
2. $\hat{Y}$ is then translated back to domain X using generator F to get $\tilde{X}$.
3. The goal is for $\tilde{X}$ to closely resemble the original image X.

### Backward Cycle

1. An image from domain Y is translated to domain X using generator F to get $\hat{X}$.
2. $\hat{X}$ is then translated back to domain Y using generator G to get $\tilde{Y}$.
3. The goal is for $\tilde{Y}$ to closely resemble the original image Y.

### Loss Calculation

The cycle-consistency loss combines losses from both the forward and backward cycles, aiming to minimize the difference between the original and cycled images. This encourages the network to learn transformations that preserve content while changing domain-specific attributes.

This approach has been widely used for various image-to-image translation tasks, showcasing its effectiveness and versatility in unsupervised learning scenarios.

![cycle_losss](https://www.tensorflow.org/static/tutorials/generative/images/cycle_loss.png)

In [None]:
class InstanceNormalization(tf.keras.layers.Layer):
  """Instance Normalization Layer (https://arxiv.org/abs/1607.08022)."""

  def __init__(self, epsilon=5e-5):
    super(InstanceNormalization, self).__init__()
    self.epsilon = epsilon

  def build(self, input_shape):
    self.scale = self.add_weight(
        name='scale',
        shape=input_shape[-1:],
        initializer=tf.random_normal_initializer(1., 0.05),
        trainable=True)

    self.offset = self.add_weight(
        name='offset',
        shape=input_shape[-1:],
        initializer='zeros',
        trainable=True)

  def call(self, x):
    mean, variance = tf.nn.moments(x, axes=[1, 2], keepdims=True)
    inv = tf.math.rsqrt(variance + self.epsilon)
    normalized = (x - mean) * inv
    return self.scale * normalized + self.offset

In [None]:
def downsample(filters, size, norm_type='batchnorm', apply_norm=True):
  """Downsamples an input.

  Conv2D => Batchnorm => LeakyRelu

  Args:
    filters: number of filters
    size: filter size
    norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
    apply_norm: If True, adds the batchnorm layer

  Returns:
    Downsample Sequential Model
  """
  initializer = tf.random_normal_initializer(0., 0.05)

  result = tf.keras.Sequential()
  result.add(
      tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False))

  if apply_norm:
    if norm_type.lower() == 'batchnorm':
      result.add(tf.keras.layers.BatchNormalization())
    elif norm_type.lower() == 'instancenorm':
      result.add(InstanceNormalization())

  result.add(tf.keras.layers.LeakyReLU())

  return result

In [None]:
def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):
  """Upsamples an input.

  Conv2DTranspose => Batchnorm => Dropout => Relu

  Args:
    filters: number of filters
    size: filter size
    norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
    apply_dropout: If True, adds the dropout layer

  Returns:
    Upsample Sequential Model
  """

  initializer = tf.random_normal_initializer(0., 0.05)

  result = tf.keras.Sequential()
  result.add(
      tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
                                      padding='same',
                                      kernel_initializer=initializer,
                                      use_bias=False))

  if norm_type.lower() == 'batchnorm':
    result.add(tf.keras.layers.BatchNormalization())
  elif norm_type.lower() == 'instancenorm':
    result.add(InstanceNormalization())

  if apply_dropout:
    result.add(tf.keras.layers.Dropout(0.6))

  result.add(tf.keras.layers.ReLU())

  return result

In [None]:
def Discriminator(norm_type='batchnorm', target=True):
  """PatchGan discriminator model (https://arxiv.org/abs/1611.07004).

  Args:
    norm_type: Type of normalization. Either 'batchnorm' or 'instancenorm'.
    target: Bool, indicating whether target image is an input or not.

  Returns:
    Discriminator model
  """

  initializer = tf.random_normal_initializer(0., 0.05)

  inp = tf.keras.layers.Input(shape=[None, None, 3], name='input_image')
  x = inp

  if target:
    tar = tf.keras.layers.Input(shape=[None, None, 3], name='target_image')
    x = tf.keras.layers.concatenate([inp, tar])  # (bs, 256, 256, channels*2)

  down1 = downsample(64, 4, norm_type, False)(x)  # (bs, 128, 128, 64)
  down2 = downsample(128, 4, norm_type)(down1)  # (bs, 64, 64, 128)
  down3 = downsample(256, 4, norm_type)(down2)  # (bs, 32, 32, 256)

  zero_pad1 = tf.keras.layers.ZeroPadding2D()(down3)  # (bs, 34, 34, 256)
  conv = tf.keras.layers.Conv2D(
      512, 4, strides=1, kernel_initializer=initializer,
      use_bias=False)(zero_pad1)  # (bs, 31, 31, 512)

  if norm_type.lower() == 'batchnorm':
    norm1 = tf.keras.layers.BatchNormalization()(conv)
  elif norm_type.lower() == 'instancenorm':
    norm1 = InstanceNormalization()(conv)

  leaky_relu = tf.keras.layers.LeakyReLU()(norm1)

  zero_pad2 = tf.keras.layers.ZeroPadding2D()(leaky_relu)  # (bs, 33, 33, 512)

  last = tf.keras.layers.Conv2D(
      1, 4, strides=1,
      kernel_initializer=initializer)(zero_pad2)  # (bs, 30, 30, 1)

  if target:
    return tf.keras.Model(inputs=[inp, tar], outputs=last)
  else:
    return tf.keras.Model(inputs=inp, outputs=last)

In [None]:
def unet_generator(output_channels, norm_type='batchnorm'):
  """Modified u-net generator model (https://arxiv.org/abs/1611.07004).

  Args:
    output_channels: Output channels
    norm_type: Type of normalization. Either 'batchnorm' or 'instancenorm'.

  Returns:
    Generator model
  """

  down_stack = [
      downsample(64, 4, norm_type, apply_norm=False),  # (bs, 128, 128, 64)
      downsample(128, 4, norm_type),  # (bs, 64, 64, 128)
      downsample(256, 4, norm_type),  # (bs, 32, 32, 256)
      downsample(512, 4, norm_type),  # (bs, 16, 16, 512)
      downsample(512, 4, norm_type),  # (bs, 8, 8, 512)
      downsample(512, 4, norm_type),  # (bs, 4, 4, 512)
      downsample(512, 4, norm_type),  # (bs, 2, 2, 512)
      downsample(512, 4, norm_type),  # (bs, 1, 1, 512)
  ]

  up_stack = [
      upsample(512, 4, norm_type, apply_dropout=True),  # (bs, 2, 2, 1024)
      upsample(512, 4, norm_type, apply_dropout=True),  # (bs, 4, 4, 1024)
      upsample(512, 4, norm_type, apply_dropout=True),  # (bs, 8, 8, 1024)
      upsample(512, 4, norm_type),  # (bs, 16, 16, 1024)
      upsample(256, 4, norm_type),  # (bs, 32, 32, 512)
      upsample(128, 4, norm_type),  # (bs, 64, 64, 256)
      upsample(64, 4, norm_type),  # (bs, 128, 128, 128)
  ]

  initializer = tf.random_normal_initializer(0., 0.02)
  last = tf.keras.layers.Conv2DTranspose(
      output_channels, 4, strides=2,
      padding='same', kernel_initializer=initializer,
      activation='tanh')  # (bs, 256, 256, 3)

  concat = tf.keras.layers.Concatenate()

  inputs = tf.keras.layers.Input(shape=[None, None, 3])
  x = inputs

  # Downsampling through the model
  skips = []
  for down in down_stack:
    x = down(x)
    skips.append(x)

  skips = reversed(skips[:-1])

  # Upsampling and establishing the skip connections
  for up, skip in zip(up_stack, skips):
    x = up(x)
    x = concat([x, skip])

  x = last(x)

  return tf.keras.Model(inputs=inputs, outputs=x)

In [None]:
OUTPUT_CHANNELS = 3
# generator_g - takes a photo and tries to generate a monet
generator_g = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
# generator_f - takes a monet and tries to generate a photo
generator_f = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')

#discriminator_x - estimating generator_g , gives feedback to generator, so that the generator can improve
discriminator_x = Discriminator(norm_type='instancenorm', target=False)
# discriminator_y - estimating generator_g , gives feedback to generator, so that the generator can improve
discriminator_y = Discriminator(norm_type='instancenorm', target=False)

In [None]:
# initial generation results (before training)

to_monet = generator_g(sample_photo)
plt.figure(figsize=(8, 8))
contrast = 8

imgs = [sample_photo, to_monet]
title = ['Photo', 'To Monet']

for i in range(len(imgs)):
  plt.subplot(2, 2, i+1)
  plt.title(title[i])
  if i % 2 == 0:
    plt.imshow(imgs[i][0] * 0.5 + 0.5)
  else:
    plt.imshow(imgs[i][0] * 0.5 * contrast + 0.5)
plt.show()

In [None]:
LAMBDA = 10

In [None]:
loss_obj = tf.keras.losses.BinaryCrossentropy(from_logits=True)

In [None]:
def discriminator_loss(real, generated):
  real_loss = loss_obj(tf.ones_like(real), real)

  generated_loss = loss_obj(tf.zeros_like(generated), generated)

  total_disc_loss = real_loss + generated_loss

  return total_disc_loss * 0.5

In [None]:
def generator_loss(generated):
  return loss_obj(tf.ones_like(generated), generated)

In [None]:
def calc_cycle_loss(real_image, cycled_image):
  loss1 = tf.reduce_mean(tf.abs(real_image - cycled_image))

  return LAMBDA * loss1

In [None]:
def identity_loss(real_image, same_image):
  loss = tf.reduce_mean(tf.abs(real_image - same_image))
  return LAMBDA * 0.5 * loss

In [None]:
generator_g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
generator_f_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

discriminator_x_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_y_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

# Training

In [None]:
EPOCHS = 100

In [None]:
# for dinamic visualization
def generate_images(model, test_input):
  prediction = model(test_input)

  plt.figure(figsize=(12, 12))

  display_list = [test_input[0], prediction[0]]
  title = ['Input Image', 'Predicted Image']

  for i in range(2):
    plt.subplot(1, 2, i+1)
    plt.title(title[i])
    # getting the pixel values between [0, 1] to plot it.
    plt.imshow(display_list[i] * 0.5 + 0.5)
    plt.axis('off')
  plt.show()

In [None]:
@tf.function
def train_step(real_x, real_y):
  # persistent is set to True because the tape is used more than
  # once to calculate the gradients.
  with tf.GradientTape(persistent=True) as tape:
    # Generator G translates X -> Y
    # Generator F translates Y -> X.

    fake_y = generator_g(real_x, training=True)
    cycled_x = generator_f(fake_y, training=True)

    fake_x = generator_f(real_y, training=True)
    cycled_y = generator_g(fake_x, training=True)

    # same_x and same_y are used for identity loss.
    same_x = generator_f(real_x, training=True)
    same_y = generator_g(real_y, training=True)

    disc_real_x = discriminator_x(real_x, training=True)
    disc_real_y = discriminator_y(real_y, training=True)

    disc_fake_x = discriminator_x(fake_x, training=True)
    disc_fake_y = discriminator_y(fake_y, training=True)

    # calculate the loss
    gen_g_loss = generator_loss(disc_fake_y)
    gen_f_loss = generator_loss(disc_fake_x)

    total_cycle_loss = calc_cycle_loss(real_y, cycled_y)

    # Total generator loss = adversarial loss + cycle loss
    total_gen_g_loss = gen_g_loss + total_cycle_loss + identity_loss(real_y, same_y)
    total_gen_f_loss = gen_f_loss + total_cycle_loss + identity_loss(real_x, same_x)

    disc_x_loss = discriminator_loss(disc_real_x, disc_fake_x)
    disc_y_loss = discriminator_loss(disc_real_y, disc_fake_y)

  # Calculate the gradients for generator and discriminator
    generator_g_gradients = tape.gradient(total_gen_g_loss, 
                                        generator_g.trainable_variables)
    generator_f_gradients = tape.gradient(total_gen_f_loss, 
                                        generator_f.trainable_variables)

    discriminator_x_gradients = tape.gradient(disc_x_loss, 
                                            discriminator_x.trainable_variables)
    discriminator_y_gradients = tape.gradient(disc_y_loss, 
                                            discriminator_y.trainable_variables)

  # Apply the gradients to the optimizer
    generator_g_optimizer.apply_gradients(zip(generator_g_gradients, 
                                            generator_g.trainable_variables))

    generator_f_optimizer.apply_gradients(zip(generator_f_gradients, 
                                            generator_f.trainable_variables))

    discriminator_x_optimizer.apply_gradients(zip(discriminator_x_gradients,
                                                discriminator_x.trainable_variables))

    discriminator_y_optimizer.apply_gradients(zip(discriminator_y_gradients,
                                                discriminator_y.trainable_variables))

let's train the Cycle GAN. For the visualization of each Epoch, we will use the same sample photo, so that we can see the progress of the Generator.

In [None]:
for epoch in range(EPOCHS):
  start = time.time()

  n = 0
  for image_x, image_y in tf.data.Dataset.zip((train_photo, train_monet)):
    train_step(image_x, image_y)
    if n % 10 == 0:
      print ('.', end='')
    n += 1

  clear_output(wait=True).
  generate_images(generator_g, sample_photo)

  print (f'Time taken for epoch {(epoch + 1)} is {(time.time()-start)} sec\n')

# Visualizing results

In [None]:
# Run the trained model on the test dataset
for inp in test_photo.skip(5).take(5):
  generate_images(generator_g, inp)

# Saving results

In [None]:
import PIL
!mkdir /kaggle/working/images
import numpy as np

In [None]:
i = 1
for image in photo_data:
    pred = generator_g(image, training=False)[0].numpy()
    pred = (pred*127.5 + 127.5).astype(np.uint8)
    im = PIL.Image.fromarray(pred)
    im.save("/kaggle/working/images/generated_" + str(i) + ".jpg")
    i += 1

In [None]:
import shutil
shutil.make_archive("/kaggle/working/images", 'zip', "/kaggle/working/images")