# Introduction and Setup

This notebook is used for the course Machine Learning in Practice from the Radboud Univerity in Nijmegen. This notebook is the tutorial notebook by Amy Jang with modifications. Code from Erik Linder-Noren is used to convert the CycleGAN into a DualGAN.

This notebook utilizes a DualGAN architecture to add Monet-style to photos. The TFRecord dataset will be used. Import the following packages and change the accelerator to TPU.

The imports are needed to make the code work, these imports are combined of the imports needed for both the tutorial notebook by Amy Jang and the code by Erik Linder-Noren.

Tutorial notebook by Amy Jang: https://www.kaggle.com/amyjang/monet-cyclegan-tutorial

Erik Linder-Noren's Github page: https://github.com/eriklindernoren/Keras-GAN

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_addons as tfa
import tensorflow_datasets as tfds

from kaggle_datasets import KaggleDatasets
import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout, Concatenate
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import RMSprop, Adam
from keras.utils import to_categorical
import keras.backend as K

try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
    print('Device:', tpu.master())
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.experimental.TPUStrategy(tpu)
except:
    strategy = tf.distribute.get_strategy()
print('Number of replicas:', strategy.num_replicas_in_sync)

AUTOTUNE = tf.data.experimental.AUTOTUNE
    
print(tf.__version__)

# Load in the data

The following section contains code that has been taken from the tutorial notebook by Amy Jang. 

We want to keep our photo dataset and our Monet dataset separate. First, load in the filenames of the TFRecords.

In [None]:
GCS_PATH = KaggleDatasets().get_gcs_path()

In [None]:
MONET_FILENAMES = tf.io.gfile.glob(str(GCS_PATH + '/monet_tfrec/*.tfrec'))
print('Monet TFRecord Files:', len(MONET_FILENAMES))

PHOTO_FILENAMES = tf.io.gfile.glob(str(GCS_PATH + '/photo_tfrec/*.tfrec'))
print('Photo TFRecord Files:', len(PHOTO_FILENAMES))

All the images for the competition are already sized to 256x256. As these images are RGB images, set the channel to 3. Additionally, we need to scale the images to a [-1, 1] scale. Because we are building a generative model, we don't need the labels or the image id so we'll only return the image from the TFRecord.

In [None]:
IMAGE_SIZE = [256, 256]

def decode_image(image):
    image = tf.image.decode_jpeg(image, channels=3)
    image = (tf.cast(image, tf.float32) / 127.5) - 1
    image = tf.reshape(image, [*IMAGE_SIZE, 3])
    return image

def read_tfrecord(example):
    tfrecord_format = {
        "image_name": tf.io.FixedLenFeature([], tf.string),
        "image": tf.io.FixedLenFeature([], tf.string),
        "target": tf.io.FixedLenFeature([], tf.string)
    }
    example = tf.io.parse_single_example(example, tfrecord_format)
    image = decode_image(example['image'])
    return image

Define the function to extract the image from the files.

In [None]:
def load_dataset(filenames, labeled=True, ordered=False):
    dataset = tf.data.TFRecordDataset(filenames)
    dataset = dataset.map(read_tfrecord, num_parallel_calls=AUTOTUNE)
    return dataset

Let's load in our datasets.

In [None]:
monet_ds = load_dataset(MONET_FILENAMES, labeled=True).batch(1)
photo_ds = load_dataset(PHOTO_FILENAMES, labeled=True).batch(1)

In [None]:
monet_iter = iter(monet_ds)
photo_iter = iter(photo_ds)

for _ in range(4):
    example_monet = next(monet_iter)
    example_photo = next(photo_iter)

Let's  visualize a photo example and a Monet example.

In [None]:
plt.subplot(221)
plt.title('Photo')
plt.imshow(example_photo[0] * 0.5 + 0.5) 

plt.subplot(223)
plt.imshow(example_photo[0])

plt.subplot(222)
plt.title('Monet')
plt.imshow(example_monet[0] * 0.5 + 0.5)

plt.subplot(224)
plt.imshow(example_monet[0])
plt.show()


# Build the generator

The code in this section has been taken from the notebook by Amy Jang. 

We'll be using a UNET architecture for our CycleGAN. To build our generator, let's first define our `downsample` and `upsample` methods.

The `downsample`, as the name suggests, reduces the 2D dimensions, the width and height, of the image by the stride. The stride is the length of the step the filter takes. Since the stride is 2, the filter is applied to every other pixel, hence reducing the weight and height by 2.

We'll be using an instance normalization instead of batch normalization. As the instance normalization is not standard in the TensorFlow API, we'll use the layer from TensorFlow Add-ons.

In [None]:
def downsample(filters, kernel_size, apply_instancenorm=True):
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

    result = keras.Sequential()
    result.add(layers.Conv2D(filters, kernel_size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False))

    if apply_instancenorm:
        result.add(tfa.layers.InstanceNormalization(gamma_initializer=gamma_init))

    result.add(layers.LeakyReLU())

    return result

`Upsample` does the opposite of downsample and increases the dimensions of the of the image. `Conv2DTranspose` does basically the opposite of a `Conv2D` layer.

In [None]:
def upsample(filters, kernel_size, apply_dropout=False):
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

    result = keras.Sequential()
    result.add(layers.Conv2DTranspose(filters, kernel_size, strides=2,
                                      padding='same',
                                      kernel_initializer=initializer,
                                      use_bias=False))

    result.add(tfa.layers.InstanceNormalization(gamma_initializer=gamma_init))

    if apply_dropout:
        result.add(layers.Dropout(0.5))

    result.add(layers.ReLU())

    return result

Let's build our generator!

The generator first downsamples the input image and then upsample while establishing long skip connections. Skip connections are a way to help bypass the vanishing gradient problem by concatenating the output of a layer to multiple layers instead of only one. Here we concatenate the output of the downsample layer to the upsample layer in a symmetrical fashion.

In [None]:
def Generator(output_channels = 3):
    inputs = layers.Input(shape=[256,256,3])

    # bs = batch size
    down_stack = [
        downsample(64, 4, apply_instancenorm=False), # (bs, 128, 128, 64)
        downsample(128, 4), # (bs, 64, 64, 128)
        downsample(256, 4), # (bs, 32, 32, 256)
        downsample(512, 4), # (bs, 16, 16, 512)
        downsample(512, 4), # (bs, 8, 8, 512)
        downsample(512, 4), # (bs, 4, 4, 512)
        downsample(512, 4), # (bs, 2, 2, 512)
        downsample(512, 4), # (bs, 1, 1, 512)
    ]
    up_stack = [
        upsample(512, 4, apply_dropout=True), # (bs, 2, 2, 1024)
        upsample(512, 4, apply_dropout=True), # (bs, 4, 4, 1024)
        upsample(512, 4, apply_dropout=True), # (bs, 8, 8, 1024)
        upsample(512, 4), # (bs, 16, 16, 1024)
        upsample(256, 4), # (bs, 32, 32, 512)
        upsample(128, 4), # (bs, 64, 64, 256)
        upsample(64, 4), # (bs, 128, 128, 128)
    ]

    initializer = tf.random_normal_initializer(0., 0.02)
    last = layers.Conv2DTranspose(output_channels, 4,
                                  strides=2,
                                  padding='same',
                                  kernel_initializer=initializer,
                                  activation='tanh') # (bs, 256, 256, 3)
    

    x = inputs

    skips = []
    for down in down_stack:
        x = down(x)
        skips.append(x)

    skips = reversed(skips[:-1]) 

    # Upsampling and establishing the skip connections
    for up, skip in zip(up_stack, skips):
        x = up(x)
        # concatenates output of upsample layer and skip connection
        x = layers.Concatenate()([x, skip])
        
    prediction = last(x)

    return keras.Model(inputs=inputs, outputs=prediction)

In [None]:
# Added by our team to check the architecture of the Generator. 
test_gen = Generator()
test_gen.summary()

# Build the discriminator

The code in this section has been taken from the notebook by Amy Jang. Some modifications have been made by our team. 

The discriminator takes in the input image and classifies it as real or fake (generated). Instead of outputing a single node, the discriminator outputs a smaller 2D image with higher pixel values indicating a real classification and lower values indicating a fake classification.

In [None]:
def Discriminator():
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

    inp = layers.Input(shape=[256, 256, 3], name='input_image')

    x = inp

    down1 = downsample(64, 4, False)(x) # (bs, 128, 128, 64)
    down2 = downsample(128, 4)(down1) # (bs, 64, 64, 128)
    down3 = downsample(256, 4)(down2) # (bs, 32, 32, 256)

    zero_pad1 = layers.ZeroPadding2D()(down3) # (bs, 34, 34, 256)
    conv = layers.Conv2D(512, 4, strides=1,
                         kernel_initializer=initializer,
                         use_bias=False)(zero_pad1) # (bs, 31, 31, 512)

    norm1 = tfa.layers.InstanceNormalization(gamma_initializer=gamma_init)(conv)

    leaky_relu = layers.LeakyReLU()(norm1)

    zero_pad2 = layers.ZeroPadding2D()(leaky_relu) # (bs, 33, 33, 512)

    #Modification by our team: removing the last layer. 
    #last = layers.Conv2D(1, 4, strides=1,
    #                     kernel_initializer=initializer)(zero_pad2) # (bs, 30, 30, 1)
    
    flatten = layers.Flatten()(zero_pad2)
    
    last = layers.Dense(1)(flatten)

    return tf.keras.Model(inputs=inp, outputs=last)

In [None]:
# Added by our team to check the architecture of the Discriminator. 
test_disc = Discriminator()
test_disc.summary()

# Training procedure 

The code in this section has been taken from the github from Erik Linder-Norén. Several modifications have been made by our team. 

In [None]:
class DUALGAN(keras.Model):
    def __init__(self,
                 monet_generator, 
                 photo_generator,
                 monet_discriminator, 
                 photo_discriminator,
                 optimizer, 
                 disc_loss, 
                 gen_loss):
        
        super(DUALGAN, self).__init__()
        # Modification made by our team: initialisation outside the class. 
        self.gen_loss = gen_loss
        self.disc_loss = disc_loss
        self.optimizer = optimizer
        self.img_rows = 256
        self.img_cols = 256
        self.img_channels = 3
        self.img_dim = self.img_rows*self.img_cols
        
        #Monet discriminator
        self.D_A = monet_discriminator 
        self.D_A.compile(loss=self.disc_loss, optimizer=self.optimizer, metrics=['accuracy'])

        #photo discriminator
        self.D_B = photo_discriminator 
        self.D_B.compile(loss=self.disc_loss, optimizer=self.optimizer, metrics=['accuracy'])

        #-------------------------
        # Construct Computational
        #   Graph of Generators
        #-------------------------

        # Build the generators
        self.G_AB = photo_generator #monet to photo
        self.G_BA = monet_generator #photo to monet

        # For the combined model we will only train the generators
        self.D_A.trainable = False
        self.D_B.trainable = False

        # The generator takes images from their respective domains as inputs
        imgs_A = Input(shape=[self.img_rows, self.img_cols, self.img_channels]) 
        imgs_B = Input(shape=[self.img_rows, self.img_cols, self.img_channels])

        # Generators translates the images to the opposite domain
        fake_B = self.G_AB(imgs_A)
        fake_A = self.G_BA(imgs_B)

        # The discriminators determines validity of translated images
        valid_A = self.D_A(fake_A)
        valid_B = self.D_B(fake_B)

        # Generators translate the images back to their original domain
        recov_A = self.G_BA(fake_B)
        recov_B = self.G_AB(fake_A)

        # Modification made by our team: different loss-weights.
        # The combined model  (stacked generators and discriminators)
        self.combined = Model(inputs=[imgs_A, imgs_B], outputs=[valid_A, valid_B, recov_A, recov_B])
        self.combined.compile(loss=[self.gen_loss, self.gen_loss, 'mae', 'mae'],
                                optimizer=self.optimizer,
                                loss_weights=[1, 1, 500, 500])

    

    def sample_generator_input(self, X, n_samples, same=False):
        #Samples a number of images from X given n_samples. If same=True,
        #the sampled images are the images at range [0,n_samples). 
        if same:
            idx = np.arange(n_samples)
        else:
            idx = np.random.randint(0, X.shape[0], n_samples)
        return X[idx]

    def train(self, X_A, X_B, epochs, batch_size=128, sample_interval=10, plot = True, clip_value = 0.01, n_critic = 3):

        valid = np.ones((batch_size, 1))
        fake = np.zeros((batch_size, 1))

        for epoch in range(epochs):
            # Train the discriminator for n_critic iterations
            for _ in range(n_critic):

                # ----------------------
                #  Train Discriminators
                # ----------------------

                # Sample generator inputs
                imgs_A = self.sample_generator_input(X_A, batch_size) #monet
                imgs_B = self.sample_generator_input(X_B, batch_size)

                # Translate images to their opposite domain
                fake_B = self.G_AB.predict(imgs_A)
                fake_A = self.G_BA.predict(imgs_B)

                # Train the discriminators # TODO: ensure that the losses are (lists of) scalars?
                D_A_loss_real = self.D_A.train_on_batch(imgs_A, valid)
                D_A_loss_fake = self.D_A.train_on_batch(fake_A, fake)

                D_B_loss_real = self.D_B.train_on_batch(imgs_B, valid)
                D_B_loss_fake = self.D_B.train_on_batch(fake_B, fake)

                D_A_loss = np.add(D_A_loss_real, D_A_loss_fake)
                D_B_loss = np.add(D_B_loss_real, D_B_loss_fake)

                # Clip discriminator weights
                for d in [self.D_A, self.D_B]:
                    for l in d.layers:
                        weights = l.get_weights()
                        weights = [np.clip(w, -clip_value, clip_value) for w in weights]
                        l.set_weights(weights)

                # ------------------
                #  Train Generators
                # ------------------

                # Train the generators
            g_loss = self.combined.train_on_batch([imgs_A, imgs_B], [valid, valid, imgs_A, imgs_B])
   
            # Print the progress
            print ("%d [D1 loss: %f] [D2 loss: %f] [G loss: %f]" \
                       % (epoch, D_A_loss[0], D_B_loss[0], g_loss[0]))
            
            if plot and epoch % sample_interval == 0:
                self.plot_predictions(X_A, X_B)

            
    def get_predictions(self, X_A, X_B, n_samples, same):
        imgs_A = self.sample_generator_input(X_A, n_samples, same)
        imgs_B = self.sample_generator_input(X_B, n_samples, same)

        # Images translated to their opposite domain
        fake_B = self.G_AB.predict(imgs_A)
        fake_A = self.G_BA.predict(imgs_B)
            
        return imgs_A, imgs_B, fake_A, fake_B
    
    def plot_predictions(self, monet, photo, n_samples = 4, figsize=(10,15), same = True, scale = 0.5):
        
        real_monet, real_picture, fake_monet, fake_picture = self.get_predictions(monet, photo, n_samples, same)
        
        # show picture -> Monet
        plt.figure(figsize=figsize)
        for i in range(n_samples):
            plt.subplot(n_samples,2,1+i*2)
            plt.imshow(real_picture[i] * scale + scale)
            plt.title("picture")

            plt.subplot(n_samples,2,2+i*2)
            plt.imshow(fake_monet[i] * scale + scale)
            plt.title("picture to Monet")
        plt.show()

        # show Monet -> picture
        plt.figure(figsize=figsize)
        for i in range(n_samples):
            plt.subplot(n_samples,2,1+i*2)
            plt.imshow(real_monet[i] * scale + scale)
            plt.title("Monet")

            plt.subplot(n_samples,2,2+i*2)
            plt.imshow(fake_picture[i] * scale + scale)
            plt.title("Monet to picture")
        plt.show()

In [None]:
# Modification by our team: Here the BatchDataset objects are prosessed 
# such that they can be used when training.
monet = np.array([m.squeeze() for m in monet_ds.as_numpy_iterator()])
photo = np.array([p.squeeze() for p in photo_ds.as_numpy_iterator()])

In [None]:
#Modification by our team: Data augmentation

def hor_flip(image):
    image = tf.image.flip_left_right(image)
    return image

def augment_data(original_data):

    data = original_data.copy()
    aug_data = np.zeros((data.shape[0]*2, data.shape[1], data.shape[2], data.shape[3])) 
    
    for i in range(0,data.shape[0]):
        aug_data[i+data.shape[0]] = hor_flip(data[i])
        aug_data[i] = data[i] 
    
    return aug_data
augmented_monet = augment_data(monet)

# Initializing and training

The code in this section is inspired on the notebook by Amy Jang.

In [None]:
with strategy.scope():
    monet_generator = Generator() # transforms photos to Monet-esque paintings = A
    photo_generator = Generator() # transforms Monet paintings to be more like photos = B

    monet_discriminator = Discriminator() # differentiates real Monet paintings and generated Monet paintings = AB
    photo_discriminator = Discriminator() # differentiates real photos and generated photos = BA

In [None]:
with strategy.scope():
    optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

In [None]:
#Modification by our team: Different loss functions.
with strategy.scope():

    def disc_loss(validation_prediction, truth):
        mse = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)
        return mse(truth, validation_prediction)
    
    def gen_loss(validation_prediction, truth):
        mse = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)
        return mse(truth, validation_prediction)



In [None]:
with strategy.scope():
    gan = DUALGAN(monet_generator,
                  photo_generator, 
                  monet_discriminator, 
                  photo_discriminator, 
                  optimizer, 
                  disc_loss, 
                  gen_loss)

In [None]:
with strategy.scope():
    gan.train(augmented_monet, photo, epochs=111, batch_size=32)

# Visualize our Monet-esque photos

The code in this section is taken from the notebook by Amy Jang.

In [None]:
_, ax = plt.subplots(5, 2, figsize=(12, 12))
for i, img in enumerate(photo_ds.take(4)):
    prediction = monet_generator(img, training=False)[0].numpy()
    prediction = (prediction * 127.5 + 127.5).astype(np.uint8)
    img = (img[0] * 127.5 + 127.5).numpy().astype(np.uint8)

    ax[i, 0].imshow(img)
    ax[i, 1].imshow(prediction)
    ax[i, 0].set_title("Input Photo")
    ax[i, 1].set_title("Monet-esque")
    ax[i, 0].axis("off")
    ax[i, 1].axis("off")
#plt.show()

# Create submission file

The code in this section has been taken from the notebook by Amy Jang.

In [None]:
import PIL
! mkdir ../images

In [None]:
i = 1
for img in photo_ds:
    prediction = monet_generator(img, training=False)[0].numpy()
    prediction = (prediction * 127.5 + 127.5).astype(np.uint8)
    im = PIL.Image.fromarray(prediction)
    im.save("../images/" + str(i) + ".jpg")
    i += 1

In [None]:
import shutil
shutil.make_archive("/kaggle/working/images", 'zip', "/kaggle/images")