## PA4.2  Cycle gans with tensorflow (65 marks)
### In this notebook we use [CycleGAN](https://arxiv.org/abs/1703.10593) to generate images of horses from zebras and vice versa.




[All code blocks should already have been run and the outputs should be visible in order to be graded]


Roll number:

# Important Instructions

- You are allowed to (encouraged to) use either colabs or kaggle notebooks. 
- Please do not do this on your local machine as it will take unnecessarily long times to run, and you will have to deal with the issues of storing models.
- There are certain code blocks that have been partially implemented, in such blocks there are places where you will find a  "..." you are supposed to fill in the "..." with your code.
- In other code blocks you have to implement the whole block, please do so in the space provided.

The following link: https://colab.research.google.com/github/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/CycleGAN.ipynb should lead you to an abstration of what you will be doing, the only difference is that you will be coding this on Cyclegans on your own.


now lets get started


# import libraries

Do not include extra imports, you should be able to complete this assignment with the following imports alone, if you think you need more, please ask reahc out via slack.

In [4]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras import layers, Model, Input
import tensorflow_addons as tfa
import matplotlib.pyplot as plt
from IPython.display import clear_output
import time

AUTOTUNE = tf.data.AUTOTUNE

## Load dataset (2.5 marks)

For this paper will be using the horse2zebra dataset. You can find this dataset and similar ones [here](https://www.tensorflow.org/datasets/catalog/cycle_gan).

You can go ahead and apply  any sort of preprocessing u want on the images, data augmentation works too, since this exact the image translation is also being done in the paper, you may want to refer to the [paper](https://arxiv.org/abs/1703.10593), itself for some help.

In [1]:
# start of by loading the dataset itself, and then divide the dataset into training and testing.

In [6]:
# feel free to use these variables to change the size of the images (these are default values as per the paper)
BUFFER_SIZE = 1000
BATCH_SIZE = 1
IMG_WIDTH = 256
IMG_HEIGHT = 256

# Preprocessing (2.5 marks)

In [7]:
# Use this code block to perform the necessary preprocessing for your data. 
# You are encouraged to make your code modular by defining functions for preprocessing, and then apply these functions in subsequent code blocks.

# Normalise the images (2.5 marks)

In [8]:
# implement a function for image normalization

Ideally you could create iter objects to traverse through the images but you can choose to go with your own implementation, and deviatef rom the template if you want to. 

In [None]:
# create iter object here 

# Visualize the images

visualize the images and check if the preprocesisng is being done correctly

# The architecture

Now comes the point where we make our own architecture, the paper itself employs a basic cycle gans architecture, featuring 2 generators and 2 discriminators, and in the implementation it uses the gen and disrim architectures from the pix2pix gans model, however we will not be doing this, we will be coding our own architectures, it will still be 2 generators and 2 discrimiantors but not the ones being used in pix2pix.

# Sampling (2.5)

Before we even get to the architectures of the models, we need to worry about the sampling, we need to make sure that the images are being  interpreted by the neural networks correctly, what I mean by this is, that we need to implement some sort of sampling functions that do the following:

1. Compresses images to extract essential features lets call this the downsmapling function.
2. Reconstructs these images from features into the target domain, enabling unpaired image-to-image translation, lets call this the upsampling function

both of these funciton will be used in your neural networks, they have been implemented below.

YOU HAVE TO FILL IN THE "..." IN THE CODE BLOCKS BELOW, with whatever you feel fits best, once again the paper might be of help here.

In [None]:
def downsample(filters, size, apply_norm=True, norm_type='instancenorm'):

    # Initialize weights with a mean and standard deviation of your choosing  
    initializer = ...

    # Create a sequential model to stack layers
    result = tf.keras.Sequential()

    # Add a 2D Convolution layer for downsampling, with specified filters, size, and stride, without bias
    result.add(layers.Conv2D(filters, size, strides=..., padding='same', kernel_initializer=initializer, use_bias=False))

    # Conditionally apply normalization
    if apply_norm:
        if norm_type == 'instancenorm':  # Check if normalization type is instance normalization
            # Add instance normalization layer
            result.add(tfa.layers.InstanceNormalization())
    
    # Add LeakyReLU or RelU (your choice to make) for non-linearity while allowing some gradients to flow for negative values        
    result.add(...)
    return result


def upsample(filters, size, apply_dropout=False):

    # Initialize weights with a mean and standard deviation of your choosing  
    initializer = ...

    # Create a sequential model to stack layers
    result = tf.keras.Sequential()

    # Add a 2D Transposed Convolution layer for upsampling, with specified filters, size, and stride, without bias
    result.add(layers.Conv2DTranspose(filters, size, strides=..., padding='same', kernel_initializer=initializer, use_bias=False))

    # Add instance normalization layer
    result.add(tfa.layers.InstanceNormalization())

    # Conditionally apply dropout for regularization
    if apply_dropout:
        # Add dropout layer with a dropout rate of 0.5
        result.add(layers.Dropout(0.5))

    # Add LeakyReLU or RelU (your choice to make) for non-linearity to ensure positive activations
    result.add(...)
    return result


# Generator Architecture (10 marks)

You will be implementing the generator architecture of U-Net, which is a popular architecture for image-to-image translation tasks. The architecture is as follows:

1. Downsampling: The generator first downsamples the input image and extracts features from it.
2. Bottleneck: The features are then passed through a bottleneck layer.
3. Upsampling: The features are then upsampled to generate the output image.


In [2]:
def unet_generator(output_channels, norm_type='instancenorm'):

    # 1. Define the input layer to accept images of shape (Height, Width, 3)
    inputs = Input(shape=[None, None, 3])

    # 2. Construct the downsampling stack 
    down_stack = [
        downsample(64, 4, apply_norm=False), # Starting without normalization for the first layer
        # Add additional downsample layers with increasing filters and normalization
    ]

    # 3. Construct the upsampling stack  
    up_stack = [
        upsample(512, 4, apply_dropout=True), # Starting with dropout layers for regularization
        # Add additional upsample layers with decreasing filters
    ]

    # 4. Define the final layer with a Conv2DTranspose to map to the desired output_channels with 'tanh' activation
    initializer = tf.random_normal_initializer(... , ...) # Initialize with the same mean and std dev you used above
    last = layers.Conv2DTranspose(output_channels, 4, strides=2, padding='same', kernel_initializer=initializer, activation='tanh')

    # Implement the forward pass: downsampling, skip connections, and upsampling
    # Remember to concatenate the corresponding downsample layer's output with the upsample layer's output for the skip connections
    
    return Model(inputs=inputs, outputs= ...)  # Return the model instance

# Discriminator Architecture (10 marks)

The discriminator architecture is a PatchGAN discriminator, which is a popular discriminator architecture for image-to-image translation tasks. The architecture is as follows:

1. The discriminator downsamples the input image and extracts features from it.
2. The features are then passed through a bottleneck layer.
3. The features are then passed through a classification layer.



In [42]:
def discriminator(norm_type='instancenorm', target=False):
    # 1. Initialize weights and define input layer for input images
    initializer = tf.random_normal_initializer(..., ...) # Initialize with the same mean and std dev you used above
    inp = Input(shape=[None, None, 3], name='input_image')

    x = inp

    # 2. Optionally concatenate target image if discriminator is conditional
    if target:
        tar = Input(shape=[None, None, 3], name='target_image')
        x = layers.concatenate([inp, tar])  # Combine input and target for conditional GAN

    # 3. Add downsampling layers to process the input (or combined input and target) images
    x = downsample(64, 4, False, norm_type)(x)  # Start with downsampling without normalization
    x = downsample(128, 4, True, norm_type)(x)  # Subsequent layers with normalization
    x = downsample(256, 4, True, norm_type)(x)


    # 4. Apply convolution with zero padding -> Conv2d -> normalization layers to refine the features before the final layer
    zero_pad1 =  ...
    conv =  ...

    if norm_type == 'instancenorm':
        norm1 =  ...
        leaky_relu = layers.LeakyReLU()(norm1)


    # 5. Add a final convolution layer without normalization
        
    zero_pad2 = ...
    last = ...

    # Return model based on whether target image is part of the input
    if target:
        return Model(inputs=[inp, tar], outputs=last)
    else:
        return Model(inputs=inp, outputs=last)


In [43]:
OUTPUT_CHANNELS = 3  # dont change unless u decided to make images grayscale in preprocessing (ideally u did not do that)

# declare the generators
generator_g = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
generator_f = unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')

# declare the discriminators
discriminator_x = discriminator(norm_type='instancenorm', target=False)
discriminator_y = discriminator(norm_type='instancenorm', target=False)


# Visualize (again)

Visualize the images latent space representation to check if the models are doing some sort of trnasformation (completely optional)

## Loss functions (5 marks)

In CycleGAN, there is no paired data to train on, hence there is no guarantee that the input and the target  pair are meaningful during training. Thus in order to enforce that the network learns the correct mapping, the authors propose the cycle consistency loss.

You are going to have to implement the following loss functions:

1. Discriminator loss
2. Generator loss
3. Identity loss
4. Cycle consistency loss

In [47]:
# This line initializes a loss_obj object as an instance of TensorFlow's BinaryCrossentropy class, configured to operate on logits.
loss_obj = tf.keras.losses.BinaryCrossentropy(from_logits=True)

In [48]:
def discriminator_loss(real, generated):
  
  # Calculate the real loss using the loss object and the real and generated values

  real_loss = ...
  generated_loss = ...
  total_disc_loss = real_loss + generated_loss

  return total_disc_loss * 0.5

In [49]:
def generator_loss(generated):
  
  # Calculate the loss between the generated images and a matrix of ones, using the loss object
  
  return ...

In [50]:
def calc_cycle_loss(real_image, cycled_image):

  # Calculate the cycle consistency loss using the L1 norm (tf.reduce_mean and tf.abs will be useful here) also consider multiplying 
  # the loss by some scalar factor (like 10) to give it more weight, as compared to the other losses.
  
  loss1 = ...
  final_loss = ...
  return final_loss

In [51]:
def identity_loss(real_image, same_image):
  # Calculate the cycle consistency loss using the L1 norm (tf.reduce_mean and tf.abs will be useful here) also consider multiplying 
  # the loss by some scalar factor (like 5) to give it less weight, as compared to the other losses.
  
  loss2 = ...
  final_loss2 = ...
  return final_loss2

Initialize the optimizers for all the generators and the discriminators. 

we will be using the good old adam optimizer, with the betas set to what they are rn.

In [52]:
generator_g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
generator_f_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

discriminator_x_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_y_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

## Checkpoints (Optional but highly recomended)

You can save the model checkpoints at regular intervals, so that you can resume training in case of a runtime failure. You can also use these checkpoints to test the model on new images without having to retrain it, point is you should save the model checkpoints, becuase training time for GANs is not quite long.


## Training the model (15 marks)

Now that we are done with the model and everything else, lets train the model, you can train the model for as many epochs as you want, the paper trains it for 200 epochs, that however is not possible, so you are reccomended to train the model to the point where you see some sort of transaltion in the images.

In most computationally restricted implementations, the model is trained for 5 to 10 epochs, and does show some signs of image translation. but this also depends on your architecture.

In [54]:
EPOCHS = 10

In [3]:
# Ideally implement a function right now to generate images, during training.
def generate_images(model, test_input):
  ...

The training loop consists of four main steps:

* Get the predictions (essentially a forward pass).
* Calculate the loss.
* Calculate the gradients using backpropagation.
* Apply the gradients to the optimizer.

In [56]:
@tf.function
def train_step(real_x, real_y):

  # The Set up for TensorFlow's automatic differentiation context, has already been done for you do not change it
  
  with tf.GradientTape(persistent=True) as tape:

    # what you have to do now is translate from domain X to Y and back to X, and from Y to X and back to Y, for cycle consistency
    # so train the generators for real and fake data, as per fucniton prototype.
    
    # code here



    # Generates images in the same domain to enforce identity preservation.
    # so train the generators for real data, as per fucniton prototype.
     
    # code here




    # Use the Discriminators to determine the what image is real and what is fake images.
    # so train the discriminators for real and fake data, as per fucniton prototype.
     
    # code here




    # Calculates losses for generators and discriminators.
    # for your ease losses are given

    gen_g_loss = ...
    gen_f_loss = ...

    total_cycle_loss = ...  
    total_gen_g_loss = ...
    total_gen_f_loss = ...

    disc_x_loss =  ...
    disc_y_loss =  ...


  # Computes gradients of losses and apply them to optimize the models accordingly (use and tape zip)
  # an example has been done below for generator_g

  generator_g_gradients = tape.gradient(total_gen_g_loss, generator_g.trainable_variables)
  generator_g_optimizer.apply_gradients(zip(generator_g_gradients, generator_g.trainable_variables))
 


In [5]:
EPOCHS = 0

In [6]:
for epoch in range(EPOCHS):
  # code for executing the loop should go here
  ...


## Generation time


In [7]:
# Run the trained model on a couple of images in the test dataset and simiply display them.

# Analytical questions (15 marks)

Q1) What would happen if you were to reduce the size of the dataset to 1/4 the number of images, and then train the GAN, how would the results differ, and why? Please do the following:

1. Train the GAN with the 1/4 the dataset
2. Run the epochs till 10.
3. Generate images using this newly trained model
4. Along with a couple of images give reasoning for the results, as well as a description of the results.

Ans:

Q2) What would happen if you were to change the loss_obj object which is currently an instance of TensorFlow's BinaryCrossentropy class, and then train the GAN, how would the results differ, and why? Please do the following:

1. Train the GAN with the 1/4 the dataset.
2. chnage the BCE loss to somehting other relevant loss.
2. run till 10 epochs.
2. Generate images using this newly trained model
3. Along with a couple of images give reasoning for the results, as well as a description of the results, and how they differ to part 1s answer

Ans:

Q3) What would happen if you were to add noise to the images of horses, add noise of coming from a random noise distribution to all horse images, and then train the GAN, how would the results differ, and why? Please do the following:

1. Take 1/4 the dataset, and add noise to the images of only horses.
2. Train the GAN with this 1/4 the dataset.
3. run till 10 epochs.
4. Generate images using this newly trained model
5. Along with a couple of images give reasoning for the results, as well as a description of the results, and how they differ to part 1's answer

Ans:
