# Lab 06 - Generative adversarial networks

## Introduction

In this laboratory we will implement what is called in the literature a Cycle GAN (https://arxiv.org/pdf/1703.10593.pdf). These are powerful models that can learn to translate images without having paired samples to learn from.

A Generative Adversarial Network can generate novel data that is similar to that found in the training set (comes from the same distribution). As you've seen in the lecture, classic models can take random noise as input and learn to generate such data. There are also models where, in addition to the random noise, the model can take another input which it is meant to translate. For instance, this paper https://arxiv.org/pdf/1611.07004.pdf showcases translating images containing outlines alone to fully fledged objects.

![](./edge_to_obj_.png)

In such cases, there are pairs of desired inputs and outputs. However, it is rarely the case that we can create or benefit from such well structured datasets. The general case consists in samples gathered separately from the two classes that we wish to translate. For instance you can have several images of horses, and several images of zebras with no way of pairing them.

![](./h2z.png)

A CycleGAN can learn to translate from one class to another given such a scenario.

## CycleGAN

A CycleGAN primarily consists in have two GAN models: one that translates from the first class to the second, and one that translates from the second class to the first. Lets name those classes X and Y.

There will be a generator that takes as input an image from the X class and outputs an image from the Y class, a discrimiator that has to distinguish real Y images from fake ones, a generator that translates from Y to X and a final distriminator that classifies real and fake X images.

In order to prevent the generators to output a single image from the opposite class regardless of the input there is an additional factor that the model has to optimize; the most important aspect of the architecture. The translations have to be cycle consistent. This means that if we take an X image and translate it into a Y image, and then translate it again into an X image, we should end up with the exact same image we started with. This is the fundamental ideea of this model.

![](./cycle.png)

Additionally, we will introduce another loss term. Given a X image, we what that the generator that translates from Y to X to exactly reproduce it (identity loss), similarly, the other way around.

## Implemenatation

With the help of your superviser you can implement a CycleGAN starting from this code skeleton. The example featured will be translating apples to oranges, this is so that a model trained for a few iterations can provide visually interesting results. The same model cand then be applied to the zebras to horses dataset, provided you train the model for a larger number of iterations, all you have to do is replace the 'apple2orange' dataset name to 'horse2zebra'. The script will load that dataset and train the model on it without any other changes required.

Firstly, importing all neccessary tools and loading the dataset.

In [1]:

import cv2
import numpy as np
import os
import time
import torch
from torchvision.transforms import transforms
import glob

device = torch.device('cuda')

In [2]:
class Apple2OrangeDataset(torch.utils.data.Dataset):
    def __init__(self, apple_path, orange_path):
        super().__init__()
        self.apple_path = glob.glob(apple_path)
        self.orange_path = glob.glob(orange_path)
        self.n_samples = len(self.apple_path)
        self.trans = transforms.ToTensor()
    
    def __getitem__(self, sample_n):
        apple = self.apple_path[sample_n]
        apple = cv2.imread(apple)
        
        apple = self.trans(apple)

        orange = self.orange_path[sample_n]
        orange = cv2.imread(orange)
        orange = self.trans(orange)
        return apple, orange
        
        
    def __len__(self):
        return self.n_samples

In [30]:
apple_path = "../datasets/apple2orange/trainA/*"
orange_path = "../datasets/apple2orange/trainB/*"
train_ds = Apple2OrangeDataset(apple_path, orange_path)
train_dl = torch.utils.data.DataLoader(
    train_ds,
    batch_size=32,
    shuffle=True,
    num_workers=4
)

In [31]:
for x,y in train_dl:
    print(x.shape)
    print(y.shape)
    break

torch.Size([32, 3, 256, 256])
torch.Size([32, 3, 256, 256])


Now, we will define our generator architecture. For this class we will use a simple U-Net generator. The ideea is to concatenate the convolutional features in the following manner.

![](./unet.jpg)

For instance, if your first layer has 32 filters, your last layer (excluding the final output) that will also have 32 layers, will be contatenated with the first one obtaining 64 channels as input for what follows. Another example suppose you have an architecture with the following number of filters: input -> 32 -> 64 -> 128 -> 64 -> 32 -> 3(final output). Then, the final 32-filter convolution output will have its output concatened with the output of the first 32-filter convolution, and the last 64-filter convolution will have its output concatenated with that of the first 64-filter convolution output. The reason for this is that the generators have to reconstruct pretty much the same visual structure they received as input so instead of having the task of memorizing the structure, we will simply give the the structure for it as it advances in its reconstruction.

Our final layer will have sigmoid activation in order to reconstruct the input.

In [32]:
class Generator(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # define a list of convolutional layers with
        # 64, 128, 256 and 512 filters respectively
        # relu activations, same padding and a kernel of size 3
        self.down = torch.nn.ModuleList([
            torch.nn.Sequential(
                torch.nn.Conv2d(3, 64, kernel_size=3, padding='same'),
                torch.nn.ReLU(),
            ),
            torch.nn.Sequential(
                torch.nn.Conv2d(64, 128, kernel_size=3, padding='same'), 
                torch.nn.ReLU(),
            ),
            torch.nn.Sequential(
                torch.nn.Conv2d(128, 256, kernel_size=3, padding='same'),
                torch.nn.ReLU(),
            ),
            torch.nn.Sequential(
                torch.nn.Conv2d(256, 512, kernel_size=3, padding='same'),
                torch.nn.ReLU(),
            ),
        ])
        # define a 2x2 max pooling layer
        self.pool = torch.nn.MaxPool2d(kernel_size=(2,2))
        # define a list of transposed convolutions with
        # 256, 128 and 64 filters respectively
        # relu activations, same padding, a kernnel of size 3 and a stride of size 2
        self.up =torch.nn.ModuleList([
            torch.nn.Sequential(
                torch.nn.ConvTranspose2d(512, 256, padding='same', stride=2, kernel_size=3 ),
                torch.nn.ReLU(),
                ),
            torch.nn.Sequential(
                torch.nn.ConvTranspose2d(256, 128, padding='same', stride=2, kernel_size=3 ),
                torch.nn.ReLU(),
                ),
            torch.nn.Sequential(
                torch.nn.ConvTranspose2d(128, 64, padding='same', stride=2, kernel_size=3 ),
                torch.nn.ReLU()
                ),
        ])
        # define a final deconvolution with 3 filters, kernel size 3, stride 2,
        # sigmoid activation, and same padding
        self.deconv = torch.nn.ConvTranspose2d(64, 3, kernel_size=3,stride=2)

    def forward(self, x):
        # sequentially apply the convolutions and pooling layer
        # retain all max pooling outputs in a list
        list_outputs = [ ]
        for layer in self.down:
            x = layer(x)
            x = self.pool(x)
            list_outputs.append(x)
        
        list_outputs = list_outputs[::-1]
        # sequentially apply all deconvolutions (except for the output layer)
        # concatenate each output with its correspoding convolutional output
        for i, layer in enumerate(self.up):
            x = layer(x)
            x = torch.cat([x, list_outputs[i]], dim=1)
        
        #apply the final decovolution and return the output
        out = self.deconv(x)
        return out

For the discrminator, since its will not have a complicated architecture, it will suffice to define a function that returns a sequential layer.

In [33]:

class Discriminator(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Sequential(
                torch.nn.Conv2d(3, 64, kernel_size=3, padding='same'),
                torch.nn.ReLU(),
                torch.nn.MaxPool2d(kernel_size=(2,2)),
                torch.nn.Conv2d(64, 128, kernel_size=3, padding='same'), 
                torch.nn.ReLU(),
                torch.nn.MaxPool2d(kernel_size=(2,2)),
                torch.nn.Conv2d(128, 256, kernel_size=3, padding='same'),
                torch.nn.ReLU(),
                torch.nn.MaxPool2d(kernel_size=(2,2)),
                torch.nn.Conv2d(256, 512, kernel_size=3, padding='same'),
                torch.nn.ReLU(),
        )
        self.linear = torch.nn.Sequential(
                torch.nn.Flatten(),
                torch.nn.Linear(512*32, 128),
                torch.nn.ReLU(),
                torch.nn.Linear(128, 1)
    )
        

    def forward(self, x):
        x = self.layer(x)
        print(x.shape)
        out = self.linear(x)
        return out

We will instantiate all models and define their individual optimizers

In [34]:
gen_x_y = Generator()
gen_y_x = Generator()
disc_x = Discriminator()
disc_y = Discriminator()
gen_x_y, gen_y_x, disc_x, disc_y = gen_x_y.to(device),gen_y_x.to(device),disc_x.to(device),disc_y.to(device)

opt_gen_x_y = torch.optim.Adam(gen_x_y.parameters(), lr=2e-4,  betas=(0.5, 0.999))
opt_gen_y_x = torch.optim.Adam(gen_y_x.parameters(), lr=2e-4, betas=(0.5, 0.999))
opt_disc_x = torch.optim.Adam(disc_x.parameters(), lr=2e-4, betas=(0.5, 0.999))
opt_disc_y = torch.optim.Adam(disc_y.parameters(), lr=2e-4,betas=(0.5, 0.999))
opts = {
    'opt_gen_x_y': opt_gen_x_y,
    'opt_gen_y_x': opt_gen_y_x,
    'opt_disc_x':opt_disc_x,
    'opt_disc_y': opt_disc_y
}

#### lossurile de gan in sine

iei poza cu cal --- cal_to_zebra -->  zerba_fake

iei poza cu zerba -- zebra_to_cal ->  cal_fake

treci cal si cal_fake prin discriminator de cal

 -> loss_discriminator_cal = sa prezica 1 pentru cal si 0 pentru fake_cal

 -> loss pentru generator_cal = discriminatorul sa prezica 1 pentru fake_cal

la fel si pentru zebra

#### loss cyclic


zebra_fake --- zebra_to_cal ---> cal_cyclic --> l2 loss intre cal si cal_cyclic

la fel si pentru zebra

#### identity loss

zebra -- cal_to_zebra --> acceasi zebra (l2 loss intre input si output)

la fel si pentru cal

calculez toate lossurile -> 8 ------> 4 lossuri, unu pentru fiecare model, cu suma lossurilor care implica
 modelul

am 4 optimizatoare cu parametrii pentru cele 4 modele

iau cele 4 lossuri pe rand:
    - loss.backward()
    - optimizator de interes.step()
    - toate_modelele.zero_grad()

In [36]:
# def fit(gen_x_y,gen_y_x, disc_x, disc_y, train_dl):
    # X apples
    # Y oranges
for apple, orange in train_dl:
    apple = apple.to(device)
    orange = orange.to(device)
    fake_orange = gen_x_y(apple)
    break
    # fake_apple = gen_y_x(orange)
    # print(apple.shape)
    # print(orange.shape)
    # print(fake_apple.shape)
    # print(fake_apple.shape)

TypeError: conv_transpose2d(): argument 'padding' (position 5) must be tuple of ints, not tuple

In [9]:
fit(gen_x_y,gen_y_x, disc_x, disc_y, train_dl)

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

The rest of the implementation is standard, trains the model for a few iterations and plots a few transformations applied on test images.

In [None]:
checkpoint_path = "./checkpoints/train"

ckpt = tf.train.Checkpoint(
    gen_x_y = gen_x_y,
    gen_y_x = gen_y_x,
    disc_x = disc_x,
    disc_y = disc_y,
)
ckpt_manager = tf.train.CheckpointManager(
    ckpt,
    checkpoint_path,
    max_to_keep = 1
)

In [None]:
train_x = iter(train_x)
train_y = iter(train_y)
n_iterations = 500
for i in range(n_iterations):
    fit_iteration(
        next(train_x), next(train_y)
    )
    print(i / n_iterations, end = '\r')

Additionally, you can implement Instance Normalization (https://arxiv.org/pdf/1607.08022.pdf) and regularization techniques in your model in order to further enhance performances.

Even without those, provided the implementation has been successful, the following block should yield semi-realistc images, even after training for as few as 500 iterations with a batch size of 1. Try it out!

In [None]:
test_x = iter(test_x)
test_y = iter(test_y)
import matplotlib.pyplot as plt
for _ in range(6):
    real_x, real_y = next(test_x), next(test_y)
    plt.imshow(
        np.concatenate([
            real_x, gen_x_y(real_x)
        ], axis = 2)[0]
    )
    plt.show()