<table>
<td height="150px">
<img src='https://stergioc.github.io/assets/img/logos.png' />
</td>
</table>

#FDL DSBA 2023-2024 Individual Assignment
### **Deadline:** 03/01/2024 @ 23:59
This is an individual assignement. You will be graded in 100 points while two different optional questions will be possible for you to have 20 extra points.

Please fill in the blanks in between the `****START CODE****` and `****END CODE****` lines in the cells, and answer to the questions in Markdown in the first section where you see `ANSWER HERE`.

In order to submit, please rename the file to `FDL_Assignment_<first_name>_<last_name>.ipynb` and upload your solution to Edunao after zipping it.

## Section 1. Getting started **[10 pts]**
Answer these theoretical questions in the `ANSWER HERE` cells.

**1.1**. What is a metric? What is a loss? What is the difference between both, and how are they used in the training process?

`ANSWER HERE`

1.2. Briefly explain the concept of gradient descent, and how it is used in the training process.

`ANSWER HERE`

1.3. Explain the bias-variance tradeoff problem, and how it is linked with the concept of overfitting and underfitting.

`ANSWER HERE`

1.4. What is an activation function, and why is it necessary in order to stack multiple layers?

`ANSWER HERE`

1.5. Briefly explain the CNN architecture, and why it is more adapted to images than standard MLP.

`ANSWER HERE`

1.6. What is the difference between deep learning and classical machine learning? What are the main advantages of deep learning over more classical techniques?

`ANSWER HERE`

1.7. Discuss the ethical considerations and potential biases that may arise during a training of a deep learning model. How is it possible to take this into account?

`ANSWER HERE`

1.8. Discuss the ethical implications of deploying deep learning models in critical processes.

`ANSWER HERE`

1.9. List a couple of methods that you can use to help interpret the classifications of your neural network, and provide a brief explanation of how they work.

`ANSWER HERE`

## Section 2. Training a CNN **[40 pts]**
In this section, you will train a CNN on a dataset of [histopathology patches](https://en.wikipedia.org/wiki/Histopathology). This data corresponds to digitized microscopic analysis of tumor tissue, which has been divided into patches. The objective is to classify the patches into the ones containing tumor tissue, and ones not containing any tumor tissue. We will use the [PCAM dataset](https://github.com/basveeling/pcam) which consists of 96x96 pixel patches. We will only use the validation set (which contains 32768 patches and which should take about 0.8 GB of storage) in order to make the training faster.

In [None]:
import h5py
import random
import numpy as np
import torch.nn as nn
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
from torchvision.datasets.utils import download_file_from_google_drive, _decompress

2.1. Download the dataset which is stored in a `.h5` file.
The images can be download from [here](https://drive.google.com/uc?export=download&id=1hgshYGWK8V-eGRy8LToWJJgDU_rXWVJ3), and the labels from [here](https://drive.google.com/uc?export=download&id=1bH8ZRbhSVAhScTS0p9-ZzGnX91cHT3uO). Please then unzip the files and write the paths below. **[1 pt]**

In [None]:
from torchvision.datasets.utils import download_file_from_google_drive, _decompress

# You can run the following cell to download the files on colab
base_folder = "./"
archive_name = "camelyonpatch_level_2_split_valid_x.h5.gz"
download_file_from_google_drive("1hgshYGWK8V-eGRy8LToWJJgDU_rXWVJ3", base_folder, filename=archive_name, md5="d5b63470df7cfa627aeec8b9dc0c066e")
_decompress(base_folder + archive_name)

archive_name = "camelyonpatch_level_2_split_valid_y.h5.gz"
download_file_from_google_drive("1bH8ZRbhSVAhScTS0p9-ZzGnX91cHT3uO", base_folder, filename=archive_name, md5="2b85f58b927af9964a4c15b8f7e8f179")
_decompress(base_folder + archive_name)

In [None]:
# ****START CODE****
IMAGES_PATH =
LABELS_PATH =
# ****END CODE****

In [None]:
images = np.array(h5py.File(IMAGES_PATH)['x'])
labels = np.array([y.item() for y in h5py.File(LABELS_PATH)['y']])

2.2. Now that we have the data, we will want to split it into a training and a validation set. For this, we will write a function which takes in as input the size of the dataset, and which will return the indices of the training set and the indices of the validation set. **[1 pt]**

In [None]:
random.seed(0)

In [None]:
def get_split_indices(dataset_length, train_ratio=0.7):
    """
    Function which splits the data into tranining and validation sets.
    arguments:
        dataset_length [int]: number of elements in the dataset
        train_ratio [float]: ratio of the dataset in the training set
    returns:
        train_indices [list]: list of indices in the training set (of size dataset_length*train_ratio)
        val_indices [list]: list of indices in the validation set (of size dataset_length*(1-train_ratio))
    """
    # ****START CODE****

    # ****END CODE****

In [None]:
train_indices, val_indices = get_split_indices(len(labels))

2.3. Write the dataset classes. Feel free to add any type of data augmentation that you like. Please note that pytorch has an implemented PCAM dataset class, but we ask you to code these using from scratch. **[2 pt]**

In [None]:
class PCAMDataset(Dataset):
    def __init__(self, data, labels, train):
        """
        Dataset class for the PCAM dataset.
        arguments:
            data [numpy.array]: all RGB 96-96 images
            labels [numpy.array]: corresponding labels
            train [bool]: whether the dataset is training or validation
        """
        super(PCAMDataset, self).__init__()
        self.data = data
        self.labels = labels
        self.train = train

        if self.train:
            # ****START CODE****
            self.augmentation = transforms.Compose([])
            # ****END CODE****

    def __len__(self):
        # ****START CODE****

        # ****END CODE****

    def __getitem__(self, idx):
        # ****START CODE****

        # ****END CODE****

In [None]:
# ****START CODE****
BATCH_SIZE =
# ****END CODE****

In [None]:
train_dataset = PCAMDataset(images[train_indices], labels[train_indices], train=True)
val_dataset = PCAMDataset(images[val_indices], labels[val_indices], train=False)
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)

2.4 Display a random sample of images that have a label of 0 (not containing
any tumor tissue) and 1 (containing tumor tissue). **[2 pt]**

Can you identify the features in a particular image which cause it to be classified as having tumor tissue or not?

(Extra: See if you can display a random sample of images without looking at the label, and then try to classify it as containing tumor tissue or not - then check your answer afterwards)

2.5. Plot the distribution of class labels in the training and validation datasets, to see how well the classes are balanced. **[1 pt]**

`ANSWER HERE`

2.6. Write your model architecture, you can be creative here! Here is a (non exhaustive) list of some useful documentations you could want to use [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html), [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html), [LayerNorm](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html), [activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity). Justify your choice of architecture. **[5 pts]**

In [None]:
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        # ****START CODE****

        # ****END CODE****

    def forward(self, x):
        # ****START CODE****

        # ****END CODE****
        return x

In [None]:
model = ConvNet()

2.7. Initialize the training hyperparameters (optimizer, criterion, ...). Code the whole training loop, where the model is validated after each epoch, and where the essential information is output (training and validation loss and metric). For the metric you may want to use the [torchmetrics library](https://lightning.ai/docs/torchmetrics/stable/). **[5 pts]**

In [None]:
# ****START CODE****
lr =
num_epochs =
optimizer =
criterion =
metric =
# ****END CODE****

Train model and validate it after each epoch. Feel free to use a GPU if you're training on colab to speed up your training.

In [None]:
# ****START CODE****

# ****END CODE****

2.8. Validate your model, show that it is not overfitting. Justify your choice of metric. Answer this either with code or Markdown (or both). **[3 pts]**

`ANSWER HERE`

2.9. Try to optimize three hyperparameters (the learning rate, the batch size and the number of layers in your CNN model), does it improve the performance of your model? Anwser with a graph and comment on the result. Answer this with code and Markdown. **[8 pts]**

To do so, use bayesian optimization to find the best set of hyperparameters using the library `scikit-optimize`.

`ANSWER HERE`

In [None]:
!pip install scikit-optimize # Run this cell to import the library in colab

In [None]:
from skopt import gp_minimize
from skopt.utils import use_named_args

In [None]:
# Retrieve the best set of hyperparameters using bayesian optimization.
# Declare search space for your set of hyperparameters (you may take a look here: https://scikit-optimize.github.io/stable/modules/space.html#space)
# ****START CODE****
dimensions = [] # list of your search spaces
parameters_default_values = [] # default value for each parameter for initialization
# ****END CODE****

In [None]:
# Create a function that take as input your set of hyperparameters and return a score to be minimized (choose wisely your scoring function)

@use_named_args(dimensions=dimensions)
def fit_opt():
    # ****START CODE****

    # ****END CODE****
    return score

In [None]:
# Use gp_minize to retrieve the optimal values (you may take a look here: https://scikit-optimize.github.io/stable/modules/generated/skopt.gp_minimize.html?highlight=gp_minimize#skopt.gp_minimize)

gp_result = gp_minimize(
    # ****START CODE****

    # ****END CODE****
    )

print(f"Optimal set of parameters found at iteration {np.argmin(gp_result.func_vals)}")
print(gp_result.x)

2.10. OPTIONAL QUESTION. Implement a ViT and compare the results obtained in the previous section in a table. **[10 extra pts]**

In [None]:
# ****START CODE****

# ****END CODE****

2.11. With the exception of using Saliency maps, use one other interpretability method you listed in part 1.9 to investigate how your model made its classifications. **[12 pts]**

How does your chosen method probe the classifications of your model? Do the results make sense?

With respect to the code block below, saliency maps are useful in interpreting the decisions of CNNs. However, they have some limitations. After completing and running the code block below, list some of these limitations, given the results you observe on applying saliency maps to your images.

`ANSWER HERE`

In [None]:
## Code block to use saliency maps

### START CODE
# Choose a particular image and corresponding label in which to investigate the classifications of the network
image =
label =

preprocess = transforms.Compose([
              ]) ### Here put the transforms to be applied

input_tensor = preprocess(image).unsqueeze(0)  # Add batch dimension
### END CODE

# Set the model to evaluation mode
model.eval()

# Set the requires_grad attribute of the input tensor to True for gradients
input_tensor.requires_grad_(True)

# Forward pass to get the model prediction
### START CODE
output =
### END CODE

# Choose the class index for which you want to visualize the saliency map
class_index = torch.argmax(output)

model.zero_grad()

# Backward pass to get the gradients of the output w.r.t the input
output[0, class_index].backward()

# Get the gradients from the input tensor
saliency_map = input_tensor.grad.squeeze(0).abs().cpu().numpy()

# Normalize the saliency map for visualization (optional)
saliency_map = saliency_map / saliency_map.max()

normalized_saliency_map = (saliency_map - saliency_map.min()) / (saliency_map.max() - saliency_map.min())

# Convert the saliency map back to a uint8 image format (0-255)
saliency_map_image = np.uint8(255 * normalized_saliency_map)

# Aggregate across the channels
aggregate_saliency = saliency_map.sum(axis=0)

# Plot the input image and its corresponding saliency map side by side
fig, axes = plt.subplots(1, 2, figsize=(10, 5))

# Plot the input image
axes[0].imshow(image)
axes[0].set_title('Input Image')
axes[0].axis('off')

# Plot the saliency map
axes[1].imshow(aggregate_saliency, cmap='jet', alpha=0.7)  # Overlay saliency map on the input image
axes[1].imshow(image, alpha=0.3)  # Overlay input image for comparison
axes[1].set_title('Saliency Map')
axes[1].axis('off')

## Section 3. Implementing a CycleGAN **[50 pt]**

In this part, we will implement CycleGAN using PyTorch. We will train a model to translate images of apples to oranges and vice versa.

The CycleGAN model is composed of two generators and two discriminators. The generators are responsible for translating images from one domain to another, while the discriminators are responsible for distinguishing between translated images and real images. The generators and discriminators are trained in an adversarial manner, where the generators try to fool the discriminators and the discriminators try to distinguish between real and fake images. You can see an overview of the CycleGAN model in the figure below:
<img src="https://junyanz.github.io/CycleGAN/images/cyclegan_blogs.jpg">

You can refer to the [CycleGAN paper](https://arxiv.org/pdf/1703.10593.pdf) for more information.

You can first retrieve the data by executing the following cells:

In [None]:
!wget https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/datasets/download_cyclegan_dataset.sh?raw=true

In [None]:
!mkdir datasets
!bash ./download_cyclegan_dataset.sh?raw=true apple2orange

### 1) Dataset creation

In [None]:
import os
import gc
import random
import torch
import torch.utils.data as data
import torch.nn as nn
from torchvision import transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
from PIL import Image
from torch.optim import lr_scheduler
from IPython.display import clear_output
from torch.utils.data import Subset

Some useful functions that we will use later:

In [None]:
def denormalize(images, std=0.5, mean=0.5):
    # For plot
    images = (images * std) + mean
    return images

def deprocess(input_tensor):
    if len(input_tensor.shape) == 3:
        return np.transpose(denormalize(input_tensor.to(device).cpu()), (1, 2, 0))
    elif len(input_tensor.shape) == 4:
        return np.transpose(denormalize(input_tensor.to(device).cpu()), (0, 2, 3, 1))

3.1.1. You will now implement a simple dataset class in order to load the images. The dataset class should load the images from the dataset folder and apply the input transformations **[1 pt]**:

In [None]:
class GeneratorDataset(data.Dataset):

    def __init__(self, root_dir, transform=None):
        # ****START CODE****

        # ****END CODE****

    def __len__(self):
        # ****START CODE****

        # ****END CODE****

    def __getitem__(self, idx):
        # ****START CODE****

        # ****END CODE****

We will now create the dataset objects for the training and testing sets:

In [None]:
DATASET = 'apple2orange'
DATASET_PATH = os.path.join("datasets", DATASET) # Dataset path
OUTPUT_PATH = 'outputs'
base_logdir = os.path.join("logs", 'pytorch') # Sets up a log directory.
RESIZE_SHAPE = 128 # Resized image size for faster training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create the dataset objects
preprocess_train_transformations = transforms.Compose([
                               transforms.Resize(RESIZE_SHAPE),
                               transforms.RandomHorizontalFlip(p=0.5),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ])

preprocess_test_transformations = transforms.Compose([
                               transforms.Resize(RESIZE_SHAPE),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ])

train_data_X = GeneratorDataset(root_dir=os.path.join(DATASET_PATH, "trainA"),
                           transform=preprocess_train_transformations)

train_data_Y = GeneratorDataset(root_dir=os.path.join(DATASET_PATH, "trainB"),
                           transform=preprocess_train_transformations)

test_data_X = GeneratorDataset(root_dir=os.path.join(DATASET_PATH, "testA"),
                           transform=preprocess_test_transformations)

test_data_Y = GeneratorDataset(root_dir=os.path.join(DATASET_PATH, "testB"),
                           transform=preprocess_test_transformations)

print("Found {} images in {}".format(len(train_data_X), 'trainA'))
print("Found {} images in {}".format(len(train_data_Y), 'trainB'))
print("Found {} images in {}".format(len(test_data_X), 'testA'))
print("Found {} images in {}".format(len(test_data_Y), 'testB'))

In order to speed up the training process, we will use a subset of the training data:

In [None]:
random.seed(2)

N_IMAGES_TO_SAMPLE = 400

indices_X = random.sample(range(len(train_data_X)), N_IMAGES_TO_SAMPLE)
indices_Y = random.sample(range(len(train_data_Y)), N_IMAGES_TO_SAMPLE)

train_data_X = Subset(train_data_X, indices_X)
train_data_Y = Subset(train_data_Y, indices_Y)

### 2) Generator and Discriminator Models

3.2.1) We will now implement the backbone for the generator and the discriminator. You are asked to complete the code for the ResidualBlock of the generator backbone **[4 pt]**:

In [None]:
##############################
#           RESNET
##############################


class ResidualBlock(nn.Module):
    def __init__(self, in_features):
        # ****START CODE****

        # ****END CODE****

    def forward(self, x):
        # ****START CODE****

        # ****END CODE****


class GeneratorResNet(nn.Module):
    def __init__(self, input_channel, n_blocks, filters, output_channel):
        super(GeneratorResNet, self).__init__()

        # Initial convolution block
        model = [
            nn.ReflectionPad2d(input_channel),
            nn.Conv2d(input_channel, filters, 7),
            nn.InstanceNorm2d(filters),
            nn.ReLU(inplace=True),
        ]
        in_features = filters

        # Downsampling
        for _ in range(2):
            filters *= 2
            model += [
                nn.Conv2d(in_features, filters, 3, stride=2, padding=1),
                nn.InstanceNorm2d(filters),
                nn.ReLU(inplace=True),
            ]
            in_features = filters

        # Residual blocks
        for _ in range(n_blocks):
            model += [ResidualBlock(filters)]

        # Upsampling
        for _ in range(2):
            filters //= 2
            model += [
                nn.Upsample(scale_factor=2),
                nn.Conv2d(in_features, filters, 3, stride=1, padding=1),
                nn.InstanceNorm2d(filters),
                nn.ReLU(inplace=True),
            ]
            in_features = filters

        # Output layer
        model += [nn.ReflectionPad2d(output_channel), nn.Conv2d(filters, output_channel, 7), nn.Tanh()]

        self.model = nn.Sequential(*model)

    def forward(self, x):
        return self.model(x)


##############################
#        Discriminator
##############################


class Discriminator(nn.Module):
    def __init__(self, input_channel, filters):
        super(Discriminator, self).__init__()

        def discriminator_block(in_filters, out_filters, normalize=True):
            """Returns downsampling layers of each discriminator block"""
            layers = [nn.Conv2d(in_filters, out_filters, 4, stride=2, padding=1)]
            if normalize:
                layers.append(nn.InstanceNorm2d(out_filters))
            layers.append(nn.LeakyReLU(0.2, inplace=True))
            return layers

        self.model = nn.Sequential(
            *discriminator_block(input_channel, filters, normalize=False),
            *discriminator_block(filters, filters * 2),
            *discriminator_block(filters * 2, filters * 4),
            *discriminator_block(filters * 4, filters *8),
            nn.ZeroPad2d((1, 0, 1, 0)),
            nn.Conv2d(filters *8, 1, 4, padding=1)
        )

    def forward(self, img):
        return self.model(img)

We will now instantiate the generator and discriminator models:

In [None]:
G_XtoY = GeneratorResNet(input_channel=3, output_channel=3, filters=64, n_blocks=9).to(device)
G_YtoX = GeneratorResNet(input_channel=3, output_channel=3, filters=64, n_blocks=9).to(device)

In [None]:
Dx = Discriminator(input_channel=3, filters=64).to(device)
Dy = Discriminator(input_channel=3, filters=64).to(device)

3.2.2) You will now implement a function to randomly initialize the weights for all the convolutional layers in the generators and the discriminators with values sampled from a normal distribution (mean=0.0, std=0.02), and initialize their bias to 0.0. You need to complete the code of the *weights_init_normal* function and apply it to the models **[5 pt]**:

In [None]:
def weights_init_normal(m):
    # ****START CODE****

    # ****END CODE****

# ============================
# Initialize the values of the models
# ============================
# Initialize the values of the two generators
# ****START CODE****

# ****END CODE****
# Initialize the values of the two discriminators
# ****START CODE****

# ****END CODE****

### 3) Training

We will define the hyperparameters used for training our CycleGAN model. The model should run on the T4 GPU provided by Google Colab. You may need to adjust the batch size to fit the model on other GPUs:

In [None]:
BATCH_SIZE = 10
EPOCHs = 30
SAVE_EVERY_N_EPOCH = 5
LR = 0.0002
BETAS = (0.5, 0.999)

We will now define the data loaders for the training and testing sets:

In [None]:
train_image_loader_X = torch.utils.data.DataLoader(train_data_X, batch_size=BATCH_SIZE,
                                                    shuffle=True, num_workers=0)
train_image_loader_Y = torch.utils.data.DataLoader(train_data_Y, batch_size=BATCH_SIZE,
                                         shuffle=True, num_workers=0)
test_image_loader_X = torch.utils.data.DataLoader(test_data_X, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=0)
test_image_loader_Y = torch.utils.data.DataLoader(test_data_Y, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=0)

We will now extract some images from the test set to visualize the model's performance during training:

In [None]:
id_sample_X = np.where(test_data_X.filenames == "n07740461_11391.jpg")[0][0]
id_sample_Y = np.where(test_data_Y.filenames == "n07749192_10081.jpg")[0][0]

sample_X = test_data_X[id_sample_X]
sample_Y = test_data_Y[id_sample_Y]

In [None]:
plt.subplot(121)
plt.title('X')
plt.imshow(deprocess(sample_X))

In [None]:
plt.subplot(121)
plt.title('Y')
plt.imshow(deprocess(sample_Y))

3.3.1. You will now define the optimizers and schedulers for the generator and discriminator models **[3 pts]**:

In [None]:
# ****START CODE****

# ****END CODE****

We will now implement the different loss functions used in CycleGANs:

In [None]:
SOFT_FAKE_LABEL_RANGE =  [0.0, 0.3] # The label of fake label will be generated within this range.
SOFT_REAL_LABEL_RANGE = [0.7, 1.2] # The label of real label will be generated within this range.

The discriminator loss is defined by:
\begin{equation}
\mathcal{L}_{D} = \frac{1}{2} (\mathbb{E}_{y \sim p_{data}(y)}[(D_Y(y) - r_2)^2] + \mathbb{E}_{x \sim p_{data}(x)}[(D_Y(G_{XY}(x))-r_1)^2]) + \frac{1}{2} (\mathbb{E}_{x \sim p_{data}(x)}[(D_X(x) - r_2)^2] + \mathbb{E}_{y \sim p_{data}(y)}[(D_X(G_{YX}(y))-r_1)^2])
\end{equation}
with $p_{data}(x)$ being the distribution of images from the first domain, $p_{data}(y)$ being the distribution of images from the second domain, $G_{XY}$ and $G_{YX}$ being the two generators, $D_X$ and $D_Y$ the two discriminators, and $r_1$ and $r_2$ being the soft fake and real labels chosen from a uniform distribution within the ranges $[0.0, 0.3]$ and $[0.7, 1.2]$ respectively.

3.3.2. You will now implement the discriminator loss function **[4 pts]**:

In [None]:
def discriminator_loss(real_image, generated_image):
    # ****START CODE****

    # ****END CODE****
    return loss

The generator loss is defined by:
\begin{equation}
\mathcal{L}_{G} = \mathbb{E}_{x \sim p_{data}(x)}(D_Y(G_{XY}(x)) - r_2)^2 + \mathbb{E}_{y \sim p_{data}(y)}(D_X(G_{YX}(y)) - r_2)^2
\end{equation}
with $p_{data}(x)$ being the distribution of images from the first domain, $p_{data}(y)$ being the distribution of images from the second domain, $G_{XY}$ and $G_{YX}$ being the two generators, $D_X$ and $D_Y$ the two discriminators, and $r_2$ being the soft real label chosen from a uniform distribution within the range $[0.7, 1.2]$.

3.2.3. You will now implement the generator loss function for a domain **[4 pts]**:

In [None]:
def generator_loss(generated_image):
    # ****START CODE****

    # ****END CODE****
    return loss

In addition to the traditional loss functions used in GANs, CycleGANs also use two additional loss functions: cycle consistency loss and identity loss. We will use the same $\lambda$ for the two losses.

In [None]:
LAMBDA = 10

The cycle consistency loss is defined by:
\begin{equation}
\mathcal{L}_{cyc} = \lambda\mathbb{E}_{x \sim p_{data}(x)}[||x - G_{YX}(G_{XY}(x))||_1] + \lambda\mathbb{E}_{y \sim p_{data}(y)}[||y - G_{XY}(G_{YX}(y))||_1]
\end{equation}
with $p_{data}(x)$ being the distribution of images from the first domain, $p_{data}(y)$ being the distribution of images from the second domain, $G_{XY}$ and $G_{YX}$ being the two generators and $\lambda$ being the weight for the cycle consistency loss.

3.3.4. You will now implement the cycle consistency loss function **[4 pts]**:

In [None]:
def cycle_consistency_loss(real_image, cycled_image):
    # ****START CODE****

    # ****END CODE****
    return loss

The identity loss is defined by:
\begin{equation}
\mathcal{L}_{id} = \frac{1}{2}\lambda\mathbb{E}_{x \sim p_{data}(x)}[||G_{YX}(x) - x||_1] + \frac{1}{2}\lambda\mathbb{E}_{y \sim p_{data}(y)}[||G_{XY}(y) - y||_1]
\end{equation}
with $p_{data}(x)$ being the distribution of images from the first domain, $p_{data}(y)$ being the distribution of images from the second domain, $G_{XY}$ and $G_{YX}$ being the two generators, and $\lambda$ being the weight for the identity loss.

3.3.5. You will now implement the identity loss function **[4 pts]**:

In [None]:
def identity_loss(real_image, generated_image):
    # ****START CODE****

    # ****END CODE****
    return loss

The total generator loss is defined by:
\begin{equation}
\mathcal{L}_{G_{tot}} = \mathcal{L}_{G} + \mathcal{L}_{cyc} + \mathcal{L}_{id}
\end{equation}
with $\mathcal{L}_{G_X}$ and $\mathcal{L}_{G_Y}$ being the generator loss for the two domains, $\mathcal{L}_{cyc}$ being the cycle consistency loss and $\mathcal{L}_{id}$ being the identity loss.

We will now set the checkpoint path for saving the model:

In [None]:
checkpoint_path = os.path.join("checkpoints", 'pytorch', DATASET, )

if not os.path.exists(checkpoint_path):
    os.makedirs(checkpoint_path)

def save_training_checkpoint(epoch):
    state_dict = {
    'G_XtoY':G_XtoY.state_dict(),
    'G_YtoX':G_YtoX.state_dict(),
    'Dx':Dx.state_dict(),
    'Dy':Dy.state_dict(),
    'G_XtoY_optimizer':G_XtoY_optimizer.state_dict(),
    'G_YtoX_optimizer':G_YtoX_optimizer.state_dict(),
    'Dx_optimizer':Dx_optimizer.state_dict(),
    'Dy_optimizer':Dy_optimizer.state_dict(),
    'epoch': epoch
    }

    save_path = os.path.join(checkpoint_path, 'training-checkpoint')
    torch.save(state_dict, save_path)

# if a checkpoint exists, restore the latest checkpoint.
if os.path.isfile(os.path.join(checkpoint_path, 'training-checkpoint')):
    checkpoint = torch.load(os.path.join(checkpoint_path, 'training-checkpoint'))
    G_XtoY.load_state_dict(checkpoint['G_XtoY'])
    G_YtoX.load_state_dict(checkpoint['G_YtoX'])
    Dx.load_state_dict(checkpoint['Dx'])
    Dy.load_state_dict(checkpoint['Dy'])
    G_XtoY_optimizer.load_state_dict(checkpoint['G_XtoY_optimizer'])
    G_YtoX_optimizer.load_state_dict(checkpoint['G_YtoX_optimizer'])
    Dx_optimizer.load_state_dict(checkpoint['Dx_optimizer'])
    Dy_optimizer.load_state_dict(checkpoint['Dy_optimizer'])
    CURRENT_EPOCH = checkpoint['epoch']
    print ('Latest checkpoint of epoch {} restored!!'.format(CURRENT_EPOCH))

3.3.6. You will now implement a function to generate in the other modality a given *test_input* image using a trained generator **[2pts]**:

In [None]:
def generate_images(model, test_input):
    # ****START CODE****

    # ****END CODE****

3.3.7. You will now complete the following code to perform the training process **[10 pts]**:

In [None]:
import time

training_steps = np.ceil((min(len(train_data_X), len(train_data_Y)) / BATCH_SIZE)).astype(int)

for epoch in range(1, EPOCHs + 1):
    start = time.time()
    print('Start of epoch %d' % (epoch,))
    # Reset dataloader
    iter_train_image_X = iter(train_image_loader_X)
    iter_train_image_Y = iter(train_image_loader_Y)
    # Initialize losses
    G_XtoY_loss_mean = 0
    G_YtoX_loss_mean = 0
    Dx_loss_mean = 0
    Dy_loss_mean = 0
    for step in range(training_steps):

        real_image_X = next(iter_train_image_X).to(device)
        real_image_Y = next(iter_train_image_Y).to(device)

        # ============================
        # Compute the discriminator loss
        # ============================
        # Generate fake images for discriminators
        # ****START CODE****

        # ****END CODE****


        # Compute the discriminator loss using the latest fake images
        # ****START CODE****

        # ****END CODE****

        # ============================
        # Update discriminators
        # ============================
        # ****START CODE****

        # ****END CODE****

        # ============================
        # Compute the generator loss
        # ============================
        # Generate fake images for generators
        # ****START CODE****

        # ****END CODE****

        # Compute the generator loss using the latest fake images
        # ****START CODE****

        # ****END CODE****

        # ============================
        # Compute the cycle consistency loss
        # ============================
        # Generate cycled images using the latest fake images
        # ****START CODE****

        # ****END CODE****

        # Compute the cycle consistency loss using the latest cycled images
        # ****START CODE****

        # ****END CODE****

        # ============================
        # Compute the identity loss
        # ============================
        # Generate identity images using the latest fake images
        # ****START CODE****

        # ****END CODE****
        # Compute the identity loss using the latest identity images
        # ****START CODE****

        # ****END CODE****

        # ============================
        # Combine all generator losses
        # ============================
        # ****START CODE****

        # ****END CODE****
        # ============================
        # Update generators
        # ============================
        # ****START CODE****

        # ****END CODE****

        # Add losses
        # ****START CODE****

        # ****END CODE****

        if step % 10 == 0:
            print ('.', end='')

    clear_output(wait=True)
    # ============================
    # Print loss values at the end of an epoch
    # ============================
    # ****START CODE****

    # ****END CODE****

    # ============================
    # Using consistent images (sample_X and sample_Y), plot the progress of the training using both generators
    # ============================
    # ****START CODE****

    # ****END CODE****

    # ============================
    # Save the checkpoint for every SAVE_EVERY_N_EPOCH epoch
    # ============================
    if epoch % SAVE_EVERY_N_EPOCH == 0:
        # ****START CODE****

        # ****END CODE****
        print ('Saving checkpoint for epoch {} at {}'.format(epoch,
                                                             checkpoint_path))

    print ('Time taken for epoch {} is {} sec\n'.format(epoch,
                                                      time.time()-start))
    gc.collect()

### 4) Testing

3.4.1. You will now generate images using the trained models, what do you observe from the generated images ? Discuss any anomalies and suggest possible solutions **[3 pts]**.

In [None]:
# ****START CODE****

# ****END CODE****

### 5) Theory Questions **[6 pts]** (2 pts each):
5.1. What are the key differences between a traditional GAN and a CycleGAN?



`ANSWER HERE`

5.2. How does the architecture of a CycleGAN ensure the preservation of key features in the image translation process?

`ANSWER HERE`

5.3. Discuss the role of cycle consistency loss in CycleGAN. Why is it important?

`ANSWER HERE`

### 6) Optional Question
6.1. Implement an additional feature in the CycleGAN model or play with the parameters to improve the performance of the model. **[10 extra pts]**

It may include:
*   Changing the backbone architecture for the generator and/or the discriminator (Unet style, ...)
*   Changing the hyperparameters
*   Adding new losses

Illustrate the results you get with these improvements compared to the original implementation. Every implementation choices must be properly justified (in a few lines), or no points will be rewarded.

In [None]:
# ****START CODE****

# ****END CODE****