# Notebook MLA Project - Fader Networks

**Authors:** Adrien PETARD, Robin LEVEQUE, Théo MAGOUDI, Eliot CHRISTON

**Group:** 11


# Table of Contents
1. [Introduction](#1-introduction)
2. [Imports](#2-imports)
3. [Annotations](#3-annotations)
    - [Identity](#identity)
    - [Attributes](#attributes)
    - [Bounding Boxes](#bounding-boxes)
    - [Landmarks](#landmarks)
4. [Evaluation](#4-evaluation)
5. [Images](#5-images)
6. [Annexes](#6-annexes)
    - [6.1 How to connect the jupyter server and Vscode, and how to use/understand it:](#61-how-to-connect-the-jupyter-server-and-vscode-and-how-to-useunderstand-it)
    - [6.2 Our point of comparison is the Fader Networks paper (https://arxiv.org/pdf/1706.00409.pdf)](#62-our-point-of-comparison-is-the-fader-networks-paper-httpsarxivorgpdf170600409pdf)
    - [6.3 Train your own models](#63-train-your-own-models)
        - [6.3.1 Train a classifier](#631-train-a-classifier)
        - [6.3.2 Train a fader Network](#632-train-a-fader-network)


## 1. Introduction

The goal of this notebook is to explore the CelebA dataset and to understand how it is structured.

Here is the link to the dataset: [https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)

The data is divided into 3 folders:
- <u>**Anno** (`annotation`):</u> contains the annotations of the dataset
- <u>**Eval** (`evaluation`):</u> contains the evaluation files of the dataset
- <u>**Img** (`images`):</u> contains the images of the dataset, here we chose to use the aligned and cropped images


___
___
## 2. Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

___
___
## 3. Annotations

___
#### Identity

The identity is a number representing the person in the image. Each person has a unique identity number but can appear in multiple images.

In [2]:
%cd ../..
identity = pd.read_csv('data/Anno/identity_CelebA.txt', sep=" ", header=None, index_col=0)
identity.columns = ["identity_id"]
identity.index.name = "image_id"

print("The shape of the identity dataframe is: ", identity.shape)
display(identity.head())

/


FileNotFoundError: [Errno 2] No such file or directory: 'data/Anno/identity_CelebA.txt'

In [None]:
# here we will the identities according to their frequency
identity_counts = identity.groupby("identity_id").size().reset_index(name="count") # group by identity_id and count the number of occurences
identity_counts = identity_counts.sort_values(by="count", ascending=False) # sort by count in descending order
identity_counts = identity_counts.reset_index(drop=True) # reset the index

print("Number of identities: {}".format(len(identity_counts)), '\n')

print("Most frequent identities: ")
display(identity_counts.head()) # first 5 (default) most frequent identities
print("Least frequent identities: ")
display(identity_counts.tail()) # last 5 (default) least frequent identities

In [None]:
# plot the distribution of appearances of identities

identity_counts["count"].plot.hist(
    bins=100, 
    figsize=(10, 5), 
    title="Distribution of appearances of identities",
    xlabel="Number of appearances",
    ylabel="Number of identities",
    grid=True,
)

Identities are mostly represented 30 times in the dataset.

___
#### Attributes

In [None]:
attributes = pd.read_csv('data/Anno/list_attr_celeba.txt', sep=" ", header=1, index_col=0)
attributes.index.name = "image_id"
print("The shape of the attributes dataframe is: ", attributes.shape)
print("Number of attributes: {}".format(len(attributes.columns)))
display(attributes.head()) # first 5 (default) attributes

In [None]:
binary_attributes = attributes.replace(to_replace=-1, value=0) # replace -1 with 0
attribute_counts = binary_attributes.sum(axis=0).sort_values(ascending=False) # sum each column and sort


%matplotlib inline

plt.figure(figsize=(10, 5))
plt.bar(range(len(attribute_counts)), attribute_counts, color="orange", alpha=0.5)
plt.xlabel("Attribute")
plt.ylabel("Count")
# on the x axis, we need to label the bars with the attribute names
plt.xticks(range(len(attribute_counts)), attribute_counts.index, rotation=90)
plt.grid(True, axis="y", alpha=0.5)
plt.title("Attribute Counts")
plt.show()

___
#### Bounding Boxes

The bounding boxes are the coordinates of the face in the image. They are represented by 4 numbers: x, y, width and height.

In [None]:
bounding_boxes = pd.read_csv('data/Anno/list_bbox_celeba.txt', sep=" ", header=1, index_col=0)
bounding_boxes.index.name = "image_id"
print("The shape of the bounding boxes dataframe is: ", bounding_boxes.shape)
display(bounding_boxes.head()) # first 5 (default) bounding boxes

___
#### Landmarks

The landmarks are the coordinates of 5 points on the face: left eye, right eye, nose, left mouth and right mouth.

There are 2 data_files for the landmarks: `list_landmarks_align_celeba.txt` and `list_landmarks_celeba.txt`. The first one contains the landmarks for the aligned images and the second one for the non-aligned images.

In [None]:
landmarks_aligned = pd.read_csv('data/Anno/list_landmarks_align_celeba.txt', sep=" ", header=1, index_col=0)
landmarks_aligned.index.name = "image_id"
print("The shape of the landmarks_aligned dataframe is: ", landmarks_aligned.shape)
display(landmarks_aligned.head()) # first 5 (default) landmarks

In [None]:
landmarks_wild = pd.read_csv('data/Anno/list_landmarks_celeba.txt', sep=" ", header=1, index_col=0)
landmarks_wild.index.name = "image_id"
print("The shape of the landmarks_wild dataframe is: ", landmarks_wild.shape)
display(landmarks_wild.head()) # first 5 (default) landmarks

___
## 4. Evaluation

The dataset is divided into 3 parts: train, validation and test. The train and validation sets are used to train the model and the test set is used to evaluate the model.

The split is defined in the `list_eval_partition.txt` file. It contains the identity number and the split number (0 for train, 1 for validation and 2 for test).

In [None]:
list_eval_partition = pd.read_csv('data/Eval/list_eval_partition.txt', sep=" ", index_col=0)
list_eval_partition.index.name = "image_id"
list_eval_partition.columns = ["partition"]
print("The shape of the list_eval_partition dataframe is: ", list_eval_partition.shape)
display(list_eval_partition.head()) # first 5 (default) partitions

In [None]:
n_train = len(list_eval_partition[list_eval_partition["partition"] == 0])
n_val = len(list_eval_partition[list_eval_partition["partition"] == 1])
n_test = len(list_eval_partition[list_eval_partition["partition"] == 2])

print("Number of training images: {}".format(n_train))
print("Number of validation images: {}".format(n_val))
print("Number of test images: {}".format(n_test))

# plot camembert
plt.figure(figsize=(5, 5))
plt.pie(
    [n_train, n_val, n_test], 
    labels=["Train", "Validation", "Test"], 
    autopct='%1.1f%%',
    startangle=90,
    textprops={'fontsize': 14}
)
plt.title("Partition Distribution")
plt.show()

___
## 5. Images

The Img folder contains all the images of the dataset. let's take a look at some of them.

In [None]:
nb_images = 10
fig, axes = plt.subplots(3, nb_images, figsize=(20, 8))
list_random_images = list_eval_partition.sample(nb_images).index
for i, image_id in enumerate(list_random_images):
    img = plt.imread("data/Img/" + image_id)
    axes[0][i].imshow(img)
    axes[0][i].axis("off")
    axes[0][i].set_title(image_id)
    
    # get the attributes of the image and display them
    attributes = binary_attributes.loc[image_id]
    list_attributes = attributes[attributes == 1].index
    
    axes[1][i].axis("off")
    axes[1][i].text(0, 0.5, "\n".join(list_attributes), fontsize=8, ha="left", va="center")
    
    # adding identity
    identity_id = identity.loc[image_id].values[0]
    axes[2][i].imshow(img)
    axes[2][i].axis("off")
    axes[2][i].set_title("identity_id: {}".format(identity_id))
    
    # adding landmarks
    landmarks = landmarks_aligned.loc[image_id]
    axes[2][i].scatter(landmarks[0::2], landmarks[1::2], marker="+", c="green")
    
    
plt.show()

___
## 6. Annexes

#### 6.1 How to connect the jupyter server and Vscode, and how to use/understand it:

On this link you will find all the instructions to complete the quest of connecting your notebook to the university's server: https://code.visualstudio.com/docs/datascience/jupyter-notebooks

But to help you I will resume the steps you will go through:
- Click on the kernel you're actually using on the top right corner of the vscode window
- Click with the left click of your mouse adnd then choose "Select another kernel"
And then enter all the informations needed:

URL:    https://sdi.ppi.ingenierie.upmc.fr/gpu11

ID:     notregroupe

PWD:    notregroupe


When you are on Vscode and you create a notebook, you have the option to choose a server to which you will connect. Here, the advantage is to connect to the university server to have access to the images without downloading them, and that on our favorite environment. We can then create our notebooks or even Python files that will be executed by the notebook which is relatively educational and allows improving the communication of information and progress.


In [None]:
# Using the % or ! in front of a command that we would usually use in the terminal, we can execute it directly in the notebook:
%cd
!ls
%ls

# Now we are going to get the files we are interested in:
%cd tests/MLA_Projet_2023/
%ls

### 6.2 Our point of comparison is the Fader Networks paper (https://arxiv.org/pdf/1706.00409.pdf)

In [None]:
# We are going to preprocess the images, that is to say:

%cd
%ll
print("\nstart point:")
%cd tests/MLA_Projet_2023/FaderNetworks_NIPS2017/
# !pip install -r requirements.txt  #Uncomment this line if you don't have the required packages installed

print("\nthe datas we are going to use for this part:")
%cd data
%ls

# DECOMMENT THIS PART IF YOU NEED TO DOWNLOAD PREPROCESSED DATA
# !chmod +x preprocess.py # We need to make the file executable and give it the right to be executed
# !./preprocess.py

# We have in VScode the file preprocess.py, we will use it to preprocess the images:

# It will resize images, and create 2 files: images_256_256.pth and attributes.pth.
# The first one contains a tensor of size (202599, 3, 256, 256) containing the concatenation of all resized images. Note that you can update the image size in preprocess.py to work with different resolutions.
# The second file is a pre-processed version of the attributes.

In [None]:
%cd
%cd tests/MLA_Projet_2023/FaderNetworks_NIPS2017/models
%ls

# DECOMMENT THIS PART IF YOU WANT TO DOWNLOAD THE PRETRAINED MODELS
# !chmod +x download.sh
# !./download.sh

/home/notregroupe
/home/notregroupe/tests/MLA_Projet_2023/FaderNetworks_NIPS2017/models
autoencoder_12_16_01-09-43.pt  discriminator_12_16_01-09-43.pt  narrow_eyes.pth
classifier128.pth              download.sh                      pointy_nose.pth
classifier256.pth              eyeglasses.pth                   young.pth
[0m[01;34mdefault[0m/                       male.pth


Given a trained model, you can use it to swap attributes of images in the dataset. Below are examples using the pretrained models:

In [None]:
%cd
%cd tests/MLA_Projet_2023/FaderNetworks_NIPS2017/
if input("Do you want to download the pretrained models? (yes/no)") == "yes":
    !chmod +x download.sh
    !./download.sh

if input("Do you want to preprocess the images? (yes/no)") == "yes":
    # Narrow Eyes
    !python interpolate.py --model_path models/narrow_eyes.pth --n_images 10 --n_interpolations 10 --alpha_min 10.0 --alpha_max 10.0 --output_path narrow_eyes.png

    # Eyeglasses
    !python interpolate.py --model_path models/eyeglasses.pth --n_images 10 --n_interpolations 10 --alpha_min 2.0 --alpha_max 2.0 --output_path eyeglasses.png

    # Age
    !python interpolate.py --model_path models/young.pth --n_images 10 --n_interpolations 10 --alpha_min 10.0 --alpha_max 10.0 --output_path young.png

    # Gender
    !python interpolate.py --model_path models/male.pth --n_images 10 --n_interpolations 10 --alpha_min 2.0 --alpha_max 2.0 --output_path male.png

    # Pointy nose
    !python interpolate.py --model_path models/pointy_nose.pth --n_images 10 --n_interpolations 10 --alpha_min 10.0 --alpha_max 10.0 --output_path pointy_nose.png

/home/notregroupe
/home/notregroupe/tests/MLA_Projet_2023/FaderNetworks_NIPS2017


In [None]:
%cd
%cd tests/MLA_Projet_2023/FaderNetworks_NIPS2017/
# Narrow Eyes
from IPython.display import Image
Image(filename='narrow_eyes.png')

# Eyeglasses
Image(filename='eyeglasses.png')

# Age
Image(filename='young.png')

# Male
Image(filename='male.png')

# Pointy nose
Image(filename='pointy_nose.png')

These commands will generate images with 10 rows of 12 columns with the interpolated images. The first column corresponds to the original image, the second is the reconstructed image (without alteration of the attribute), and the remaining ones correspond to the interpolated images. alpha_min and alpha_max represent the range of the interpolation. Values superior to 1 represent generations over the True / False range of the boolean attribute in the model. Note that the variations of some attributes may only be noticeable for high values of alphas. For instance, for the "eyeglasses" or "gender" attributes, alpha_max=2 is usually enough, while for the "age" or "narrow eyes" attributes, it is better to go up to alpha_max=10.

## 6.3 Train your own models

### 6.3.1 Train a classifier
To train your own model you first need to train a classifier to let the model evaluate the swap quality during the training. Training a good classifier is relatively simple for most attributes, and a good model can be trained in a few minutes. We provide a trained classifier for all attributes in models/classifier256.pth. Note that the classifier does not need to be state-of-the-art, it is not used during the training process, but is just here to monitor the swap quality. If you want to train your own classifier, you can run classifier.py, using the following parameters:

In [None]:
# DECOMMENT THIS PART TO USE THE CLASSIFIER
if input("Do you want to use the classifier? (yes/no)") == "yes":
    %cd
    %cd tests/MLA_Projet_2023/FaderNetworks_NIPS2017/
    !chmod +x classifier.py
    !python classifier.py

In [None]:
'''
# Main parameters
--img_sz 256                  # image size
--img_fm 3                    # number of feature maps
--attr "*"                    # attributes list. "*" for all attributes

# Network architecture
--init_fm 32                  # number of feature maps in the first layer
--max_fm 512                  # maximum number of feature maps
--hid_dim 512                 # hidden layer size

# Training parameters
--v_flip False                # randomly flip images vertically (data augmentation)
--h_flip True                 # randomly flip images horizontally (data augmentation)
--batch_size 32               # batch size
--optimizer "adam,lr=0.0002"  # optimizer
--clip_grad_norm 5            # clip gradient L2 norm
--n_epochs 1000               # number of epochs
--epoch_size 50000            # number of images per epoch

# Reload
--reload ""                   # reload a trained classifier
--debug False                 # debug mode (if True, load a small subset of the dataset)
'''

In [None]:
'''
# Main parameters
--img_sz 256                  # image size
--img_fm 3                    # number of feature maps
--attr "*"                    # attributes list. "*" for all attributes

# Network architecture
--init_fm 32                  # number of feature maps in the first layer
--max_fm 512                  # maximum number of feature maps
--hid_dim 512                 # hidden layer size

# Training parameters
--v_flip False                # randomly flip images vertically (data augmentation)
--h_flip True                 # randomly flip images horizontally (data augmentation)
--batch_size 32               # batch size
--optimizer "adam,lr=0.0002"  # optimizer
--clip_grad_norm 5            # clip gradient L2 norm
--n_epochs 1000               # number of epochs
--epoch_size 50000            # number of images per epoch

# Reload
--reload ""                   # reload a trained classifier
--debug False                 # debug mode (if True, load a small subset of the dataset)
'''

SyntaxError: invalid syntax (37316337.py, line 3)

### 6.3.2 Train a Fader Network
You can train a Fader Network with train.py. The autoencoder can receive feedback from:
- The image reconstruction loss
- The latent discriminator loss
- The PatchGAN discriminator loss
- The classifier loss

In the paper, only the first two losses are used, but the two others could improve the results further. You can tune the impact of each of these losses with the lambda_ae, lambda_lat_dis, lambda_ptc_dis, and lambda_clf_dis coefficients. Below is a complete list of all parameters:

In [None]:
# DECOMMENT THIS PART TO USE THE 
if input("Do you want to train the model? (yes/no)") == "yes":
    %cd
    %cd tests/MLA_Projet_2023/FaderNetworks_NIPS2017/
    !chmod +x train.py
    !python train.py

'''
# Main parameters
--img_sz 256                      # image size
--img_fm 3                        # number of feature maps
--attr "Male"                     # attributes list. "*" for all attributes

# Networks architecture
--instance_norm False             # use instance normalization instead of batch normalization
--init_fm 32                      # number of feature maps in the first layer
--max_fm 512                      # maximum number of feature maps
--n_layers 6                      # number of layers in the encoder / decoder
--n_skip 0                        # number of skip connections
--deconv_method "convtranspose"   # deconvolution method
--hid_dim 512                     # hidden layer size
--dec_dropout 0                   # dropout in the decoder
--lat_dis_dropout 0.3             # dropout in the latent discriminator

# Training parameters
--n_lat_dis 1                     # number of latent discriminator training steps
--n_ptc_dis 0                     # number of PatchGAN discriminator training steps
--n_clf_dis 0                     # number of classifier training steps
--smooth_label 0.2                # smooth discriminator labels
--lambda_ae 1                     # autoencoder loss coefficient
--lambda_lat_dis 0.0001           # latent discriminator loss coefficient
--lambda_ptc_dis 0                # PatchGAN discriminator loss coefficient
--lambda_clf_dis 0                # classifier loss coefficient
--lambda_schedule 500000          # lambda scheduling (0 to disable)
--v_flip False                    # randomly flip images vertically (data augmentation)
--h_flip True                     # randomly flip images horizontally (data augmentation)
--batch_size 32                   # batch size
--ae_optimizer "adam,lr=0.0002"   # autoencoder optimizer
--dis_optimizer "adam,lr=0.0002"  # discriminator optimizer
--clip_grad_norm 5                # clip gradient L2 norm
--n_epochs 1000                   # number of epochs
--epoch_size 50000                # number of images per epoch

# Reload
--ae_reload ""                    # reload pretrained autoencoder
--lat_dis_reload ""               # reload pretrained latent discriminator
--ptc_dis_reload ""               # reload pretrained PatchGAN discriminator
--clf_dis_reload ""               # reload pretrained classifier
--eval_clf ""                     # evaluation classifier (trained with classifier.py)
--debug False                     # debug mode (if True, load a small subset of the dataset)

'''


In [None]:
'''
# Main parameters
--img_sz 256                      # image size
--img_fm 3                        # number of feature maps
--attr "Male"                     # attributes list. "*" for all attributes

# Networks architecture
--instance_norm False             # use instance normalization instead of batch normalization
--init_fm 32                      # number of feature maps in the first layer
--max_fm 512                      # maximum number of feature maps
--n_layers 6                      # number of layers in the encoder / decoder
--n_skip 0                        # number of skip connections
--deconv_method "convtranspose"   # deconvolution method
--hid_dim 512                     # hidden layer size
--dec_dropout 0                   # dropout in the decoder
--lat_dis_dropout 0.3             # dropout in the latent discriminator

# Training parameters
--n_lat_dis 1                     # number of latent discriminator training steps
--n_ptc_dis 0                     # number of PatchGAN discriminator training steps
--n_clf_dis 0                     # number of classifier training steps
--smooth_label 0.2                # smooth discriminator labels
--lambda_ae 1                     # autoencoder loss coefficient
--lambda_lat_dis 0.0001           # latent discriminator loss coefficient
--lambda_ptc_dis 0                # PatchGAN discriminator loss coefficient
--lambda_clf_dis 0                # classifier loss coefficient
--lambda_schedule 500000          # lambda scheduling (0 to disable)
--v_flip False                    # randomly flip images vertically (data augmentation)
--h_flip True                     # randomly flip images horizontally (data augmentation)
--batch_size 32                   # batch size
--ae_optimizer "adam,lr=0.0002"   # autoencoder optimizer
--dis_optimizer "adam,lr=0.0002"  # discriminator optimizer
--clip_grad_norm 5                # clip gradient L2 norm
--n_epochs 1000                   # number of epochs
--epoch_size 50000                # number of images per epoch

# Reload
--ae_reload ""                    # reload pretrained autoencoder
--lat_dis_reload ""               # reload pretrained latent discriminator
--ptc_dis_reload ""               # reload pretrained PatchGAN discriminator
--clf_dis_reload ""               # reload pretrained classifier
--eval_clf ""                     # evaluation classifier (trained with classifier.py)
--debug False                     # debug mode (if True, load a small subset of the dataset)

'''
