# Deep Convolutionnal Generative Adversarial Network (DCGAN)
# DL Lecture with Pr. Kelly Joly

### Aymeric MILLAN & Arthur VIENS

In this notebook we are going to present our DCGAN. Its purpose is to generate
fake images that look like real images, after training on a particular dataset. 
We were interested in GANs because we thought it would be really interesting to 
dive into the details of training one. For other types of deep learning 
architectures, it is pretty straightforward to train a network, but that is not
the case with GANs.

Here is an example of three generated pictures of resolution `128x128`. This
training has been done with a [landscapes dataset](google.com). The rendering is
not perfect at all, and couldn't fool a human discriminator, but we can see that
it is starting to _look like_ a landscape. 

![Example of generated images](fig/sluggy_landscapes.png)

We faced many various difficulties while implementing this GAN, which we are
going to present in this Notebook. The problems we encoutered mainly came from
two sides :
- Architecture : These were all the "inner" problems, which are directly related
to the network (e.g., which convolutionnal layers to use, how to avoid gradient
vanishing, what size of upsampling, etc.)
- Training : These are the "outer" problems, which are not directly related to
the architecture of our GAN, but mostly about the training. For example, setting
the right learning rate, or choosing the frequency of training of the generator
regarding the discriminator, etc.

Our training was executed on NVIDIA's last generation GPUs, `A100`. Even with
such computational power, training our network took quite a long time. To be
able to _start_ to see some result, and taking decision on adjusting our
network, we nedded at least 10-12 hours. And this is for `128x128` images.
We will discuss about scaling up our GAN later.  

# Data Loaders and Datasets

The first part of the project was to be able to load correctly our datasets. For
this we used the different utility classes of PyTorch such as Dataset and
DataLoader from `torch.utils.data`. With these classes, we just have to
implement some methods to retrieve an item from the dataset and to get the total
length of the dataset, and we can use the whole system behind it.
For example, it is possible to use this machinery to do shuffling, multiprocess
loading, batch prefetching, dataset weighting and much more.

Moreover, we used different image transforms from `torchvision.transform`, such
as resizing, cropping, horizontal flipping or transforms composition for
example. It was really useful to use this kind of transformations for data
augmentation. Hence, the images from dataset are not always exactly the same and
it makes it harder for the network to overfit on the training set.

(IMAGE_ORIGINALE_ET_COMPARAISON_IMAGE_FLIP_OU_AGGRANDIE_etc)[IMAGE]

As the landscapes dataset's size is more than 10 GB, it can not fit in memory at 
once. Hence, it is usefull to load images on the fly and compute the 
transformations at the same time. To achieve this, we tuned the
`prefecth factor` and the number of workers to use the full capacity of the
available GPUs.

In our actual code, the pipeline of data loading is :

1. Selection of a random image
2. Read image as 3d matrix
3. Resize image
4. Crop image to the size we want to generate
5. Transform to a PyTorch tensor
6. 50% chance of horizontal flip the image



# Architecture

For our GAN project, we started off very simple. Our first network was a fully 
connected GAN creating `28x28` images of black and white numbers, as we trained 
first of MNIST dataset. We increased the complexity afterwards, replacing 
fully-connected linear layers with convolutional layers, transposed 
convolutions and upsampling layers. It is pretty simple for a GAN to generate 
small images such as MNIST digits, but as the resolution increases, it becomes
harder and harder to have satisfactory results. The scaling up of a GAN is not
a trivial task.

Our last architecture is composed of many _ResBlocks_ and _ResUpBlocks_, in order 
to keep the gradient flowing through the layers via the skip connections. 
Our networks are pretty deep, and it is important for the gradient to flow, 
else the training fails or takes an extremely long time. Our ResBlock is built 
as such : __METTRE UN DIAGRAMME__

And our ResUpBlock, which is the ResBlock counterpart that scale the images up
instead of scaling them down, is composed like this : __METTRE UN DIAGRAMME__

We saw on the litterature different ways to implement the ResBlock, but we 
chose one that we thought was best for us, and eventually, we engineered it even
more to best fit our needs.

Our whole architeture depends on the size of the images we want to generate.
Of course, a network will need more parameters to generate `256x256` images than
`128x128` images, so the architecture is similar, but we add additional layers
for `256x256` generation. DIAGRAMME ?


# Training strategies

Training a GAN is not straightforwards. This process is a zero-sum game between 
the generator and the discriminator. Because of this, they are able to train 
each other as they take turn in winning the game. It needs a complicated 
equilibrium in order to succesfully generate images that feel real for human 
eye. If one of them is too strong for the other, it generally makes the other 
one collapse, and training fails. It is of paramount importance to train them 
in such a way that they are approximately at the same level. <br/>

GANs are trained in the following fashion : In each training loops, we sample
first a batch of points from the latent space, that we feed to the generator
that outputs a batch of fake images. We feed them to the discriminator, that
has to label them as real or fake. Obviously, its goal is to label the
generator images as fake. We then feed a batch of images from the dataset to
the discriminator, that has to label them aswell. Once the training step of the
discriminator is finished, we train the generator. After sampling a batch of
latent points and feeding them to the generator, we freeze the discriminator
weights and ask it to label the fake images. Here, the goal of the generator is
to have its images labeled as real, and we compute the loss accordingly.

There are different variants of how we execute this training, and different ways
to tweak the training. For example we can :
- Train k times the discriminator and only once the generator (k is typically 
low : 2 or 3)
- Use a different learning rate for each network
- Add noise the to real and generated images
- Change the latent space dimension (not trivial, the whole network
architecture needs to be able to convert it to an image)
- Perform "label smooting", by setting the label of real images to 0.9 instead
of 1 for example, in order regularize the discriminator by making it less 
overconfident of his predictions
- Tune the different hyperparameters
- Regularize the networks with L1, L2 weight decay

# Images Size
# Latent Space
# Architecture
Parler des paramètres "internes" au réseau, et des "externes" (learning rate, etc...)
# Training strategies
Entrainer un sur deux le discriminator ou generator
# Environment setup
(option --midsave, --resume, automatiser et profesionnaliser le tout car sinon plus possible d'améliorer)
# Artifacts

# Montrer images a différentes epochs

# Parler du papier sur le BIGGAN
