# Preliminary Research

##### Authors:
1. Ovidio Manteiga Moar
1. Carlos Villar Martínez

### Introduction

The main goal of this work is to investigate Generative Adversarial Networks (GANs) and compare them against Variational Autoencoders (VAEs). To do this we will do research of these deeep learning generative models and use them to see the obtained results. We are going to use the We will use the _celebA Dataset_, which contains > 2×105 of faces.


### Variational Autoencoders (VAEs)

A VAE is an autoencoder where the latent space has some properties that allow us to generate data in a proper way (an autoencoder is an encoder followed by a decoder). One could think that, if we take a random point from the latent space, we could decode this point to generate new content. This depend on three factors:

1. Distribution of data.
1. Dimensions of the latent space.
1. Architecture of the AE.

The goal of autoencoders is to encode and decode with the minimum losses, and not to organize the latent space in any way. If we want to use AEs to generate data, we have to assure that the latent space is well organized with a regularization function that penalizes a disorganized latent space.

In order to use an autoencoder for generation, the latent space should be regular enough. This idea beahind variational autoencoders: the training is regularised to avoid overfitting and to ensure that the latent space has good properties. 

In variational AE, instead of encoding the input as a single point, it is encoded as a distribution over a latent space. The training process is:

1. The input is encoded as a dstribution over the latent space.
1. A point is sampled from this distribution.
1. The point is decoded and the reconstruction error is computed.
1. The reconstruction error is backpropagated trough the NN.

##### Implementation of a variational autoencoder

When using VAEs, before defining the encoder and the decoder we must define the bottleneck layer of the architecture (the sampling layer). After this we can define the encoder part of the VAE, which takes the inputs and encodes their representation in the sampling layer. After this the decoder is defined. Both models are combined and the training procedure is defined. The complete variational architecture is built by defining a class that is child of _keras.Model_. As last step the database is loaded nd the VAE is trained.


##### Application as Generative models

The applications of VAE cover a wide range, from molecular science to security applications. As in this practise we are going to use a dataset that contains images of faces, we are going to focus on the application of VAEs to generate new faces.

VAEs allow us to develop a low-dimensional latent space of representations where any point can be mapped to a realistic looking sample. The process is the following one:

Training data --> Learning process --> Latente space of images --> Generator/Decoder --> Artificial image.







### Generative Adversarial Networks (GANs)

A GAN is a DNN architecture used to generate new, synthetic data samples that are similarto a training dataset. It has two networks, a generator nework, that generates synthetic data samples while the discriminator network tries to distinguish the syntheticc samples from real samples drawn from the training dataset. 

The two networks are trained together in an adversarial process: the generator tries to produce samples that the discriminator cannot distinguish from real samples, while the discriminator tries to correctly classify the synthetic and real samples. Through this process, the generator learns to generate samples that are similar to the training data, and the discriminator learns to effectively distinguish between synthetic and real samples. When the raining begins the generator produces obviously fake data, and the discriminator quickly learns to tell that it is fake. Both the generator and the discriminator are neural networks. The generator output is directly connected to the discriminator input. Through backpropagation, the discriminator's clasification provides a signal that the generator uses to update his waights.

The discriminator in a GAN is simply a classifier. It tries to distinguish real data from the data created by the generator. It could use any network architecture appropriate to the type of data it is classifying. The generator part of a GAN learns to create fake data by incorporating feedback from the discriminator. It learns to make the discriminator classify its output as real. Genrator training requires tighter integration between the generator and the discriminator than discriminator training requires. 


##### Implementation of a GAN

Generator: The generator is responsible for producing synthetic data samples. It takes random noise as input and transforms it into output data that should resemble the real data. Typically, the generator consists of a deep neural network that learns to map the input noise to the desired output data distribution.

Discriminator: The discriminator acts as a binary classifier that distinguishes between real and fake data samples. It takes both real data from the training set and generated data from the generator as input. The discriminator learns to classify whether the input data is real or fake by optimizing its parameters to improve its discrimination ability.

Adversarial Training: The generator and discriminator are trained together in an adversarial manner. During training, the generator aims to generate data samples that the discriminator cannot distinguish from real ones, while the discriminator aims to correctly classify real and fake samples. This adversarial process creates a feedback loop that drives both components to improve over time.

Loss Functions: GANs use specific loss functions to guide the training process. The generator's loss is typically based on the discriminator's feedback. It aims to minimize the discriminator's ability to differentiate real and fake samples, encouraging the generator to generate more realistic data. The discriminator's loss is designed to maximize its accuracy in distinguishing real and fake samples.

Training Process: The GAN training process involves an iterative optimization procedure. In each iteration, the generator and discriminator are updated based on their respective loss functions. This back-and-forth training process continues until the generator can produce synthetic samples that are indistinguishable from real data, and the discriminator can no longer discriminate between real and fake samples effectively.

Architecture Variations: GANs have various architectural variations, such as Deep Convolutional GANs (DCGANs), Wasserstein GANs (WGANs), and Conditional GANs (cGANs). These variations introduce additional architectural elements or modifications to address specific challenges or improve performance in different domains.

Mode Collapse: One of the challenges in GAN training is mode collapse, where the generator fails to explore the entire data distribution and only generates a limited variety of samples. Researchers have proposed several techniques, such as regularization methods and alternative loss functions, to mitigate this issue.

Applications: GANs have found applications in various domains, including image synthesis, style transfer, data augmentation, super-resolution, text generation, and more. They have been instrumental in generating highly realistic and novel content.


##### DCGANs and WGANs.


Deep Convolutional GANs (DCGANs) are a variant of GANs that specifically focus on generating realistic images. They leverage convolutional neural networks (CNNs) as the core architecture for both the generator and discriminator components. DCGANs have been successful in generating high-quality images with richer details compared to traditional GAN architectures.

Key features of DCGANs include:

Convolutional Layers: DCGANs utilize convolutional layers instead of fully connected layers in the generator and discriminator networks. Convolutional layers enable the networks to learn spatial hierarchies and capture local patterns effectively, making them well-suited for image data.
Strided Convolutions and Transposed Convolutions: DCGANs employ strided convolutions in the discriminator network to downsample the image, allowing it to learn hierarchical representations. Transposed convolutions (also known as deconvolutions or upsampling) are used in the generator to produce larger images from the initial noise input.
Batch Normalization: DCGANs utilize batch normalization to normalize the inputs to each layer within the networks. This technique helps stabilize and speed up the training process by reducing internal covariate shift.
LeakyReLU Activation: Instead of using regular ReLU activation, DCGANs typically use LeakyReLU activation functions in the discriminator to alleviate the problem of dead neurons and enhance the flow of gradients during training.

DCGANs have significantly contributed to the advancement of image synthesis tasks, enabling the generation of visually appealing and realistic images across various domains.


Wasserstein GANs (WGANs) are another variant of GANs that introduced a novel training objective called the Wasserstein distance (also known as Earth Mover's distance or W-distance). WGANs aim to improve the stability and address mode collapse issues often encountered in traditional GANs.

Key features of WGANs include:

Wasserstein Distance: WGANs optimize the Wasserstein distance between the distribution of real data and generated data. This distance measures the minimum cost of transforming one distribution into another and provides a more meaningful and stable training signal compared to traditional GAN loss functions.
Lipschitz Constraint: WGANs enforce a Lipschitz constraint on the discriminator to ensure its smoothness and stability. This constraint is typically achieved by weight clipping or by applying gradient penalty techniques.
Gradient Penalty: Instead of weight clipping, WGAN-GP (Wasserstein GAN with Gradient Penalty) introduces a gradient penalty term to the loss function. This penalty term discourages the discriminator from having very steep gradients and promotes better convergence properties.

WGANs have shown improved training stability, reduced mode collapse, and better convergence properties compared to traditional GANs. They have been particularly effective in domains where mode collapse is a significant issue, such as generating diverse images or complex distributions.

Both DCGANs and WGANs are influential architectural variations of GANs, each addressing specific challenges and pushing the boundaries of generative modeling in their respective domains.