# Generating Images of Faces with Generative Adversarial Networks
by Manuel Herold and Alexander Lercher

## Project Overview
Our goal was the generation of images of humans with artificial neural networks (ANNs). We discovered that images of human bodies are extremely hard to generate as many research papers currently only focus on faces and training data for whole body postures is rare and oftentimes not aligned.
We already identified this problem in the proposal and chose images of faces as fallback plan.

For our project we used a Generative Adversarial Network (GAN) to create images of faces. There, two ANNs are connected in a way that they try to outsmart each other during training. The _generator_ generates new images of faces while the _discriminator_ tries to distinguish them from real ones [1].

Both networks and the training process were implemented from scratch in Python by using TensorFlow's machine learning framework [2].

The tasks for the project were split as follows:

| Task                       | Team member       | Description                     |
|----------------------------|-------------------|---------------------------------|
| Training Data Collection   | Manuel; Alexander | Collection of raw images        |
| Training Data Alignment    | Manuel            | Training data preparation       |
| GAN Training Pipeline      | Alexander         | Training process implementation |
| GAN Reference Architecture | Alexander         | Image generation with [3, 4]       |
| GAN VAE Noise Input        | Manuel            | Image generation with [5]       |

The remainder of this report contains my contributions including their theory. If Manuel's contributions are needed to understand mine they will be explicitly marked as his.

[1] https://developers.google.com/machine-learning/gan

[2] https://www.tensorflow.org/

[3] https://arxiv.org/pdf/1511.06434.pdf 

[4] https://arxiv.org/pdf/1711.06491.pdf

[5] @misc{zhong2018generative,
    title={Generative Adversarial Networks with Decoder-Encoder Output Noise},
    author={Guoqiang Zhong and Wei Gao and Yongbin Liu and Youzhao Yang},
    year={2018},
    eprint={1807.03923},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

## Training Data
We needed real images of faces for the training process so the GAN can learn how newly generated images should look like. 

First, we implemented a web crawling script to download images from public websites featuring models. Unfortunately, this approach had the problem that most images were discarded during alignment.

Next, we downloaded multiple professional datasets from research papers or Kaggle competitions. In the end, the training process was done with the [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset, which contains 200,000 images of celebrity faces.

### Preparation
The dataset was prepared in a way that 
images are converted to greyscale (1 channel),
the image dimensions match our desired size (256x256), 
and faces are in the middle of the cropped image. 
Methods for face detection and cropping were implemented by Manuel.
Furthermore, the images were combined to batches of size 64 and stored on disk as large numpy arrays with resulting shapes (64, 256, 256, 1).

The following method from ```training_images/training_data_provider.py``` is used to retrieve the prepared image batches as numpy arrays during training. 

In [None]:
def get_all_training_images_in_batches_from_disk(self) -> Iterable[np.ndarray]:
        '''
        Loads preprocessed image arrays from files. Memory load is not that high, as only individual batches are read in.
        The batch size is fixed to 64.

        :returns: same as self.get_all_training_images_in_batches() but slightly faster.
        '''
        if not os.path.exists(self.npy_data_path):
            raise IOError(f"The training arrays folder {self.npy_data_path} does not exist.")

        for _, _, files in os.walk(self.npy_data_path):
            random.shuffle(files)
            for file_ in files:
                yield np.load(os.path.join(self.npy_data_path, file_), allow_pickle=True)

### Considerations
If we could not detect a face, the image was discarded from the training set. For all other images the face was moved to the center.

This aids the training process where the discriminator must learn how a valid face looks like. As all faces are in the center the discriminator successfully learns to focus on this part.

## GAN Basics
As already explained, the GAN consists of two ANNs.
The _generator_ generates new images of faces while the _discriminator_ tries to distinguish them from real ones [1].
The basic architecture is visualized in figure 1.

![Basic Architecture](https://developers.google.com/machine-learning/gan/images/gan_diagram.svg)
_Figure 1: Basic GAN Architecture_

### Discriminator Goal
The discriminator's goal is to correctly classify input images from two classes: fake and real. Real images are taken from the CelebA[x] dataset. Fake images are generated by the Generator.
The loss function for the discriminator is the following [6]:

\begin{equation*}
L(D) = -\frac{1}{2} log (D(x)) -\frac{1}{2} log (1 - D(G(z)))
\end{equation*}

Intuitively, the first part represents real data x which should be classified as real with D(x)=1. The second part represents fake data from the generator which should be classified as fake with D(G(z))=0.

### Generator Goal
The generator's goal is the generation of realistic images based on the real images. Therefore its goal is opposite from the discriminator's:

\begin{equation*}
L(G) = - log (D(G(z)))
\end{equation*}

Here, the generator's output should be classified as real with D(G(z))=1, where _z_ is random input for the generator.

[6] https://arxiv.org/pdf/1711.06491.pdf


## Additional Sources for Report

- [explanation of transposed convolution _Conv2DTranspose_ to apply for generator (increase image size)](https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d)

https://datagrid.co.jp/en/all/release/386/