# Face Generation

![](https://images.unsplash.com/photo-1499824643098-62967ac87503?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1047&q=80)

Picture by [Kevin Bluer](https://unsplash.com/photos/e6XqFP4kCxM)

In this exercise, you will basically redo what was done in the lectures, with faces. We will use the olivetti faces dataset.

So the first thing to is to load the dataset, using `fetch_olivetti_faces` of scikit-learn.

In [1]:
### TODO: load the dataset

You already know it, but feel free to display some faces, and get familiar with the dimensions.

In [2]:
### TODO: Display some images and get the shapes

We will now create a CVAE, that will allow us to generate new faces.

Reminder, a CVAE architecture looks like this:

<p align="center">
<img src="https://drive.google.com/uc?export=view&id=1zUwFVcL0MrrpaQv2oND9yMxj6FwfW7xI">
</p>


Except that here it won't be digits, but faces of people.

Let's define the encoder & decoder parts now.

In [None]:
from tensorflow.keras import layers

class Sampling(layers.Layer):
    """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

In [3]:
### TODO: Build the encoder and decoder parts

Once you have built your model, compile it and train it on the data!

Do not forget to one-hot-encode your target.

In [None]:
# The following code will help you create the VAE. 
# You can use it like any other Keras model
# You just need to instanciate it with your previously defined encoder & decoder

class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def train_step(self, data):
        # if isinstance(data, tuple):
        #     data = data
        with tf.GradientTape() as tape:
            z_mean, z_log_var, z = encoder(data)
            reconstruction = decoder(z)
            reconstruction_loss = tf.reduce_mean(
                keras.losses.binary_crossentropy(data[0][0], reconstruction)
            )
            
            reconstruction_loss *= 64 * 64
            kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
            kl_loss = tf.reduce_mean(kl_loss)
            kl_loss *= -0.5
            total_loss = reconstruction_loss + kl_loss
            
        grads = tape.gradient(total_loss, self.trainable_weights)

        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))

        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

In [4]:
### TODO: Instanciate, compile, and train your model

Now, using the encoder part only, have a look at the latent space: display the values of the latent variables in a scatter plot.

In [5]:
### TODO: Display the latent variables using the encoder side

What is the range that the latent variables can take? (i.e. what are the minimum and maximum values?)

Use that range to generate new faces of a person now.

In [6]:
### TODO: Generate and display faces

How do you interpret the latent variables in that case?