<table align="center">
  <td align="center">
    <a target="_blank" href="http://inspiredk.org">
    <img align="center" src="https://i.ibb.co/Z6HZPSbH/Inspired-K-org-Logo-No-Whitespace-Extra-Small.png">InspiredK.org Website</a>
  </td>
  
  <td align="center">
    <a target="_blank" href="https://colab.research.google.com/github/InspiredK-organization/MITintrotodeeplearning/blob/master/lab2/solutions/Lab2.2 - Facial Recognition with VAEs, Debiasing, and TensorFlow Solution.ipynb">
    <img align="center" src="https://i.ibb.co/2P3SLwK/colab.png"/>Run in Google Colab</a>
  </td>
</table>

# Copyright Information

In [None]:
# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.
#
# Licensed under the MIT License. You may not use this file except in compliance
# with the License. Use and/or modification of this code outside of MIT Introduction
# to Deep Learning must reference:
#
# © MIT Introduction to Deep Learning
# http://introtodeeplearning.com
#
# Original lab is adopted from http://introtodeeplearning.com
# Lab is edited by http://InspiredK.org

# Laboratory 2: Computer Vision

# Part 2: Debiasing Facial Detection Systems

In the second portion of the lab, we'll explore two prominent aspects of applied deep learning: facial detection and algorithmic bias.

Deploying fair, unbiased AI systems is critical to their long-term acceptance. Consider the task of facial detection: given an image, is it an image of a face?  This seemingly simple, but extremely important, task is subject to significant amounts of algorithmic bias among select demographics.

In this lab, we'll investigate [one recently published approach](http://introtodeeplearning.com/AAAI_MitigatingAlgorithmicBias.pdf) to addressing algorithmic bias. We'll build a facial detection model that learns the *latent variables* underlying face image datasets and uses this to adaptively re-sample the training data, thus mitigating any biases that may be present in order  to train a *debiased* model.


Run the next code block for a short video from Google that explores how and why it's important to consider bias when thinking about machine learning:

In [None]:
import IPython # For special displays and control of code outputs.
IPython.display.YouTubeVideo('59bMh59JQDo')

Before starting this lab, make sure that you change your runtime to a tensor processing unit (TPU). This can be done by navigating to Runtime > Change runtime type > Hardware accelerator > v2-8 TPU. The usage of a TPU is critical because there are some code incompatibilities with the free Google Colab T4 GPU, and without a TPU the code will return errors and not function correctly.

If you are unable to connect to a TPU or you reach the usage limit, a CPU can be used as the hardware accelerator. However, note that the code will take about twice as long to run, so make sure that you cannot connect to a TPU before using a CPU.

Now that we are using a TPU, let's download the course repository, install dependencies, and import the relevant packages we'll need for this lab.

In [None]:
# Install Tensorflow 2.0.
!pip install tensorflow --quiet
import tensorflow as tf

# Install opencv-python (cv2) as a dependency for the MIT package.
!pip install opencv-python --quiet # OpenCV is a framework that supports computer vision tasks like ours.

# Download and import the MIT Introduction to Deep Learning package.
!pip install mitdeeplearning --quiet
import mitdeeplearning as mdl

# Import all remaining packages.
import functools # Dependency for functions that output other functions.
import matplotlib.pyplot as plt # For graphical visualizations of different statistics.
import numpy as np # For nparrays.
from tqdm import tqdm # For textual progress bars.

## 2.1 Datasets

We'll be using three datasets in this lab. In order to train our facial detection models, we'll need a dataset of positive examples (i.e., of faces) and a dataset of negative examples (i.e., of things that are not faces). We'll use these data to train our models to classify images as either faces or not faces. Finally, we'll need a test dataset of face images. Since we're concerned about the potential *bias* of our learned models against certain demographics, it's important that the test dataset we use has equal representation across the demographics or features of interest. In this lab, we'll consider skin tone and gender.

1.   **Positive training data**: [CelebA Dataset](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html). A large-scale, highly varied dataset containing faces of many different celebrities.   
2.   **Negative training data**: [ImageNet](http://www.image-net.org/). A dataset containing images across many different categories. We take our negative examples from some of the non-human categories.
3.   **[Fitzpatrick Scale](https://en.wikipedia.org/wiki/Fitzpatrick_scale)**: A skin type classification system to label each image as "Lighter" or "Darker".

Let's begin by importing these datasets. We've written a class that does a bit of data pre-processing to import the training data in a usable format.

In [None]:
# Get the training data containing images from both CelebA and ImageNet.
path_to_training_data = tf.keras.utils.get_file('train_face.h5', 'https://www.dropbox.com/s/hlz8atheyozp1yx/train_face.h5?dl=1')
loader = mdl.lab2.TrainingDatasetLoader(path_to_training_data) # Create a TrainingDatasetLoader from MIT's package for the downloaded dataset.

We can look at the size of the training dataset and grab a batch of size 100:

In [None]:
print("Number of training images: {}".format(loader.get_train_size())) # Number of images that are in our training data.
(images, labels) = loader.get_batch(100) # Get a random batch containing 100 images and their labels (face or not).

Play around with displaying images to get a sense of what the training data actually looks like!

In [None]:
#@title Change the sliders to look at different training examples! { run: "auto" }

face_images = images[np.where(labels == 1)[0]] # Get the images that are faces.
not_face_images = images[np.where(labels == 0)[0]] # Get the images that are not faces.

idx_face = 0 #@param {type:"slider", min:0, max:50, step:1}
idx_not_face = 0 #@param {type:"slider", min:0, max:50, step:1}

# Show the image of the face at a given index.
plt.figure(figsize=(5,5))
plt.subplot(1, 2, 1)
plt.imshow(face_images[idx_face])
plt.title("Face"); plt.grid(False)

# Show the image of the not-face at a given index.
plt.subplot(1, 2, 2)
plt.imshow(not_face_images[idx_not_face])
plt.title("Not Face"); plt.grid(False)

### Thinking about bias

Remember we'll be training our facial detection classifiers on the large, well-curated CelebA dataset (and ImageNet), and then evaluating their accuracy by testing them on an independent test dataset. Our goal is to build a model that trains on CelebA *and* achieves high classification accuracy on the the test dataset across all demographics, and to thus show that this model does not suffer from any hidden bias.

What exactly do we mean when we say a classifier is biased? In order to formalize this, we'll need to think about [*latent variables*](https://en.wikipedia.org/wiki/Latent_variable), variables that define a dataset but are not strictly observed. As defined in the generative modeling lecture, we'll use the term *latent space* to refer to the probability distributions of the aforementioned latent variables. Putting these ideas together, we consider a classifier *biased* if its classification decision changes after it sees some additional latent features. This notion of bias may be helpful to keep in mind throughout the rest of the lab.

## 2.2 CNN for facial detection

First, we'll define and train a CNN on the facial classification task, and evaluate its accuracy. Later, we'll evaluate the performance of our debiased models against this baseline CNN. The CNN model has a relatively standard architecture consisting of a series of convolutional layers with batch normalization followed by two fully connected layers to flatten the convolution output and generate a class prediction.

### Define and train the CNN model

Like we did in the first part of the lab, we'll define our CNN model, and then train on the CelebA and ImageNet datasets using the `tf.GradientTape` class and the `tf.GradientTape.gradient` method.

In [None]:
n_filters = 12 # Factor for the number of convolutional filters/feature maps in our model.

def make_cnn_standard_classifier(n_outputs=1):
  '''Function to define a standard CNN model'''

  # Use functools.partial (if needed) to take a built-in function and set some parameters to have a different default value.
  Conv2D = functools.partial(tf.keras.layers.Conv2D, padding='same', activation='relu') # The convolutional layer that performs convolutions on its inputs.
  BatchNormalization = tf.keras.layers.BatchNormalization # The batch normalization layer (instead of pooling) that puts all values at a similar scale.
  Flatten = tf.keras.layers.Flatten # The flatten layer that turns its 2-dimensional input into a 1-dimensional output.
  Dense = functools.partial(tf.keras.layers.Dense, activation='relu') # The dense (fully-connected) layer for final computations and classification.

  model = tf.keras.Sequential([
    # All input images have a shape of 64 x 64 in RGB format.

    # First convolutional layer with 12 feature maps and 5x5 kernels.
    Conv2D(filters=1*n_filters, kernel_size=5,  strides=2),
    BatchNormalization(),

    # Second convolutional layer with 24 feature maps and 5x5 kernels.
    Conv2D(filters=2*n_filters, kernel_size=5,  strides=2),
    BatchNormalization(),

    # Third convolutional layer with 48 feature maps and 3x3 kernels.
    Conv2D(filters=4*n_filters, kernel_size=3,  strides=2),
    BatchNormalization(),

    # Final convolutional layer with 72 feature maps and 3x3 kernels.
    Conv2D(filters=6*n_filters, kernel_size=3,  strides=2),
    BatchNormalization(),

    # All convolutional layers used the ReLU non-linear activation function and a batch normalization.

    Flatten(), # Take the final convolutional output and turn it into one dimension.
    Dense(512), # Pass the vector through a dense layer with 512 neurons.
    Dense(n_outputs, activation=None), # Use one neuron to output the classification as a binary value.
  ])
  return model

standard_cnn_classifier = make_cnn_standard_classifier()
standard_cnn_classifier.predict(images[[0]]) # Initialize the model by passing some data through it.
print(standard_cnn_classifier.summary()) # Print the model summary for all of our model's layers.

Now let's train the standard CNN that we just defined. Remember to rerun the above code cell before retraining with new parameters to reinitialize the model. If you do not do this, the new training you perform will be added onto the previous training.

In [None]:
# Training hyperparameters - experiment to find what reaches the highest accuracy.
params = dict(
  batch_size = 64, # Number of random training examples fed in at one time.
  learning_rate = 5e-3, # Learning rate for the optimizer.
)
# Find other optimizers to try at https://www.tensorflow.org/api_docs/python/tf/keras/optimizers.
optimizer = tf.keras.optimizers.Adagrad(params["learning_rate"]) # Define our optimizer with our set learning rate.

loss_history = mdl.util.LossHistory(smoothing_factor=0.99) # Create a list to track our loss during the training process.
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss') # Create a graph to plot the loss during the training process.
if hasattr(tqdm, '_instances'): tqdm._instances.clear() # Clear any previous progress bars if they exist.

@tf.function
def standard_train_step(x, y):
  with tf.GradientTape() as tape:
    # Feed the given images into the model.
    logits = standard_cnn_classifier(x)
    # Compute the loss based on these predictions and the actual expected outputs.
    loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=logits)

  # Backpropagation by computing the gradient on our loss and applying it.
  grads = tape.gradient(loss, standard_cnn_classifier.trainable_variables) # Compute the gradient for the loss with the hyperparameters.
  optimizer.apply_gradients(zip(grads, standard_cnn_classifier.trainable_variables)) # Apply the gradient on the model using our optimizer and the hyperparameters.

  return loss # Return our loss for graphing purposes.

for _ in tqdm(range(loader.get_train_size() // params["batch_size"])): # Compute number of possible batches that can be made from our dataset and train for every possible batch.
  x, y = loader.get_batch(params["batch_size"]) # Get a batch of training data (images and labels).
  loss = standard_train_step(x, y) # Compute the loss for these inputs.

  loss_history.append(loss.numpy().mean()) # Add the loss to the loss_history list.
  plotter.plot(loss_history.get()) # Graph the new loss_history list for progress tracking.

print('\nFinal loss after training:', loss.numpy().mean())

### Evaluate performance of the standard CNN

Next, let's evaluate the classification performance of our CelebA-trained standard CNN on the training dataset.


In [None]:
(batch_x, batch_y) = loader.get_batch(5000) # Get a batch of 5,000 images and labels to see the training accuracy of our model.
y_pred_standard = tf.round(tf.nn.sigmoid(standard_cnn_classifier.predict(batch_x))) # Getting predictions and converting them to simple binary format.
acc_standard = tf.reduce_mean(tf.cast(tf.equal(batch_y, y_pred_standard), tf.float32)) # Finding the difference between the predictions and expected outputs to find accuracy.

print("Standard CNN accuracy on the training set: {:.4f}".format(acc_standard.numpy()))

We will also evaluate our networks on an independent test dataset containing faces that were not seen during training. For the test data, we'll look at the classification accuracy across four different demographics, based on the Fitzpatrick skin scale and sex-based labels: dark-skinned male, dark-skinned female, light-skinned male, and light-skinned female.

Let's take a look at some sample faces in the test set.

In [None]:
test_faces = mdl.lab2.get_test_faces() # Get 5 faces from each of the below keys.
keys = ["Light Females", "Light Males", "Dark Females", "Dark Males"]
for group, key in zip(test_faces, keys): # For each key and its images, display them together.
  plt.figure(figsize=(5,5))
  plt.imshow(np.hstack(group))
  plt.title(key, fontsize=15)

Now, let's evaluate the probability of each of these face demographics being classified as a face using the standard CNN classifier we've just trained.

In [None]:
standard_cnn_classifier_logits = [standard_cnn_classifier(np.array(x, dtype=np.float32)) for x in test_faces] # Get the predictions for these faces from our model.
standard_cnn_classifier_probs = tf.squeeze(tf.sigmoid(standard_cnn_classifier_logits)) # Turn them into binary classifications.

# Plot the prediction accuracies for each demographic.
xx = range(len(keys))
yy = standard_cnn_classifier_probs.numpy().mean(1)
plt.bar(xx, yy)
plt.xticks(xx, keys)
plt.ylim(max(0, yy.min() - np.ptp(yy) / 2.), yy.max() + np.ptp(yy) / 2.)
plt.title("Standard CNN Classifier Predictions")

Take a look at the accuracies for this first model across these four groups. What do you observe? Would you consider this model biased or unbiased? What are some reasons why a trained model may have biased accuracies?

## 2.3 Mitigating algorithmic bias

Imbalances in the training data can result in unwanted algorithmic bias. For example, the majority of faces in CelebA (our training set) are those of light-skinned females. As a result, a classifier trained on CelebA will be better suited at recognizing and classifying faces with features similar to these, and will thus be biased.

How could we overcome this? A naive solution -- and one that is being adopted by many companies and organizations -- would be to annotate different subclasses (i.e., light-skinned females, males with hats, etc.) within the training data, and then manually even out the data with respect to these groups.

But this approach has two major disadvantages. First, it requires annotating massive amounts of data, which is not scalable. Second, it requires that we know what potential biases (e.g., race, gender, pose, occlusion, hats, glasses, etc.) to look for in the data. As a result, manual annotation may not capture all the different features that are imbalanced within the training data.

Instead, let's actually **learn** these features in an unbiased, unsupervised manner, without the need for any annotation, and then train a classifier fairly with respect to these features. In the rest of this lab, we'll do exactly that.

## 2.4 Variational autoencoder (VAE) for learning latent structure

As you saw, the accuracy of the CNN varies across the four demographics we looked at. To think about why this may be, consider the dataset the model was trained on, CelebA. If certain features, such as dark skin or hats, are *rare* in CelebA, the model may end up biased against these as a result of training with a biased dataset. That is to say, its classification accuracy will be worse on faces that have under-represented features, such as dark-skinned faces or faces with hats, relevative to faces with features well-represented in the training data! This is a problem.

Our goal is to train a *debiased* version of this classifier -- one that accounts for potential disparities in feature representation within the training data. Specifically, to build a debiased facial classifier, we'll train a model that **learns a representation of the underlying latent space** to the face training data. The model then uses this information to mitigate unwanted biases by sampling faces with rare features, like dark skin or hats, *more frequently* during training. The key design requirement for our model is that it can learn an *encoding* of the latent features in the face data in an entirely *unsupervised* way. To achieve this, we'll turn to variational autoencoders (VAEs).

![The concept of a VAE](https://i.ibb.co/3s4S6Gc/vae.jpg)

As shown in the schematic above and in Lecture 4, VAEs rely on an encoder-decoder structure to learn a latent representation of the input data. In the context of computer vision, the encoder network takes in input images, encodes them into a series of variables defined by a mean and standard deviation, and then draws from the distributions defined by these parameters to generate a set of sampled latent variables. The decoder network then "decodes" these variables to generate a reconstruction of the original image, which is used during training to help the model identify which latent variables are important to learn.

Let's formalize two key aspects of the VAE model and define relevant functions for each.


### Understanding VAEs: loss function

In practice, how can we train a VAE? In learning the latent space, we constrain the means and standard deviations to approximately follow a unit Gaussian. Recall that these are learned parameters, and therefore must factor into the loss computation, and that the decoder portion of the VAE is using these parameters to output a reconstruction that should closely match the input image, which also must factor into the loss. What this means is that we'll have two terms in our VAE loss function:

1.  **Latent loss ($L_{KL}$)**: measures how closely the learned latent variables match a unit Gaussian and is defined by the Kullback-Leibler (KL) divergence.
2.   **Reconstruction loss ($L_{x}{(x,\hat{x})}$)**: measures how accurately the reconstructed outputs match the input and is given by the $L^1$ norm of the input image and its reconstructed output.

The equation for the latent loss is provided by:

$$L_{KL}(\mu, \sigma) = \frac{1}{2}\sum_{j=0}^{k-1} (\sigma_j + \mu_j^2 - 1 - \log{\sigma_j})$$

The equation for the reconstruction loss is provided by:

$$L_{x}{(x,\hat{x})} = ||x-\hat{x}||_1$$

Thus for the VAE loss we have:

$$L_{VAE} = c\cdot L_{KL} + L_{x}{(x,\hat{x})}$$

where $c$ is a weighting coefficient used for regularization. Now we're ready to define our VAE loss function:

In [None]:
def vae_loss_function(x, x_recon, mu, logsigma, kl_weight=0.0005):
  '''Inputs:
  * an input - x,
  * reconstructed output - x_recon,
  * encoded means - mu,
  * encoded log of standard deviation - logsigma,
  * weight parameter for the latent loss - kl_weight

  Outputs:
  * loss value - vae_loss
  '''

  # TODO: Define the latent loss. Note this is given in the first equation in the text block above.
  latent_loss = 0.5 * tf.reduce_sum(tf.exp(logsigma) + tf.square(mu) - 1.0 - logsigma, axis=1)
  # latent_loss = # TODO

  # TODO: Define the reconstruction loss as the mean absolute pixel-wise
  # difference between the input and reconstruction. Hint: you'll need to
  # use tf.reduce_mean, and supply an axis argument which specifies which
  # dimensions to reduce over. For example, reconstruction loss needs to
  # average over the height, width, and channel image dimensions.
  # https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean
  reconstruction_loss = tf.reduce_mean(tf.abs(x - x_recon), axis=(1, 2, 3))
  # reconstruction_loss = # TODO

  # TODO: Define the VAE loss. Note this is given in the equation for L_VAE in the text block directly above.
  vae_loss = kl_weight * latent_loss + reconstruction_loss
  # vae_loss = # TODO

  return vae_loss

Great! Now that we have a more concrete sense of how VAEs work, let's explore how we can leverage this network structure to train a *debiased* facial classifier.

### Understanding VAEs: reparameterization

As you may recall from lecture, VAEs use a "reparameterization  trick" for sampling learned latent variables. Instead of the VAE encoder generating a single vector of real numbers for each latent variable, it generates a vector of means and a vector of standard deviations that are constrained to roughly follow Gaussian distributions. We use Gaussian distrubutions as they lead to computational efficient training processes and provides a good role model distribution that the model should aim to maintain. Here is an example of the standard Gaussian distribution, which we will use for our model:

<img src="https://i.ibb.co/chb3CwFx/Normal-Gaussian-Distribution-Example-for-Lab-2-2.png">

We then sample from the standard deviations and add back the mean to output this as our sampled latent vector. Formalizing this for a latent variable $z$ where we sample $\epsilon \sim N(0,(I))$ we have:

$$z = \mu + e^{\left(\frac{1}{2} \cdot \log{\Sigma}\right)}\circ \epsilon$$

Where $\mu$ is the mean and $\Sigma$ is the covariance matrix. This is useful because it will let us neatly define the loss function for the VAE, generate randomly sampled latent variables, achieve improved network generalization, make our complete VAE network differentiable so that it can be trained via backpropagation. Quite powerful!

Let's define a function to implement the VAE sampling operation:

In [None]:
def sampling(z_mean, z_logsigma):
  '''Performing the reparameterization trick by sampling from a standard Gaussian distribution.

  Inputs:
  * mean of latent distribution - z_mean,
  * log of standard deviation of latent distribution - z_logsigma

  Outputs:
  * sampled latent vector - z
  '''

  batch, latent_dim = z_mean.shape # Get the batch size and latent space dimensionality from the mean.
  epsilon = tf.random.normal(shape=(batch, latent_dim)) # Use these sizes to create a standard Gaussian distribution for sampling.

  # TODO: Define the reparameterization computation. Note the equation is given in the text block immediately above.
  z = z_mean + tf.math.exp(0.5 * z_logsigma) * epsilon
  # z = # TODO
  return z

## 2.5 Debiasing variational autoencoder (DB-VAE)

Now, we'll use the general idea behind the VAE architecture to build a model, termed a [*debiasing variational autoencoder*](https://introtodeeplearning.com/AAAI_MitigatingAlgorithmicBias.pdf) or DB-VAE, to mitigate (potentially) unknown biases present within the training idea. We'll train our DB-VAE model on the facial detection task, run the debiasing operation during training, evaluate on the PPB dataset, and compare its accuracy to our original, biased CNN model.    

### The DB-VAE model

The key idea behind this debiasing approach is to use the latent variables learned via a VAE to adaptively re-sample the CelebA data during training. Specifically, we will alter the probability that a given image is used during training based on how often its latent features appear in the dataset. So, faces with rarer features (like dark skin, sunglasses, or hats) should become more likely to be sampled during training, while the sampling probability for faces with features that are over-represented in the training dataset should decrease (relative to uniform random sampling across the training data).

A general schematic of the DB-VAE approach is shown here:

![DB-VAE](https://raw.githubusercontent.com/MITDeepLearning/introtodeeplearning/2019/lab2/img/DB-VAE.png)

Recall that we want to apply our DB-VAE to a *supervised classification* problem -- the facial detection task. Importantly, note how the encoder portion in the DB-VAE architecture also outputs a single supervised variable, $z_o$, corresponding to the class prediction -- face or not face. Usually, VAEs are not trained to output any supervised variables (such as a class prediction)! This is another key distinction between the DB-VAE and a traditional VAE.

Keep in mind that we only want to learn the latent representation of *faces*, as that's what we're ultimately debiasing against, even though we are training a model on a binary classification problem. We'll need to ensure that, **for faces**, our DB-VAE model both learns a representation of the unsupervised latent variables, captured by the distribution $q_\phi(z|x)$, **and** outputs a supervised class prediction $z_o$, but that, **for negative examples**, it only outputs a class prediction $z_o$.

### Defining the DB-VAE loss function

This means we'll need to be a bit clever about the loss function for the DB-VAE. The form of the loss will depend on whether it's a face image or a non-face image that's being considered.

For **face images**, our loss function will have two components:


1.   **VAE loss ($L_{VAE}$)**: consists of the latent loss and the reconstruction loss.
2.   **Classification loss ($L_y(y,\hat{y})$)**: standard cross-entropy loss for a binary classification problem.

In contrast, for images of **non-faces**, our loss function is solely the classification loss.

We can write a single expression for the loss by defining an indicator variable ${I}_f$which reflects which training data are images of faces (${I}_f(y) = 1$ ) and which are images of non-faces (${I}_f(y) = 0$). Using this, we obtain:

$$L_{total} = L_y(y,\hat{y}) + {I}_f(y)\Big[L_{VAE}\Big]$$

Let's write a function to define the DB-VAE loss function:


In [None]:
def debiasing_loss_function(x, x_pred, y, y_logit, mu, logsigma):
  '''Loss function for the DB-VAE model.

  Inputs:
  * true input - x
  * reconstructed input - x_pred
  * true label (face or not face) - y
  * predicted labels - y_logit
  * mean of latent distribution - mu
  * log of standard deviation of latent distribution - logsigma

  Outputs:
  * total loss of the DB-VAE - total_loss
  * classification loss of the DB-VAE - classification_loss
  '''

  # TODO: Call the relevant function with the correct inputs to get the VAE loss.
  vae_loss = vae_loss_function(x, x_pred, mu, logsigma)
  # vae_loss = vae_loss_function('''TODO''') # TODO

  # TODO: Define the classification loss with sigmoid_cross_entropy.
  # https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits
  # Sigmoid cross entropy is perfect for our binary classification task as the sigmoid squashes all values to be between 0 and 1.
  classification_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=y_logit)
  # classification_loss = # TODO

  # Use the training data labels to create an indicator that marks the images that are faces.
  face_indicator = tf.cast(tf.equal(y, 1), tf.float32)

  # TODO: Define the DB-VAE's total loss. Use tf.reduce_mean to average over all data samples.
  total_loss = tf.reduce_mean(
      classification_loss + # The classification loss
      face_indicator * vae_loss # The VAE loss, but only for the images of faces.
  )
  # total_loss = # TODO

  return total_loss, classification_loss

### DB-VAE architecture

Now we're ready to define the DB-VAE architecture. To build the DB-VAE, we will use the standard CNN classifier from above as our encoder, and then define a decoder network. We will create and initialize the two models, and then construct the end-to-end VAE. We will use a latent space with 100 latent variables.

The decoder network will take as input the sampled latent variables, run them through a series of deconvolutional layers, and output a reconstruction of the original input image.

In [None]:
n_filters = 12 # Factor for the number of convolutional filters/feature maps in our model.
latent_dim = 100 # Dimensionality of the latent space, represents the amount of compression our images will go through.

def make_face_decoder_network():
  # Use functools.partial (if needed) to take a built-in function and set some parameters to have a different default value.
  Conv2DTranspose = functools.partial(tf.keras.layers.Conv2DTranspose, padding='same', activation='relu') # The convolutional transposition layer that performs deconvolutions on its inputs.
  BatchNormalization = tf.keras.layers.BatchNormalization # The batch normalization layer (instead of pooling) that puts all values at a similar scale.
  Flatten = tf.keras.layers.Flatten # The flatten layer that turns its 2-dimensional input into a 1-dimensional output.
  Dense = functools.partial(tf.keras.layers.Dense, activation='relu') # The dense (fully-connected) layer for final computations and classification.
  Reshape = tf.keras.layers.Reshape # The reshape layer that converts its inputs to a different shape or dimensionality.

  # All convolutional transposition layers use the ReLU non-linear activation function.
  decoder = tf.keras.Sequential([
    Dense(units=4*4*6*n_filters), # Expand the 100-dimensional latent space into a more representative 1-dimensional feature map.
    Reshape(target_shape=(4, 4, 6*n_filters)), # Change the shape of those outputs to set each feature map at size 4 x 4.

    # Using deconvolution to decrease the number of feature maps and reach an actual generated output.
    Conv2DTranspose(filters=4*n_filters, kernel_size=3, strides=2), # Scale to 8 x 8 size with 48 feature maps and kernel size 3 x 3.
    Conv2DTranspose(filters=2*n_filters, kernel_size=3, strides=2), # Scale to 16 x 16 size with 24 feature maps and kernel size 3 x 3.
    Conv2DTranspose(filters=1*n_filters, kernel_size=5, strides=2), # Scale to 32 x 32 size with 12 feature maps and kernel size 5 x 5 (capture more strucuture).
    Conv2DTranspose(filters=3, kernel_size=5, strides=2) # Scale to 64 x 64 size with 3 feature maps (RGB) and kernel size 5 x 5.
  ])

  return decoder

Now, we will put this decoder together with the standard CNN classifier as our encoder to define the DB-VAE. Note that at this point, there is nothing special about how we put the model together that makes it a "debiasing" model -- that will come when we define the training operation. Here, we will define the core VAE architecture by sublassing the `Model` class; defining encoding, reparameterization, and decoding operations; and calling the network end-to-end.

In [None]:
class DB_VAE(tf.keras.Model):
  def __init__(self, latent_dim):
    super(DB_VAE, self).__init__()
    self.latent_dim = latent_dim # Take an input of latent_dim to define the model's latent dimensionality.

    # Define the number of outputs for the encoder. We will also have latent_dim latent variables.
    num_encoder_dims = 2*self.latent_dim + 1

    self.encoder = make_cnn_standard_classifier(num_encoder_dims) # Define our encoder with the set dimensionality using the standard CNN.
    self.decoder = make_face_decoder_network() # Define our decoder using the new function we just created.

  def encode(self, x):
    # Get the output of the encoder based on its inputs.
    encoder_output = self.encoder(x)
    # Get the actual classifications based on those outputs.
    y_logit = tf.expand_dims(encoder_output[:, 0], -1)

    # Get the mean and log of standard deviation for the latent space.
    z_mean = encoder_output[:, 1:self.latent_dim+1]
    z_logsigma = encoder_output[:, self.latent_dim+1:]

    return y_logit, z_mean, z_logsigma

  def reparameterize(self, z_mean, z_logsigma):
    # TODO: Call the sampling function defined previously.
    z = sampling(z_mean, z_logsigma) # Given the latent space's mean and log of standard deviation, sample some latent variables.
    # z = # TODO

    return z

  def decode(self, z):
    # TODO: Use the decoder to take the reparameterized samples and output the reconstruction.
    reconstruction = self.decoder(z) # Pass the samples through the decoder, get the reconstruction.
    # reconstruction = # TODO

    return reconstruction

  # The call function passes an input of x all the way through the DB-VAE.
  def call(self, x):
    # Encode the input to the latent space and get a prediction.
    y_logit, z_mean, z_logsigma = self.encode(x)

    # TODO: Perform reparameterization on the latent space.
    z = self.reparameterize(z_mean, z_logsigma)
    # z = # TODO

    # TODO: Find the reconstruction based on the reparameterized samples.
    recon = self.decode(z)
    # recon = # TODO
    return y_logit, z_mean, z_logsigma, recon # Return the predictions, latent space, and reconstruction.

  # Based on an input of x, predict if it is a face or not.
  def predict(self, x):
    y_logit, z_mean, z_logsigma = self.encode(x) # Use the encoder just like a CNN.
    return y_logit # Return only the prediction without the latent space.

dbvae = DB_VAE(latent_dim) # Define our DB-VAE with the set latent dimensionality from before (should be 100).

As stated, the encoder architecture is identical to the CNN from earlier in this lab. Note the outputs of our constructed DB_VAE model in the `call` function: `y_logit, z_mean, z_logsigma, z`. Think carefully about why each of these are outputted and their significance to the problem at hand.



### Adaptive resampling for automated debiasing with DB-VAE

So, how can we actually use DB-VAE to train a debiased facial detection classifier?

Recall the DB-VAE architecture: as input images are fed through the network, the encoder learns an estimate ${Q}(z|X)$ of the latent space. We want to increase the relative frequency of rare data by increased sampling of under-represented regions of the latent space. We can approximate ${Q}(z|X)$ using the frequency distributions of each of the learned latent variables, and then define the probability distribution of selecting a given datapoint $x$ based on this approximation. These probability distributions will be used during training to re-sample the data.

You'll write a function to execute this update of the sampling probabilities, and then call this function within the DB-VAE training loop to actually debias the model.

First, we've defined a short helper function `get_latent_mu` that returns the latent variable means returned by the encoder after a batch of images is inputted to the network:

In [None]:
# Get the mean of the latent space.
def get_latent_mu(images, dbvae, latent_dim, batch_size=1024):
  '''Inputs:
  * group of images - images
  * model to pass images through - dbvae
  * latent space dimensionality - latent_dim
  * number of images randomly selected at one time - batch_size

  Outputs:
  * mean of the latent space for our given images - mu
  '''

  N = images.shape[0] # Find the number of images provided.
  mu = np.zeros((N, latent_dim)) # Initialize the mean as a multi-dimensional array of zeros.

  for start_ind in range(0, N, batch_size): # Loop through every possible batch in our provided images.
    end_ind = min(start_ind+batch_size, N+1) # Loop provides starting index, so we need to find the ending index of our batch.
    batch = (images[start_ind:end_ind]).astype(np.float32)/255. # Convert our batch of images to a numerical representation.
    _, batch_mu, _ = dbvae.encode(batch) # Find the latent space mean for our batch by using our DB-VAE encoder.
    mu[start_ind:end_ind] = batch_mu # Add the batch mean to our overall mean variable.

  return mu # Return the entire latent space mean after all batches are done.

Now, let's define the actual resampling algorithm `get_training_sample_probabilities`. Importantly note the argument `smoothing_fac`. This parameter tunes the degree of debiasing: for `smoothing_fac=0`, the re-sampled training set will tend towards falling uniformly over the latent space, i.e., the most extreme debiasing.

In [None]:
def get_training_sample_probabilities(images, dbvae, latent_dim, bins=10):
  '''Function that recomputes the latent space sampling probabilities for the images within a given batch based on their feature distribution in the training data.

  Inputs:
  * group of images - images
  * model to pass images through - dbvae
  * latent space dimensionality - latent_dim
  * number of classes for probabilities - bins

  Outputs:
  * new sampling probabilities - training_sample_p
  '''

  print("Recomputing the sampling probabilities")

  # TODO: Get the mean of the latent space for our given images.
  mu = get_latent_mu(images, dbvae, latent_dim) # Remember to use the correct inputs for the function.
  # mu = get_latent_mu('''TODO''') # TODO

  training_sample_p = np.zeros(mu.shape[0]) # Initialize our new sampling probabilities to be as large as our means.

  for i in range(latent_dim): # Loop through every latent variable in the latent space.
      latent_distribution = mu[:,i] # Get the means for all of the images at our given latent variable.

      hist_density, bin_edges = np.histogram(latent_distribution, density=True, bins=bins) # Create a histogram using the latent distribution and our set number of bins.

      # Extend the range of our histogram to be infinite.
      bin_edges[0] = -float('inf')
      bin_edges[-1] = float('inf')

      # TODO: Use the np.digitize function to find the bins in the latent distribution that every data sample falls into.
      # https://numpy.org/doc/stable/reference/generated/numpy.digitize.html
      bin_idx = np.digitize(latent_distribution, bin_edges)
      # bin_idx = np.digitize('''TODO''', '''TODO''') # TODO

      # Smooth the histogram to become a probability distribution that sums to 1.
      hist_smoothed_density = hist_density / np.sum(hist_density)

      # Invert the newly smoothed histogram.
      p = 1.0 / (hist_smoothed_density[bin_idx-1])

      # TODO: Normalize all the inverted probabilities just like our smoothing operation from before.
      p = p / np.sum(p)
      # p = # TODO

      # TODO: Update the sampling probabilities by comparing the new probability distribution to the existing one to find which is larger.
      training_sample_p = np.maximum(p, training_sample_p)
      # training_sample_p = # TODO

  # After the probability distribution is finalized, perform the final smoothing operation on our probability distribution.
  training_sample_p = training_sample_p / np.sum(training_sample_p)

  return training_sample_p

Now that we've defined the resampling update, we can train our DB-VAE model on the CelebA/ImageNet training data, and run the above operation to re-weight the importance of particular data points as we train the model. Remember again that we only want to debias for features relevant to *faces*, not the set of negative examples. Complete the code block below to execute the training loop!

In [None]:
# Training hyperparameters - experiment to find what reaches the highest accuracy
params = dict(
  batch_size = 32, # Number of random training examples fed in at one time.
  learning_rate = 3e-1, # Learning rate for the optimizer.
  latent_dim = 150 # Number of dimensions in the latent space.
)
# Find other optimizers to try at https://www.tensorflow.org/api_docs/python/tf/keras/optimizers
optimizer = tf.keras.optimizers.Adadelta(params["learning_rate"])

loss_history = mdl.util.LossHistory(smoothing_factor=0.99) # Create a list to track our loss during the training process.
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss') # Create a graph to plot the loss during the training process.
if hasattr(tqdm, '_instances'): tqdm._instances.clear() # Clear any previous progress bars if they exist.

# Create a new DB-VAE model with our set latent space dimensionality.
dbvae = DB_VAE(params["latent_dim"])

@tf.function
def debiasing_train_step(x, y):

  with tf.GradientTape() as tape:
    # Feed the given images into the model. Note that this automatically uses the DB-VAE call function.
    y_logit, z_mean, z_logsigma, x_recon = dbvae(x)

    '''TODO: Call the DB-VAE loss function to compute the loss'''
    loss, class_loss = debiasing_loss_function(x, x_recon, y, y_logit, z_mean, z_logsigma) # Compute the loss using the inputs, outputs, expected outputs, and latent space.
    # loss, class_loss = debiasing_loss_function('''TODO arguments''') # TODO

  # Backpropagation by computing the gradient on our loss and applying it.
  '''TODO: Use the tape.gradient method to compute the gradients.
     Hint: this is with respect to the trainable_variables of the dbvae.'''
  grads = tape.gradient(loss, dbvae.trainable_variables) # Compute the gradient for the loss with the hyperparameters.
  # grads = tape.gradient('''TODO''', '''TODO''') # TODO
  optimizer.apply_gradients(zip(grads, dbvae.trainable_variables)) # Apply the gradient on the model using our optimizer and the hyperparameters.

  return loss

# Get all of the training images that contain faces to train our model on them.
all_faces = loader.get_all_train_faces()

'''TODO: Recompute the sampling probabilities for debiasing'''
p_faces = get_training_sample_probabilities(all_faces, dbvae, params["latent_dim"]) # Use our previously defined function with the correct parameters.
# p_faces = get_training_sample_probabilities('''TODO''', '''TODO''') # TODO

for _ in tqdm(range(loader.get_train_size() // params["batch_size"])): # Compute number of possible batches that can be made from our dataset and train for every possible batch.
  # load a batch of data
  x, y = loader.get_batch(params["batch_size"], p_pos=p_faces) # Get a batch of training data using the debiasing sampling probabilities.

  # loss optimization
  loss = debiasing_train_step(x, y) # Compute the loss for these inputs.

  # Record the loss and plot the evolution of the loss as a function of training
  loss_history.append(loss.numpy().mean()) # Add the loss to the loss_history list.
  plotter.plot(loss_history.get()) # Graph the new loss_history list for progress tracking.

print('\nFinal loss after training:', loss.numpy().mean())

Wonderful! Now we should have a trained and debiased facial classification model, ready for evaluation!

## 2.6 Evaluation of DB-VAE on Test Dataset

Finally let's test our DB-VAE model on the test dataset, looking specifically at its accuracy on each the "Dark Male", "Dark Female", "Light Male", and "Light Female" demographics. We will compare the performance of this debiased model against the (potentially biased) standard CNN from earlier in the lab.

In [None]:
dbvae_logits = [dbvae.predict(np.array(x, dtype=np.float32)) for x in test_faces] # Get the predictions for the testing face images we previously defined.
dbvae_probs = tf.squeeze(tf.sigmoid(dbvae_logits)) # Turn them into binary classifications.

# Plot the prediction accuracies for each demographic from both our CNN and the new DB-VAE for comparison.
xx = np.arange(len(keys))
plt.bar(xx, standard_cnn_classifier_probs.numpy().mean(1), width=0.2, label="Standard CNN")
plt.bar(xx+0.2, dbvae_probs.numpy().mean(1), width=0.2, label="DB-VAE")
plt.xticks(xx, keys)
plt.title("Network predictions on test dataset")
plt.ylabel("Probability"); plt.legend(bbox_to_anchor=(1.04, 1), loc="upper left")

## 2.7 Conclusion

Now that you have created your debiased model, try to optimize it to achieve improved performance. We encourage you to think about and address some questions raised by the approach and results outlined here:

*  How does the accuracy of the DB-VAE across the four demographics compare to that of the standard CNN? Do you find this result surprising in any way?
*  How can the performance of the DB-VAE classifier be improved even further? We purposely did not optimize hyperparameters to leave this up to you!
*  In which applications (either related to facial detection or not!) would debiasing in this way be desired? Are there applications where you may not want to debias your model?
* Do you think it should be necessary for companies to demonstrate that their models, particularly in the context of tasks like facial detection, are not biased? If so, do you have thoughts on how this could be standardized and implemented?
* Do you have ideas for other ways to address issues of bias, particularly in terms of the training data?

Hopefully this lab has shed some light on a few concepts like vision based tasks, VAEs, and algorithmic bias. We definitely think it has, but we're biased 😉.

<img src="https://i.ibb.co/BjLSRMM/ezgif-2-253dfd3f9097.gif" />