<a href="https://colab.research.google.com/github/Antony-gitau/probabilistic_AI_playgraound/blob/main/Variational_autoencoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Notes:

Generative models aim to learn the probability density function $p(x)$ 
 (distribution probability) that best describes the training data to enable generation by sampling from the distribution.

 - sample generation: 
 - density estimation:

different strategy to achieve this goal:
1. explicit density model - 
2. implicit density model - 

latent variable models:
1. variational autoencoder
2. Generative Adversarial Network

**Autoencoder**

- unsupervised learning approach
- dimensionality of latent space 
- representation learning 
- self encoding or auto-encoding

**variational autoencoder**
- probability twist on autoencoder
- loss function: 
encoder: f



generation is the process of computing a data point $x$ from the latent variable $z$. 

maximum likelihood of the mean or standard deviation - estimation technique for parameters of a probability distribution so that the distribution fits the observed data. 


likelihood - finding the optimal values for the mean (or standard deviation) for a distribution given a bunch of observed measurements.

approximation methods:
1. markov chain monte carlo
2. variational inference


amortized variational inference - train an external neural network to predict the variational parameters instead of optimizing ELBO per data point.

reparametarization - rewriting the the expectation so that the distribution is independent of the parameter $Θ$

**GAN**
- conditional GAN
- CycleGAN

**Diffusion models**

[good read](https://theaisummer.com/latent-variable-models/)

[mit lecture](https://youtu.be/3G5hWM6jqPk?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI)

In this notebook we will go into details on Variational Autoencoder with inspiration from [MIT'S Lab 2 on face detection](https://colab.research.google.com/drive/1j2W42n7R2DeBuO0xQZr7kDmTY2Z4Sexw#scrollTo=nLemS7dqECsI)

![The concept of a VAE](https://i.ibb.co/3s4S6Gc/vae.jpg) this is the architectural design of a Variational Autoencoder (VAE).

**encoder** transform inputs into variables, for example defined by mean and standard deviation, then draw from that distribution (defined by the mean and standard deviation) to generate a set of sampled latent variables.

**decoder** the decoder's task is to reconstruct the latent variable to the original form of the input data. The decoder learns which latent variables are important. 

**goal of VAE** is therefore to train a model that learns a representation of the underlying latent space of the training data.

**Loss function** 
- the encoder needs to learn latent variables which should ideally match the input (remember the input is transformed into a unit gaussian). this is the latent loss.

- the decoder needs to reconstruct the latent variables clossly to the input. this is reconstruction loss.

- so the total loss of a variational autoencoder is the sum of latent and reconstruction loss.


In [1]:
import tensorflow as tf

In [2]:
def vae_loss_function(x, x_reconstructed, mu, logsigma, kl_weight = 0.005):
  '''
  we pass an input and its reconstruction.
  we also define the prior distribution with mean and log of standard deviation
  kl_weight is the weight parameter for the latent loss used for regularization
  '''
  latent_loss = 0.5 * tf.reduce_sum(tf.exp(logsigma) + tf.square(mu) -1 - logsigma, axis=-1)
  abs_diff = tf.abs(x, x_reconstructed)
  reconstruction_loss = tf.reduce_mean(abs_diff, axis=[1,2,3])
  vae_loss = kl_weight * latent_loss + reconstruction_loss

  return vae_loss

Sampling and reparameterization

In [3]:
def sampling(z_mean, z_logsigma):
  '''
  we pass the mean and logarithm of variance of the learned latent variable
  we extracts the batch size and the number of dimensions in the latent space
  we then generate random samples from a standard normal distribution 
  the shape of the random sample is the same as the mean of latent variables
  we compute the parameterization trick
  '''
  batch, latent_dim = z_mean.shape
  epsilon = tf.random.normal(shape = (batch, latent_dim))
  z = z_mean  + tf.exp(0.5 * z_logsigma) * epsilon

  return z



Semi-supervised variational autoencoder (SS-VAE)

- the motivation is to use variational encoder to uncover bias in face recognition. 
- notice there is a supervised classification problem.
- notice also that the VAE we talked about earlier did not output supervised variables.

- lets workout a loss function

In [5]:
def ss_vae_loss_function(x, x_pred, y, y_logit, mu, logsigma):
  '''
  we pass an input, x, and its reconstruction, x_pred
  we then pass a true label, y, and its prediction, y_logit
  y_logits are the raw output before applying sigmoid function
  we then pass the mean and log standard deviation of the learned latent distribution
  '''
  # we call the vae_loss function we defined earlier
  vae_loss = vae_loss_function(x, x_pred, mu, logsigma, kl_weight = 0.005)
  
  # we then define classification loss 
  classification_loss = tf.nn.sigmoid_cross_entropy_with_logits(y_logit,y)
  classification_loss = tf.reduce_mean(classification_loss)

  # we create an indicator for which training data are images of faces
  face_indicator = tf.cast(tf.equal(y, 1), tf.float32)

  # total ss_vae loss
  total_loss = vae_loss + face_indicator * classification_loss
  total_loss = tf.reduce_mean(total_loss)

  return total_loss, classification_loss, vae_loss


we define the ss-vae architecture:

- we will first define the encoder. that will entail supervised outputs from the classification and the number of latent variables from the encoder (latent variables are the output of the encoder)

- Then we define the decoder. This is the part of the architecture that will output reconstruction.


In [None]:
### Define the CNN model ###
import functools
n_filters = 12 # base number of convolutional filters

'''Function to define a standard CNN model'''
def make_standard_classifier(n_outputs=1):
  Conv2D = functools.partial(tf.keras.layers.Conv2D, padding='same', activation='relu')
  BatchNormalization = tf.keras.layers.BatchNormalization
  Flatten = tf.keras.layers.Flatten
  Dense = functools.partial(tf.keras.layers.Dense, activation='relu')

  model = tf.keras.Sequential([ 
    Conv2D(filters=1*n_filters, kernel_size=5,  strides=2),
    BatchNormalization(),
    
    Conv2D(filters=2*n_filters, kernel_size=5,  strides=2),
    BatchNormalization(),

    Conv2D(filters=4*n_filters, kernel_size=3,  strides=2),
    BatchNormalization(),

    Conv2D(filters=6*n_filters, kernel_size=3,  strides=2),
    BatchNormalization(),

    Flatten(),
    Dense(512),
    Dense(n_outputs, activation=None),
  ])
  return model

standard_classifier = make_standard_classifier()

In [None]:
### Define the decoder portion of the SS-VAE ###

def make_face_decoder_network(n_filters=12):

  # Functionally define the different layer types we will use
  Conv2DTranspose = functools.partial(tf.keras.layers.Conv2DTranspose, padding='same', activation='relu')
  BatchNormalization = tf.keras.layers.BatchNormalization
  Flatten = tf.keras.layers.Flatten
  Dense = functools.partial(tf.keras.layers.Dense, activation='relu')
  Reshape = tf.keras.layers.Reshape

  # Build the decoder network using the Sequential API
  decoder = tf.keras.Sequential([
    # Transform to pre-convolutional generation
    Dense(units=4*4*6*n_filters),  # 4x4 feature maps (with 6N occurances)
    Reshape(target_shape=(4, 4, 6*n_filters)),

    # Upscaling convolutions (inverse of encoder)
    Conv2DTranspose(filters=4*n_filters, kernel_size=3,  strides=2),
    Conv2DTranspose(filters=2*n_filters, kernel_size=3,  strides=2),
    Conv2DTranspose(filters=1*n_filters, kernel_size=5,  strides=2),
    Conv2DTranspose(filters=3, kernel_size=5,  strides=2),
  ])

  return decoder

In [None]:
# we inherit a model class from keras to define an ss_vae loss
class SS_VAE(tf.keras.Model):

  # initialize the class SS_VAE and take the latent dim arguement
  def __init__(self, latent_dim):
    super(SS_VAE, self).__init__()
    self.latent_dim = latent_dim

  #We accomodate the output of the latent variables and supervised outputs of classfication
    num_encoder_dims = 2 * self.latent_dim + 1

    # we create instances of neural networks models encoder and decoder
    self.encoder = make_standard_classifier(num_encoder_dims)
    self.decoder = make_face_decoder_network()


    # we now define the function that feed the encoder with:
    # latent space and classification probability

    def encoder(self, x):
      encoder_output = self.encoder(x)

      y_logit = tf.expand_dims(encoder_output[:,0], -1)
      z_mean = encoder_output[:,1:self.latent_dim+1]
      z_logsigma = encoder_output[:, self.latent_dim+1:]

      return y_logit, z_mean, z_logsigma

    # we will then define decoding function



  

