<a href="https://colab.research.google.com/github/Antony-gitau/probabilistic_AI_playgraound/blob/main/Variational_autoencoder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Notes:

Generative models aim to learn the probability density function $p(x)$ 
 (distribution probability) that best describes the training data to enable generation by sampling from the distribution.

 - sample generation: 
 - density estimation:

different strategy to achieve this goal:
1. explicit density model - 
2. implicit density model - 

latent variable models:
1. variational autoencoder
2. Generative Adversarial Network

**Autoencoder**

- unsupervised learning approach
- dimensionality of latent space 
- representation learning 
- self encoding or auto-encoding

**variational autoencoder**
- probability twist on autoencoder
- loss function: 
encoder: f



generation is the process of computing a data point $x$ from the latent variable $z$. 

maximum likelihood of the mean or standard deviation - estimation technique for parameters of a probability distribution so that the distribution fits the observed data. 


likelihood - finding the optimal values for the mean (or standard deviation) for a distribution given a bunch of observed measurements.

approximation methods:
1. markov chain monte carlo
2. variational inference


amortized variational inference - train an external neural network to predict the variational parameters instead of optimizing ELBO per data point.

reparametarization - rewriting the the expectation so that the distribution is independent of the parameter $Θ$

**GAN**
- conditional GAN
- CycleGAN

**Diffusion models**

[good read](https://theaisummer.com/latent-variable-models/)

[mit lecture](https://youtu.be/3G5hWM6jqPk?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI)

In this notebook we will go into details on Variational Autoencoder with inspiration from [MIT'S Lab 2 on face detection](https://colab.research.google.com/drive/1j2W42n7R2DeBuO0xQZr7kDmTY2Z4Sexw#scrollTo=nLemS7dqECsI)

![The concept of a VAE](https://i.ibb.co/3s4S6Gc/vae.jpg) this is the architectural design of a Variational Autoencoder (VAE).

**encoder** transform inputs into variables, for example defined by mean and standard deviation, then draw from that distribution (defined by the mean and standard deviation) to generate a set of sampled latent variables.

**decoder** the decoder's task is to reconstruct the latent variable to the original form of the input data. The decoder learns which latent variables are important. 

**goal of VAE** is therefore to train a model that learns a representation of the underlying latent space of the training data.

**Loss function** 
- the encoder needs to learn latent variables which should ideally match the input (remember the input is transformed into a unit gaussian). this is the latent loss.

- the decoder needs to reconstruct the latent variables clossly to the input. this is reconstruction loss.

- so the total loss of a variational autoencoder is the sum of latent and reconstruction loss.


In [1]:
import tensorflow as tf

In [2]:
def vae_loss_function(x, x_reconstructed, mu, logsigma, kl_weight = 0.005):
  '''
  we pass an input and its reconstruction.
  we also define the prior distribution with mean and log of standard deviation
  kl_weight is the weight parameter for the latent loss used for regularization
  '''
  latent_loss = 0.5 * tf.reduce_sum(tf.exp(logsigma) + tf.square(mu) -1 - logsigma, axis=-1)
  abs_diff = tf.abs(x, x_reconstructed)
  reconstruction_loss = tf.reduce_mean(abs_diff, axis=[1,2,3])
  vae_loss = kl_weight * latent_loss + reconstruction_loss

  return vae_loss

Sampling and reparameterization

In [3]:
def sampling(z_mean, z_logsigma):
  '''
  we pass the mean and logarithm of variance of the learned latent variable
  we extracts the batch size and the number of dimensions in the latent space
  we then generate random samples from a standard normal distribution 
  the shape of the random sample is the same as the mean of latent variables
  we compute the parameterization trick
  '''
  batch, latent_dim = z_mean.shape
  epsilon = tf.random.normal(shape = (batch, latent_dim))
  z = z_mean  + tf.exp(0.5 * z_logsigma) * epsilon

  return z



Semi-supervised variational autoencoder (SS-VAE)

- the motivation is to use variational encoder to uncover bias in face recognition. 
- notice there is a supervised classification problem.
- notice also that the VAE we talked about earlier did not output supervised variables.

- lets workout a loss function

In [None]:
def ss_vae_loss_function(x, x_pred, y, y_logit, mu, logsigma):
  '''
  '''
  vae_loss = vae_loss_function(x, x_pred, mu, logsigma, kl_weight = 0.005)
  classification_loss = tf.nn.sigmoid_cross_entropy_with_logits(y_logit,y)
  classification_loss = 