# Normalizing Flows

Normalizing flows are for powerful distribution approximation.

For more detail, [this](https://lilianweng.github.io/posts/2018-10-13-flow-models/) and especially [this](https://blog.evjang.com/2018/01/nf1.html).


## Definition

Normalizing flows are for more powerful distribution approximation.
The name “normalizing flow” can be interpreted as the following:

1. “Normalizing” means that the change of variables gives a normalized density after applying an invertible transformation.
2. “Flow” means that the invertible transformations can be composed with each other to create more complex invertible transformations.

In normalizing flows, we wish to map simple distributions (easy to sample and evaluate densities) to complex ones (learned via data)by applying a sequence of invertible transformation functions. Flowing through a chain of transformations, we repeatedly substitute the variable for the new one according to the change of variables theorem and eventually obtain a probability distribution of the final target variable.

<img src="notebook-images/normalizing-flow.png" alt="Alt Text" width="900"/>


*Figure 1: A representation of normalizing flows, illustrating the transformation of a simple, easy-to-sample distribution into a complex target distribution. This is achieved by applying a sequence of invertible transformations, where each step leverages the change of variables theorem to update the density of the transformed variable progressively toward the desired target distribution.*




## Change of Variables Formula

The change of variables formula describes how to evaluate the densities of a random variable that is a deterministic transformation from another variable.
**Change of Variables:** 

**Change of Variables:** Let _Z_ and _X_ be random variables that are related by a mapping  
$f: \mathbb{R}^n \rightarrow \mathbb{R}^n$ such that $X = f(Z)$ and $Z = f^{-1}(X)$. Then:

$$
p_X(\mathbf{x}) = p_Z\left(f^{-1}(\mathbf{x})\right) \left|\det\left(\frac{\partial f^{-1}(\mathbf{x})}{\partial \mathbf{x}}\right)\right|
$$

*  The input and output dimensions must be the same.
*  The transformation must be invertible.
*  Computing the determinant of the Jacobian needs to be efficient and differentiable.


### Golden Quote
**Determinants are nothing more than the amount (and direction) of volume distortion of a linear transformation, generalized to any number of dimensions or the local, linearized rate of volume change of a transformation.**

## Training criterion

With normalizing flows in our toolbox, the exact log-likelihood of input data, $\log p(\mathbf{x})$ becomes tractable. As a result, the training criterion of flow-based generative model is simply the negative log-likelihood (NLL) over the training dataset $\mathcal{D}$.
$$\mathcal{L}(\mathcal{D}) = - \frac{1}{\vert\mathcal{D}\vert}\sum_{\mathbf{x} \in \mathcal{D}} \log p(\mathbf{x})$$

## Models of Normalizing Flows
 [RealNVP](https://arxiv.org/abs/1605.08803), [Glow](https://arxiv.org/abs/1807.03039) and [MADE](https://arxiv.org/abs/1502.03509).
  
TensorFlow has a nice [set of functions](https://arxiv.org/pdf/1711.10604) that make it easy to build flows.

## Transformed Distributions in TensorFlow
TensorFlow has an elegant API for transforming distributions. A `TransformedDistribution` is specified by a **base distribution** object that we will transform, and a **Bijector** object that implements:

1. A forward transformation $y = f(x)$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}^d$
2. Its inverse transformation $x = f^{-1}(y)$
3. The inverse log determinant of the Jacobian $\log |\det J(f^{-1}(y))|$

For the rest of this post, I will abbreviate this quantity as **ILDJ**.



Under this abstraction, forward sampling is trivial:

In [None]:
bijector.forward(base_dist.sample())

To evaluate log-density of the transformed distribution:


In [None]:
distribution.log_prob(bijector.inverse(x)) + bijector.inverse_log_det_jacobian(x)


### Building a basic Normalizing Flow in TensorFlow

TF [Distributions](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/Distribution) - general API for manipulating distributions in TF.

TF [Bijector](https://www.tensorflow.org/probability/api_docs/python/tfp/bijectors) - general API for creating operators on distributions.

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
tfb = tfp.bijectors 

We are trying to model the distribution $p(x_1, x_2) = \mathcal{N}(x_1 \mid \mu = \frac{1}{4} x_2^2, \sigma = 1) \cdot \mathcal{N}(x_2 \mid \mu = 0, \sigma = 4)$. We can generate samples from the target distribution using the following code snippet (we generate them in TensorFlow to avoid having to copy samples from the CPU to the GPU on each minibatch:


In [8]:
batch_size=512
x2_dist = tfd.Normal(loc=0., scale=4.)
x2_samples = x2_dist.sample(batch_size)
x1 = tfd.Normal(loc=.25 * tf.square(x2_samples),
                scale=tf.ones(batch_size, dtype=tf.float32))
x1_samples = x1.sample()
x_samples = tf.stack([x1_samples, x2_samples], axis=1)