Helpful resource:
- [Lilian Weng's blog on Flow-based deep generative models](https://lilianweng.github.io/posts/2018-10-13-flow-models/)

## **Why Do We Need Flow‑Based Models?**

Imagine you have a simple "source" distribution (like a basic Gaussian) and you want to "sculpt" it into a much more complex shape to match real data (eg: the distribution of natural images). Flow‑based models achieve this by "flowing" the simple distribution through a series of invertible transformations. Think of it like molding clay: you start with a basic block and then apply a series of well‑designed moves until you get the shape you want.

Key Advantages Compared to Other Deep Generative Models:

- **Exact Likelihood Computation:** Unlike GANs (which only provide implicit density estimates) and some types of VAEs (which use approximations), flow‑based models let you compute the exact probability density of any given data point. This makes training and evaluation more straightforward.
- **Efficient Sampling and Inversion:** Because the transformations are invertible, you can easily switch between the latent (simple) space and the data (complex) space. This is useful for both generating new data and for tasks like density estimation.
- **Interpretability and Flexibility:** Each step in the model (each "flow") can be examined and understood, making it easier to see how the model transforms data step by step.

## **Some jargon**

- **Transformation** is a function that maps data from one space to another. In flow‑based models, transformations are invertible, meaning after you've transformed the data, you can still go back to the original space.
- **Density estimation** is the problem of reconstructing the probability density function using a set of given data points.
- In probability theory, a **probability density function (PDF)**, **density function**, or **density**, all mean the same thing: a function that describes the probability of a random variable taking on a particular value.

## **Flows for Continuous Random Variables**



### **Big Picture**

Think of a flow as a sequence of "waterfalls" that progressively reshape a simple stream (eg: a Gaussian distribution) into a complex river (the data distribution). Each waterfall is an invertible transformation that's carefully designed so that you can "read back" the exact change made to the probability density.

Below is an in‐depth, step‐by‐step explanation of flow‑based models—starting with continuous random variables. I’ll begin with an intuitive introduction, build a clear roadmap of concepts, and then dive into the technical details. At the end of this section, I’ll pause to ask if you have any doubts before moving on to flows for discrete random variables.

**The Change-of-Variables Principle:**  
At the heart of flow‑based models is the change-of-variables formula from probability. If you have a random variable $\mathbf{z}$ with a known density $p_{\mathbf{z}}(\mathbf{z})$ and you transform it using an invertible function $f$ such that

$$\mathbf{x} = f(\mathbf{z})$$

then the density $p_{\mathbf{x}}(\mathbf{x})$ is given by:

$$p_{\mathbf{x}}(\mathbf{x}) = p_{\mathbf{z}}(f^{-1}(\mathbf{x})) \left| \det \left( \frac{\partial f^{-1}(\mathbf{x})}{\partial \mathbf{x}} \right) \right|$$

This formula ensures that probability mass is conserved through the transformation.

## **Deep Dive**

The above just provided you with an intuition of how flow-based models work. Now, let's dive into the details.

We start with the goal of transforming a simple probability distribution into a more complex one using an invertible function. The mathematical backbone is the **change‑of‑variables formula**. Let's systematically break it down.

### **Probability Conservation**

**Concept:**  
Imagine you have a bucket of water. No matter how you pour or reshape the water (without spilling any), the total amount remains constant. In probability, this "water" is the probability mass.

- **Mathematically:**  
  If you have a random variable $z$ with density $p_z(z)$, then for any region $A$ in the space, the probability that $z$ falls in $A$ is:
  
  $$P(z \in A) = \int_A p_z(z) \, dz$$
  
  Now, if we transform $ z $ via an invertible function $ f $ such that
  
  $$x = f(z)$$
  then every set $ A $ in the $ z $-space corresponds to a set $ f(A) $ in the $ x $-space. Since no probability is lost:
  $$\int_A p_z(z) \, dz = \int_{f(A)} p_x(x) \, dx$$

### **Change of Variables in Integrals**

**Why change variables?**  
When we change the variable of integration (from $z$ to $x$), the size of a small "chunk" of space (an infinitesimal volume) changes. In calculus, this is handled by the **Jacobian determinant**.

- **Substitution:**  
  Given $x = f(z)$, a small change $ dz $ in $ z $ corresponds to a change $ dx $ in $ x $. The relationship is:
  
  $$dx = \Bigl|\det\left(\frac{\partial f(z)}{\partial z}\right)\Bigr| dz$$
  where $\frac{\partial f(z)}{\partial z}$ is the Jacobian matrix of partial derivatives and the determinant tells you how much a small volume element is scaled by $f$.

### **Step-by-Step Derivation**

Now, let's go through the derivation:

1. **Start with Probability Conservation:**  
   For any measurable set $A$,
   $$\int_A p_z(z) \, dz = \int_{f(A)} p_x(x) \, dx$$
   This equation means the probability mass in $A$ under $z$ is exactly the mass in $f(A)$ under $x$.

2. **Apply Change of Variables:**  
   Substitute $x = f(z)$ in the right integral. Changing variables gives:
   
   $$dx = \Bigl|\det\left(\frac{\partial f(z)}{\partial z}\right)\Bigr| dz$$
   
   So the integral becomes:

   $$\int_A p_x(f(z)) \Bigl|\det\left(\frac{\partial f(z)}{\partial z}\right)\Bigr| dz$$

3. **Set the Integrands Equal:**  
   Because the above equality holds for every set $A$, the integrands themselves must be equal almost everywhere. Hence,
   
   $$p_x(f(z)) \Bigl|\det\left(\frac{\partial f(z)}{\partial z}\right)\Bigr| = p_z(z)$$

4. **Solve for $p_x(x)$:**  
   
   Replace $z$ with $f^{-1}(x)$ (since $f$ is invertible):
   
   $$p_x(x) = p_z(f^{-1}(x)) \Bigl|\det\left(\frac{\partial f^{-1}(x)}{\partial x}\right)\Bigr|$$

   This is the **change‑of‑variables formula** that tells us how to compute the density $p_x(x)$ after applying the invertible transformation $f$.

### **Summary**

- **Probability Conservation:**  
  It means that if you "reshape" the distribution by a transformation, the total probability (mass) stays the same.

- **Change of Variables in Integrals:**  
  In integration, when you switch from one variable to another, you must adjust the "size" of the differential element by the Jacobian determinant. This step is essential to ensure that areas (or volumes) are properly scaled.

- **Jacobian Determinant:**  
  For a function $f:\mathbb{R}^n \to \mathbb{R}^n$, the Jacobian matrix $J_f(z)$ is a square matrix of first derivatives. Its determinant, $\det(J_f(z))$, tells us by what factor the transformation scales a small volume around $z$.

With the above concepts in mind, we can implement the RealNVP model, a popular flow-based model for continuous random variables. You can check it out in the [`realnvp.ipynb`](https://github.com/aryaman1802/model-implementations/blob/main/generative-deep-learning/4_flow_based_models/realnvp.ipynb) notebook in this directory.

## **Flows for Discrete Random Variables**

