# Families of Related Generative Modeling Techniques

This section organizes major **generative modeling paradigms** by the *core difficulty they address* and the *mathematical idea they exploit*.  
All of them attempt to solve the same fundamental problem:

> **How can we transform a simple distribution into the data distribution using learnable stochastic or deterministic processes?**

What differs is **how inference, sampling, and learning are approximated**.

---

## 1. Learning Generative Models via Approximate Inference

### Core idea
When **exact posterior inference is intractable**, learning must rely on approximations.

### Wake–Sleep family
Let the generative model be
$$
p_\theta(x,z) = p_\theta(z)\,p_\theta(x \mid z)
$$

and an approximate inference model
$$
q_\phi(z \mid x)
$$

- **Wake phase**: update $\theta$ using samples from $q_\phi(z \mid x)$  
- **Sleep phase**: update $\phi$ using samples from $p_\theta(x,z)$

### Reweighted Wake–Sleep (RWS)
Uses **importance weighting** to reduce bias:
$$
w_i = \frac{p_\theta(x,z_i)}{q_\phi(z_i \mid x)}
$$

Learning signal approximates the true gradient of
$$
\log p_\theta(x)
$$

### Goal
Train latent-variable models **without exact inference**, but with improved gradient estimates.

---

## 2. Markov Chain–Based Generative Models

### Generative Stochastic Networks (GSNs)

Instead of learning a static distribution, learn a **transition operator**:
$$
x_{t+1} \sim p_\theta(x_{t+1} \mid x_t)
$$

The Markov chain is trained such that:
$$
\lim_{t\to\infty} p(x_t) = p_{\text{data}}(x)
$$

### Key properties
- No explicit latent variables required
- Sampling = **run the Markov chain**
- The model learns **local transitions**, not a global density

### Interpretation
The generative model is defined implicitly by its **stationary distribution**.

---

## 3. Autoregressive Density Models

### Fundamental factorization
Any joint distribution can be written as:
$$
p(x_1,\dots,x_d) = \prod_{i=1}^d p(x_i \mid x_{<i})
$$

### Neural Autoregressive Distribution Estimator (NADE)
Each conditional is modeled with a neural network:
$$
p(x_i \mid x_{<i}; \theta)
$$

### Extensions
- **Recurrent NADE**: sequences
- **Deep NADE**: higher expressivity
- **PixelCNN / WaveNet**: convolutional autoregressive structure

### Properties
- Exact likelihood
- Exact sampling
- No latent variables
- Computational cost grows with dimensionality

---

## 4. Adversarial Learning

### Generative Adversarial Networks (GANs)

Two-player game:
- Generator: $x = G_\theta(z)$
- Discriminator: $D_\phi(x) \in [0,1]$

Objective:
$$
\min_G \max_D \;
\mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)]
+ \mathbb{E}_{z\sim p(z)}[\log(1-D(G(z)))]
$$

### Characteristics
- No explicit likelihood
- Training is **game-theoretic**, not probabilistic
- Implicit density model

### Conceptual ancestry
Earlier ideas aimed at:
- Learning inverse mappings
- Factorized latent representations
- Two-way encoder–decoder structures

---

## 5. Invertible / Bijective Generative Models

### Core structure
A deterministic invertible map:
$$
x = f_\theta(z), \quad z = f_\theta^{-1}(x)
$$

with simple base distribution:
$$
z \sim p(z)
$$

### Exact likelihood (change of variables)
$$
\log p(x) = \log p(z) + \log\left|\det \frac{\partial f^{-1}}{\partial x}\right|
$$

### Properties
- Exact likelihood
- Exact sampling
- Latent variables are fully observed

These ideas later unified under **normalizing flows**.

---

## 6. Learning Inverses of Bayesian Networks

### Motivation
Forward sampling in Bayesian networks:
$$
z \rightarrow x
$$
is easy, but inference:
$$
x \rightarrow z
$$
is hard.

### Solution
Learn a **stochastic inverse**:
$$
q_\phi(z \mid x)
$$

### Use cases
- Approximate inference
- Amortized inference
- Efficient posterior sampling

This idea foreshadows **variational inference** and encoder networks.

---

## 7. Conditional Gaussian Mixture Models

### Mixtures of Conditional Gaussian Scale Mixtures (MCGSMs)

Model local conditional densities:
$$
p(x_i \mid \text{context}) = \sum_k \pi_k(\text{context})\;
\mathcal{N}(x_i \mid \mu_k, \Sigma_k)
$$

### Characteristics
- Parameters depend on local neighborhoods
- Strong inductive bias for images
- Pre-deep-learning state-of-the-art in vision

### Interpretation
Early structured probabilistic image models with learned local dependencies.

---

## 8. Early Neural Network Generative Models

### Key ideas introduced
- Neural networks as **generative mappings**
- Latent manifolds
- Stochastic decoding

Form:
$$
x = f_\theta(z) + \epsilon
$$

### Contribution
These works established:
- Latent-variable thinking
- Continuous data manifolds
- Learned stochastic decoders

They are direct conceptual ancestors of **VAEs** and **diffusion models**.

---

## 9. Physics-Inspired Sampling & Learning

### Annealed Importance Sampling (AIS)

Interpolate between distributions:
$$
p_0 \rightarrow p_1 \rightarrow \dots \rightarrow p_T
$$

Estimate ratios:
$$
\frac{Z_T}{Z_0}
$$

Used for:
- Partition function estimation
- Energy-based models
- Model comparison

---

### Langevin Dynamics

Stochastic differential equation:
$$
dx_t = \nabla_x \log p(x_t)\,dt + \sqrt{2}\,dW_t
$$

Discrete form:
$$
x_{t+1} = x_t + \epsilon \nabla_x \log p(x_t) + \sqrt{2\epsilon}\,\xi_t
$$

Defines a diffusion whose stationary distribution is $p(x)$.

---

### Fokker–Planck Equation

Evolution of density under diffusion:
$$
\frac{\partial p(x,t)}{\partial t}
=
-\nabla \cdot (p \nabla \log p)
+ \Delta p
$$

Connects:
- Stochastic dynamics
- Density evolution
- Thermodynamics

---

### Forward & Reverse Diffusion

Forward process:
$$
dx = \sqrt{\beta(t)}\,dW_t
$$

Reverse process exists:
$$
dx = \big[\nabla_x \log p_t(x)\big]\,dt + \sqrt{\beta(t)}\,d\bar W_t
$$

This theoretical symmetry is the **foundation of diffusion generative models**.

---

## Big Picture Synthesis

All these methods differ in *mechanism*, not in *purpose*.

They answer the same question:

> **How do we map a simple distribution into the data distribution?**

Using:
- Explicit likelihoods (autoregressive, flows)
- Implicit distributions (GANs, GSNs)
- Approximate inference (VAEs, wake–sleep)
- Stochastic dynamics (diffusion, Langevin)

Generative modeling is not a collection of tricks —  
it is a single problem explored through multiple mathematical lenses.
