# 1. Flow-Based Generative Models (Normalizing Flows)

Flow-based models learn an **invertible mapping** between:

- a simple base distribution (Gaussian)  
- a complex data distribution  

using invertible transformations.

---

## 1.1 Invertible Transformation

$$
x = f_{\theta}(z), \qquad z = f_{\theta}^{-1}(x)
$$

Where:

- \(z\) is the latent variable (e.g., \(N(0,I)\))  
- \(x\) is data  
- \(f_{\theta}\) is invertible and differentiable  

---

## 1.2 Change of Variables Formula (Core of Flow Models)

This is the central equation of flows:

$$
p_X(x) = p_Z(z)\,\left|\det\frac{\partial f_{\theta}^{-1}(x)}{\partial x}\right|
$$

Equivalently:

$$
p_X(x) = p_Z(f_{\theta}^{-1}(x))\,
\left|\det J_{f_{\theta}^{-1}}(x)\right|
$$

Where \(J\) is the Jacobian matrix.

---

## 1.3 Log-Likelihood of Flow Models

$$
\log p_X(x) = \log p_Z(z) + \log\left|\det J_{f_{\theta}^{-1}}(x)\right|
$$

Training is by maximum log-likelihood:

$$
\max_{\theta}\sum_n \log p_X(x^{(n)})
$$

---

## 1.4 Composition of Flows

Flows compose multiple invertible transformations:

$$
f_{\theta} = f_K \circ f_{K-1} \circ \cdots \circ f_1
$$

Jacobian log-determinant becomes:

$$
\log\left|\det J_{f_{\theta}^{-1}}\right|
= \sum_{k=1}^{K} \log\left|\det J_{f_k^{-1}}\right|
$$

---

## 1.5 Sampling

Sample \(z \sim N(0,I)\), then transform:

$$
x = f_{\theta}(z)
$$

---

# 2. Score-Based Diffusion Models (Score Matching + SDEs)

Score-based models learn the **score**:

$$
s_{\theta}(x,t) = \nabla_x \log p_t(x)
$$

which is the gradient of the log-density.

They combine:

- forward diffusion (noise addition)  
- reverse-time SDE (denoising)  

---

# 2.1 Forward Diffusion (Discrete)

$$
q(x_t \mid x_{t-1}) =
N(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I)
$$

Closed form:

$$
x_t = \sqrt{\bar{\alpha}_t}\,x_0 +
\sqrt{1-\bar{\alpha}_t}\,\epsilon,\quad \epsilon\sim N(0,I)
$$

Where:

$$
\bar{\alpha}_t = \prod_{s=1}^t (1 - \beta_s)
$$

---

# 2.2 Forward SDE

Variance-exploding:

$$
dx = g(t)\,dw
$$

Variance-preserving:

$$
dx = -\tfrac12 \beta(t)\,x\,dt + \beta(t)\,dw
$$

---

# 3. Score Function

$$
s(x,t)= \nabla_x \log p_t(x)
$$

The model learns:

$$
s_{\theta}(x,t) \approx \nabla_x \log p_t(x)
$$

---

# 4. Score-Matching Loss

$$
L(\theta) =
\mathbb{E}_{t,x_t,x_0}
\left[
\lambda(t)
\left\|
s_{\theta}(x_t,t) -
\nabla_x \log q(x_t\mid x_0)
\right\|^2
\right]
$$

For Gaussian forward noise:

$$
\nabla_x \log q(x_t\mid x_0)
=
-\frac{x_t - \sqrt{\bar{\alpha}_t}\,x_0}
{\sqrt{1-\bar{\alpha}_t}}
$$

---

# 5. Reverse-Time SDE (Key Sampling Equation)

$$
dx =
\left[
f(x,t)
-
\frac{g(t)^2}{2}
\nabla_x \log p_t(x)
\right] dt
+
g(t)\,d\bar{w}
$$

Replace score with neural network:

$$
dx =
\left[
f(x,t)
-
\frac{g(t)^2}{2}
\,s_{\theta}(x,t)
\right] dt
+
g(t)\,d\bar{w}
$$

This SDE generates data samples.

---

# 6. Probability-Flow ODE (Deterministic Sampling)

$$
\frac{dx}{dt}
=
f(x,t)
-
\frac{g(t)^2}{2}
s_{\theta}(x,t)
$$

Used in high-quality deterministic samplers.

---

# 7. DDPM Sampling (Discrete Reverse Diffusion)

$$
x_{t-1}
=
\frac{1}{\sqrt{\alpha_t}}
\left(
x_t
-
\frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}}
\epsilon_{\theta}(x_t,t)
\right)
+
\sigma_t z,
\qquad z\sim N(0,I)
$$

Where:

- \(\epsilon_{\theta}(x_t,t)\) is the noise predictor  
- \(\sigma_t\) controls sampling noise  

---

# Summary Table — Flow vs Score Models

| Concept | Flow Models | Score/Diffusion Models |
|--------|-------------|------------------------|
| Main object | invertible mapping \(f_{\theta}(x)\) | score \(s(x,t)=\nabla \log p_t(x)\) |
| Training | max log-likelihood | score matching |
| Generative process | invertible transform | reverse diffusion / SDE |
| Probability | exact density via Jacobian | implicit density via score |
| Sampling | one-shot \(x=f(z)\) | iterative denoising |
| Base distribution | explicit \(N(0,I)\) | implicit noise schedule |
| Key formula | change of variables | reverse SDE |



# Gaussian Distributions — Full Formulas Using Euler’s Number \(e\)

This cell contains **all formulas exactly as written**, using **\(e\)** explicitly, with **no icons**.

---

# 1. Univariate Gaussian (1-D Normal Distribution)

The probability density of a single real variable \(x\):

$$
p(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\; e^{-\frac{(x-\mu)^2}{2\sigma^{2}}}
$$

Where:

- \(\mu\) = mean  
- \(\sigma^2\) = variance  
- \(\sigma\) = standard deviation  
- \(e = 2.718281828459\ldots\) is Euler’s number  

---

# 2. Multivariate Gaussian (d-Dimensional Normal Distribution)

Used extensively in deep learning (VAEs, diffusion models, flows):

$$
p(x)=\frac{1}{(2\pi)^{d/2}\,|\Sigma|^{1/2}}\;
e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)}
$$

Where:

- \(x \in \mathbb{R}^d\)  
- \(\mu \in \mathbb{R}^d\) (mean vector)  
- \(\Sigma \in \mathbb{R}^{d\times d}\) (covariance matrix)  
- \(|\Sigma|\) = determinant  
- \(\Sigma^{-1}\) = inverse covariance  

---

# 3. Standard Multivariate Gaussian — The AI Favorite

This is the exact distribution used in:

- diffusion models  
- VAEs  
- normalizing flows  
- energy-based models  
- weight initialization  

Assume:

- mean \(= 0\)  
- covariance \(= I\)  

Then:

$$
p(x)=\frac{1}{(2\pi)^{d/2}}\;
e^{-\frac{1}{2}\|x\|^{2}}
$$

Where:

$$
\|x\|^{2}=x^{T}x
$$

---

# Optional Next Steps (You may ask for any of them)

- Derivation of the Gaussian from first principles  
- Why the exponent is quadratic  
- How the Gaussian becomes the energy function in EBMs  
- Why diffusion noise is always Gaussian  
- Relationship to KL divergence and entropy  
- Geometric meaning of \(\Sigma\) and Mahalanobis distance  

