# Multivariate Normal (Gaussian) Distribution

The multivariate normal distribution is one of the most important distributions in probability and statistics, with numerous applications in machine learning, signal processing, finance, and more. Below, we discuss the key properties of the multivariate normal distribution.

---

## 1. Definition

A random vector \( \mathbf{X} = [X_1, X_2, \ldots, X_d]^T \in \mathbb{R}^d \) follows a multivariate normal distribution if its probability density function (PDF) is given by:

$$
p(\mathbf{x}) = \frac{1}{(2 \pi)^{d/2} |\Sigma|^{1/2}} \exp\left( - \frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right),
$$


## 2. Marginal Distributions

If $ \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) $, then any subset of components of $ \mathbf{X} $ is also jointly Gaussian. Specifically:

For $ \mathbf{X}_1 \in \mathbb{R}^{d_1} $ and $ \mathbf{X}_2 \in \mathbb{R}^{d_2} $, with $ \mathbf{X} = [\mathbf{X}_1^T, \mathbf{X}_2^T]^T $, the marginal distributions are:

- $ \mathbf{X}_1 \sim \mathcal{N}(\boldsymbol{\mu}_1, \Sigma_{11}) $,
- $ \mathbf{X}_2 \sim \mathcal{N}(\boldsymbol{\mu}_2, \Sigma_{22}) $,

where $ \boldsymbol{\mu}_1 $, $ \boldsymbol{\mu}_2 $ are subvectors of $ \boldsymbol{\mu} $, and $ \Sigma_{11} $, $ \Sigma_{22} $ are submatrices of $ \Sigma $.

---

## 3. Conditional Distributions

The conditional distribution of \( \mathbf{X}_1 \) given \( \mathbf{X}_2 = \mathbf{x}_2 \) is also Gaussian:

$$
\mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2 \sim \mathcal{N}(\boldsymbol{\mu}_{1 \mid 2}, \Sigma_{1 \mid 2}),
$$

where:

- Conditional mean:
  $$
  \boldsymbol{\mu}_{1 \mid 2} = \boldsymbol{\mu}_1 + \Sigma_{12} \Sigma_{22}^{-1} (\mathbf{x}_2 - \boldsymbol{\mu}_2),
  $$
- Conditional covariance:
  $$
  \Sigma_{1 \mid 2} = \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}.
  $$

---

## 4. Linear Transformations

If \( \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \), then any linear transformation of \( \mathbf{X} \) is also Gaussian. Specifically, for \( A \in \mathbb{R}^{m \times d} \) and \( \mathbf{b} \in \mathbb{R}^m \):

$$
\mathbf{Y} = A \mathbf{X} + \mathbf{b} \sim \mathcal{N}(A \boldsymbol{\mu} + \mathbf{b}, A \Sigma A^T).
$$

---

## 5. Independence

Two subsets of variables \( \mathbf{X}_1 \) and \( \mathbf{X}_2 \) in a multivariate Gaussian distribution are independent if and only if their cross-covariance matrix is zero:

$$
\Sigma_{12} = \Sigma_{21}^T = 0.
$$

For example, if \( \Sigma \) is diagonal, all components of \( \mathbf{X} \) are independent.

---

## 6. Affine Properties

The multivariate normal distribution is closed under affine transformations:

If \( \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \), then \( \mathbf{Y} = A \mathbf{X} + \mathbf{b} \) is also Gaussian:

$$
\mathbf{Y} \sim \mathcal{N}(A \boldsymbol{\mu} + \mathbf{b}, A \Sigma A^T).
$$

---

## 7. Uncorrelated Implies Independence

In a multivariate Gaussian distribution, uncorrelated variables are also independent. This is a special property of the Gaussian distribution and does not hold for general distributions.

---

## 8. Quadratic Form

The quadratic form \( (\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu}) \) appears in the exponent of the PDF. This term represents the squared Mahalanobis distance between \( \mathbf{x} \) and \( \boldsymbol{\mu} \), accounting for the shape of the distribution via the covariance matrix \( \Sigma \).

---

## 9. Level Sets (Contours)

The level sets of the multivariate Gaussian PDF are ellipsoids centered at \( \boldsymbol{\mu} \). These ellipsoids are determined by the eigenvalues and eigenvectors of \( \Sigma \):

- The axes of the ellipsoid align with the eigenvectors of \( \Sigma \).
- The lengths of the axes are proportional to the square roots of the eigenvalues of \( \Sigma \).

---

## 10. Entropy

The differential entropy of a multivariate Gaussian distribution is:

$$
H(\mathbf{X}) = \frac{1}{2} \log\left((2 \pi e)^d |\Sigma|\right).
$$

This measures the uncertainty or "spread" of the distribution.

---

## 11. Maximum Entropy Property

Among all distributions with a fixed mean vector \( \boldsymbol{\mu} \) and covariance matrix \( \Sigma \), the multivariate Gaussian distribution has the maximum entropy. This makes it the "least informative" distribution consistent with the given constraints.

---

## 12. Sum of Independent Gaussians

If \( \mathbf{X}_1 \sim \mathcal{N}(\boldsymbol{\mu}_1, \Sigma_1) \) and \( \mathbf{X}_2 \sim \mathcal{N}(\boldsymbol{\mu}_2, \Sigma_2) \) are independent, then their sum is also Gaussian:

$$
\mathbf{X}_1 + \mathbf{X}_2 \sim \mathcal{N}(\boldsymbol{\mu}_1 + \boldsymbol{\mu}_2, \Sigma_1 + \Sigma_2).
$$

---

## 13. Degenerate Case

If the covariance matrix \( \Sigma \) is singular (not positive definite), the distribution is degenerate, meaning it lies in a lower-dimensional subspace of \( \mathbb{R}^d \). In this case, the PDF is not well-defined, but the distribution can still be described using generalized inverses.

---

## 14. Sampling from a Multivariate Normal Distribution

To sample from \( \mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \):

1. Compute the Cholesky decomposition of \( \Sigma \): \( \Sigma = L L^T \), where \( L \) is a lower triangular matrix.
2. Generate \( \mathbf{z} \sim \mathcal{N}(0, I) \), a standard normal vector.
3. Compute \( \mathbf{x} = \boldsymbol{\mu} + L \mathbf{z} \).

This method ensures that \( \mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma) \).

---

## Summary of Key Properties

1. Marginals are Gaussian.  
2. Conditionals are Gaussian.  
3. Linear transformations preserve Gaussianity.  
4. Uncorrelated implies independence.  
5. Level sets are ellipsoids.  
6. Maximum entropy property.  
7. Closed under addition of independent Gaussians.  
8. Degenerate case exists when \( \Sigma \) is singular.  
