In [1]:
import numpy as np
import matplotlib.pyplot as plt

Much of this chapter is familiar so I'm going to skip around quite a bit.

## The Multivariate Gaussian

Fro a D-dimensional vector $\mathbf{x}$, the multivariate Gaussian distribution takes the form:
$$\mathcal{N}(\mathbf{x}|\mathbf{\mu}, \mathbf{\Sigma}) = \frac{1}{(2\pi)^{D/2}}\frac{1}{|\mathbf{\Sigma}|^{1/2}} \exp \bigg\{-\frac{1}{2}(\mathbf{x} - \mathbf{\mu})^\intercal \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \bigg\}$$
Where $\mathbf{\mu}$ is the D-dimensional vector of means and $\mathbf{\Sigma}$ is the DxD covariance matrix

***Central Limit Theorem:*** The sum of a set of random variables (which is itself a random variable) has a distribution that becomes increasingly Gaussian as the number of terms in the sum increases.
- This occurs regardless of the distribution(s) from which the individual component random variables are drawn.
- One consequence of this is that the Binomial distribution will trend towards a Gaussian distribution as $N \rightarrow \infty$

### Geometry of the Gaussian

The functional dependence of the Gaussian on $\mathbf{x}$ is through the quadratic term in its exponent: $$\Delta^2 = (\mathbf{x} - \mathbf{\mu})^\intercal \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu})$$

This term, which we will denote $\Delta^2$ is called the ***Mahalanobis Distance*** from $\mathbf{\mu}$ to $\mathbf{x}$. It reduces to the Euclidean distance when $\mathbf{\Sigma}$ is the identity matrix. 

Let's do a quick little demo of how this equates Euclidean distance with the identity matrix. Recall the formula for Euclidean distance:
$$d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum(x_i - y_i)^2}$$
This is equivalent to the square root on the inner sum of the vector $(\mathbf{x} - \mathbf{y})^2$, or euqivalently, the $L_2$ norm of this vector.

In [14]:
x = np.random.randint(0, 10, 10)
y = np.random.randint(0, 10, 10)

dist_1 = np.sqrt(((x - y)**2).sum())**2

dist_2 = (x - y).T @ np.linalg.inv(np.identity(10)) @ (x - y)

dist_3 = (x - y).T @ (x - y)

print(f"d1: {dist_1}, d2: {dist_2}, d3: {dist_3}")

d1: 196.0, d2: 196.0, d3: 196


Note that $(\mathbf{x} - \mathbf{\mu})^\intercal (\mathbf{x} - \mathbf{\mu})$ is equivalent to the inner sum of the vector $(\mathbf{x} - \mathbf{\mu})^2$, and the inverse of the identity matrix is simply the identity matrix itself.

Note also that the covariance matrix $\mathbf{\Sigma}$ is symmetric.