In [None]:
https://mmuratarat.github.io/2019-10-05/univariate-multivariate_gaussian
https://gregorygundersen.com/blog/2020/12/29/multivariate-skew-normal/
http://www.statsathome.com/2018/10/27/sampling-from-the-singular-normal/#fnref2
https://rinterested.github.io/statistics/multivariate_gaussian.html
https://cocalc.com/share/public_paths/7557a5ac1c870f1ec8f01271959b16b49df9d087/05-Multivariate-Gaussians.ipynb

https://github.com/peterroelants/peterroelants.github.io/blob/main/notebooks/misc/multivariate-normal-primer.ipynb
https://gestalt.ink/gaussians#sources-and-other-materials

The key to understanding the function lies in recognizing how it's performing the equivalent mathematical operations in code, particularly in handling the matrix inverse in a numerically stable way.

In the multivariate normal (Gaussian) distribution, the probability density function (PDF) is given by:

$$
f(x) = \frac{1}{\sqrt{(2\pi)^d |\Sigma|}} \exp\left( -\frac{1}{2} (x - \mu)^\top \Sigma^{-1} (x - \mu) \right)
$$

Here, $(x - \mu)^\top \Sigma^{-1} (x - \mu)$ is the Mahalanobis distance, which measures the distance of the point $x$ from the mean $\mu$ in terms of the covariance matrix $\Sigma$.

In your Python function, you have:

```python
x_m = x - mean
exponent = np.linalg.solve(covariance, x_m).T.dot(x_m)
result = (1. / (np.sqrt((2 * np.pi)**d * np.linalg.det(covariance)))
          * np.exp(-exponent / 2))
```

Here's the breakdown:

1. **Computing $(x - \mu)$:**
   ```python
   x_m = x - mean
   ```
   This computes the difference vector \((x - \mu)\).

2. **Computing $\Sigma^{-1} (x - \mu)$ using `np.linalg.solve`:**
   ```python
   inv_cov_x_m = np.linalg.solve(covariance, x_m)
   ```
   The `np.linalg.solve` function solves the linear system $\Sigma \cdot y = (x - \mu)$ for $y$, which effectively computes $y = \Sigma^{-1} (x - \mu)$ without explicitly calculating the inverse of $\Sigma$. This approach is numerically more stable and efficient than computing the inverse directly.

3. **Computing $(x - \mu)^\top \Sigma^{-1} (x - \mu)$:**
   ```python
   exponent = inv_cov_x_m.T.dot(x_m)
   ```
   This computes the dot product $[ \Sigma^{-1} (x - \mu) ]^\top \cdot (x - \mu)$, which simplifies to $(x - \mu)^\top \Sigma^{-1} (x - \mu)$. The reason the transpose is on `inv_cov_x_m` is because `inv_cov_x_m` is already $\Sigma^{-1} (x - \mu)$.

So although it might look different at first glance, the code effectively computes the Mahalanobis distance $(x - \mu)^\top \Sigma^{-1} (x - \mu)$, just utilizing `np.linalg.solve` for numerical stability.

**Why use `np.linalg.solve` instead of directly computing the inverse?**

Computing the inverse of a matrix explicitly (especially large or ill-conditioned matrices) can lead to numerical inaccuracies and is computationally inefficient. `np.linalg.solve` avoids these issues by solving the system of equations directly, which is more stable and usually faster.

**Conclusion:**

The function computes the multivariate normal PDF correctly. The exponent inside the `np.exp` function corresponds to the Mahalanobis distance, $(x - \mu)^\top \Sigma^{-1} (x - \mu)$, even though it is computed using `np.linalg.solve` for better numerical performance.

Because `np.linalg.solve(covariance, x_m)` computes Σ⁻¹(x–μ), so the dot product gives (x–μ)ᵗΣ⁻¹(x–μ) as in the standard formula—the function computes the PDF correctly using a numerically stable method.

Computing the determinant of a covariance matrix, especially for high-dimensional data or ill-conditioned matrices, can be numerically challenging. Direct computation using functions like `np.linalg.det` can lead to inaccurate results due to floating-point errors. To achieve an effective and numerically stable computation of the determinant in the context of a Gaussian multivariate distribution, you can use the **Cholesky decomposition** or **Singular Value Decomposition (SVD)**.

Here's how you can proceed:

---

### **1. Use Cholesky Decomposition**

#### **Why Cholesky Decomposition?**

- **Suitability**: The covariance matrix \(\Sigma\) is symmetric and positive-definite (assuming the data isn't degenerate), which makes it ideal for Cholesky decomposition.
- **Numerical Stability**: Cholesky decomposition is numerically stable and efficient for positive-definite matrices.
- **Efficient Determinant Computation**: The determinant of \(\Sigma\) can be easily computed from the Cholesky factor.

#### **How to Compute the Determinant Using Cholesky Decomposition**

The Cholesky decomposition factorizes \(\Sigma\) into:

$$
\Sigma = LL^\top
$$

where \(L\) is a lower triangular matrix.

**Steps:**

1. **Compute the Cholesky Decomposition:**

   ```python
   L = np.linalg.cholesky(covariance_matrix)
   ```
   
2. **Compute the Log Determinant:**

   The determinant of \(\Sigma\) is the square of the product of the diagonal elements of \(L\):

   $$
   |\Sigma| = \left( \prod_{i=1}^{d} L_{ii} \right)^2
   $$

   To avoid numerical underflow/overflow, compute the logarithm of the determinant:

   ```python
   log_det_cov = 2 * np.sum(np.log(np.diag(L)))
   ```
   
3. **Use the Log Determinant in the PDF:**

   When computing the PDF, it is often better to work in the log domain to maintain numerical stability:

   ```python
   exponent = -0.5 * (x_m.T @ np.linalg.solve(covariance_matrix, x_m))
   log_pdf = -0.5 * d * np.log(2 * np.pi) - 0.5 * log_det_cov + exponent
   pdf = np.exp(log_pdf)
   ```

#### **Complete Function Example with Cholesky Decomposition**

```python
import numpy as np

def multivariate_normal_pdf(x, mean, covariance):
    """Compute the PDF of the multivariate normal distribution."""
    x_m = x - mean
    d = x.shape[0]  # Dimensionality

    # Compute the Cholesky decomposition
    L = np.linalg.cholesky(covariance)

    # Solve for y in L * y = x_m
    y = np.linalg.solve(L, x_m)

    # Compute the Mahalanobis distance
    mahalanobis_distance = np.dot(y, y)

    # Compute log determinant
    log_det_cov = 2 * np.sum(np.log(np.diag(L)))

    # Compute the log of the PDF
    log_pdf = -0.5 * (d * np.log(2 * np.pi) + log_det_cov + mahalanobis_distance)

    # If needed, exponentiate to get the PDF
    pdf = np.exp(log_pdf)
    return pdf
```

---

### **2. Use Singular Value Decomposition (SVD) (Alternative Method)**

If the covariance matrix is not strictly positive-definite (e.g., when dealing with singular covariance matrices), you may use SVD.

#### **How to Compute the Determinant Using SVD**

The SVD of \(\Sigma\) is:

$$
\Sigma = U \Sigma_{\text{diag}} V^\top
$$

Where \(\Sigma_{\text{diag}}\) contains the singular values. For covariance matrices, \(U = V\).

**Steps:**

1. **Compute the SVD:**

   ```python
   U, S, Vt = np.linalg.svd(covariance_matrix)
   ```
   
2. **Compute the Log Determinant:**

   The determinant is the product of the singular values:

   $$
   |\Sigma| = \prod_{i=1}^{d} S_i
   $$

   Compute the log determinant:

   ```python
   log_det_cov = np.sum(np.log(S))
   ```

3. **Use the Log Determinant in the PDF (Similar to the Cholesky method).**

---

### **3. General Tips for Numerical Stability**

- **Avoid Explicit Matrix Inversion**: Instead of computing \(\Sigma^{-1}\) directly, solve the linear system \(\Sigma \cdot y = x_m\) using `np.linalg.solve` or `np.linalg.cholesky` to compute the Mahalanobis distance.

- **Work in the Log Domain**: When dealing with products of probabilities or very small numbers, compute logarithms to prevent underflow/overflow.

- **Check for Positive-Definiteness**: Ensure that the covariance matrix is positive-definite before attempting Cholesky decomposition. If it's not, consider adding a small multiple of the identity matrix (regularization):

  ```python
  epsilon = 1e-10  # or a value appropriate for your problem
  covariance_matrix += epsilon * np.eye(covariance_matrix.shape[0])
  ```

- **Handling Singular Covariance Matrices**: If the covariance matrix is singular or near-singular, consider using techniques like **Principal Component Analysis (PCA)** to reduce dimensionality or regularize the covariance matrix.

---

### **Example Code with Safety Checks**

Here's an enhanced version of the multivariate normal PDF function with checks and comments:

```python
import numpy as np

def multivariate_normal_pdf(x, mean, covariance):
    """Compute the PDF of the multivariate normal distribution."""
    x = np.atleast_1d(x)
    mean = np.atleast_1d(mean)
    x_m = x - mean
    d = x.shape[0]  # Dimensionality

    # Regularization to prevent numerical issues (if necessary)
    epsilon = 1e-10
    covariance += epsilon * np.eye(d)

    # Try Cholesky decomposition
    try:
        L = np.linalg.cholesky(covariance)
        # Solve for y in L * y = x_m
        y = np.linalg.solve(L, x_m)
        mahalanobis_distance = np.dot(y, y)
        log_det_cov = 2 * np.sum(np.log(np.diag(L)))
    except np.linalg.LinAlgError:
        # Use SVD if Cholesky fails
        U, S, Vt = np.linalg.svd(covariance)
        # Compute the pseudo-inverse
        covariance_inv = Vt.T @ np.diag(1.0 / S) @ U.T
        mahalanobis_distance = x_m.T @ covariance_inv @ x_m
        log_det_cov = np.sum(np.log(S + epsilon))

    # Compute log PDF
    log_pdf = -0.5 * (d * np.log(2 * np.pi) + log_det_cov + mahalanobis_distance)
    pdf = np.exp(log_pdf)
    return pdf
```

---

### **Understanding the Computation**

- **Mahalanobis Distance**: This is effectively a measure of how many standard deviations away $x$ is from the mean, considering the covariance.

- **Log Determinant**: Using the log determinant avoids numerical issues with very small or large determinants and is essential in high-dimensional spaces.

- **Regularization**: Adding a small $\epsilon$ times the identity matrix to the covariance matrix ensures it's positive-definite, aiding the Cholesky decomposition.

---

### **References and Further Reading**

- **Numerical Recipes in C**: A well-known resource for numerical methods, providing insights into why certain methods are preferred for numerical stability.

- **Applied Multivariate Statistical Analysis** by Johnson and Wichern: A comprehensive text covering multivariate distributions and computational considerations.

- **Matrix Computations** by Golub and Van Loan: An authoritative book on numerical linear algebra methods, including discussions on stability.

---

By using Cholesky decomposition or SVD and computing in the log domain, you can effectively and stably compute the determinant of the covariance matrix for the multivariate normal distribution.

Conditional normal

To solve the problem, we need to find the conditional probability density function (PDF) of $Y = (X_1, X_2, \dots, X_{d-1})$ given that $X_d = c$, where $X$ is a $d$-dimensional Gaussian random vector with mean vector $\mu$ and covariance matrix $\Sigma$.

**1. Partition the Mean Vector and Covariance Matrix:**

First, partition the mean vector $\mu$ and the covariance matrix $\Sigma$ as follows:

- Mean vector:
  $$
  \mu = \begin{bmatrix} \mu_Y \\ \mu_d \end{bmatrix}
  $$
  where $\mu_Y$ is a $(d-1)$-dimensional vector (mean of $Y$) and $\mu_d$ is a scalar (mean of $X_d$).

- Covariance matrix:
  $$
  \Sigma = \begin{bmatrix} \Sigma_{YY} & \Sigma_{Yd} \\ \Sigma_{dY} & \Sigma_{dd} \end{bmatrix}
  $$
  where:
  - $\Sigma_{YY}$ is the $(d-1) \times (d-1)$ covariance matrix of $Y$.
  - $\Sigma_{Yd}$ is the $(d-1) \times 1$ covariance vector between $Y$ and $X_d$.
  - $\Sigma_{dd}$ is the scalar variance of $X_d$.

**2. Use the Conditional Distribution Formula for Multivariate Gaussians:**

The conditional distribution of $Y$ given $X_d = c$ is also Gaussian, with:

- **Conditional Mean:**
  $$
  \mu_{Y|d} = \mu_Y + \Sigma_{Yd} \Sigma_{dd}^{-1} (c - \mu_d)
  $$
- **Conditional Covariance:**
  $$
  \Sigma_{Y|d} = \Sigma_{YY} - \Sigma_{Yd} \Sigma_{dd}^{-1} \Sigma_{dY}
  $$
  Note that $\Sigma_{dY} = \Sigma_{Yd}^\top$.

**3. Write the Conditional PDF:**

The conditional PDF of $Y$ given $X_d = c$ is:

$$
P(Y = y \mid X_d = c) = \frac{1}{(2\pi)^{(d-1)/2} \left| \Sigma_{Y|d} \right|^{1/2}} \exp\left( -\frac{1}{2} (y - \mu_{Y|d})^\top \Sigma_{Y|d}^{-1} (y - \mu_{Y|d}) \right)
$$

**Summary:**

The conditional distribution $Y \mid X_d = c$ is a multivariate Gaussian distribution with mean and covariance adjusted based on the value $c$.

---

**Answer:**

An explicit multivariate normal PDF:
 Y | Xₙ = c has a Gaussian PDF with mean μ_Y + Σ_Yd Σ_dd⁻¹(c – μ_d) and covariance Σ_YY – Σ_Yd Σ_dd⁻¹Σ_Ydᵗ.