# Conditional Gaussian distributions

## Definition
Suppose $\mathbf{x}$ is a $D$-dimensional vector with Gaussian distribution $\mathcal{N}(\mathbf{x}|\mathbf{\mu},\Sigma)$ and that we partition $\mathbf{x}$ into two disjoint subset $\mathbf{x}_a$ and $\mathbf{x}_b$. Without loss of generality, we can take $\mathbf{x}_a$ to form the first $M$ components of $\mathbf{x}$, with $\mathbf{x}_a$ comprising the remaining $D-M$ components, so that
$$\mathbf{x}=\left( \begin{matrix}
\mathbf{x}_a\\ 
\mathbf{x}_b
\end{matrix}\right )$$
We define the corresponding partitions of the mean vector $\mathbf{\mu}$ given by
$$\mathbf{\mu}=\left( \begin{matrix}
\mathbf{\mu}_a\\ 
\mathbf{\mu}_b
\end{matrix}\right )$$
and of the covariance matrix $\Sigma$ given by
$$\Sigma=\left( \begin{matrix}
\Sigma_{aa} &\Sigma_{ab} \\ 
\Sigma_{ba} & \Sigma_{bb}
\end{matrix}\right )$$
Note that the symmetry $\Sigma^T=\Sigma$ of the covariance matrix implies that $\Sigma_{aa}$ and $\Sigma_{bb}$ are symmetric, while $\Sigma_{ba}=\Sigma_{ab}^T$.
The *precision matrix* is defined by
$$\Lambda\equiv\Sigma^{-1}$$
The partitioned form of the matrix is
$$\Lambda\equiv\left( \begin{matrix}
\Lambda_{aa} &\Lambda_{ab} \\ 
\Lambda_{ba} & \Lambda_{bb}
\end{matrix}\right )$$
where $\Lambda_{aa}$ and $\Lambda_{bb}$ is symmetric while $\Lambda_{ab}^T=\Lambda_{ba}$

---------------------
## Evaluate the conditional Gaussian distribution parameters
### Conditional distribution exponent
The conditional distribution $p(\mathbf{x}_a|\mathbf{x}_b)$ is still a Gaussian distribution. Conditional distribution can be evaluated from the joint distribution $p(\mathbf{x})=p(\mathbf{x}_a,\mathbf{x}_b)$ simply by fixing $\mathbf{x}_b$ to the observed value and nomalizing the resulting expression to obtain a valid probability distribution over $\mathbf{x}_a$. <font color='red'>Instead of preforming this normalization explicitly, we can obtain the solution more efficiently by considering the quadratic form in the exponent of the Gaussian distribution, and then reinstanting the normalization coefficient at the end of the calculation.</font>
$$\begin{align*}
-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\Sigma^{-1}(\mathbf{x}-\mathbf{\mu})
&=-\frac{1}{2}
\left[\left( \begin{matrix}\mathbf{x}_a\\ \mathbf{x}_b\end{matrix}\right )
-\left( \begin{matrix}\mathbf{\mu}_a\\ \mathbf{\mu}_b\end{matrix}\right )\right]^T
\left( \begin{matrix}\Lambda_{aa} &\Lambda_{ab} \\ \Lambda_{ba} & \Lambda_{bb}\end{matrix}\right )
\left[\left( \begin{matrix}\mathbf{x}_a\\ \mathbf{x}_b\end{matrix}\right )
-\left( \begin{matrix}\mathbf{\mu}_a\\ \mathbf{\mu}_b\end{matrix}\right )\right]\\
&=-\frac{1}{2}(\mathbf{x}_a-\mathbf{\mu}_a)^T\Lambda_{aa}(\mathbf{x}_a-\mathbf{\mu}_a)
-\frac{1}{2}(\mathbf{x}_a-\mathbf{\mu}_a)^T\Lambda_{ab}(\mathbf{x}_b-\mathbf{\mu}_b)\\
&\quad -\frac{1}{2}(\mathbf{x}_b-\mathbf{\mu}_b)^T\Lambda_{ba}(\mathbf{x}_a-\mathbf{\mu}_a)
-\frac{1}{2}(\mathbf{x}_b-\mathbf{\mu}_b)^T\Lambda_{bb}(\mathbf{x}_b-\mathbf{\mu}_b)\\
\end{align*}
$$

### General distribution exponent
Now, back to the exponent of the general Gaussian ditribution $\mathcal{N}(\mathbf{x}|\mathbf{\mu},\Sigma)$. The exponent can be written 
$$\begin{align*}-\frac{1}{2}(\mathbf{x}-\mathbf{\mu})^T\Sigma^{-1}(\mathbf{x}-\mathbf{\mu})
&=-\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{x}+\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{\mu}+\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{x}-\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{\mu}\\
&=-\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{x}+\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{\mu}+\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{x}^{TT}-\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{\mu}\\
&=-\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{x}+\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{\mu}+\frac{1}{2}(\mathbf{x}^T\Sigma^{-1}\mathbf{\mu})^{T}-\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{\mu}\\
&=-\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{x}+\mathbf{x}^T\Sigma^{-1}\mathbf{\mu}-\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{\mu}
\qquad \mathbf{x}^T\Sigma^{-1}\mathbf{\mu}=c,\ c^T=c \\
\end{align*}$$
In this equation, 
- $-\frac{1}{2}\mathbf{x}^T\Sigma^{-1}\mathbf{x}$ is the second order term.
- $\mathbf{x}^T\Sigma^{-1}\mathbf{\mu}$ is the first order term.
- $-\frac{1}{2}\mathbf{\mu}^T\Sigma^{-1}\mathbf{\mu}$ is the constant term.  

### Solve the conditional distribution $\Sigma$ and $\mathbf{\mu}$
In another word, if we can convert the conditional distribution exponent to the general form, we will get their $\Sigma$ and $\mathbf{\mu}$.
Thus, we can conclude that the covaiance of $p(\mathbf{x}_a|\mathbf{x}_b)$ is given by
$$\Sigma_{a|b}=\Lambda_{aa}^{-1}$$
The mean is given by
$$\begin{align*}\mathbf{\mu}_{a|b}
&=\Sigma_{a|b}\{\Lambda_{aa}\mathbf{\mu}_a-\Lambda_{ab}(\mathbf{x}_b)-\mathbf{\mu}_b\}\\
&=\mathbf{\mu}_a-\Lambda_{aa}^{-1}\Lambda_{ab}(\mathbf{x}_b-\mathbf{\mu}_b)
\end{align*}$$

-----------------
#### General $2\times 2$ matrix inverse calculation
$$\left( \begin{matrix}
A &B \\ 
C & D
\end{matrix}\right )^{-1}
=\left( \begin{matrix}
M &-MBD^{-1} \\ 
-D^{-1}CM & D^{-1}+D^{-1}CMBD^{-1}
\end{matrix}\right )$$
where we have defined
$$M=(A-BD^{-1}C)^{-1}$$

---------------
Following these matrix calculation, we have
$$\begin{align*}\Lambda_{aa}&=(\Sigma_{aa}-\Sigma_{ab}\Sigma_{bb}^{-1}\Sigma_{ba})^{-1}\\
\Lambda_{ab}&=-(\Sigma_{aa}-\Sigma_{ab}\Sigma_{bb}^{-1}\Sigma_{ba})^{-1}\Sigma_{ab}\Sigma_{bb}^{-1}
\end{align*}$$
The mean and covariance of the conditional distribution $p(\mathbf{x}_a|\mathbf{x}_b)$
<font color='red'>$$\begin{align*}
\mathbf{\mu}_{a|b}&=\mathbf{\mu}_a+\Sigma_{ab}\Sigma_{bb}^{-1}(\mathbf{x}_b-\mathbf{\mu}_b)\\
\Sigma_{a|b}&=\Sigma_{aa}-\Sigma_{ab}\Sigma_{bb}^{-1}\Sigma_{ba}
\end{align*}$$</font>

### Conclusion
<font color='red'>$$\begin{align*}
p(\mathbf{x}_a|\mathbf{x}_b) &= \mathcal{N}(\mathbf{x}|\mathbf{\mu}_{a|b},\Lambda_{aa}^{-1})\\
\mathbf{\mu}_{a|b}&=\mathbf{\mu}_a-\Lambda_{aa}^{-1}\Lambda_{ab}(\mathbf{x}_b-\mathbf{\mu}_b)
\end{align*}$$</font>
where $\mathbf{\mu}_{a|b}$ can be seen as a linear function of $\mathbf{x}_b$.