# Gaussian Probability Distributions

The Gaussian, or normal, distribution is used to model continuous variables. 

For a single variable x, the Gaussian distribution is given by
$$\color{green}{ N(x|\mu,\sigma^{2}) =  \dfrac{1}{(2\pi\sigma^{2})^{1/2}} exp\{-\dfrac{1}{2\sigma^{2}} (x - \mu)^2\}  }$$

For a D-dim vector $\vec{x}$, the multivirate Gaussian distribution takes the form
$$\color{green}{ N(\vec{x}|\vec{\mu},\mathbf{\Sigma}) =  \dfrac{1}{(2\pi)^{D/2}} \dfrac{1}{|\mathbf{\Sigma}|^{1/2}} exp\{-\dfrac{1}{2} (\vec{x} - \vec{\mu})^T \mathbf{\Sigma}^{-1} (\vec{x} - \vec{\mu})\} }
$$

where $\vec{\mu}$ is a D-dim mean vector, $\mathbf{\Sigma}$ is a DxD covariance matrix, and $|\mathbf{\Sigma}|$ denotes the determinant of $\mathbf{\Sigma}$

***

## Conditional Gaussian Distributions

An important property of the multivariate Gaussian distribution is that if two sets of variables are jointly Gaussian, then the conditional distribution of one set conditioned on the other is again Gaussian.

Suppose $\vec{x}$ is a D-dimensional vector with Gaussian distribution $N(\vec{x}|\vec{\mu}, \mathbf{\Sigma})$ and that we partition $\vec{x}$ into two disjoint subsets $\vec{x_a}$ and $\vec{x_b}$. 
Without loss of generality, we can take $\vec{x_a}$ to form the first M components of x, with $\vec{x_b}$ comprising the remaining D − M components, so that
$$\vec{x} = \begin{pmatrix} \vec{x_a} \\ \vec{x_b} \end{pmatrix}$$
We also define corresponding partitions of the mean vector μ given by
$$\vec{\mu} = \begin{pmatrix} \vec{\mu_a} \\ \vec{\mu_b} \end{pmatrix}$$

and the covariance matrix given by
$$ \mathbf{\Sigma}  = \begin{bmatrix} \mathbf{\Sigma_{aa}} \ \mathbf{\Sigma_{ab}} \\  \mathbf{\Sigma_{ba}} \ \mathbf{\Sigma_{bb}} \end{bmatrix}$$.

It is important to note that since the covariance matrix is symmetrical ($\mathbf{\Sigma} = \mathbf{\Sigma}^T$) we have that $\mathbf{\Sigma_{aa}}$ and $\mathbf{\Sigma_{bb}}$ are symmetric and $\mathbf{\Sigma_{ba}}  = \mathbf{\Sigma_{ab}}^T$

However, in many situations it is more convinient to work with the inverse of the covariance matrix, which is know as the $precision \ matrix$
$$ \mathbf{\Lambda} = \mathbf{\Sigma}^{-1}$$

We shall also introduce a partitioned form of the precision matrix
$$ \mathbf{\Lambda}  = \begin{bmatrix} \mathbf{\Lambda_{aa}} \ \mathbf{\Lambda_{ab}} \\  \mathbf{\Lambda_{ba}} \ \mathbf{\Lambda_{bb}} \end{bmatrix}$$
It is important to note that $\mathbf{\Lambda_{aa}}$ is not simply the inverse of $\mathbf{\Sigma_{aa}}$, and we shall examine their relationship soon. 
We must also note that becuase the inverse of a symmetric matrix is symmetrix, we have that $\mathbf{\Lambda_{aa}}$ and $\mathbf{\Lambda_{bb}}$ are symmetric and $\mathbf{\Lambda_{ba}}  = \mathbf{\Lambda_{ab}}^T$.

Let us begin by finding an expression for the conditional distribution $p(\vec{x_a}|\vec{x_b})$.

From the product rule of probability, we see that this conditional distribution can be evaluated from the joint distribution $p(\vec{x}) = p(\vec{x_a},\vec{x_b})$ simply by fixing $\vec{x_b}$ to the observed value and normalizing the resulting expression to obtain a valid probability distribution over $\vec{x_a}$.

Instead of performing this normalization explicitly, we can obtain the solution more efficiently by considering the quadratic form in the exponent of the Gaussian distribution and then reinstating the normalization coefficient at the end of the calculation.

The quadratic form in the exponent of the Gaussian distribituion is given by
$$\color{green}{-\dfrac{1}{2} (\vec{x} - \vec{\mu})^T \mathbf{\Sigma}^{-1} (\vec{x} - \vec{\mu})}$$

making use of the partitioning we obtain
$$ = \color{red}{ - \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{aa} (\vec{x_{a}} - \vec{\mu_{a}})
- \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{ab} (\vec{x_{b}} - \vec{\mu_{b}})
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{ba} (\vec{x_{a}} - \vec{\mu_{a}})
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{bb} (\vec{x_{b}} - \vec{\mu_{b}})}$$

As we can see we this is a function of $\vec{x_a}$ in quadratic form. Thus we know that corresponding conditional distribution $p(\vec{x_a}|\vec{x_b})$ will be Gaussian

Because this distribution is completely characterized by its mean and its covariance, our goal will be to identify expressions for the mean and covariance by inspection of the partioned equation above

Such problems can be solved straightforwardly by noting that the exponent in a general Gaussian distribution $N(\vec{x}|\vec{\mu},\mathbf{\Sigma})$ can be written 
$$\color{green}{-\dfrac{1}{2} (\vec{x} - \vec{\mu})^T \mathbf{\Sigma}^{-1} (\vec{x} - \vec{\mu})} =
\color{green}{-\dfrac{1}{2} \vec{x}^T \mathbf{\Sigma}^{-1}\vec{x} + \vec{x}^T\mathbf{\Sigma}^{-1}\vec{\mu} + const.}$$

If we can transform our partioned equation, so that it can match this form dependent on $\vec{x_a}$ then we can find $\vec{\mu}$ and $\mathbf{\Sigma}$

Now let us focus on the $\vec{x_a}$ terms
$$ - \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{aa} (\vec{x_{a}} - \vec{\mu_{a}})
- \dfrac{1}{2} \vec{x_{a}}^T \mathbf{\Lambda}_{ab} (\vec{x_{b}} - \vec{\mu_{b}})
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{ba} \vec{x_{a}}$$

Applying the symmetry properties listed above and some transpose arithmetic we can rewrite the equation
$$ - \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{aa} (\vec{x_{a}} - \vec{\mu_{a}})
-  \vec{x_{a}}^T \mathbf{\Lambda}_{ab} (\vec{x_{b}} - \vec{\mu_{b}})$$

Expanding the square for the first part, the expression becomes
$$ -  \dfrac{1}{2}\vec{x_a}^T\mathbf{\Lambda}_{aa}\vec{x_a}
+ \vec{x_a}^T\mathbf{\Lambda}_{aa}\vec{\mu_a} + const. - \ \vec{x_{a}}^T \mathbf{\Lambda}_{ab} (\vec{x_{b}} - \vec{\mu_{b}})$$


Which allows us to simplify it to

$$\color{red}{ - \dfrac{1}{2}\vec{x_a}^T\mathbf{\Lambda}_{aa}\vec{x_a}
+  \vec{x_a}^T [\mathbf{\Lambda}_{aa}\vec{\mu_a} - \mathbf{\Lambda}_{ab}\vec{x_b} + \mathbf{\Lambda}_{ab}\vec{\mu_b}] + const.}$$

Let $\vec{m} =[\mathbf{\Lambda}_{aa}\vec{\mu_a} - \mathbf{\Lambda}_{ab}\vec{x_b} + \mathbf{\Lambda}_{ab}\vec{\mu_b}]$ so that our expression becomes
$$\color{red}{ - \dfrac{1}{2}\vec{x_a}^T\mathbf{\Lambda}_{aa}\vec{x_a} + \vec{x_a}^T\vec{m} + const.}$$

Now our goal is to complete the square so that we can draw conclusions from the general quadratic form of the Gaussian distribution. We will denote the mean and coviarnce of this distribution by $\vec{\mu_{a|b}}$ and $\mathbf{\Sigma}_{a|b}$, respectively. The general form is given by
$$\color{red}{-\dfrac{1}{2} (\vec{x} - \vec{\mu_{a|b}})^T \mathbf{\Sigma}_{a|b}^{-1} (\vec{x} - \vec{\mu_{a|b}})}$$

Let us begin completing the square
$$ - \dfrac{1}{2}\vec{x_a}^T\mathbf{\Lambda}_{aa}\vec{x_a} + \vec{x_a}^T\mathbf{\Lambda}_{aa}(\mathbf{\Lambda}_{aa}^{-1}\vec{m})$$
$$ - \dfrac{1}{2}\vec{x_a}^T\mathbf{\Lambda}_{aa}\vec{x_a} + \vec{x_a}^T\mathbf{\Lambda}_{aa}(\mathbf{\Lambda}_{aa}^{-1}\vec{m}) - \dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{aa}^{-1}\vec{m} + \dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{aa}^{-1}\vec{m}$$

Thus we have
$$- \dfrac{1}{2} (\vec{x_a} - \mathbf{\Lambda}_{aa}^{-1}\vec{m})^T {\Lambda}_{aa} (\vec{x_a} - \mathbf{\Lambda}_{aa}^{-1}\vec{m}) + \dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{aa}^{-1}\vec{m}$$

Noting the last term is not dependent on $\vec{x_a}$ we can drop it so that we have
$$\color{red}{- \dfrac{1}{2} (\vec{x_a} - \mathbf{\Lambda}_{aa}^{-1}\vec{m})^T {\Lambda}_{aa} (\vec{x_a} - \mathbf{\Lambda}_{aa}^{-1}\vec{m}) }$$

from which we can immediately conlcude that the covariance of $p(\vec{x_a}|\vec{x_b})$ is given by $$\mathbf{\Sigma}_{a|b} = \mathbf{\Lambda}_{aa}^{-1}$$

And for the mean of $p(\vec{x_a}|\vec{x_b})$  is given by
$$\vec{\mu}_{a|b} = \mathbf{\Lambda}_{aa}^{-1}[\mathbf{\Lambda}_{aa}\vec{\mu_a} - \mathbf{\Lambda}_{ab}\vec{x_b} + \mathbf{\Lambda}_{ab}\vec{\mu_b}]$$
$$  = \vec{\mu}_a -  \mathbf{\Lambda}_{aa}^{-1}\mathbf{\Lambda}_{ab}(\vec{x_b}-\vec{\mu_b})$$

To summarize our results
$$\color{red}{\mathbf{\Sigma}_{a|b} = \mathbf{\Lambda}_{aa}^{-1}}$$
$$\color{red}{\vec{\mu}_{a|b} = \vec{\mu}_a -  \mathbf{\Lambda}_{aa}^{-1}\mathbf{\Lambda}_{ab}(\vec{x_b}-\vec{\mu_b})}$$

We now want to examine how the parts of the $\mathbf{\Sigma}$ and $\mathbf{\Lambda}$ relate. In other words, we want to use the an identity for the inverse of partitioned matrix to solve
$$\begin{bmatrix} \mathbf{\Sigma}_{aa} \ \mathbf{\Sigma}_{ab} \\  \mathbf{\Sigma}_{ba} \ \mathbf{\Sigma}_{bb} \end{bmatrix}^{-1} = \begin{bmatrix} \mathbf{\Lambda}_{aa} \ \mathbf{\Lambda}_{ab} \\  \mathbf{\Lambda}_{ba} \ \mathbf{\Lambda}_{bb} \end{bmatrix}$$

Using the proof below, and noticing we only need $\mathbf{\Lambda}_{aa}$ and $\mathbf{\Lambda}_{ab}$ to satisfy our results, we have
$$ \mathbf{\Lambda}_{aa} = (\mathbf{\Sigma}_{aa} - \mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}\mathbf{\Sigma}_{aa} )^{-1}$$

$$  \mathbf{\Lambda}_{ab} = -\mathbf{\Lambda}_{aa}\mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}$$

While the conditional covariance substiution of $\mathbf{\Lambda}_{aa}$ is elementary, we can subsitute $\mathbf{\Lambda}_{ab}$ into the conditional mean to get 
$$\vec{\mu}_{a|b} = \vec{\mu}_{a} - \mathbf{\Lambda}_{aa}^{-1}(-\mathbf{\Lambda}_{aa}\mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1})(\vec{x_b}-\vec{\mu_b})$$
$$=\vec{\mu}_{a} + \mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}(\vec{x_b}-\vec{\mu_b})$$

Finally, we have the following expressions for the mean and covariance of the conditional distribution $p(\vec{x_a}|\vec{x_b})$
$$\color{red}{\mathbf{\Sigma}_{a|b} = \mathbf{\Sigma}_{aa} - \mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}\mathbf{\Sigma}_{aa} }$$
$$\color{red}{\vec{\mu}_{a|b} =\vec{\mu}_{a} + \mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}(\vec{x_b}-\vec{\mu_b})}$$

Note that the mean of the conditional distribution $\vec{\mu}_{a|b}$, is a linear function of $\vec{x_b}$ and that the covariance, is independent of $\vec{x_a}$. This represents an example of a linear-Gaussian model.

---------------------------------------------------------------------------------------------

## Marginal Gaussian Distributions

We have seen that the condition distribution $p(\vec{x_a}|\vec{x_b})$ will be Gaussian given the joint distribution $p(\vec{x_a},\vec{x_b})$ is Gaussian.
Now we turn the $marginal$ $distribution$ given by $$\color{green}{p(\vec{x_a}) = \int p(\vec{x_a},\vec{x_b}) \ d\vec{x_b}}$$
which we will see is also Gaussian.

It should be noted that before, in finding the conditional distruibution, we focused on the quadratic form in the exponent of the conditional distribution itself.
Now, our strategy for evaluating this distribution will be to focus on the quadratic form in the exponent of the $joint$ distribution and thereby to identify the mean and covariance of the marginal distribution $p(\vec{x_a})$. 

As before, the quadratic form in the exponent of the joint Gaussian distribituion is given by
$$\color{green}{-\dfrac{1}{2} (\vec{x} - \vec{\mu})^T \mathbf{\Sigma}^{-1} (\vec{x} - \vec{\mu})}$$

and making use of the partitioning we obtain 
$$ - \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{aa} (\vec{x_{a}} - \vec{\mu_{a}})
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{ba} (\vec{x_{a}} - \vec{\mu_{a}})
- \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{ab} (\vec{x_{b}} - \vec{\mu_{b}})
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{bb} (\vec{x_{b}} - \vec{\mu_{b}})$$

Because our goal is to integrate out $\vec{x_b}$, this is most easily achieved by first considering the terms involving $\vec{x_b}$ and then completing the square in order to facilitate integration. Picking out just those terms that involve $\vec{x_b}$, we have

$$
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{bb} (\vec{x_{b}} - \vec{\mu_{b}})
- \dfrac{1}{2} \vec{x_{b}}^T \mathbf{\Lambda}_{ba} (\vec{x_{a}} - \vec{\mu_{a}})
- \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{ab} \vec{x_{b}}
$$

Applying the symmetry properties listed above and some transpose arithmetic we can rewrite the equation
$$
- \dfrac{1}{2} (\vec{x_{b}} - \vec{\mu_{b}})^T \mathbf{\Lambda}_{bb} (\vec{x_{b}} - \vec{\mu_{b}})
- \vec{x_{b}}^T \mathbf{\Lambda}_{ba} (\vec{x_{a}} - \vec{\mu_{a}})
$$

Expanding the square for the first part, we have
$$ - \dfrac{1}{2}\vec{x_{b}}^T\mathbf{\Lambda}_{bb}\vec{x_{b}} + \vec{x_{b}}^T\mathbf{\Lambda}_{bb}\vec{\mu_{b}} + const.  -  \ \vec{x_{b}}^T \mathbf{\Lambda}_{ba} (\vec{x_{a}} - \vec{\mu_{a}})$$

Combining like terms and dropping the dropping the const. term, we have
$$\color{red}{ - \dfrac{1}{2}\vec{x_{b}}^T\mathbf{\Lambda}_{bb}\vec{x_{b}} + \vec{x_{b}}^T[\mathbf{\Lambda}_{bb}\vec{\mu_{b}} -\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}})]}$$

Now, let us define $\vec{m} = [\mathbf{\Lambda}_{bb}\vec{\mu_{b}} -\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}})]$ so that we can rewrite the equation as 
$$\color{red}{ - \dfrac{1}{2}[\vec{x_{b}}^T\mathbf{\Lambda}_{bb}\vec{x_{b}} + 2\vec{x_{b}}^T\vec{m}]}$$

Now we want to complete the square. First, we can add in the following
$$ - \dfrac{1}{2}[\vec{x_{b}}^T\mathbf{\Lambda}_{bb}\vec{x_{b}} + 2\vec{x_{b}}^T\mathbf{\Lambda}_{bb}(\mathbf{\Lambda}_{bb}^{-1}\vec{m})]$$

And now lets add in the neccesary terms to complete the square
$$- \dfrac{1}{2}[\vec{x_{b}}^T\mathbf{\Lambda}_{bb}\vec{x_{b}} + 2\vec{x_{b}}^T\mathbf{\Lambda}_{bb}(\mathbf{\Lambda}^{-1}_{bb}\vec{m}) + \vec{m}^T\mathbf{\Lambda}_{bb}^{-1}\vec{m}]+\dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{bb}^{-1}\vec{m}$$

Thus we have
$$\color{red}{- \dfrac{1}{2}(\vec{x_{b}}-\mathbf{\Lambda}_{bb}^{-1}\vec{m})^T\mathbf{\Lambda}_{bb}(\vec{x_{b}}-\mathbf{\Lambda}_{bb}^{-1}\vec{m})+\dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{bb}^{-1}\vec{m}}$$

Note that although the last term is not dependent on $\vec{x_b}$, and it does depend on $\vec{x_a}$ by the definition of $\vec{m}$.

Focussing on only the terms dependent on $\vec{x_b}$ we are left with
$$\color{red}{- \dfrac{1}{2}(\vec{x_{b}}-\mathbf{\Lambda}_{bb}^{-1}\vec{m})^T\mathbf{\Lambda}_{bb}(\vec{x_{b}}-\mathbf{\Lambda}_{bb}^{-1}\vec{m})}$$

Let us take a look at the terms depenedent on $\vec{x_a}$ in $p(\vec{x_a}|\vec{x_b})$ that we dropped so far
$$
\dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{bb}^{-1}\vec{m}
- \dfrac{1}{2} (\vec{x_{a}} - \vec{\mu_{a}})^T \mathbf{\Lambda}_{aa} (\vec{x_{a}} - \vec{\mu_{a}})
- \vec{x_{a}^T \mathbf{\Lambda}_{ba} }(- \vec{\mu_{b}})
$$

Expanding the square for the middle term, dropping the const. term, and combining like terms we have
$$
\dfrac{1}{2}\vec{m}^T\mathbf{\Lambda}_{bb}^{-1}\vec{m}
- \dfrac{1}{2}\vec{x_{a}}^T\mathbf{\Lambda}_{aa}\vec{x_{a}} + \vec{x_{a}}^T[\mathbf{\Lambda}_{aa}\vec{\mu_{a}} + \mathbf{\Lambda}_{ba}\vec{\mu_{b}}]$$

Let us take apart the first term
$$\dfrac{1}{2}[\mathbf{\Lambda}_{bb}\vec{\mu_{b}} -\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}})]^T\mathbf{\Lambda}_{bb}^{-1}[\mathbf{\Lambda}_{bb}\vec{\mu_{b}} -\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}})]$$

$$\dfrac{1}{2}(\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}}))^T
\mathbf{\Lambda}_{bb}^{-1}
(\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}}))
-(\mathbf{\Lambda}_{ba}(\vec{x_{a}}-\vec{\mu_{a}}))^T
\mathbf{\Lambda}_{bb}^{-1}
(\mathbf{\Lambda}_{bb}\vec{\mu_{b}})
+ const.
$$

$$
\dfrac{1}{2}(\vec{x_{a}}-\vec{\mu_{a}})^T
\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba}
(\vec{x_{a}}-\vec{\mu_{a}})
- \vec{x_a}
\mathbf{\Lambda}_{ba}
\vec{\mu_b} + const.
$$


$$
\dfrac{1}{2}\vec{x_{a}}^T
\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba}
\vec{x_{a}}
- \vec{x_{a}}^T
\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba}
\vec{\mu_a}
+ const. - \
\vec{x_a}^T
\mathbf{\Lambda}_{ba}
\vec{\mu_b}
 + const.
$$


Returning to our expression for all the other terms dependent on $\vec{x_a}$ and independent of $\vec{x_b}$, and dropping the const., we have
$$\dfrac{1}{2}\vec{x_{a}}^T
(\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba})
\vec{x_{a}}
- \vec{x_{a}}^T
(\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba})
\vec{\mu_a}
-\vec{x_a}^T
\mathbf{\Lambda}_{ba}
\vec{\mu_b}
- \dfrac{1}{2}\vec{x_{a}}^T\mathbf{\Lambda}_{aa}\vec{x_{a}} + \vec{x_{a}}^T[\mathbf{\Lambda}_{aa}\vec{\mu_{a}} + \mathbf{\Lambda}_{ba}\vec{\mu_{b}}]$$

Notice that the $\vec{\mu_b}$ term cancels out so we have 
$$\dfrac{1}{2}\vec{x_{a}}^T
(\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba})
\vec{x_{a}}
- \vec{x_{a}}^T
(\mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba})
\vec{\mu_a}- \dfrac{1}{2}\vec{x_{a}}^T\mathbf{\Lambda}_{aa}\vec{x_{a}} + \vec{x_{a}}^T\mathbf{\Lambda}_{aa}\vec{\mu_{a}}$$

Grouping by like terms gives us
$$-\dfrac{1}{2}\vec{x_{a}}^T
(\mathbf{\Lambda}_{aa} - \mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba})
\vec{x_{a}}
+
\vec{x_{a}}^T
(\mathbf{\Lambda}_{aa} - \mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba})
\vec{\mu_{a}}
$$

Finally, completing the square, and dropping the constant we have 
$$\color{red}{ -\dfrac{1}{2} (\vec{x_{a}}-\vec{\mu_{a}})^T
[\mathbf{\Lambda}_{aa} - \mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba}]
(\vec{x_{a}}-\vec{\mu_{a}})}$$

From this we can can conclude
$$\color{red}{\mathbf{\Sigma}_{a} = [\mathbf{\Lambda}_{aa} - \mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba}]^{-1} }$$
$$\color{red}{ \vec{\mu_{a}} = \vec{\mu_{a}}} $$

Again, we want to rewrite this in terms of the corresponding partitioning of the covariance matrix. Using the proof below and inverting it, notice that 
$$ \mathbf{\Sigma}_{aa} = (\mathbf{\Lambda}_{aa} - \mathbf{\Lambda}_{ab}\mathbf{\Lambda}_{bb}^{-1}\mathbf{\Lambda}_{ba}) ^{-1}$$
Thus
$$\color{red}{\mathbf{\Sigma}_{a} = \mathbf{\Sigma}_{aa}}$$

---

### Conditional and Marignal Gaussian Distributions Summary

Given a joint Gaussian distribution $N(\vec{x}|\vec{\mu},\mathbf{\Sigma})$ with $\mathbf{\Lambda} = \mathbf{\Sigma}^{-1}$ and
$$\vec{x} = \begin{pmatrix} \vec{x_a} \\ \vec{x_b} \end{pmatrix},\vec{\mu} = \begin{pmatrix} \vec{\mu_a} \\ \vec{\mu_b} \end{pmatrix},$$

$$ \mathbf{\Sigma}  = \begin{bmatrix} \mathbf{\Sigma_{aa}} \ \mathbf{\Sigma_{ab}} \\  \mathbf{\Sigma_{ba}} \ \mathbf{\Sigma_{bb}} \end{bmatrix} , 
\mathbf{\Lambda}  = \begin{bmatrix} \mathbf{\Lambda_{aa}} \ \mathbf{\Lambda_{ab}} \\  \mathbf{\Lambda_{ba}} \ \mathbf{\Lambda_{bb}} \end{bmatrix} $$

#### Conditional Distribution
$$p(\vec{x_a}|\vec{x_b}) = N(\vec{x}|\vec{\mu}_{a} + \mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}(\vec{x_b}-\vec{\mu_b}), \ \mathbf{\Sigma}_{aa} - \mathbf{\Sigma}_{ab}\mathbf{\Sigma}_{bb}^{-1}\mathbf{\Sigma}_{aa}) $$
Or 
$$p(\vec{x_a}|\vec{x_b}) = N(\vec{x}| \vec{\mu}_a -  \mathbf{\Lambda}_{aa}^{-1}\mathbf{\Lambda}_{ab}(\vec{x_b}-\vec{\mu_b}) , \ \mathbf{\Lambda}_{aa}^{-1})$$


#### Marginal Distribution
- If $k = a$, then $l=b$
- If $k=b$, then $l = a$
$$p(\vec{x_k}) = N(\vec{x}|\vec{\mu_k}, \mathbf{\Sigma}_{kk}) $$
Or 
$$p(\vec{x_k}) = N(\vec{x}|\vec{\mu_k}, \ (\mathbf{\Lambda}_{kk} - \mathbf{\Lambda}_{kl}\mathbf{\Lambda}_{ll}^{-1}\mathbf{\Lambda}_{lk})^{-1} ) $$

---

## Bayes' Theorem for Gaussian Variables

Suppose that we are given a Gaussian marginal distribution $p(\vec{x})$ and a Gaussian conditional distribution $p(\vec{y}|\vec{x})$. We shall take the distibutions to be 
$$p(\vec{x}) = N(\vec{x}|\vec{\mu}, \mathbf{\Lambda}^{-1} ) $$
$$p(\vec{y}|\vec{x}) = N(\vec{y}|\mathbf{A}\vec{x}+\vec{b}, \mathbf{L}^{-1} ) $$

Looking at the conditional Gaussian distribution formula, note we have that $\vec{\mu}_{a|b}$ is a linear function of $\vec{x_b}$. Here we shall suppose that the Gaussian conditional distribution we are given, $p(\vec{y}|\vec{x})$, has a mean that is a linear function of $\vec{x}$, and a covariance which is independent of $\vec{x}$.

First we find an expression for the joint distribution over $\vec{x}$ and $\vec{y}$. To do this, we define
$$ \vec{z} = \begin{pmatrix}  \vec{x} \\  \vec{y} \end{pmatrix} $$

Then taking the log of the joint distribution we have
$$ ln \ p(\vec{z}) = ln \ p(\vec{x}) + ln \ p(\vec{y}|\vec{x}) = $$
$$
-\dfrac{1}{2}(\vec{x}-\vec{\mu})^T \mathbf{\Lambda} (\vec{x}-\vec{\mu})
-\dfrac{1}{2}(\vec{y}-\mathbf{A}\vec{x}-\vec{b})^T \mathbf{L} (\vec{y}-\mathbf{A}\vec{x}-\vec{b})
+ const.
$$

From which we can expand the square to get the following second order terms
$$
-\dfrac{1}{2}\vec{x}^T \mathbf{\Lambda} \vec{x}
- \dfrac{1}{2}\vec{y}^T \mathbf{L} \vec{y}^T 
+ \dfrac{1}{2}\vec{y}^T  \mathbf{L}\mathbf{A} \vec{x}
+ \dfrac{1}{2}\vec{x}^T \mathbf{A}^T\mathbf{L} \vec{y}^T
+ \dfrac{1}{2}\vec{x}^T \mathbf{A}^T\mathbf{L}\mathbf{A} \vec{x}^T
$$

Combining the last term with the first term we have
$$
-\dfrac{1}{2}\vec{x}^T (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A}) \vec{x}
- \dfrac{1}{2}\vec{y}^T \mathbf{L} \vec{y}^T 
+ \dfrac{1}{2}\vec{y}^T  \mathbf{L}\mathbf{A} \vec{x}
+ \dfrac{1}{2}\vec{x}^T \mathbf{A}^T\mathbf{L} \vec{y}^T
$$

So that we can rewrite the second order terms as
$$
-\dfrac{1}{2}
\begin{pmatrix} \vec{x} \\ \vec{y} \end{pmatrix}^T
\begin{pmatrix} \mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A} \ \ \ -  \mathbf{A}^T\mathbf{L} \\ -\mathbf{L}\mathbf{A} \ \ \ \ \  \ \ \mathbf{L} \end{pmatrix}
\begin{pmatrix} \vec{x} \\ \vec{y} \end{pmatrix} = -\dfrac{1}{2}\vec{z}^T\mathbf{R}\vec{z} $$

And so the covalence of $\vec{z}$ is given by the inverse of the precision $\mathbf{R}$
$$ cov[\vec{z}] = \mathbf{R}^{-1} =\begin{pmatrix} \mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A} \ \ \ -  \mathbf{A}^T\mathbf{L} \\ -\mathbf{L}\mathbf{A} \ \ \ \ \  \ \ \mathbf{L} \end{pmatrix}^ {-1} $$

Using the inverse of a partioned matrix identity from below we have
$$\color{red}{ cov[\vec{z}]  = \begin{pmatrix} \mathbf{\Lambda}^{-1} \ \ \ \ \ \ \ \ \ \ \ \   \mathbf{\Lambda}^{-1}\mathbf{A}^T \\ \ \ \mathbf{A}\mathbf{\Lambda}^{-1} \  \ \ \ \  \mathbf{L}^{-1} + \mathbf{A}\mathbf{\Lambda}^{-1}\mathbf{A}^T  \end{pmatrix}   }$$

Similarly, we can find the mean of the Gaussian distribution over z by identifying the linear terms


$$
\vec{x}^T \mathbf{\Lambda}\vec{\mu}- \vec{x}^T \mathbf{A}^T\mathbf{L} \vec{b} + \vec{y}^T \mathbf{L} \vec{b}
$$

and rewrite so we have the linear terms in the form
$$\begin{pmatrix} \vec{x} \\ \vec{y} \end{pmatrix}^T
\begin{pmatrix} \mathbf{\Lambda}\vec{\mu} - \mathbf{A}^T\mathbf{L} \vec{b} \\ \mathbf{L} \vec{b} \end{pmatrix} $$

Hence the mean is given by
$$ E[\vec{z}] = cov[\vec{z}]*\begin{pmatrix} \mathbf{\Lambda}\vec{\mu} - \mathbf{A}^T\mathbf{L} \vec{b} \\ \mathbf{L} \vec{b} \end{pmatrix} = \begin{pmatrix} \mathbf{\Lambda}^{-1} \ \ \ \ \ \ \ \ \ \ \ \   \mathbf{\Lambda}^{-1}\mathbf{A}^T \\ \ \ \mathbf{A}\mathbf{\Lambda}^{-1} \  \ \ \ \  \mathbf{L}^{-1} + \mathbf{A}\mathbf{\Lambda}^{-1}\mathbf{A}^T  \end{pmatrix}   \begin{pmatrix} \mathbf{\Lambda}\vec{\mu} - \mathbf{A}^T\mathbf{L} \vec{b} \\ \mathbf{L} \vec{b} \end{pmatrix} $$

$$ \color{red}{ E[\vec{z}] = \begin{pmatrix} \vec{\mu} \\  \mathbf{A}\vec{\mu} + \vec{b} \end{pmatrix} }$$

We can now move towards finding an expression for the marginal ditrubtion $p(\vec{y})$ and the conditional distribution $p(\vec{x}|\vec{y})$.

Let us start with the marginal distribution 
$$p(\vec{y}) = N(\vec{y}|E[\vec{y}], cov [\vec{y}])$$
We can see from the expectation of $\vec{z}$ we have
$$\color{red}{ E[\vec{y}] =  \mathbf{A}\vec{\mu} + \vec{b}}$$
For the covariance let us note that for the marginal distrbution formula we had $\mathbf{\Sigma}_k = \mathbf{\Sigma}_{kk}$ for $\vec{x_k}$.
Hence
$$\color{red}{ cov [\vec{y}] =  cov[\vec{z}]_{bb} = \mathbf{L}^{-1} + \mathbf{A}\mathbf{\Lambda}^{-1}\mathbf{A}^T }$$

Let us now move on to the conditional distibution.
$$p(\vec{x}|\vec{y}) = N(\vec{x}|E[\vec{x}|\vec{y}], cov[\vec{x}|\vec{y}])$$

Using the conditional Gaussian distubution formula we have 
$$
\color{red}{cov[\vec{x}|\vec{y}] = cov[\vec{z}]^{-1}_{aa} =  (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1}}
$$

and for the mean we have
$$
E[\vec{x}|\vec{y}] 
= E[\vec{x}] - cov[\vec{x}|\vec{y}]cov[z]^{-1}_{ab}(\vec{y} - E[\vec{y}])
$$

$$
= \vec{\mu} -   (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1} (-\mathbf{A}^T\mathbf{L} ) [\vec{y} - \mathbf{A}\vec{\mu} - \vec{b}]
$$

$$
= \vec{\mu} +   (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1} (\mathbf{A}^T\mathbf{L} ) [\vec{y} - \mathbf{A}\vec{\mu} - \vec{b}]
$$

$$
= (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1} (\mathbf{A}^T\mathbf{L} ) [\vec{y} - \mathbf{A}\vec{\mu} - \vec{b} + (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})\vec{\mu}]
$$

$$
= (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1}  [\mathbf{A}^T\mathbf{L}(\vec{y} - \vec{b}) - \mathbf{A}^T\mathbf{L}\mathbf{A}\vec{\mu} + (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})\vec{\mu}]
$$

$$
\color{red}{E[\vec{x}|\vec{y}] = (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1}  [\mathbf{A}^T\mathbf{L}(\vec{y} - \vec{b}) + \mathbf{\Lambda}\vec{\mu}]}
$$

### Bayes' for Gaussian Variables Summary
Given 
$$p(\vec{x}) = N(\vec{x}|\vec{\mu}, \mathbf{\Lambda}^{-1} ) $$
$$p(\vec{y}|\vec{x}) = N(\vec{y}|\mathbf{A}\vec{x}+\vec{b}, \mathbf{L}^{-1} ) $$
we have
$$p(\vec{y}) = N(\vec{y}|\mathbf{A}\vec{\mu} + \vec{b}, \ \mathbf{L}^{-1} + \mathbf{A}\mathbf{\Lambda}^{-1}\mathbf{A}^T )$$
and
$$p(\vec{x}|\vec{y}) =  N(\vec{x}|(\mathbf{\Sigma}  [\mathbf{A}^T\mathbf{L}(\vec{y} - \vec{b}) + \mathbf{\Lambda}\vec{\mu}], \ \mathbf{\Sigma}$$
where $\mathbf{\Sigma} = (\mathbf{\Lambda} + \mathbf{A}^T\mathbf{L}\mathbf{A})^{-1}$

---