layout |
mathjax |
author |
affiliation |
e_mail |
date |
title |
chapter |
section |
topic |
theorem |
sources |
proof_id |
shortcut |
username |
proof |
true |
Joram Soch |
BCCN Berlin |
joram.soch@bccn-berlin.de |
2024-01-19 00:51:30 -0800 |
Posterior distribution for Bayesian linear regression with known covariance |
Statistical Models |
Univariate normal data |
Bayesian linear regression with known covariance |
Posterior distribution |
authors |
year |
title |
in |
pages |
url |
Bishop CM |
2006 |
Bayesian linear regression |
Pattern Recognition for Machine Learning |
pp. 152-161, eqs. 3.49-3.51, ex. 3.7 |
|
|
authors |
year |
title |
in |
pages |
url |
doi |
Penny WD |
2012 |
Comparing Dynamic Causal Models using AIC, BIC and Free Energy |
NeuroImage |
vol. 59, iss. 2, pp. 319-330, eq. 27 |
|
10.1016/j.neuroimage.2011.07.039 |
|
|
P433 |
blrkc-post |
JoramSoch |
Theorem: Let
$$ \label{eq:GLM}
y = X \beta + \varepsilon, ; \varepsilon \sim \mathcal{N}(0, \Sigma)
$$
be a linear regression model with measured $n \times 1$ data vector $y$, known $n \times p$ design matrix $X$ and known $n \times n$ covariance matrix $\Sigma$ as well as unknown $p \times 1$ regression coefficients $\beta$. Moreover, assume a multivariate normal distribution over the model parameter $\beta$:
$$ \label{eq:GLM-N-prior}
p(\beta) = \mathcal{N}(\beta; \mu_0, \Sigma_0) ; .
$$
Then, the posterior distribution is also a multivariate normal distribution
$$ \label{eq:GLM-N-post}
p(\beta|y) = \mathcal{N}(\beta; \mu_n, \Sigma_n)
$$
and the posterior hyperparameters are given by
$$ \label{eq:GLM-N-post-par}
\begin{split}
\mu_n &= \Sigma_n (X^\mathrm{T} \Sigma^{-1} y + \Sigma_0^{-1} \mu_0) \\
\Sigma_n &= \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right)^{-1} ; .
\end{split}
$$
Proof: According to Bayes' theorem, the posterior distribution is given by
$$ \label{eq:GLM-N-BT}
p(\beta|y) = \frac{p(y|\beta) , p(\beta)}{p(y)} ; .
$$
Since $p(y)$ is just a normalization factor, the posterior is proportional to the numerator:
$$ \label{eq:GLM-N-post-JL}
p(\beta|y) \propto p(y|\beta) , p(\beta) = p(y,\beta) ; .
$$
Equation \eqref{eq:GLM} implies the following likelihood function:
$$ \label{eq:GLM-LF}
p(y|\beta) = \mathcal{N}(y; X \beta, \Sigma) = \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} , \exp\left[ -\frac{1}{2} (y-X\beta)^\mathrm{T} \Sigma^{-1} (y-X\beta) \right] ; .
$$
Combining the likelihood function \eqref{eq:GLM-LF} with the prior distribution \eqref{eq:GLM-N-prior} using the probability density function of the multivariate normal distribution, the joint likelihood of the model is given by
$$ \label{eq:GLM-N-JL-s1}
\begin{split}
p(y,\beta) = ; & p(y|\beta) , p(\beta) \\
= ; & \sqrt{\frac{1}{(2 \pi)^n |\Sigma|}} , \exp\left[ -\frac{1}{2} (y-X\beta)^\mathrm{T} \Sigma^{-1} (y-X\beta) \right] \cdot \\
; & \sqrt{\frac{1}{(2 \pi)^p |\Sigma_0|}} , \exp\left[ -\frac{1}{2} (\beta-\mu_0)^\mathrm{T} \Sigma_0^{-1} (\beta-\mu_0) \right] ; .
\end{split}
$$
Collecting identical variables gives:
$$ \label{eq:GLM-N-JL-s2}
\begin{split}
p(y,\beta) = ; & \sqrt{\frac{1}{(2 \pi)^{n+p} |\Sigma| |\Sigma_0|}} \cdot \\
& \exp\left[ -\frac{1}{2} \left( (y-X\beta)^\mathrm{T} \Sigma^{-1} (y-X\beta) + (\beta-\mu_0)^\mathrm{T} \Sigma_0^{-1} (\beta-\mu_0) \right) \right] ; .
\end{split}
$$
Expanding the products in the exponent gives:
$$ \label{eq:GLM-N-JL-s3}
\begin{split}
p(y,\beta) = ; & \sqrt{\frac{1}{(2 \pi)^{n+p} |\Sigma| |\Sigma_0|}} \cdot \\
& \exp\left[ -\frac{1}{2} \left( y^\mathrm{T} \Sigma^{-1} y - y^\mathrm{T} \Sigma^{-1} X \beta - \beta^\mathrm{T} X^\mathrm{T} \Sigma^{-1} y + \beta^\mathrm{T} X^\mathrm{T} \Sigma^{-1} X \beta + \right. \right. \\
& \hphantom{\exp \left[ -\frac{1}{2} \right.} ; \left. \left. \beta^\mathrm{T} \Sigma_0^{-1} \beta - \beta^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_0^\mathrm{T} \Sigma_0^{-1} \beta + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 \right) \right] ; .
\end{split}
$$
Regrouping the terms in the exponent gives:
$$ \label{eq:GLM-N-JL-s4}
\begin{split}
p(y,\beta) = ; & \sqrt{\frac{1}{(2 \pi)^{n+p} |\Sigma| |\Sigma_0|}} \cdot \\
& \exp\left[ -\frac{1}{2} \left( \beta^\mathrm{T} [ X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} ] \beta - 2 \beta^\mathrm{T} [X^\mathrm{T} \Sigma^{-1} y + \Sigma_0^{-1} \mu_0] + \right. \right. \\
& \hphantom{\exp \left[ -\frac{1}{2} \right.} ; \left. \left. y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 \right) \right] ; .
\end{split}
$$
Completing the square over $\beta$, we finally have
$$ \label{eq:GLM-N-JL-s5}
\begin{split}
p(y,\beta) = ; & \sqrt{\frac{1}{(2 \pi)^{n+p} |\Sigma| |\Sigma_0|}} \cdot \\
& \exp\left[ -\frac{1}{2} \left( (\beta-\mu_n)^\mathrm{T} \Sigma_n^{-1} (\beta-\mu_n) + (y^\mathrm{T} \Sigma^{-1} y + \mu_0^\mathrm{T} \Sigma_0^{-1} \mu_0 - \mu_n^\mathrm{T} \Sigma_n^{-1} \mu_n) \right) \right]
\end{split}
$$
with the posterior hyperparameters
$$ \label{eq:GLM-N-post-par-qed}
\begin{split}
\mu_n &= \Sigma_n (X^\mathrm{T} \Sigma^{-1} y + \Sigma_0^{-1} \mu_0) \\
\Sigma_n &= \left( X^\mathrm{T} \Sigma^{-1} X + \Sigma_0^{-1} \right)^{-1} ; .
\end{split}
$$
Ergo, the joint likelihood is proportional to
$$ \label{eq:GLM-N-JL-s6}
p(y,\beta) \propto \exp\left[ -\frac{1}{2} (\beta-\mu_n)^\mathrm{T} \Sigma_n^{-1} (\beta-\mu_n) \right] ; ,
$$
such that the posterior distribution over $\beta$ is given by
$$ \label{eq:GLM-N-post-qed}
p(\beta|y) = \mathcal{N}(\beta; \mu_n, \Sigma_n)
$$
with the posterior hyperparameters given in \eqref{eq:GLM-N-post-par-qed}.