Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions D/dev.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
layout: definition
mathjax: true

author: "Joram Soch"
affiliation: "BCCN Berlin"
e_mail: "joram.soch@bccn-berlin.de"
date: 2022-03-01 07:48:00

title: "Deviance information criterion"
chapter: "Model Selection"
section: "Classical information criteria"
topic: "Deviance information criterion"
definition: "Deviance"

sources:
- authors: "Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A"
year: 2002
title: "Bayesian measures of model complexity and fit"
in: "Journal of the Royal Statistical Society, Series B: Statistical Methodology"
pages: "vol. 64, iss. 4, pp. 583-639"
url: "https://rss.onlinelibrary.wiley.com/doi/10.1111/1467-9868.00353"
doi: "10.1111/1467-9868.00353"
- authors: "Soch J, Allefeld C"
year: 2018
title: "MACS – a new SPM toolbox for model assessment, comparison and selection"
in: "Journal of Neuroscience Methods"
pages: "vol. 306, pp. 19-31, eqs. 10-12"
url: "https://www.sciencedirect.com/science/article/pii/S0165027018301468"
doi: "10.1016/j.jneumeth.2018.05.017"
- authors: "Wikipedia"
year: 2022
title: "Deviance information criterion"
in: "Wikipedia, the free encyclopedia"
pages: "retrieved on 2022-03-01"
url: "https://en.wikipedia.org/wiki/Deviance_information_criterion#Definition"

def_id: "D172"
shortcut: "dev"
username: "JoramSoch"
---


**Definition:** Let there be a [generative model](/D/gm) $m$ describing measured data $y$ using model parameters $\theta$. Then, the deviance of $m$ is a function of $\theta$ which multiplies the [log-likelihood function](/D/llf) with $-2$:

$$ \label{eq:dev}
D(\theta) = -2 \log p(y|\theta,m) \; .
$$

The deviance function serves the definition of the [deviance information criterion](/D/dic).
13 changes: 8 additions & 5 deletions I/ToC.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,16 +548,18 @@ title: "Table of Contents"
&emsp;&ensp; 1.4.14. **[Weighted least squares](/P/mlr-wls2)** (2) <br>
&emsp;&ensp; 1.4.15. **[Maximum likelihood estimation](/P/mlr-mle)** <br>
&emsp;&ensp; 1.4.16. **[Maximum log-likelihood](/P/mlr-mll)** <br>
&emsp;&ensp; 1.4.17. **[Akaike information criterion](/P/mlr-aic)** <br>
&emsp;&ensp; 1.4.18. **[Bayesian information criterion](/P/mlr-bic)** <br>
&emsp;&ensp; 1.4.19. **[Corrected Akaike information criterion](/P/mlr-aicc)** <br>
&emsp;&ensp; 1.4.17. **[Deviance function](/P/mlr-dev)** <br>
&emsp;&ensp; 1.4.18. **[Akaike information criterion](/P/mlr-aic)** <br>
&emsp;&ensp; 1.4.19. **[Bayesian information criterion](/P/mlr-bic)** <br>
&emsp;&ensp; 1.4.20. **[Corrected Akaike information criterion](/P/mlr-aicc)** <br>

1.5. Bayesian linear regression <br>
&emsp;&ensp; 1.5.1. **[Conjugate prior distribution](/P/blr-prior)** <br>
&emsp;&ensp; 1.5.2. **[Posterior distribution](/P/blr-post)** <br>
&emsp;&ensp; 1.5.3. **[Log model evidence](/P/blr-lme)** <br>
&emsp;&ensp; 1.5.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
&emsp;&ensp; 1.5.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
&emsp;&ensp; 1.5.4. **[Deviance information criterion](/P/blr-dic)** <br>
&emsp;&ensp; 1.5.5. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
&emsp;&ensp; 1.5.6. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>

2. Multivariate normal data

Expand Down Expand Up @@ -664,6 +666,7 @@ title: "Table of Contents"

2.3. Deviance information criterion <br>
&emsp;&ensp; 2.3.1. *[Definition](/D/dic)* <br>
&emsp;&ensp; 2.3.2. *[Deviance](/D/dev)* <br>

3. Bayesian model selection

Expand Down
166 changes: 166 additions & 0 deletions P/blr-dic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
layout: proof
mathjax: true

author: "Joram Soch"
affiliation: "BCCN Berlin"
e_mail: "joram.soch@bccn-berlin.de"
date: 2022-03-01 12:10:00

title: "Deviance information criterion for multiple linear regression"
chapter: "Statistical Models"
section: "Univariate normal data"
topic: "Bayesian linear regression"
theorem: "Deviance information criterion"

sources:

proof_id: "P313"
shortcut: "blr-dic"
username: "JoramSoch"
---


**Theorem:** Consider a [linear regression model](/D/mlr) $m$

$$ \label{eq:mlr}
m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V), \; \sigma^2 V = (\tau P)^{-1}
$$

with a [normal-gamma prior distribution](/P/blr-prior)

$$ \label{eq:blr-prior}
p(\beta,\tau) = \mathcal{N}(\beta; \mu_0, (\tau \Lambda_0)^{-1}) \cdot \mathrm{Gam}(\tau; a_0, b_0) \; .
$$

Then, the [deviance information criterion](/D/dic) for this model is

$$ \label{eq:mlr-dic}
\begin{split}
\mathrm{DIC}(m) &= n \cdot \log(2\pi) - n \left[ 2 \psi(a_n) - \log(a_n) - \log(b_n) \right] - \log|P| \\
&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right)
\end{split}
$$

where $\mu_n$ and $\Lambda_n$ as well as $a_n$ and $b_n$ are [posterior parameters](/D/post) describing the [posterior distribution in Bayesian linear regression](/P/blr-post).


**Proof:** The [deviance for multiple linear regression](/P/mlr-dev) is

$$ \label{eq:mlr-dev-s1}
D(\beta,\sigma^2) = n \cdot \log(2\pi) + n \cdot \log(\sigma^2) + \log|V| + \frac{1}{\sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta)
$$

which, applying the equalities $\tau = 1/\sigma^2$ and $P = V^{-1}$, becomes

$$ \label{eq:mlr-dev-s2}
D(\beta,\tau) = n \cdot \log(2\pi) - n \cdot \log(\tau) - \log|P| + \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \; .
$$

The [deviance information criterion](/D/dic) (DIC) is defined as

$$ \label{eq:dic}
\mathrm{DIC}(m) = -2 \log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m) + 2 \, p_D
$$

where $\log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m)$ is the [log-likelihood function](/D/mlr-mll) at the posterior [expectations](/D/mean) and the "effective number of parameters" $p_D$ is the [difference between the expectation of the deviance and the deviance at the expectation](/D/dic):

$$ \label{eq:dic-pD}
p_D = \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \; .
$$

With that, the DIC for multiple linear regression becomes:

$$ \label{eq:mlr-dic-s1}
\begin{split}
\mathrm{DIC}(m) &= -2 \log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m) + 2 \, p_D \\
&= D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) + 2 \left[ \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \right] \\
&= 2 \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \; .
\end{split}
$$

The [posterior distribution for multiple linear regression](/P/blr-post) is

$$ \label{eq:blr-post}
p(\beta,\tau|y) = \mathcal{N}(\beta; \mu_n, (\tau \Lambda_n)^{-1}) \cdot \mathrm{Gam}(\tau; a_n, b_n)
$$

where the [posterior hyperparameters](/D/post) are given by

$$ \label{eq:blr-post-par}
\begin{split}
\mu_n &= \Lambda_n^{-1} (X^\mathrm{T} P y + \Lambda_0 \mu_0) \\
\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \\
a_n &= a_0 + \frac{n}{2} \\
b_n &= b_0 + \frac{1}{2} (y^\mathrm{T} P y + \mu_0^\mathrm{T} \Lambda_0 \mu_0 - \mu_n^\mathrm{T} \Lambda_n \mu_n) \; .
\end{split}
$$

Thus, we have the following posterior expectations:

$$ \label{eq:blr-post-beta}
\left\langle \beta \right\rangle_{\beta,\tau|y} = \mu_n
$$

$$ \label{eq:blr-post-tau}
\left\langle \tau \right\rangle_{\beta,\tau|y} = \frac{a_n}{b_n}
$$

$$ \label{eq:blr-post-log-tau}
\left\langle \log \tau \right\rangle_{\beta,\tau|y} = \psi(a_n) - \log(b_n)
$$

$$ \label{eq:blr-post-beta-qf}
\begin{split}
\left\langle \beta^\mathrm{T} A \beta \right\rangle_{\beta|\tau,y} &= \mu_n^\mathrm{T} A \mu_n + \mathrm{tr}\left( A (\tau \Lambda_n)^{-1} \right) \\
&= \mu_n^\mathrm{T} A \mu_n + \frac{1}{\tau} \mathrm{tr}\left( A \Lambda_n^{-1} \right) \; .
\end{split}
$$

In these identities, we have used the [mean of the multivariate normal distribution](/P/mvn-mean), the [mean of the gamma distribution](/P/gam-mean), the [logarithmic expectation of the gamma distribution](/P/gam-logmean), the [expectation of a quadratic form](/P/mean-qf) and the [covariance of the multivariate normal distribution](/P/mvn-cov).

With that, the deviance at the expectation is:

$$ \label{eq:mlr-dev-exp}
\begin{split}
D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) &\overset{\eqref{eq:mlr-dev-s2}}{=} n \cdot \log(2\pi) - n \cdot \log(\left\langle \tau \right\rangle) - \log|P| + \tau \cdot (y - X\left\langle \beta \right\rangle)^\mathrm{T} P (y - X\left\langle \beta \right\rangle) \\
&\overset{\eqref{eq:blr-post-beta}}{=} n \cdot \log(2\pi) - n \cdot \log(\left\langle \tau \right\rangle) - \log|P| + \tau \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \\
&\overset{\eqref{eq:blr-post-tau}}{=} n \cdot \log(2\pi) - n \cdot \log\left(\frac{a_n}{b_n}\right) - \log|P| + \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \; .
\end{split}
$$

Moreover, the expectation of the deviance is:

$$ \label{eq:mlr-exp-dev}
\begin{split}
\left\langle D(\beta,\tau) \right\rangle &\overset{\eqref{eq:mlr-dev-s2}}{=} \left\langle n \cdot \log(2\pi) - n \cdot \log(\tau) - \log|P| + \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle \\
&= n \cdot \log(2\pi) - n \cdot \left\langle \log(\tau) \right\rangle - \log|P| + \left\langle \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle \\
&\overset{\eqref{eq:blr-post-log-tau}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
&+ \left\langle \tau \cdot \left\langle (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle_{\beta|\tau,y} \right\rangle_{\tau|y} \\
&= n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
&+ \left\langle \tau \cdot \left\langle y^\mathrm{T} P y - y^\mathrm{T} P X\beta - \beta^\mathrm{T} X^\mathrm{T} P y + \beta^\mathrm{T} X^\mathrm{T} P X \beta \right\rangle_{\beta|\tau,y} \right\rangle_{\tau|y} \\
&\overset{\eqref{eq:blr-post-beta-qf}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
&+ \left\langle \tau \cdot \left[ y^\mathrm{T} P y - y^\mathrm{T} P X\mu_n - \mu_n^\mathrm{T} X^\mathrm{T} P y + \mu_n^\mathrm{T} X^\mathrm{T} P X \mu_n + \frac{1}{\tau} \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \right] \right\rangle_{\tau|y} \\
&= n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
&+ \left\langle \tau \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \right\rangle_{\tau|y} + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \\
&\overset{\eqref{eq:blr-post-tau}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
&+ \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \; .
\end{split}
$$

Finally, combining the two terms, we have:

$$ \label{eq:mlr-dic-s2}
\begin{split}
\mathrm{DIC}(m) &\overset{\eqref{eq:mlr-dic-s1}}{=} 2 \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \\
&\overset{\eqref{eq:mlr-exp-dev}}{=} 2 \left[ n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \right. \\
&+ \left. \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \right] \\
&\overset{\eqref{eq:mlr-dev-exp}}{-} \left[ n \cdot \log(2\pi) - n \cdot \log\left(\frac{a_n}{b_n}\right) - \log|P| + \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \right] \\
&= n \cdot \log(2\pi) - 2 n \psi(a_n) + 2 n \log(b_n) + n \log(a_n) - \log(b_n) - \log|P| \\
&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \\
&= n \cdot \log(2\pi) - n \left[ 2 \psi(a_n) - \log(a_n) - \log(b_n) \right] - \log|P| \\
&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \; .
\end{split}
$$

This conforms to equation \eqref{eq:mlr-dic}.
104 changes: 104 additions & 0 deletions P/mlr-dev.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
layout: proof
mathjax: true

author: "Joram Soch"
affiliation: "BCCN Berlin"
e_mail: "joram.soch@bccn-berlin.de"
date: 2022-03-01 08:42:00

title: "Deviance for multiple linear regression"
chapter: "Statistical Models"
section: "Univariate normal data"
topic: "Multiple linear regression"
theorem: "Deviance function"

sources:

proof_id: "P312"
shortcut: "mlr-dev"
username: "JoramSoch"
---


**Theorem:** Consider a [linear regression model](/D/mlr) $m$ with [correlation structure](/D/corrmat) $V$

$$ \label{eq:mlr}
m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; .
$$

Then, the [deviance](/D/dev) for this model is

$$ \label{eq:mlr-dev-v1}
D(\beta,\sigma^2) = \mathrm{RSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right]
$$

under [uncorrelated observations](/D/mlr), i.e. if $V = I_n$, and

$$ \label{eq:mlr-dev-v2}
D(\beta,\sigma^2) = \mathrm{wRSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] + \log|V| \; ,
$$

in the general case, i.e. if $V \neq I_n$, where $\mathrm{RSS}$ is the [residual sum of squares](/D/rss) and $\mathrm{wRSS}$ is the [weighted residual sum of squares](/P/mlr-wls2).


**Proof:** The [likelihood function](/D/lf) for multiple linear regression [is given by](/P/mlr-mle)

$$ \label{eq:mlr-lf}
\begin{split}
p(y|\beta,\sigma^2) &= \mathcal{N}(y; X\beta, \sigma^2 V) \\
&= \sqrt{\frac{1}{(2\pi)^n |\sigma^2 V|}} \cdot \exp\left[ -\frac{1}{2} (y - X\beta)^\mathrm{T} (\sigma^2 V)^{-1} (y - X\beta) \right] \; ,
\end{split}
$$

such that, with $\lvert \sigma^2 V \rvert = (\sigma^2)^n \lvert V \rvert$, the [log-likelihood function](/D/llf) for this model [becomes](/P/mlr-mle)

$$ \label{eq:mlr-llf}
\begin{split}
\mathrm{LL}(\beta,\sigma^2) = &\log p(y|\beta,\sigma^2) \\
= &- \frac{n}{2} \log(2\pi) - \frac{n}{2} \log (\sigma^2) - \frac{1}{2} \log |V| - \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) \; .
\end{split}
$$


The last term can be expressed in terms of the (weighted) [residual sum of squares](/D/rss) as

$$ \label{eq:mll-rss}
\begin{split}
- \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) &= - \frac{1}{2 \sigma^2} (Wy-WX\beta)^\mathrm{T} (Wy-WX\beta) \\
&= - \frac{1}{2 \sigma^2} \left( \frac{1}{n} \sum_{i=1}^{n} (W\varepsilon)_i^2 \right) = - \frac{\mathrm{wRSS}}{2 \sigma^2}
\end{split}
$$

where $W = V^{-1/2}$. Plugging \eqref{eq:mll-rss} into \eqref{eq:mlr-llf} and multiplying with $-2$, we obtain the [deviance](/D/dev) as

$$ \label{eq:mlr-dev-v2-qed}
\begin{split}
D(\beta,\sigma^2) &= -2 \, \mathrm{LL}(\beta,\sigma^2) \\
&= -2 \left( - \frac{\mathrm{wRSS}}{2 \sigma^2} - \frac{n}{2} \log (\sigma^2) - \frac{n}{2} \log(2\pi) - \frac{1}{2} \log |V| \right) \\
&= \mathrm{wRSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] + \log|V|
\end{split}
$$

which proves the result in \eqref{eq:mlr-dev-v2}. Assuming $V = I_n$, we have

$$ \label{eq:mll-rss-iid}
\begin{split}
- \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) &= - \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} (y - X\beta) \\
&= - \frac{1}{2 \sigma^2} \left( \frac{1}{n} \sum_{i=1}^{n} \varepsilon_i^2 \right) = - \frac{\mathrm{RSS}}{2 \sigma^2}
\end{split}
$$

and

$$ \label{eq:mlr-logdet-V-iid}
\frac{1}{2} \log|V| = \frac{1}{2} \log|I_n| = \frac{1}{2} \log 1 = 0 \; ,
$$

such that

$$ \label{eq:mlr-mll-v1-qed}
D(\beta,\sigma^2) = \mathrm{RSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right]
$$

which proves the result in \eqref{eq:mlr-dev-v1}. This completes the proof.