StatProofBook · JoramSoch · Mar 1, 2022 · Mar 1, 2022 · Mar 1, 2022 · Mar 1, 2022
diff --git a/D/dev.md b/D/dev.md
@@ -0,0 +1,50 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-03-01 07:48:00
+
+title: "Deviance information criterion"
+chapter: "Model Selection"
+section: "Classical information criteria"
+topic: "Deviance information criterion"
+definition: "Deviance"
+
+sources:
+  - authors: "Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A"
+    year: 2002
+    title: "Bayesian measures of model complexity and fit"
+    in: "Journal of the Royal Statistical Society, Series B: Statistical Methodology"
+    pages: "vol. 64, iss. 4, pp. 583-639"
+    url: "https://rss.onlinelibrary.wiley.com/doi/10.1111/1467-9868.00353"
+    doi: "10.1111/1467-9868.00353"
+  - authors: "Soch J, Allefeld C"
+    year: 2018
+    title: "MACS – a new SPM toolbox for model assessment, comparison and selection"
+    in: "Journal of Neuroscience Methods"
+    pages: "vol. 306, pp. 19-31, eqs. 10-12"
+    url: "https://www.sciencedirect.com/science/article/pii/S0165027018301468"
+    doi: "10.1016/j.jneumeth.2018.05.017"
+  - authors: "Wikipedia"
+    year: 2022
+    title: "Deviance information criterion"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2022-03-01"
+    url: "https://en.wikipedia.org/wiki/Deviance_information_criterion#Definition"
+
+def_id: "D172"
+shortcut: "dev"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let there be a [generative model](/D/gm) $m$ describing measured data $y$ using model parameters $\theta$. Then, the deviance of $m$ is a function of $\theta$ which multiplies the [log-likelihood function](/D/llf) with $-2$:
+
+$$ \label{eq:dev}
+D(\theta) = -2 \log p(y|\theta,m) \; .
+$$
+
+The deviance function serves the definition of the [deviance information criterion](/D/dic).
diff --git a/I/ToC.md b/I/ToC.md
@@ -548,16 +548,18 @@ title: "Table of Contents"
    &emsp;&ensp; 1.4.14. **[Weighted least squares](/P/mlr-wls2)** (2) <br>
    &emsp;&ensp; 1.4.15. **[Maximum likelihood estimation](/P/mlr-mle)** <br>
    &emsp;&ensp; 1.4.16. **[Maximum log-likelihood](/P/mlr-mll)** <br>
-   &emsp;&ensp; 1.4.17. **[Akaike information criterion](/P/mlr-aic)** <br>
-   &emsp;&ensp; 1.4.18. **[Bayesian information criterion](/P/mlr-bic)** <br>
-   &emsp;&ensp; 1.4.19. **[Corrected Akaike information criterion](/P/mlr-aicc)** <br>
+   &emsp;&ensp; 1.4.17. **[Deviance function](/P/mlr-dev)** <br>
+   &emsp;&ensp; 1.4.18. **[Akaike information criterion](/P/mlr-aic)** <br>
+   &emsp;&ensp; 1.4.19. **[Bayesian information criterion](/P/mlr-bic)** <br>
+   &emsp;&ensp; 1.4.20. **[Corrected Akaike information criterion](/P/mlr-aicc)** <br>
 
    1.5. Bayesian linear regression <br>
    &emsp;&ensp; 1.5.1. **[Conjugate prior distribution](/P/blr-prior)** <br>
    &emsp;&ensp; 1.5.2. **[Posterior distribution](/P/blr-post)** <br>
    &emsp;&ensp; 1.5.3. **[Log model evidence](/P/blr-lme)** <br>
-   &emsp;&ensp; 1.5.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
-   &emsp;&ensp; 1.5.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
+   &emsp;&ensp; 1.5.4. **[Deviance information criterion](/P/blr-dic)** <br>
+   &emsp;&ensp; 1.5.5. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
+   &emsp;&ensp; 1.5.6. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
 
 2. Multivariate normal data
 
@@ -664,6 +666,7 @@ title: "Table of Contents"
 
    2.3. Deviance information criterion <br>
    &emsp;&ensp; 2.3.1. *[Definition](/D/dic)* <br>
+   &emsp;&ensp; 2.3.2. *[Deviance](/D/dev)* <br>
 
 3. Bayesian model selection
 

diff --git a/P/blr-dic.md b/P/blr-dic.md
@@ -0,0 +1,166 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-03-01 12:10:00
+
+title: "Deviance information criterion for multiple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Bayesian linear regression"
+theorem: "Deviance information criterion"
+
+sources:
+
+proof_id: "P313"
+shortcut: "blr-dic"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Consider a [linear regression model](/D/mlr) $m$
+
+$$ \label{eq:mlr}
+m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V), \; \sigma^2 V = (\tau P)^{-1}
+$$
+
+with a [normal-gamma prior distribution](/P/blr-prior)
+
+$$ \label{eq:blr-prior}
+p(\beta,\tau) = \mathcal{N}(\beta; \mu_0, (\tau \Lambda_0)^{-1}) \cdot \mathrm{Gam}(\tau; a_0, b_0) \; .
+$$
+
+Then, the [deviance information criterion](/D/dic) for this model is
+
+$$ \label{eq:mlr-dic}
+\begin{split}
+\mathrm{DIC}(m) &= n \cdot \log(2\pi) - n \left[ 2 \psi(a_n) - \log(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right)
+\end{split}
+$$
+
+where $\mu_n$ and $\Lambda_n$ as well as $a_n$ and $b_n$ are [posterior parameters](/D/post) describing the [posterior distribution in Bayesian linear regression](/P/blr-post).
+
+
+**Proof:** The [deviance for multiple linear regression](/P/mlr-dev) is
+
+$$ \label{eq:mlr-dev-s1}
+D(\beta,\sigma^2) = n \cdot \log(2\pi) + n \cdot \log(\sigma^2) + \log|V| + \frac{1}{\sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta)
+$$
+
+which, applying the equalities $\tau = 1/\sigma^2$ and $P = V^{-1}$, becomes
+
+$$ \label{eq:mlr-dev-s2}
+D(\beta,\tau) = n \cdot \log(2\pi) - n \cdot \log(\tau) - \log|P| + \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \; .
+$$
+
+The [deviance information criterion](/D/dic) (DIC) is defined as
+
+$$ \label{eq:dic}
+\mathrm{DIC}(m) = -2 \log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m) + 2 \, p_D
+$$
+
+where $\log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m)$ is the [log-likelihood function](/D/mlr-mll) at the posterior [expectations](/D/mean) and the "effective number of parameters" $p_D$ is the [difference between the expectation of the deviance and the deviance at the expectation](/D/dic):
+
+$$ \label{eq:dic-pD}
+p_D = \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \; .
+$$
+
+With that, the DIC for multiple linear regression becomes:
+
+$$ \label{eq:mlr-dic-s1}
+\begin{split}
+\mathrm{DIC}(m) &= -2 \log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m) + 2 \, p_D \\
+&= D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) + 2 \left[ \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \right] \\
+&= 2 \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \; .
+\end{split}
+$$
+
+The [posterior distribution for multiple linear regression](/P/blr-post) is
+
+$$ \label{eq:blr-post}
+p(\beta,\tau|y) = \mathcal{N}(\beta; \mu_n, (\tau \Lambda_n)^{-1}) \cdot \mathrm{Gam}(\tau; a_n, b_n)
+$$
+
+where the [posterior hyperparameters](/D/post) are given by
+
+$$ \label{eq:blr-post-par}
+\begin{split}
+\mu_n &= \Lambda_n^{-1} (X^\mathrm{T} P y + \Lambda_0 \mu_0) \\
+\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \\
+a_n &= a_0 + \frac{n}{2} \\
+b_n &= b_0 + \frac{1}{2} (y^\mathrm{T} P y + \mu_0^\mathrm{T} \Lambda_0 \mu_0 - \mu_n^\mathrm{T} \Lambda_n \mu_n) \; .
+\end{split}
+$$
+
+Thus, we have the following posterior expectations:
+
+$$ \label{eq:blr-post-beta}
+\left\langle \beta \right\rangle_{\beta,\tau|y} = \mu_n
+$$
+
+$$ \label{eq:blr-post-tau}
+\left\langle \tau \right\rangle_{\beta,\tau|y} = \frac{a_n}{b_n}
+$$
+
+$$ \label{eq:blr-post-log-tau}
+\left\langle \log \tau \right\rangle_{\beta,\tau|y} = \psi(a_n) - \log(b_n)
+$$
+
+$$ \label{eq:blr-post-beta-qf}
+\begin{split}
+\left\langle \beta^\mathrm{T} A \beta \right\rangle_{\beta|\tau,y} &= \mu_n^\mathrm{T} A \mu_n + \mathrm{tr}\left( A (\tau \Lambda_n)^{-1} \right) \\
+&= \mu_n^\mathrm{T} A \mu_n + \frac{1}{\tau} \mathrm{tr}\left( A \Lambda_n^{-1} \right) \; .
+\end{split}
+$$
+
+In these identities, we have used the [mean of the multivariate normal distribution](/P/mvn-mean), the [mean of the gamma distribution](/P/gam-mean), the [logarithmic expectation of the gamma distribution](/P/gam-logmean), the [expectation of a quadratic form](/P/mean-qf) and the [covariance of the multivariate normal distribution](/P/mvn-cov).
+
+With that, the deviance at the expectation is:
+
+$$ \label{eq:mlr-dev-exp}
+\begin{split}
+D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) &\overset{\eqref{eq:mlr-dev-s2}}{=} n \cdot \log(2\pi) - n \cdot \log(\left\langle \tau \right\rangle) - \log|P| + \tau \cdot (y - X\left\langle \beta \right\rangle)^\mathrm{T} P (y - X\left\langle \beta \right\rangle) \\
+&\overset{\eqref{eq:blr-post-beta}}{=} n \cdot \log(2\pi) - n \cdot \log(\left\langle \tau \right\rangle) - \log|P| + \tau \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \\
+&\overset{\eqref{eq:blr-post-tau}}{=} n \cdot \log(2\pi) - n \cdot \log\left(\frac{a_n}{b_n}\right) - \log|P| + \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \; .
+\end{split}
+$$
+
+Moreover, the expectation of the deviance is:
+
+$$ \label{eq:mlr-exp-dev}
+\begin{split}
+\left\langle D(\beta,\tau) \right\rangle &\overset{\eqref{eq:mlr-dev-s2}}{=} \left\langle n \cdot \log(2\pi) - n \cdot \log(\tau) - \log|P| + \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle \\
+&= n \cdot \log(2\pi) - n \cdot \left\langle \log(\tau) \right\rangle - \log|P| + \left\langle \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle \\
+&\overset{\eqref{eq:blr-post-log-tau}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \left\langle \tau \cdot \left\langle (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle_{\beta|\tau,y} \right\rangle_{\tau|y} \\
+&= n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \left\langle \tau \cdot \left\langle y^\mathrm{T} P y - y^\mathrm{T} P X\beta - \beta^\mathrm{T} X^\mathrm{T} P y + \beta^\mathrm{T} X^\mathrm{T} P X \beta \right\rangle_{\beta|\tau,y} \right\rangle_{\tau|y} \\
+&\overset{\eqref{eq:blr-post-beta-qf}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \left\langle \tau \cdot \left[ y^\mathrm{T} P y - y^\mathrm{T} P X\mu_n - \mu_n^\mathrm{T} X^\mathrm{T} P y + \mu_n^\mathrm{T} X^\mathrm{T} P X \mu_n  + \frac{1}{\tau} \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \right] \right\rangle_{\tau|y} \\
+&= n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \left\langle \tau \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \right\rangle_{\tau|y} + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \\
+&\overset{\eqref{eq:blr-post-tau}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \; .
+\end{split}
+$$
+
+Finally, combining the two terms, we have:
+
+$$ \label{eq:mlr-dic-s2}
+\begin{split}
+\mathrm{DIC}(m) &\overset{\eqref{eq:mlr-dic-s1}}{=} 2 \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \\
+&\overset{\eqref{eq:mlr-exp-dev}}{=} 2 \left[ n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \right. \\
+&+ \left. \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \right] \\
+&\overset{\eqref{eq:mlr-dev-exp}}{-} \left[ n \cdot \log(2\pi) - n \cdot \log\left(\frac{a_n}{b_n}\right) - \log|P| + \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \right] \\
+&= n \cdot \log(2\pi) - 2 n \psi(a_n) + 2 n \log(b_n) + n \log(a_n) - \log(b_n) - \log|P| \\
+&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \\
+&= n \cdot \log(2\pi) - n \left[ 2 \psi(a_n) - \log(a_n) - \log(b_n) \right] - \log|P| \\
+&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \; .
+\end{split}
+$$
+
+This conforms to equation \eqref{eq:mlr-dic}.
diff --git a/P/mlr-dev.md b/P/mlr-dev.md
@@ -0,0 +1,104 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-03-01 08:42:00
+
+title: "Deviance for multiple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Multiple linear regression"
+theorem: "Deviance function"
+
+sources:
+
+proof_id: "P312"
+shortcut: "mlr-dev"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Consider a [linear regression model](/D/mlr) $m$ with [correlation structure](/D/corrmat) $V$
+
+$$ \label{eq:mlr}
+m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; .
+$$
+
+Then, the [deviance](/D/dev) for this model is
+
+$$ \label{eq:mlr-dev-v1}
+D(\beta,\sigma^2) = \mathrm{RSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right]
+$$
+
+under [uncorrelated observations](/D/mlr), i.e. if $V = I_n$, and
+
+$$ \label{eq:mlr-dev-v2}
+D(\beta,\sigma^2) = \mathrm{wRSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] + \log|V| \; ,
+$$
+
+in the general case, i.e. if $V \neq I_n$, where $\mathrm{RSS}$ is the [residual sum of squares](/D/rss) and $\mathrm{wRSS}$ is the [weighted residual sum of squares](/P/mlr-wls2).
+
+
+**Proof:** The [likelihood function](/D/lf) for multiple linear regression [is given by](/P/mlr-mle)
+
+$$ \label{eq:mlr-lf}
+\begin{split}
+p(y|\beta,\sigma^2) &= \mathcal{N}(y; X\beta, \sigma^2 V) \\
+&= \sqrt{\frac{1}{(2\pi)^n |\sigma^2 V|}} \cdot \exp\left[ -\frac{1}{2} (y - X\beta)^\mathrm{T} (\sigma^2 V)^{-1} (y - X\beta) \right] \; ,
+\end{split}
+$$
+
+such that, with $\lvert \sigma^2 V \rvert = (\sigma^2)^n \lvert V \rvert$, the [log-likelihood function](/D/llf) for this model [becomes](/P/mlr-mle)
+
+$$ \label{eq:mlr-llf}
+\begin{split}
+\mathrm{LL}(\beta,\sigma^2) = &\log p(y|\beta,\sigma^2) \\
+= &- \frac{n}{2} \log(2\pi) - \frac{n}{2} \log (\sigma^2) - \frac{1}{2} \log |V| - \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) \; .
+\end{split}
+$$
+
+
+The last term can be expressed in terms of the (weighted) [residual sum of squares](/D/rss) as
+
+$$ \label{eq:mll-rss}
+\begin{split}
+- \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) &= - \frac{1}{2 \sigma^2} (Wy-WX\beta)^\mathrm{T} (Wy-WX\beta) \\
+&= - \frac{1}{2 \sigma^2} \left( \frac{1}{n} \sum_{i=1}^{n} (W\varepsilon)_i^2 \right) = - \frac{\mathrm{wRSS}}{2 \sigma^2}
+\end{split}
+$$
+
+where $W = V^{-1/2}$. Plugging \eqref{eq:mll-rss} into \eqref{eq:mlr-llf} and multiplying with $-2$, we obtain the [deviance](/D/dev) as
+
+$$ \label{eq:mlr-dev-v2-qed}
+\begin{split}
+D(\beta,\sigma^2) &= -2 \, \mathrm{LL}(\beta,\sigma^2) \\
+&= -2 \left( - \frac{\mathrm{wRSS}}{2 \sigma^2} - \frac{n}{2} \log (\sigma^2) - \frac{n}{2} \log(2\pi) - \frac{1}{2} \log |V| \right) \\
+&= \mathrm{wRSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] + \log|V|
+\end{split}
+$$
+
+which proves the result in \eqref{eq:mlr-dev-v2}. Assuming $V = I_n$, we have
+
+$$ \label{eq:mll-rss-iid}
+\begin{split}
+- \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) &= - \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} (y - X\beta) \\
+&= - \frac{1}{2 \sigma^2} \left( \frac{1}{n} \sum_{i=1}^{n} \varepsilon_i^2 \right) = - \frac{\mathrm{RSS}}{2 \sigma^2}
+\end{split}
+$$
+
+and
+
+$$ \label{eq:mlr-logdet-V-iid}
+\frac{1}{2} \log|V| = \frac{1}{2} \log|I_n| = \frac{1}{2} \log 1 = 0 \; ,
+$$
+
+such that
+
+$$ \label{eq:mlr-mll-v1-qed}
+D(\beta,\sigma^2) = \mathrm{RSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right]
+$$
+
+which proves the result in \eqref{eq:mlr-dev-v1}. This completes the proof.