diff --git a/D/dev.md b/D/dev.md new file mode 100644 index 00000000..644b5646 --- /dev/null +++ b/D/dev.md @@ -0,0 +1,50 @@ +--- +layout: definition +mathjax: true + +author: "Joram Soch" +affiliation: "BCCN Berlin" +e_mail: "joram.soch@bccn-berlin.de" +date: 2022-03-01 07:48:00 + +title: "Deviance information criterion" +chapter: "Model Selection" +section: "Classical information criteria" +topic: "Deviance information criterion" +definition: "Deviance" + +sources: + - authors: "Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A" + year: 2002 + title: "Bayesian measures of model complexity and fit" + in: "Journal of the Royal Statistical Society, Series B: Statistical Methodology" + pages: "vol. 64, iss. 4, pp. 583-639" + url: "https://rss.onlinelibrary.wiley.com/doi/10.1111/1467-9868.00353" + doi: "10.1111/1467-9868.00353" + - authors: "Soch J, Allefeld C" + year: 2018 + title: "MACS – a new SPM toolbox for model assessment, comparison and selection" + in: "Journal of Neuroscience Methods" + pages: "vol. 306, pp. 19-31, eqs. 10-12" + url: "https://www.sciencedirect.com/science/article/pii/S0165027018301468" + doi: "10.1016/j.jneumeth.2018.05.017" + - authors: "Wikipedia" + year: 2022 + title: "Deviance information criterion" + in: "Wikipedia, the free encyclopedia" + pages: "retrieved on 2022-03-01" + url: "https://en.wikipedia.org/wiki/Deviance_information_criterion#Definition" + +def_id: "D172" +shortcut: "dev" +username: "JoramSoch" +--- + + +**Definition:** Let there be a [generative model](/D/gm) $m$ describing measured data $y$ using model parameters $\theta$. Then, the deviance of $m$ is a function of $\theta$ which multiplies the [log-likelihood function](/D/llf) with $-2$: + +$$ \label{eq:dev} +D(\theta) = -2 \log p(y|\theta,m) \; . +$$ + +The deviance function serves the definition of the [deviance information criterion](/D/dic). \ No newline at end of file diff --git a/I/ToC.md b/I/ToC.md index cf44eb52..c202ac1b 100644 --- a/I/ToC.md +++ b/I/ToC.md @@ -548,16 +548,18 @@ title: "Table of Contents"    1.4.14. **[Weighted least squares](/P/mlr-wls2)** (2)
   1.4.15. **[Maximum likelihood estimation](/P/mlr-mle)**
   1.4.16. **[Maximum log-likelihood](/P/mlr-mll)**
-    1.4.17. **[Akaike information criterion](/P/mlr-aic)**
-    1.4.18. **[Bayesian information criterion](/P/mlr-bic)**
-    1.4.19. **[Corrected Akaike information criterion](/P/mlr-aicc)**
+    1.4.17. **[Deviance function](/P/mlr-dev)**
+    1.4.18. **[Akaike information criterion](/P/mlr-aic)**
+    1.4.19. **[Bayesian information criterion](/P/mlr-bic)**
+    1.4.20. **[Corrected Akaike information criterion](/P/mlr-aicc)**
1.5. Bayesian linear regression
   1.5.1. **[Conjugate prior distribution](/P/blr-prior)**
   1.5.2. **[Posterior distribution](/P/blr-post)**
   1.5.3. **[Log model evidence](/P/blr-lme)**
-    1.5.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)**
-    1.5.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)**
+    1.5.4. **[Deviance information criterion](/P/blr-dic)**
+    1.5.5. **[Posterior probability of alternative hypothesis](/P/blr-pp)**
+    1.5.6. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)**
2. Multivariate normal data @@ -664,6 +666,7 @@ title: "Table of Contents" 2.3. Deviance information criterion
   2.3.1. *[Definition](/D/dic)*
+    2.3.2. *[Deviance](/D/dev)*
3. Bayesian model selection diff --git a/P/blr-dic.md b/P/blr-dic.md new file mode 100644 index 00000000..83f1e179 --- /dev/null +++ b/P/blr-dic.md @@ -0,0 +1,166 @@ +--- +layout: proof +mathjax: true + +author: "Joram Soch" +affiliation: "BCCN Berlin" +e_mail: "joram.soch@bccn-berlin.de" +date: 2022-03-01 12:10:00 + +title: "Deviance information criterion for multiple linear regression" +chapter: "Statistical Models" +section: "Univariate normal data" +topic: "Bayesian linear regression" +theorem: "Deviance information criterion" + +sources: + +proof_id: "P313" +shortcut: "blr-dic" +username: "JoramSoch" +--- + + +**Theorem:** Consider a [linear regression model](/D/mlr) $m$ + +$$ \label{eq:mlr} +m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V), \; \sigma^2 V = (\tau P)^{-1} +$$ + +with a [normal-gamma prior distribution](/P/blr-prior) + +$$ \label{eq:blr-prior} +p(\beta,\tau) = \mathcal{N}(\beta; \mu_0, (\tau \Lambda_0)^{-1}) \cdot \mathrm{Gam}(\tau; a_0, b_0) \; . +$$ + +Then, the [deviance information criterion](/D/dic) for this model is + +$$ \label{eq:mlr-dic} +\begin{split} +\mathrm{DIC}(m) &= n \cdot \log(2\pi) - n \left[ 2 \psi(a_n) - \log(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) +\end{split} +$$ + +where $\mu_n$ and $\Lambda_n$ as well as $a_n$ and $b_n$ are [posterior parameters](/D/post) describing the [posterior distribution in Bayesian linear regression](/P/blr-post). + + +**Proof:** The [deviance for multiple linear regression](/P/mlr-dev) is + +$$ \label{eq:mlr-dev-s1} +D(\beta,\sigma^2) = n \cdot \log(2\pi) + n \cdot \log(\sigma^2) + \log|V| + \frac{1}{\sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) +$$ + +which, applying the equalities $\tau = 1/\sigma^2$ and $P = V^{-1}$, becomes + +$$ \label{eq:mlr-dev-s2} +D(\beta,\tau) = n \cdot \log(2\pi) - n \cdot \log(\tau) - \log|P| + \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \; . +$$ + +The [deviance information criterion](/D/dic) (DIC) is defined as + +$$ \label{eq:dic} +\mathrm{DIC}(m) = -2 \log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m) + 2 \, p_D +$$ + +where $\log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m)$ is the [log-likelihood function](/D/mlr-mll) at the posterior [expectations](/D/mean) and the "effective number of parameters" $p_D$ is the [difference between the expectation of the deviance and the deviance at the expectation](/D/dic): + +$$ \label{eq:dic-pD} +p_D = \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \; . +$$ + +With that, the DIC for multiple linear regression becomes: + +$$ \label{eq:mlr-dic-s1} +\begin{split} +\mathrm{DIC}(m) &= -2 \log p(y|\left\langle \beta \right\rangle, \left\langle \tau \right\rangle, m) + 2 \, p_D \\ +&= D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) + 2 \left[ \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \right] \\ +&= 2 \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \; . +\end{split} +$$ + +The [posterior distribution for multiple linear regression](/P/blr-post) is + +$$ \label{eq:blr-post} +p(\beta,\tau|y) = \mathcal{N}(\beta; \mu_n, (\tau \Lambda_n)^{-1}) \cdot \mathrm{Gam}(\tau; a_n, b_n) +$$ + +where the [posterior hyperparameters](/D/post) are given by + +$$ \label{eq:blr-post-par} +\begin{split} +\mu_n &= \Lambda_n^{-1} (X^\mathrm{T} P y + \Lambda_0 \mu_0) \\ +\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \\ +a_n &= a_0 + \frac{n}{2} \\ +b_n &= b_0 + \frac{1}{2} (y^\mathrm{T} P y + \mu_0^\mathrm{T} \Lambda_0 \mu_0 - \mu_n^\mathrm{T} \Lambda_n \mu_n) \; . +\end{split} +$$ + +Thus, we have the following posterior expectations: + +$$ \label{eq:blr-post-beta} +\left\langle \beta \right\rangle_{\beta,\tau|y} = \mu_n +$$ + +$$ \label{eq:blr-post-tau} +\left\langle \tau \right\rangle_{\beta,\tau|y} = \frac{a_n}{b_n} +$$ + +$$ \label{eq:blr-post-log-tau} +\left\langle \log \tau \right\rangle_{\beta,\tau|y} = \psi(a_n) - \log(b_n) +$$ + +$$ \label{eq:blr-post-beta-qf} +\begin{split} +\left\langle \beta^\mathrm{T} A \beta \right\rangle_{\beta|\tau,y} &= \mu_n^\mathrm{T} A \mu_n + \mathrm{tr}\left( A (\tau \Lambda_n)^{-1} \right) \\ +&= \mu_n^\mathrm{T} A \mu_n + \frac{1}{\tau} \mathrm{tr}\left( A \Lambda_n^{-1} \right) \; . +\end{split} +$$ + +In these identities, we have used the [mean of the multivariate normal distribution](/P/mvn-mean), the [mean of the gamma distribution](/P/gam-mean), the [logarithmic expectation of the gamma distribution](/P/gam-logmean), the [expectation of a quadratic form](/P/mean-qf) and the [covariance of the multivariate normal distribution](/P/mvn-cov). + +With that, the deviance at the expectation is: + +$$ \label{eq:mlr-dev-exp} +\begin{split} +D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) &\overset{\eqref{eq:mlr-dev-s2}}{=} n \cdot \log(2\pi) - n \cdot \log(\left\langle \tau \right\rangle) - \log|P| + \tau \cdot (y - X\left\langle \beta \right\rangle)^\mathrm{T} P (y - X\left\langle \beta \right\rangle) \\ +&\overset{\eqref{eq:blr-post-beta}}{=} n \cdot \log(2\pi) - n \cdot \log(\left\langle \tau \right\rangle) - \log|P| + \tau \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \\ +&\overset{\eqref{eq:blr-post-tau}}{=} n \cdot \log(2\pi) - n \cdot \log\left(\frac{a_n}{b_n}\right) - \log|P| + \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \; . +\end{split} +$$ + +Moreover, the expectation of the deviance is: + +$$ \label{eq:mlr-exp-dev} +\begin{split} +\left\langle D(\beta,\tau) \right\rangle &\overset{\eqref{eq:mlr-dev-s2}}{=} \left\langle n \cdot \log(2\pi) - n \cdot \log(\tau) - \log|P| + \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle \\ +&= n \cdot \log(2\pi) - n \cdot \left\langle \log(\tau) \right\rangle - \log|P| + \left\langle \tau \cdot (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle \\ +&\overset{\eqref{eq:blr-post-log-tau}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \left\langle \tau \cdot \left\langle (y - X\beta)^\mathrm{T} P (y - X\beta) \right\rangle_{\beta|\tau,y} \right\rangle_{\tau|y} \\ +&= n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \left\langle \tau \cdot \left\langle y^\mathrm{T} P y - y^\mathrm{T} P X\beta - \beta^\mathrm{T} X^\mathrm{T} P y + \beta^\mathrm{T} X^\mathrm{T} P X \beta \right\rangle_{\beta|\tau,y} \right\rangle_{\tau|y} \\ +&\overset{\eqref{eq:blr-post-beta-qf}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \left\langle \tau \cdot \left[ y^\mathrm{T} P y - y^\mathrm{T} P X\mu_n - \mu_n^\mathrm{T} X^\mathrm{T} P y + \mu_n^\mathrm{T} X^\mathrm{T} P X \mu_n + \frac{1}{\tau} \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \right] \right\rangle_{\tau|y} \\ +&= n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \left\langle \tau \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \right\rangle_{\tau|y} + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \\ +&\overset{\eqref{eq:blr-post-tau}}{=} n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \; . +\end{split} +$$ + +Finally, combining the two terms, we have: + +$$ \label{eq:mlr-dic-s2} +\begin{split} +\mathrm{DIC}(m) &\overset{\eqref{eq:mlr-dic-s1}}{=} 2 \left\langle D(\beta,\tau) \right\rangle - D(\left\langle \beta \right\rangle, \left\langle \tau \right\rangle) \\ +&\overset{\eqref{eq:mlr-exp-dev}}{=} 2 \left[ n \cdot \log(2\pi) - n \cdot \left[ \psi(a_n) - \log(b_n) \right] - \log|P| \right. \\ +&+ \left. \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \right] \\ +&\overset{\eqref{eq:mlr-dev-exp}}{-} \left[ n \cdot \log(2\pi) - n \cdot \log\left(\frac{a_n}{b_n}\right) - \log|P| + \frac{a_n}{b_n} \cdot (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) \right] \\ +&= n \cdot \log(2\pi) - 2 n \psi(a_n) + 2 n \log(b_n) + n \log(a_n) - \log(b_n) - \log|P| \\ +&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \\ +&= n \cdot \log(2\pi) - n \left[ 2 \psi(a_n) - \log(a_n) - \log(b_n) \right] - \log|P| \\ +&+ \frac{a_n}{b_n} (y - X\mu_n)^\mathrm{T} P (y - X\mu_n) + \mathrm{tr}\left( X^\mathrm{T} P X \Lambda_n^{-1} \right) \; . +\end{split} +$$ + +This conforms to equation \eqref{eq:mlr-dic}. \ No newline at end of file diff --git a/P/mlr-dev.md b/P/mlr-dev.md new file mode 100644 index 00000000..eb3fe4d8 --- /dev/null +++ b/P/mlr-dev.md @@ -0,0 +1,104 @@ +--- +layout: proof +mathjax: true + +author: "Joram Soch" +affiliation: "BCCN Berlin" +e_mail: "joram.soch@bccn-berlin.de" +date: 2022-03-01 08:42:00 + +title: "Deviance for multiple linear regression" +chapter: "Statistical Models" +section: "Univariate normal data" +topic: "Multiple linear regression" +theorem: "Deviance function" + +sources: + +proof_id: "P312" +shortcut: "mlr-dev" +username: "JoramSoch" +--- + + +**Theorem:** Consider a [linear regression model](/D/mlr) $m$ with [correlation structure](/D/corrmat) $V$ + +$$ \label{eq:mlr} +m: \; y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; . +$$ + +Then, the [deviance](/D/dev) for this model is + +$$ \label{eq:mlr-dev-v1} +D(\beta,\sigma^2) = \mathrm{RSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] +$$ + +under [uncorrelated observations](/D/mlr), i.e. if $V = I_n$, and + +$$ \label{eq:mlr-dev-v2} +D(\beta,\sigma^2) = \mathrm{wRSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] + \log|V| \; , +$$ + +in the general case, i.e. if $V \neq I_n$, where $\mathrm{RSS}$ is the [residual sum of squares](/D/rss) and $\mathrm{wRSS}$ is the [weighted residual sum of squares](/P/mlr-wls2). + + +**Proof:** The [likelihood function](/D/lf) for multiple linear regression [is given by](/P/mlr-mle) + +$$ \label{eq:mlr-lf} +\begin{split} +p(y|\beta,\sigma^2) &= \mathcal{N}(y; X\beta, \sigma^2 V) \\ +&= \sqrt{\frac{1}{(2\pi)^n |\sigma^2 V|}} \cdot \exp\left[ -\frac{1}{2} (y - X\beta)^\mathrm{T} (\sigma^2 V)^{-1} (y - X\beta) \right] \; , +\end{split} +$$ + +such that, with $\lvert \sigma^2 V \rvert = (\sigma^2)^n \lvert V \rvert$, the [log-likelihood function](/D/llf) for this model [becomes](/P/mlr-mle) + +$$ \label{eq:mlr-llf} +\begin{split} +\mathrm{LL}(\beta,\sigma^2) = &\log p(y|\beta,\sigma^2) \\ += &- \frac{n}{2} \log(2\pi) - \frac{n}{2} \log (\sigma^2) - \frac{1}{2} \log |V| - \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) \; . +\end{split} +$$ + + +The last term can be expressed in terms of the (weighted) [residual sum of squares](/D/rss) as + +$$ \label{eq:mll-rss} +\begin{split} +- \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) &= - \frac{1}{2 \sigma^2} (Wy-WX\beta)^\mathrm{T} (Wy-WX\beta) \\ +&= - \frac{1}{2 \sigma^2} \left( \frac{1}{n} \sum_{i=1}^{n} (W\varepsilon)_i^2 \right) = - \frac{\mathrm{wRSS}}{2 \sigma^2} +\end{split} +$$ + +where $W = V^{-1/2}$. Plugging \eqref{eq:mll-rss} into \eqref{eq:mlr-llf} and multiplying with $-2$, we obtain the [deviance](/D/dev) as + +$$ \label{eq:mlr-dev-v2-qed} +\begin{split} +D(\beta,\sigma^2) &= -2 \, \mathrm{LL}(\beta,\sigma^2) \\ +&= -2 \left( - \frac{\mathrm{wRSS}}{2 \sigma^2} - \frac{n}{2} \log (\sigma^2) - \frac{n}{2} \log(2\pi) - \frac{1}{2} \log |V| \right) \\ +&= \mathrm{wRSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] + \log|V| +\end{split} +$$ + +which proves the result in \eqref{eq:mlr-dev-v2}. Assuming $V = I_n$, we have + +$$ \label{eq:mll-rss-iid} +\begin{split} +- \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} V^{-1} (y - X\beta) &= - \frac{1}{2 \sigma^2} (y - X\beta)^\mathrm{T} (y - X\beta) \\ +&= - \frac{1}{2 \sigma^2} \left( \frac{1}{n} \sum_{i=1}^{n} \varepsilon_i^2 \right) = - \frac{\mathrm{RSS}}{2 \sigma^2} +\end{split} +$$ + +and + +$$ \label{eq:mlr-logdet-V-iid} +\frac{1}{2} \log|V| = \frac{1}{2} \log|I_n| = \frac{1}{2} \log 1 = 0 \; , +$$ + +such that + +$$ \label{eq:mlr-mll-v1-qed} +D(\beta,\sigma^2) = \mathrm{RSS}/\sigma^2 + n \cdot \left[ \log(\sigma^2) + \log(2\pi) \right] +$$ + +which proves the result in \eqref{eq:mlr-dev-v1}. This completes the proof. \ No newline at end of file