Skip to content

Latest commit

 

History

History
81 lines (57 loc) · 2.59 KB

mult-map.md

File metadata and controls

81 lines (57 loc) · 2.59 KB
layout mathjax author affiliation e_mail date title chapter section topic theorem sources proof_id shortcut username
proof
true
Joram Soch
BCCN Berlin
joram.soch@bccn-berlin.de
2023-12-08 07:14:47 -0800
Maximum-a-posteriori estimation for multinomial observations
Statistical Models
Count data
Multinomial observations
Maximum-a-posteriori estimation
P428
mult-map
JoramSoch

Theorem: Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a multinomial distribution:

$$ \label{eq:Mult} y \sim \mathrm{Mult}(n,p) ; . $$

Moreover, assume a Dirichlet prior distribution over the model parameter $p$:

$$ \label{eq:Mult-prior} \mathrm{p}(p) = \mathrm{Dir}(p; \alpha_0) ; . $$

Then, the maximum-a-posteriori estimates of $p$ are

$$ \label{eq:Mult-MAP} \hat{p}\mathrm{MAP} = \frac{\alpha_0+y-1}{\sum{j=1}^k \alpha_{0j} + n - k} ; . $$

Proof: Given the prior distribution in \eqref{eq:Mult-prior}, the posterior distribution for multinomial observations is also a Dirichlet distribution

$$ \label{eq:Mult-post} \mathrm{p}(p|y) = \mathrm{Dir}(p; \alpha_n) $$

where the posterior hyperparameters are equal to

$$ \label{eq:Mult-post-par} \alpha_{nj} = \alpha_{0j} + y_j, ; j = 1,\ldots,k ; . $$

The mode of the Dirichlet distribution is given by:

$$ \label{eq:Dir-mode} X \sim \mathrm{Dir}(\alpha) \quad \Rightarrow \quad \mathrm{mode}(X_i) = \frac{\alpha_i-1}{\sum_j \alpha_j - k} ; . $$

Applying \eqref{eq:Dir-mode} to \eqref{eq:Mult-post} with \eqref{eq:Mult-post-par}, the maximum-a-posteriori estimates of $p$ follow as

$$ \label{eq:Mult-MAP-s1} \begin{split} \hat{p}{i,\mathrm{MAP}} &= \frac{\alpha{ni} - 1}{\sum_j \alpha_{nj} - k} \ &\overset{\eqref{eq:Mult-post-par}}{=} \frac{\alpha_{0i} + y_i - 1}{\sum_j (\alpha_{0j} + y_j) - k} \ &= \frac{\alpha_{0i} + y_i - 1}{\sum_j \alpha_{0j} + \sum_j y_j - k} ; . \end{split} $$

Since $y_1 + \ldots + y_k = n$ by definition, this becomes

$$ \label{eq:Mult-MAP-s2} \hat{p}{i,\mathrm{MAP}} = \frac{\alpha{0i} + y_i - 1}{\sum_j \alpha_{0j} + n - k} $$

which, using the $1 \times k$ vectors $y$, $p$ and $\alpha_0$, can be written as:

$$ \label{eq:Mult-MAP-qed} \hat{p}\mathrm{MAP} = \frac{\alpha_0+y-1}{\sum{j=1}^k \alpha_{0j} + n - k} ; . $$