# Problem Set 10, Part Two: Due Tuesday, April 22 by 8am Eastern Daylight Time

## Name: David MIllard

**Show your work on all problems!** Be sure to give credit to any
collaborators, or outside sources used in solving the problems. Note
that if using an outside source to do a calculation, you should use it
as a reference for the method, and actually carry out the calculation
yourself; it’s not sufficient to quote the results of a calculation
contained in an outside source.

Fill in your solutions in the notebook below, inserting markdown and/or code cells as needed.  Try to do reasonably well with the typesetting, but don't feel compelled to replicate my formatting exactly.  **You do NOT need to make random variables blue!**

In [None]:
%matplotlib inline

In [None]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (8.0,5.0)
plt.rcParams['font.size'] = 14

## Bayesian Approach

From a Bayesian perspective, the question of whether the row and/or
column totals are held fixed when calculating a $p$-value is an
irrelevant one, because Bayesian probabilities concern statements about
a model given the actual observed data, not statements about what data
might have been observed in a hypothetical repeated experiment. So for a
$2\times 2$ contingency table, if we write $P(H_0|\{O_{ij}\})$ as the
posterior probability of $H_0$ being true given the observations, it
doesn’t matter if we also condition on the row or column totals, since
they are automatically given by the values in the table itself, e.g.,
$P(H_0|N,\{O_{ij}\})=P(H_0|\{r_i\},\{O_{ij}\})P(H_0|\{O_{ij}\})$. But it
turns out that the context of the experiment does still matter, because
it defines the meaning of the hypotheses. One standard quantity in
Bayesian hypothesis testing is the *Bayes factor*
$p({{\mathbf{x}}}|H_a)/p({{\mathbf{x}}}|H_0)$ which measures how
strongly the data favor the alternative hypothesis $H_a$ over $H_0$. The
“evidence” $p({{\mathbf{x}}}|H)$ associated with hypothesis is like a
sampling distribution, but it is appropriately averaged over possible
parameter values according to a prescription included in $H$. Suppose
the categorical observations ${{\mathbf{x}}}=\{(x_I,y_I)|I=1,\ldots,N\}$
are independent (which rules out the “lady tasting tea” scenario) so
that the sampling distribution for the sequence of observations (which
eliminates combinatorical factors which would cancel out anyway) is
$$p({{\mathbf{x}}}|\{p_{ij}\}) = p_{11}^{O_{11}}p_{12}^{O_{12}}p_{21}^{O_{21}}p_{22}^{O_{22}}$$
Evaluate the following, both for general $\{O_{ij}\}$ (use $\{r_i\}$,
$\{c_j\}$, and $N$ as appropriate to simplify your answer), and for the
example considered in class, where $O_{11}=1$, $O_{12}=6$, $O_{21}=8$,
and $O_{22}=2$.

\begin{align}
r_1 &= O_{11} + O_{12} = 1 + 6 = 7 \\
r_2 &= O_{21} + O_{22} = 8 + 2 = 10 \\
c_1 &= O_{11} + O_{21} = 1 + 8 = 9 \\
c_2 &= O_{12} + O_{22} = 6 + 2 = 8 \\
N &= r_1 + r_2 = 17
\end{align}

**(a)** The evidence $$p({{\mathbf{x}}}|H_0)
  = \int_0^1\int_0^1 p({{\mathbf{x}}}|\{p_{ij}\})\,dp_{1\bullet}\,dp_{\bullet 1}$$
for a model $H_0$ in which the probability for an observation to land in
row $i$ and column $j$ is $p_{ij}=p_{i\bullet}\,p_{\bullet j}$, where
$p_{2\bullet}=1-p_{1\bullet}$ and $p_{\bullet 2}=1-p_{\bullet 1}$, and
the model assigns a uniform distribution to the parameters
$p_{1\bullet}$ and $p_{\bullet 1}$. You may find the Beta function
identity $\int_0^1 u^k(1-u)^\ell\,du=\frac{k!\ell!}{(k+\ell+1)!}$
useful.

\begin{align}
p(\mathbf{x} \mid H_0) 
&= \int_0^1 \int_0^1 p(\mathbf{x} \mid \{p_{ij}\}) \, dp_{1\bullet} \, dp_{\bullet 1} \\
&= \int_0^1 p_{1\bullet}^{r_1} (1 - p_{1\bullet})^{r_2} \, dp_{1\bullet} \cdot \int_0^1 p_{\bullet 1}^{c_1} (1 - p_{\bullet 1})^{c_2} \, dp_{\bullet 1} \\
&= \frac{r_1! \, r_2!}{(N+1)!} \cdot \frac{c_1! \, c_2!}{(N+1)!} \\
&= \frac{r_1! \, r_2! \, c_1! \, c_2!}{(N+1)!^2}
\end{align}

\begin{align}
p(\mathbf{x} \mid H_0) 
&= \frac{7! \cdot 10! \cdot 9! \cdot 8!}{(18!)^2}
\end{align}

**(b)** The evidence $$p({{\mathbf{x}}}|H_1)
  = \int_0^1\int_0^1\int_0^1 p({{\mathbf{x}}}|\{p_{ij}\})\,dp_{1\bullet}\,dp^{(1)}_1\,dp^{(2)}_1$$
for a model $H_1$ in which the probability for an observation to land in
row $i$ and column $j$ is $p_{ij}=p_{i\bullet}\,p^{(i)}_j$, where
$p_{2\bullet}=1-p_{1\bullet}$, $p^{(i)}_2=1-p^{(i)}_1$, and the model
assigns a uniform distribution to the parameters
$p_{1\bullet}$, $p^{(1)}_1$, and $p^{(2)}_1$.

\begin{align}
p(\mathbf{x} \mid H_1) 
&= \int_0^1 \int_0^1 \int_0^1 p(\mathbf{x} \mid \{p_{ij}\}) \, dp_{1\bullet} \, dp^{(1)}_1 \, dp^{(2)}_1 \\
&= \int_0^1 \int_0^1 \int_0^1 
(p_{1\bullet} p^{(1)}_1)^{O_{11}} 
(p_{1\bullet} (1 - p^{(1)}_1))^{O_{12}} 
((1 - p_{1\bullet}) p^{(2)}_1)^{O_{21}} 
((1 - p_{1\bullet})(1 - p^{(2)}_1))^{O_{22}} \, 
dp_{1\bullet} \, dp^{(1)}_1 \, dp^{(2)}_1 \\
&= \int_0^1 \int_0^1 \int_0^1 
p_{1\bullet}^{O_{11} + O_{12}} 
(1 - p_{1\bullet})^{O_{21} + O_{22}} 
(p^{(1)}_1)^{O_{11}} (1 - p^{(1)}_1)^{O_{12}} 
(p^{(2)}_1)^{O_{21}} (1 - p^{(2)}_1)^{O_{22}} \,
dp_{1\bullet} \, dp^{(1)}_1 \, dp^{(2)}_1 \\
&= \left[ \int_0^1 p_{1\bullet}^{r_1} (1 - p_{1\bullet})^{r_2} \, dp_{1\bullet} \right]
\left[ \int_0^1 (p^{(1)}_1)^{O_{11}} (1 - p^{(1)}_1)^{O_{12}} \, dp^{(1)}_1 \right]
\left[ \int_0^1 (p^{(2)}_1)^{O_{21}} (1 - p^{(2)}_1)^{O_{22}} \, dp^{(2)}_1 \right] \\
&= \frac{r_1! \, r_2!}{(N+1)!} \cdot \frac{O_{11}! \, O_{12}!}{(O_{11} + O_{12} + 1)!} \cdot \frac{O_{21}! \, O_{22}!}{(O_{21} + O_{22} + 1)!}
\end{align}

\begin{align}
p(\mathbf{x} \mid H_1)
&= \frac{7! \cdot 10!}{18!} \cdot \frac{1! \cdot 6!}{(1 + 6 + 1)!} \cdot \frac{8! \cdot 2!}{(8 + 2 + 1)!} \\
&= \frac{7! \cdot 10! \cdot 1! \cdot 6! \cdot 8! \cdot 2!}{18! \cdot 8! \cdot 11!}
\end{align}

**(c)** The evidence $$p({{\mathbf{x}}}|H_2)
  = 6
  \int_0^{1-p_{11}-p_{12}}\int_0^{1-p_{11}}\int_0^1 p({{\mathbf{x}}}|\{p_{ij}\})
  \,dp_{11}\,dp_{12}\,dp_{21}$$ for a model $H_2$ in which any set of
non-negative probabilities satisfying $p_{11}+p_{12}+p_{21}+p_{22}=1$ is
equally likely. You may find the identity\
$\int_0^{1-u-v}\int_0^{1-u}1\int_0^1 u^k v^\ell w^m
  (1-u-v-w)^n\,du\,dv\,dw=\frac{k!\ell!m!n!}{(k+\ell+m+n+3)!}$ useful.

\begin{align}
p(\mathbf{x} \mid H_2)
&= 6 \int_0^{1 - p_{11} - p_{12}} \int_0^{1 - p_{11}} \int_0^1 
p_{11}^{O_{11}} p_{12}^{O_{12}} p_{21}^{O_{21}} (1 - p_{11} - p_{12} - p_{21})^{O_{22}} 
\, dp_{11} \, dp_{12} \, dp_{21} \\
&= 6 \cdot \frac{O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!}{(O_{11} + O_{12} + O_{21} + O_{22} + 3)!} \\
&= 6 \cdot \frac{O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!}{(N + 3)!}
\end{align}

\begin{align}
p(\mathbf{x} \mid H_2)
&= 6 \cdot \frac{1! \cdot 6! \cdot 8! \cdot 2!}{20!}
\end{align}

**(d)** The Bayes factor $p({{\mathbf{x}}}|H_1)/p({{\mathbf{x}}}|H_0)$, which is
a measure of how much the data favor a model with row-dependent column
probabilities over one with row-independent column probabilities.

\begin{align}
B_{10}
&= \frac{p(\mathbf{x} \mid H_1)}{p(\mathbf{x} \mid H_0)} \\
&= \frac{
    \dfrac{O_{1\bullet}! \, O_{2\bullet}! \, O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!}
          {(N+2)! \, (O_{11} + O_{12} + 1)(O_{21} + O_{22} + 1)}
}{
    \dfrac{O_{1\bullet}! \, O_{2\bullet}! \, O_{\bullet 1}! \, O_{\bullet 2}!}
          {(N+1)! \, O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!}
} \\
&= \frac{
    O_{11}! \, O_{12}! \, O_{21}! \, O_{22}! \cdot (N+1)!
}{
    (N+2)! \cdot (O_{11} + O_{12} + 1)(O_{21} + O_{22} + 1) \cdot O_{\bullet 1}! \cdot O_{\bullet 2}!
}
\end{align}

\begin{align}
B_{10}
= \frac{
    1! \cdot 6! \cdot 8! \cdot 2! \cdot 18!
}{
    19! \cdot (1+6+1)(8+2+1) \cdot 9! \cdot 8!
} 
\end{align}


**(e)** The Bayes factor $p({{\mathbf{x}}}|H_2)/p({{\mathbf{x}}}|H_0)$, which is
a measure of how much the data favor a model of correlated categorical
data over one of uncorrleated data.

\begin{align}
B_{20}
&= \frac{p(\mathbf{x} \mid H_2)}{p(\mathbf{x} \mid H_0)} \\
&= \frac{
    \dfrac{6 \cdot O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!}{(N+3)!}
}{
    \dfrac{O_{1\bullet}! \, O_{2\bullet}! \, O_{\bullet 1}! \, O_{\bullet 2}!}{(N+1)! \, O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!}
} \\
&= \frac{
    6 \cdot (N+1)! \cdot O_{11}! \, O_{12}! \, O_{21}! \, O_{22}!
}{
    (N+3)! \cdot O_{1\bullet}! \, O_{2\bullet}! \, O_{\bullet 1}! \, O_{\bullet 2}!
}
\end{align}

\begin{align}
B_{20}
&= \frac{
    6 \cdot 18! \cdot 1! \cdot 6! \cdot 8! \cdot 2!
}{
    20! \cdot 7! \cdot 10! \cdot 9! \cdot 8!
}
\end{align}