# Supplementary Exercises 4.8

```{warning}
This page contains solutions! We recommend attempting each problem before peeking.
```


```{note}
The old question 1 was a duplicate of [4.3 question 16](https://areding.github.io/6420-pymc/unit4/SupplementaryExercises43.html#counts-of-alpha), so we're going to start numbering with 2 in order to maintain consistency with the old materials.
```

## 2. Mosaic Virus

A single leaf is taken from each of 8 different tobacco plants. Each leaf is then divided in half, and given one of two preparations of mosaic virus. Researchers wanted to examine if there is a difference in the mean number of lesions from the two preparations. Here is the raw data:

$$
\begin{array}{ccc}
\text{Plant} & \text{Prep 1} & \text{Prep 2} \\
1 & 38 & 29 \\
2 & 40 & 35 \\
3 & 26 & 31 \\
4 & 33 & 31 \\
5 & 21 & 14 \\
6 & 27 & 37 \\
7 & 41 & 22 \\
8 & 36 & 25 \\
\end{array}
$$

Assume the normal distribution for the difference between the populations/samples. Using a PPL find:

1. the 95% credible set for $\mu_1 - \mu_2$, and
2. posterior probability of hypothesis $H_1: \mu_1 - \mu_2 \geq 0$.

Use noninformative priors.

Hint: Since this is a paired two sample problem, a single model should be placed on the difference.

```{admonition} Solution
:class: tip, dropdown


```


In [None]:
import pymc as pm
import numpy as np
import arviz as az

prep1 = np.array([38, 40, 26, 33, 21, 27, 41, 36])
prep2 = np.array([29, 35, 31, 31, 14, 37, 22, 25])
diff = prep1 - prep2

with pm.Model() as m:
    tau = pm.Gamma("precision", 0.001, 0.001)
    mu = pm.Normal("mean", 0, tau=0.0001)
    sigma2 = pm.Deterministic("variance", 1 / tau)

    pm.Normal("likelihood", mu, tau=tau, observed=diff)

    trace = pm.sample(3000)

az.summary(trace, hdi_prob=0.95)

## 3. FIGO



```{admonition} Solution
:class: tip, dropdown


```

## 4. Histocompatibility

A patient who is waiting for an organ transplant needs a histocompatible donor who matches the patient’s human leukocyte antigen (HLA) type.

For a given patient, the number of matching donors per 1000 National Blood Bank records is modeled as Poisson with unknown rate $\lambda$. If a randomly selected group of 1000 records showed exactly one match, estimate $\lambda$ in Bayesian fashion.

For $\lambda$​ assume:

1. Gamma $\text{Ga}(\alpha=2, \beta=1)$​ prior;
2. flat prior $\lambda = 1$, for $\lambda > 0$​;
3. invariance prior $\pi(\lambda) = \frac{1}{\lambda}$, for $\lambda > 0$​;
4. Jeffreys prior $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$, for $\lambda > 0$.


```{admonition} Solution
:class: tip, dropdown

1. Gamma $\text{Ga}(\alpha=2, \beta=1)$​ prior;

Poisson PDF $\propto e^{-\lambda}\lambda^k$​​

To shake things up, let's generalize to multiple independent datapoints, even though we only have a single datapoint equalling 1 for this problem.

$\prod_{i=1}^{n} e^{-\lambda}\lambda^{k_i} = e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}$​

Gamma PDF $\propto x^{\alpha -1}e^{-\beta x}$​ 

$$ 

\begin{align*} \pi(\lambda \mid k) &\propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \left(\lambda^{\alpha -1}e^{-\beta \lambda}\right) \\

&\propto \lambda^{(\alpha -1) + \sum_{i=1}^{n} k_i}e^{-\beta \lambda - n\lambda} \\ 

&\propto \lambda^{(\alpha + \sum_{i=1}^{n} k_i) - 1}e^{-(\beta + n)\lambda} \\

&= Ga(\alpha + \sum_{i=1}^{n} k_i, \beta + n) 

\end{align*} 

$$ 

We recognize the $Ga(\alpha + \sum_{i=1}^{n} k_i, \beta + n)$ posterior, which comes out to $Ga(3, 2)$ in this case. Our Bayes estimate is then the mean of the posterior, which is $\frac{\alpha}{\beta} = \frac{3}{2}$.

2. flat prior $\lambda = 1$, for $\lambda > 0$​;

Then go on with the same procedure: 

$$ 

\begin{align*} 

\pi(\lambda \mid k) &\propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \mathbf{1}(\lambda > 0) \\

&\propto \lambda^{\left(\sum_{i=1}^{n} k_i\right) + 1 - 1} e^{- n\lambda} \mathbf{1}(\lambda > 0) \\

&= Ga(1 + \sum_{i=1}^{n} k_i, n) 

\end{align*} 

$$ 

Which in our case would be $Ga(2, 1)$ with a mean of 2.

3. invariance prior $\pi(\lambda) = \frac{1}{\lambda}$, for $\lambda > 0$​; 

$$
\begin{align*}
\pi(\lambda \mid k) & \propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \frac{1}{\lambda}\mathbf{1}(\lambda > 0) \\
&\propto e^{-n\lambda} \lambda^{-1 + \sum_{i=1}^{n} k_i} \mathbf{1}(\lambda > 0) \\
& = Ga(\sum_{i=1}^{n} k_i, n) 
\end{align*}
$$


We identify the $Ga(1, 1)$ distribution, which has a mean of 1. Equivalently, the $Exp(1)$ distribution.

4. Jeffreys prior $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$, for $\lambda > 0$.

$$

\begin{align*} 

\pi(\lambda \mid k) &\propto \left(e^{- n\lambda}\lambda^{\sum_{i=1}^{n} k_i}\right) \times \sqrt{\frac{1}{\lambda}}\mathbf{1}(\lambda > 0) \\

&\propto \lambda^{- 1/2 + \sum_{i=1}^{n} k_i} e^{-n\lambda} \mathbf{1}(\lambda > 0) \\

& \propto \lambda^{- 1 + \left(1/2 + \sum_{i=1}^{n} k_i\right)} e^{-n\lambda} \mathbf{1}(\lambda > 0) \\

&= Ga(1/2 + \sum_{i=1}^{n} k_i, n) 

\end{align*} 

$$ 

In our case the posterior is $Ga(3/2, 1)$ with a mean of $3/2$.

Note that the priors in (b-d) are not proper densities (the integrals are not finite), however, the resulting posteriors are proper.

```

## 5. Neurons Fire in Potter's Lab



```{admonition} Solution
:class: tip, dropdown


```

## 6. Elicit Inverse Gamma Prior



```{admonition} Solution
:class: tip, dropdown


```


## 7. Derive Jeffreys’ Priors for Poisson $\lambda$, **Bernoulli** $p$, **and Geometric** $p$.

Recall that Jeffreys’ prior for parameter $\theta$ in the likelihood $f(x | \theta)$ is defined as

$$\pi(\theta) \propto \left| \text{det}(I(\theta)) \right|^{1/2}$$

where, for univariate parameters,

$$I(\theta) = E \left[ \left( \frac{d \log f(x | \theta)}{d\theta} \right)^2 \right] = -E \left[ \frac{d^2 \log f(x | \theta)}{d\theta^2} \right]$$

and expectation is taken with respect to the random variable $X \sim f(x | \theta)$.

**(a) Show that Jeffreys’ prior for Poisson distribution** $f(x | \lambda) = \frac{\lambda^x}{x!} e^{-\lambda}$, $\lambda \geq 0$, **is** $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$.

**(b) Show that Jeffreys’ prior for Bernoulli distribution** $f(x | p) = p^x (1 - p)^{1-x}$, $0 \leq p \leq 1$, **is** $\pi(p) \propto \frac{1}{\sqrt{p(1-p)}}$, which is the beta $\text{Be}(1/2, 1/2)$ distribution (or Arcsin distribution).

**(c) Show that Jeffreys’ prior for Geometric distribution** $f(x | p) = (1 - p)^{x-1} p$, $x = 1, 2, \ldots$ ; $0 \leq p \leq 1$, **is** $\pi(p) \propto \frac{1}{p \sqrt{1-p}}$.

---

**7. Derive Jeffreys’ Priors for Poisson** $\lambda$, **Bernoulli** $p$, **and Geometric** $p$.

**(a)**
For the Poisson distribution with likelihood function:
$$f(x | \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}$$

First, we differentiate the log-likelihood with respect to $\lambda$:

$$\frac{d}{d\lambda} \log \left( \frac{\lambda^x e^{-\lambda}}{x!} \right) = \frac{d}{d\lambda} (x \log \lambda - \lambda)$$

Which gives:

$$\frac{x}{\lambda} - 1$$

Now, the Fisher Information $I(\lambda)$ is given by:

$$I(\lambda) = E\left[ \left( \frac{x}{\lambda} - 1 \right)^2 \right] = \frac{E[x^2]}{\lambda^2} - \frac{2E[x]}{\lambda} + 1$$

Given that $E[x^2] = Var(x) + (E[x])^2$ and for a Poisson distribution, $E[x] = \lambda$ and $Var(x) = \lambda$:

$$E[x^2] = \lambda + \lambda^2$$

Substituting this in, we get:

$$I(\lambda) = \frac{1}{\lambda}$$

Thus, the Jeffreys’ prior is:

$$\pi(\lambda) \propto \sqrt{\frac{1}{\lambda}}$$

**(b)**
For the Bernoulli distribution:
$$f(x | p) = p^x (1-p)^{1-x}$$

The log-likelihood is:

$$L = x \log(p) + (1 - x) \log(1 - p)$$

Differentiating $L$ with respect to $p$ we get:

$$\frac{\partial L}{\partial p} = \frac{x}{p} - \frac{1-x}{1-p}$$

And the second derivative is:

$$\frac{\partial^2 L}{\partial p^2} = -\frac{x}{p^2} - \frac{1-x}{(1-p)^2}$$

For a Bernoulli distribution, $E[x] = p$. The Fisher Information $I(p)$ is:

$$I(p) = \frac{1}{p(1-p)}$$

So, the Jeffreys’ prior is:

$$\pi(p) \propto \frac{1}{\sqrt{p(1-p)}}$$

**(c)**
For the Geometric distribution:

$$f(x | p) = (1-p)^{x-1} p$$

The log-likelihood is:

$$L = (x-1) \log(1-p) + \log(p)$$

Differentiating $L$ with respect to $p$:

$$\frac{\partial L}{\partial p} = \frac{1}{p} - \frac{x-1}{1-p}$$

And the second derivative is:

$$\frac{\partial^2 L}{\partial p^2} = -\frac{1}{p^2} - \frac{x-1}{(1-p)^2}$$

For a Geometric distribution, $E[x] = \frac{1}{p}$. The Fisher Information $I(p)$ is:

$$I(p) = \frac{1}{p^2(1-p)}$$

So, the Jeffreys’ prior is:

$$\pi(p) \propto \frac{1}{p \sqrt{1-p}}$$

```{admonition} Solution
:class: tip, dropdown


```

## 8. Two Scenarios for the Probability of Success



```{admonition} Solution
:class: tip, dropdown


```

Is there a typo on Q 8- b? 

It says Be(1, 21/2). But Posterior Expectation is 2/21. 

I got Be(1, 19/2) which results in 2/21 with the formula alpha. / (alpha + beta)

Thanks.

## 9. Jeffreys' Prior for Normal Precision



```{admonition} Solution
:class: tip, dropdown


```

## 10. Derive Jeffreys' Prior for Maxwell's $\theta$



```{admonition} Solution
:class: tip, dropdown


```

## 11. "Quasi" Jeffreys' Priors



```{admonition} Solution
:class: tip, dropdown


```

## 12. Haldane Prior for Binomial p




```{admonition} Solution
:class: tip, dropdown


```

## 13. Eliciting a Normal Prior




```{admonition} Solution
:class: tip, dropdown


```

## 14. Jigsaw

An experiment with a sample of 18 nursery-school children involved the elapsed time required to put together a small jigsaw puzzle. The times were: 


```{admonition} Solution
:class: tip, dropdown


```

https://colab.research.google.com/drive/19OEW7E1wOfcvdpytEiLQ--VF4YlsRTcD

In [21]:
data = [
    3.1,
    3.2,
    3.4,
    3.6,
    3.7,
    4.2,
    4.3,
    4.5,
    4.7,
    5.2,
    5.6,
    6.0,
    6.1,
    6.6,
    7.3,
    8.2,
    10.8,
    13.6,
]

Assume that data are coming from a normal distribution $N (\mu, \sigma^2)$ with $\sigma^2 = 8$. For parameter $\mu$, set a normal prior with mean 5 and variance 6.

------------------

(a) Find the Bayes estimator and 95% credible set for population mean $\mu$.

We can define our model like this:
$$
\begin{align}
x_i|\mu &\sim N(\mu, 8) \\
\mu & \sim N(5, 6)
\end{align}
$$
This is the Normal-Normal conjugate pair for a fixed variance and random mean, with $n=18$ and $\bar{X} \approx 5.78333$

Our posterior is then:
$$
\begin{align}
\pi(\theta|x) &\sim N\left(\frac{\tau^2}{\tau^2 +\sigma^2/n}\bar{X} + \frac{\sigma^2/n}{\tau^2 + \sigma^2/n}\mu_0, \frac{\tau^2\sigma^2/n}{\tau^2+ \sigma^2/n}\right) \\
& \sim N(\frac{6}{6 + 8/18}\bar{X} + \frac{8/18}{6 + 8/18}(5), \frac{6(8/18)}{6+8/18}) \\
& \sim N(5.72931, 0.41379)
\end{align}
$$
Our Bayes estimator will be the posterior mean, 5.72931.

The 95% equitailed credible set:

In [16]:
import numpy as np
import scipy.stats as ss

alpha = 0.05
mean = 5.72931
var = 0.41379

post = ss.norm(loc=mean, scale=var**0.5)

post.ppf(alpha / 2), post.ppf(1 - alpha / 2)

(4.468533554544652, 6.990086445455348)

The HPD credible set will be the same for the normal distribution because of symmetry:

In [18]:
from scipy.optimize import fsolve


def conditions(x, post, alpha):
    lwr, upr = x

    cond_1 = post.pdf(upr) - post.pdf(lwr)
    cond_2 = post.cdf(upr) - post.cdf(lwr) - (1 - alpha)

    return cond_1, cond_2


fsolve(conditions, (4.5, 7.0), args=(post, alpha))

array([4.46853355, 6.99008645])

(b) Find the posterior probability of hypothesis $H_0 : \mu \leq 5$.



In [20]:
post.cdf(5)

0.12844704607549606

(c) What is your prediction for a single future observation?

Since we only want a single future observation, we can use this from Greg's U4L13 notes (linked under the lecture videos):

$$
\hat{X}_{n+1} = \int_\theta \mu(\theta) \pi(\theta \mid x_i) d\theta
$$

where $\mu(\theta) = \mathbb{E}[X] = \int x f(x \mid \theta) dx$ is the mean of the original likelihood (the distribution of $X|\theta$).

We can use the [```.expect()```](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.expect.html) method on our posterior to calculate this single value, since it will take the expectation of a function with respect to our posterior. Although in the model we treat the mean of the likelihood as unknown, here we can use the mean of $\mu$ (the mean of the prior on $\mu$).

As the original hints for the solution said, this will equal the posterior mean.

In [45]:
# define likelihood
lik = ss.norm(loc=5, scale=8**0.5)

post.expect()

5.729310000000003

A PyMC solution:

In [59]:
import pymc as pm
import arviz as az

with pm.Model() as m:
    mu_prior = pm.Normal("mu", 5, sigma=6**0.5)

    likelihood = pm.Normal("lik", mu_prior, sigma=8**0.5, observed=data)

    trace = pm.sample(5000)
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 1 seconds.
Sampling: [lik]


In [60]:
az.summary(trace, hdi_prob=0.95, kind="stats")

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%
mu,5.721,0.647,4.503,7.033


In [91]:
print(trace.posterior_predictive.to_array().mean())

<xarray.DataArray ()>
array(5.72300883)


## 15. Jeremy and Poisson



```{admonition} Solution
:class: tip, dropdown


```

## 16. NPEB for *p* in the Geometric Distribution



```{admonition} Solution
:class: tip, dropdown


```

## 17. Lifetimes and Predictive Distribution



```{admonition} Solution
:class: tip, dropdown


```

## 18. Normal Likelihood with Improper Priors



```{admonition} Solution
:class: tip, dropdown


```