# Supplementary Exercises 4.8

## 7. Derive Jeffreys’ Priors for Poisson $\lambda$, **Bernoulli** $p$, **and Geometric** $p$.

Recall that Jeffreys’ prior for parameter $\theta$ in the likelihood $f(x | \theta)$ is defined as

$$\pi(\theta) \propto \left| \text{det}(I(\theta)) \right|^{1/2}$$

where, for univariate parameters,

$$I(\theta) = E \left[ \left( \frac{d \log f(x | \theta)}{d\theta} \right)^2 \right] = -E \left[ \frac{d^2 \log f(x | \theta)}{d\theta^2} \right]$$

and expectation is taken with respect to the random variable $X \sim f(x | \theta)$.

**(a) Show that Jeffreys’ prior for Poisson distribution** $f(x | \lambda) = \frac{\lambda^x}{x!} e^{-\lambda}$, $\lambda \geq 0$, **is** $\pi(\lambda) = \sqrt{\frac{1}{\lambda}}$.

**(b) Show that Jeffreys’ prior for Bernoulli distribution** $f(x | p) = p^x (1 - p)^{1-x}$, $0 \leq p \leq 1$, **is** $\pi(p) \propto \frac{1}{\sqrt{p(1-p)}}$, which is the beta $\text{Be}(1/2, 1/2)$ distribution (or Arcsin distribution).

**(c) Show that Jeffreys’ prior for Geometric distribution** $f(x | p) = (1 - p)^{x-1} p$, $x = 1, 2, \ldots$ ; $0 \leq p \leq 1$, **is** $\pi(p) \propto \frac{1}{p \sqrt{1-p}}$.

---

**7. Derive Jeffreys’ Priors for Poisson** $\lambda$, **Bernoulli** $p$, **and Geometric** $p$.

**(a)**
For the Poisson distribution with likelihood function:
$$f(x | \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}$$

First, we differentiate the log-likelihood with respect to $\lambda$:

$$\frac{d}{d\lambda} \log \left( \frac{\lambda^x e^{-\lambda}}{x!} \right) = \frac{d}{d\lambda} (x \log \lambda - \lambda)$$

Which gives:

$$\frac{x}{\lambda} - 1$$

Now, the Fisher Information $I(\lambda)$ is given by:

$$I(\lambda) = E\left[ \left( \frac{x}{\lambda} - 1 \right)^2 \right] = \frac{E[x^2]}{\lambda^2} - \frac{2E[x]}{\lambda} + 1$$

Given that $E[x^2] = Var(x) + (E[x])^2$ and for a Poisson distribution, $E[x] = \lambda$ and $Var(x) = \lambda$:

$$E[x^2] = \lambda + \lambda^2$$

Substituting this in, we get:

$$I(\lambda) = \frac{1}{\lambda}$$

Thus, the Jeffreys’ prior is:

$$\pi(\lambda) \propto \sqrt{\frac{1}{\lambda}}$$

**(b)**
For the Bernoulli distribution:
$$f(x | p) = p^x (1-p)^{1-x}$$

The log-likelihood is:

$$L = x \log(p) + (1 - x) \log(1 - p)$$

Differentiating $L$ with respect to $p$ we get:

$$\frac{\partial L}{\partial p} = \frac{x}{p} - \frac{1-x}{1-p}$$

And the second derivative is:

$$\frac{\partial^2 L}{\partial p^2} = -\frac{x}{p^2} - \frac{1-x}{(1-p)^2}$$

For a Bernoulli distribution, $E[x] = p$. The Fisher Information $I(p)$ is:

$$I(p) = \frac{1}{p(1-p)}$$

So, the Jeffreys’ prior is:

$$\pi(p) \propto \frac{1}{\sqrt{p(1-p)}}$$

**(c)**
For the Geometric distribution:

$$f(x | p) = (1-p)^{x-1} p$$

The log-likelihood is:

$$L = (x-1) \log(1-p) + \log(p)$$

Differentiating $L$ with respect to $p$:

$$\frac{\partial L}{\partial p} = \frac{1}{p} - \frac{x-1}{1-p}$$

And the second derivative is:

$$\frac{\partial^2 L}{\partial p^2} = -\frac{1}{p^2} - \frac{x-1}{(1-p)^2}$$

For a Geometric distribution, $E[x] = \frac{1}{p}$. The Fisher Information $I(p)$ is:

$$I(p) = \frac{1}{p^2(1-p)}$$

So, the Jeffreys’ prior is:

$$\pi(p) \propto \frac{1}{p \sqrt{1-p}}$$

## 14. Jigsaw

An experiment with a sample of 18 nursery-school children involved the elapsed time required to put together a small jigsaw puzzle. The times were: 

In [21]:
data = [3.1, 3.2, 3.4, 3.6, 3.7, 4.2, 4.3, 4.5, 4.7,
 5.2, 5.6, 6.0, 6.1, 6.6, 7.3, 8.2, 10.8, 13.6]

Assume that data are coming from a normal distribution $N (\mu, \sigma^2)$ with $\sigma^2 = 8$. For parameter $\mu$, set a normal prior with mean 5 and variance 6.

------------------

(a) Find the Bayes estimator and 95% credible set for population mean $\mu$.

We can define our model like this:
$$
\begin{align}
x_i|\mu &\sim N(\mu, 8) \\
\mu & \sim N(5, 6)
\end{align}
$$
This is the Normal-Normal conjugate pair for a fixed variance and random mean, with $n=18$ and $\bar{X} \approx 5.78333$

Our posterior is then:
$$
\begin{align}
\pi(\theta|x) &\sim N\left(\frac{\tau^2}{\tau^2 +\sigma^2/n}\bar{X} + \frac{\sigma^2/n}{\tau^2 + \sigma^2/n}\mu_0, \frac{\tau^2\sigma^2/n}{\tau^2+ \sigma^2/n}\right) \\
& \sim N(\frac{6}{6 + 8/18}\bar{X} + \frac{8/18}{6 + 8/18}(5), \frac{6(8/18)}{6+8/18}) \\
& \sim N(5.72931, 0.41379)
\end{align}
$$
Our Bayes estimator will be the posterior mean, 5.72931.

The 95% equitailed credible set:

In [16]:
import numpy as np
import scipy.stats as ss

alpha = .05
mean = 5.72931
var = .41379

post = ss.norm(loc=mean, scale=var**.5)

post.ppf(alpha/2), post.ppf(1 - alpha/2)

(4.468533554544652, 6.990086445455348)

The HPD credible set will be the same for the normal distribution because of symmetry:

In [18]:
from scipy.optimize import fsolve

def conditions(x, post, alpha):
    lwr, upr = x

    cond_1 = post.pdf(upr) - post.pdf(lwr)
    cond_2 = post.cdf(upr) - post.cdf(lwr) - (1 - alpha)
    
    return cond_1, cond_2

fsolve(conditions, (4.5, 7.0), args=(post, alpha))

array([4.46853355, 6.99008645])

(b) Find the posterior probability of hypothesis $H_0 : \mu \leq 5$.



In [20]:
post.cdf(5)

0.12844704607549606

(c) What is your prediction for a single future observation?

Since we only want a single future observation, we can use this from Greg's U4L13 notes (linked under the lecture videos):

$$
\hat{X}_{n+1} = \int_\theta \mu(\theta) \pi(\theta \mid x_i) d\theta
$$

where $\mu(\theta) = \mathbb{E}[X] = \int x f(x \mid \theta) dx$ is the mean of the original likelihood (the distribution of $X|\theta$).

We can use the [```.expect()```](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.expect.html) method on our posterior to calculate this single value, since it will take the expectation of a function with respect to our posterior. Although in the model we treat the mean of the likelihood as unknown, here we can use the mean of $\mu$ (the mean of the prior on $\mu$).

As the original hints for the solution said, this will equal the posterior mean.

In [45]:
# define likelihood
lik = ss.norm(loc=5, scale=8**.5)

post.expect()

5.729310000000003

A PyMC solution:

In [59]:
import pymc as pm
import arviz as az

with pm.Model() as m:
    mu_prior = pm.Normal("mu", 5, sigma=6**.5)

    likelihood = pm.Normal("lik", mu_prior, sigma=8**.5, observed=data)

    trace = pm.sample(5000)
    pm.sample_posterior_predictive(trace, extend_inferencedata=True)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu]


Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 1 seconds.
Sampling: [lik]


In [60]:
az.summary(trace, hdi_prob=.95, kind="stats")

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%
mu,5.721,0.647,4.503,7.033


In [91]:
print(trace.posterior_predictive.to_array().mean())

<xarray.DataArray ()>
array(5.72300883)
