
# SBP Example with Jeffreys Prior (Normal mean & variance unknown)

**Goal.** Use Jeffreys prior for a Normal model with unknown mean and variance to answer:

1) Posterior probability that the population mean SBP exceeds 130 mmHg.  
2) Posterior *predictive* probability that a new patient’s SBP exceeds 130 mmHg.

**Data (n = 10):**
$$
y = (128,\;132,\;121,\;135,\;126,\;130,\;129,\;138,\;125,\;131)
$$

We'll assume $ y_i \mid \mu,\sigma^2 \stackrel{iid}{\sim} \mathcal{N}(\mu,\sigma^2) $.  
Jeffreys prior for $(\mu,\sigma^2)$ is $ \pi(\mu,\sigma^2) \propto 1/\sigma^2 $.



## Jeffreys Prior Derivation (Precision Parameterization)

We work with the precision parameterization:
$$
y_i \mid \mu,\tau^2 \stackrel{\text{iid}}{\sim} \mathcal{N}(\mu,\;\tau^{-2}),
\qquad \tau^2 = \sigma^{-2}, \quad i=1,\ldots,10.
$$

**Interpretation.**
- $\mu$: population mean SBP (scientific target)  
- $\tau^2$: precision (inverse variance); larger $\tau^2$ = less between-patient variability  
- $\sigma^2 = 1/\tau^2$: variance (often treated as a nuisance parameter)

**Log-likelihood:**
$$
\ell(\mu,\tau^2) = \frac{n}{2}\log \tau^2 - \frac{\tau^2}{2}\sum_{i=1}^n (y_i-\mu)^2 + \text{const.}
$$

**Scores and Fisher information.**
$$
\frac{\partial \ell}{\partial \mu} = \tau^2 \sum_{i=1}^n (y_i-\mu),
\quad
\frac{\partial^2 \ell}{\partial \mu^2} = - n \tau^2
\;\Rightarrow\;
I_{\mu\mu}(\mu,\tau^2)= n \tau^2.
$$

$$
\frac{\partial \ell}{\partial \tau^2} = \frac{n}{2\tau^2} - \frac{1}{2}\sum_{i=1}^n (y_i-\mu)^2,
\quad
\frac{\partial^2 \ell}{\partial (\tau^2)^2} = -\frac{n}{2(\tau^2)^2}
\;\Rightarrow\;
I_{\tau^2\tau^2}(\mu,\tau^2)= \frac{n}{2(\tau^2)^2}.
$$

$$
\frac{\partial^2 \ell}{\partial \mu\,\partial \tau^2}
= \sum_{i=1}^n (y_i-\mu)
\;\Rightarrow\;
I_{\mu,\tau^2}(\mu,\tau^2)=0.
$$

Hence
$$
I(\mu,\tau^2)=
\begin{pmatrix}
n\tau^2 & 0\$$2pt]
0 & \dfrac{n}{2(\tau^2)^2}
\end{pmatrix},
\qquad
\det I(\mu,\tau^2)=\frac{n^2}{2}\cdot \frac{1}{\tau^2}.
$$

**Jeffreys prior:**
$$
\pi_J(\mu,\tau^2)\ \propto\ \sqrt{\det I(\mu,\tau^2)}
\ \propto\ \frac{1}{\sqrt{\tau^2}}.
$$

> *Remark.* Many texts instead use the (reference/independence) prior $\pi(\mu,\sigma^2)\propto 1/\sigma^2$, which, under the reparameterization $\tau^2=1/\sigma^2$, becomes $\pi(\mu,\tau^2)\propto 1/\tau^2$. Both choices lead to the familiar Student-$t$ marginal for $\mu$; the difference only affects the conditional prior on the precision.



## Marginal Posterior Distribution of $\mu$

- Recall: $ \mu \mid \tau^2, y \sim \mathcal{N}(\bar y, (n\tau^2)^{-1}) $, and $ \tau^2 \mid y \sim \mathrm{Gamma}(n/2, S/2) $.  
- Define $ \kappa = S \tau^2/n $. Then $ \kappa \mid y \sim \mathrm{Gamma}(n/2, n/2) $.  
- Note: $(n\tau^2)^{-1} = \tfrac{S}{n^2} \kappa^{-1}$. Thus
  $$ \mu \mid \kappa,y \sim \mathcal{N}\!\left(\bar y, \tfrac{S}{n^2}\kappa^{-1}\right). $$
- By the scale mixture of Normals representation, we obtain:
$$
\boxed{
\mu \mid y \sim t_{n}\!\left(\bar y,\; S/n^2\right)
= t_{n}\!\left(\bar y,\; \left(1-\tfrac{1}{n}\right)\tfrac{s^2}{n}\right)
}
$$
where $ S = \sum_{i=1}^n (y_i - \bar y)^2 $, and $ s^2 = S/(n-1) $.



## Posterior Predictive Distribution

Let $ \tilde y \mid \mu,\tau^2 \sim \mathcal{N}(\mu, \tau^{-2}) $.  

The joint posterior is  
$$
\mu \mid \tau^2,y \sim \mathcal{N}(\bar y, (n\tau^2)^{-1}),
\qquad \tau^2 \mid y \sim \mathrm{Gamma}(n/2, S/2).
$$

1. Integrate out $\mu$ given $\tau^2$:  
   $$
   \tilde y \mid \tau^2 \sim \mathcal{N}\!\left(\bar y,\; (1+n^{-1})(\tau^2)^{-1}\right).
   $$

2. Then integrate out $\tau^2$:  
   $$
   \boxed{
   \tilde y \mid y \sim t_{n}\!\left(\bar y,\; \left(1+\tfrac{1}{n}\right)\tfrac{S}{n}\right)
   = t_{n}\!\left(\bar y,\; \left(1-\tfrac{1}{n^2}\right) s^2\right)
   }
   $$


In [1]:

import numpy as np

y = np.array([128, 132, 121, 135, 126, 130, 129, 138, 125, 131], dtype=float)
n = len(y)
ybar = y.mean()
S = ((y - ybar)**2).sum()          # Sum of squares about the mean
s2 = S/(n-1)                       # Unbiased sample variance

n, ybar, s2, S


(10, np.float64(129.5), np.float64(24.27777777777778), np.float64(218.5))


## Marginal Posterior of $\mu$ and Posterior Predictive (df $= n$) under $\pi(\mu,\tau^2)\propto (\tau^2)^{-1/2}$

From the precision-parameterized derivation:

- $ \mu \mid \tau^2, y \sim \mathcal{N}\!\big(\bar y,\; (n\tau^2)^{-1}\big) $,  
  $ \tau^2 \mid y \sim \mathrm{Gamma}\!\left(\frac{n}{2}, \frac{S}{2}\right) $ (shape–rate).

Let $ \kappa = S\tau^2/n $. Then $ \kappa \mid y \sim \mathrm{Gamma}\!\left(\frac{n}{2}, \frac{n}{2}\right) $.  
Also $ (n\tau^2)^{-1} = \frac{S}{n^2}\kappa^{-1} $, hence
$$
\mu \mid \kappa, y \sim \mathcal{N}\!\left(\bar y, \frac{S}{n^2}\kappa^{-1}\right).
$$

By the Normal–Gamma scale-mixture representation,
$$
\boxed{\;\mu \mid y \sim t_{n}\!\left(\bar y,\; \frac{S}{n^2}\right) = t_{n}\!\left(\bar y,\; \left(1-\frac{1}{n}\right)\frac{s^2}{n}\right)\;}
$$
with $ S=\sum_{i=1}^n (y_i - \bar y)^2 $ and $ s^2=S/(n-1) $.

For the posterior predictive, integrating out $\mu$ and then $\tau^2$ yields
$$
\boxed{\;\tilde y \mid y \sim t_{n}\!\left(\bar y,\; \left(1+\frac{1}{n}\right)\frac{S}{n}\right)
= t_{n}\!\left(\bar y,\; \left(1 - \frac{1}{n^2}\right) s^2\right)\;}
$$
where $t_{\nu}(\text{location}, \text{scale})$ is Student-$t$ with df $\nu$, location shift, and scale.


In [2]:

from scipy.stats import t, gamma
import numpy as np

# Using df = n and the scales derived above
nu_alt = n  # degrees of freedom
threshold = 130.0

# Posterior for mu: df=n, scale = sqrt(S/n^2)
scale_mu_alt = np.sqrt(S/(n**2))
post_mu_prob_alt = 1 - t.cdf((threshold - ybar)/scale_mu_alt, df=nu_alt)

# Posterior predictive: df=n, scale = sqrt((1 + 1/n) * S / n)
scale_pred_alt = np.sqrt((1 + 1/n) * S / n)
post_pred_prob_alt = 1 - t.cdf((threshold - ybar)/scale_pred_alt, df=nu_alt)

post_mu_prob_alt, post_pred_prob_alt


(np.float64(0.3710826258510036), np.float64(0.46039124016169886))


### Monte Carlo check for the $t_n$ results

Under $\pi(\mu,\tau^2)\propto (\tau^2)^{-1/2}$:  
- Sample $ \tau^2 \mid y \sim \mathrm{Gamma}\!\left(\frac{n}{2}, \frac{S}{2}\right) $ (shape–rate).  
- Then $ \mu \mid \tau^2, y \sim \mathcal{N}\!\big(\bar y,\; (n\tau^2)^{-1}\big) $.  
- Predictive $ \tilde y \mid \mu,\tau^2 \sim \mathcal{N}(\mu,\tau^{-2}) $.


In [3]:

rng = np.random.default_rng(2025)
B = 500_000

# tau^2 | y ~ Gamma(shape=n/2, rate=S/2)
shape = n/2
rate = S/2
tau2 = rng.gamma(shape=shape, scale=1/rate, size=B)

# mu | tau^2, y
mu_alt = rng.normal(loc=ybar, scale=np.sqrt(1/(n*tau2)))

# predictive
ynew_alt = rng.normal(loc=mu_alt, scale=np.sqrt(1/tau2))

mc_mu_alt = (mu_alt > threshold).mean()
mc_pred_alt = (ynew_alt > threshold).mean()

(post_mu_prob_alt, mc_mu_alt, post_pred_prob_alt, mc_pred_alt)


(np.float64(0.3710826258510036),
 np.float64(0.370792),
 np.float64(0.46039124016169886),
 np.float64(0.461002))

In [4]:

print("=== Using df = n (Jeffreys prior in precision form π(μ,τ^2) ∝ (τ^2)^(-1/2)) ===")
print(f"n = {n}, ȳ = {ybar:.3f}, s^2 = {s2:.3f}, S = {S:.3f}")
print(f"P(μ > 130 | y) [analytic]     = {post_mu_prob_alt:.4f}")
print(f"P(μ > 130 | y) [MC]           = {mc_mu_alt:.4f}")
print(f"P(y_new > 130 | y) [analytic] = {post_pred_prob_alt:.4f}")
print(f"P(y_new > 130 | y) [MC]       = {mc_pred_alt:.4f}")

print("\nFor reference, earlier cells used the common reference prior π(μ,σ^2) ∝ 1/σ^2, which yields df = n-1.")
print("Both produce Student-t forms; the df and scales differ slightly due to the prior choice.")


=== Using df = n (Jeffreys prior in precision form π(μ,τ^2) ∝ (τ^2)^(-1/2)) ===
n = 10, ȳ = 129.500, s^2 = 24.278, S = 218.500
P(μ > 130 | y) [analytic]     = 0.3711
P(μ > 130 | y) [MC]           = 0.3708
P(y_new > 130 | y) [analytic] = 0.4604
P(y_new > 130 | y) [MC]       = 0.4610

For reference, earlier cells used the common reference prior π(μ,σ^2) ∝ 1/σ^2, which yields df = n-1.
Both produce Student-t forms; the df and scales differ slightly due to the prior choice.
