
# Clinical Trial with Dirichlet–Multinomial (3 outcomes)

We consider three patient outcomes for a new treatment:

- $1$: Improved  
- $2$: No change  
- $3$: Worsened  

**Data (n = 20)**:  
$$(y_1, y_2, y_3) = (12, 6, 2).$$

**Prior**:  
$$\theta = (\theta_1,\theta_2,\theta_3) \sim \mathrm{Dirichlet}(2,2,2).$$

Posterior:
$$\theta \mid y \sim \mathrm{Dirichlet}(2+12,\;2+6,\;2+2) = \mathrm{Dirichlet}(14,8,4).$$

We will answer the following questions:

1. $P(\theta_1 > 0.5 \mid y)$  
2. $P(\theta_3 < 0.2 \mid y)$  
3. $P(\theta_1 > \theta_3 \mid y)$  
4. Predictive for $m=10$ new patients: $P(z_1 \ge 7 \mid y)$ where $(z_1,z_2,z_3)\mid \theta \sim \mathrm{Multinomial}(m,\theta)$.


In [None]:
import numpy as np
from scipy.stats import beta, dirichlet
from math import comb
from scipy.special import gammaln

# Data and prior
y = np.array([12, 6, 2], dtype=int)
alpha0 = np.array([2, 2, 2], dtype=float)
n = int(y.sum())

# Posterior parameters
alpha_post = alpha0 + y  # [14, 8, 4]
alpha_post

array([14.,  8.,  4.])


## Questions 1 & 2 via Beta marginals

For a Dirichlet$(\alpha_1,\alpha_2,\alpha_3)$, the marginal distribution of $\theta_1$ is $\mathrm{Beta}(\alpha_1, \alpha_2+\alpha_3)$ and similarly for $\theta_2$.  

Thus, with $\alpha'=(14,8,4)$:
- $\theta_1 \mid y \sim \mathrm{Beta}(14,\;8+4=12)$
- $\theta_2 \mid y \sim \mathrm{Beta}(4,\;14+8=22)$

We compute:

- $P(\theta_1 > 0.5 \mid y) = 1 - F_{\mathrm{Beta}(14,12)}(0.5)$  
- $P(\theta_2 < 0.2 \mid y) = F_{\mathrm{Beta}(4,22)}(0.2)$


In [None]:

alpha1, alpha2, alpha3 = alpha_post
# Q1
p_q1 = 1 - beta.cdf(0.5, a=alpha1, b=alpha2+alpha3)

# Q2
p_q2 = beta.cdf(0.2, a=alpha3, b=alpha1+alpha2)

p_q1, p_q2


(np.float64(0.6549810171127319), np.float64(0.7660067407738087))


## Question 3: $P (\theta_1 > \theta_3 \mid y)$

If $X \sim \mathrm{Gamma}(\alpha_1,1)$ and $Y \sim \mathrm{Gamma}(\alpha_3,1)$ (independent), then
$$\frac{X}{X+Y} \sim \mathrm{Beta}(\alpha_1,\alpha_3).$$
Moreover, $\theta_1 > \theta_3 \iff X > Y \iff \frac{X}{X+Y} > \tfrac{1}{2}.$  
Therefore,
$$P(\theta_1 > \theta_3 \mid y) = P\!\left(\mathrm{Beta}(\alpha_1,\alpha_3) > \tfrac{1}{2}\right)
= 1 - F_{\mathrm{Beta}(\alpha_1,\alpha_3)}\!\left(\tfrac{1}{2}\right).$$


In [None]:

p_q3 = 1 - beta.cdf(0.5, a=alpha1, b=alpha3)
p_q3


np.float64(0.9936370849609375)


## Question 4: Predictive probability $P(z_1 \ge 7 \mid y)$ for $m=10$

Given $\theta \mid y \sim \mathrm{Dirichlet}(\alpha')$ with $\alpha'=(14,8,4)$ and $m=10$, the posterior predictive for counts $(z_1,z_2,z_3)$ is Dirichlet–Multinomial:
$$
P(z \mid m,\alpha') = \frac{m!}{z_1!\,z_2!\,z_3!}\,
\frac{\Gamma(\alpha'_1+z_1)\,\Gamma(\alpha'_2+z_2)\,\Gamma(\alpha'_3+z_3)}{\Gamma(\alpha'_1)\,\Gamma(\alpha'_2)\,\Gamma(\alpha'_3)}\,
\frac{\Gamma(\alpha'_0)}{\Gamma(\alpha'_0 + m)},
$$
where $\alpha'_0 = \alpha'_1+\alpha'_2+\alpha'_3$.

We compute
$$P(z_1 \ge 7 \mid y) = \sum_{z_1=7}^{10}\;\sum_{z_2=0}^{10-z_1} P\big((z_1,z_2,10-z_1-z_2)\mid m=10,\alpha'\big).$$
To ensure numerical stability, we will sum log-pmf values and exponentiate.


In [None]:

def log_dirichlet_multinomial_pmf(z, alpha):
    # z: array of counts summing to m; alpha: Dirichlet parameters
    z = np.asarray(z, dtype=int)
    alpha = np.asarray(alpha, dtype=float)
    m = int(z.sum())
    alpha0 = alpha.sum()
    # log multinomial coefficient
    log_coef = (gammaln(m+1) - (gammaln(z[0]+1) + gammaln(z[1]+1) + gammaln(z[2]+1)))
    # Dirichlet ratio
    log_dir_ratio = (gammaln(alpha[0]+z[0]) + gammaln(alpha[1]+z[1]) + gammaln(alpha[2]+z[2])
                     - (gammaln(alpha[0]) + gammaln(alpha[1]) + gammaln(alpha[2])))
    # Gamma(alpha0)/Gamma(alpha0+m)
    log_norm = (gammaln(alpha0) - gammaln(alpha0 + m))
    return log_coef + log_dir_ratio + log_norm

m = 10
alpha_prime = alpha_post.copy()

acc = 0.0
for z1 in range(7, m+1):
    for z2 in range(0, m - z1 + 1):
        z3 = m - z1 - z2
        lp = log_dirichlet_multinomial_pmf([z1, z2, z3], alpha_prime)
        acc += float(np.exp(lp))

p_q4_exact = acc
p_q4_exact


0.2789125964876829


## Monte Carlo checks

We will also approximate the same quantities via Monte Carlo:
- Sample $\theta^{(b)} \sim \mathrm{Dirichlet}(\alpha')$.
- Compute indicators for questions 1–3.
- For the predictive, draw $(z_1,z_2,z_3)^{(b)} \sim \mathrm{Multinomial}(m,\theta^{(b)})$ and check $\mathbf{1}\{z_1 \ge 7\}$.


In [None]:

rng = np.random.default_rng(2025)
B = 500_000

theta_samp = rng.dirichlet(alpha_prime, size=B)
th1 = theta_samp[:,0]; th2 = theta_samp[:,1]; th3 = theta_samp[:,2]

mc_q1 = np.mean(th1 > 0.5)
mc_q2 = np.mean(th3 < 0.2)
mc_q3 = np.mean(th1 > th3)

# Predictive sampling
z_samp = rng.multinomial(m, theta_samp[0], size=1)  # dummy init shape
# Efficient per-row multinomial draw:
def row_multinomial(m, p):
    # Draw one sample per row of p
    out = np.empty((p.shape[0], p.shape[1]), dtype=int)
    for i, pi in enumerate(p):
        out[i] = rng.multinomial(m, pi)
    return out

z_samp = row_multinomial(m, theta_samp)
mc_q4 = np.mean(z_samp[:,0] >= 7)

mc_q1, mc_q2, mc_q3, mc_q4


(np.float64(0.654862),
 np.float64(0.766638),
 np.float64(0.993444),
 np.float64(0.279426))

In [None]:

print("Posterior parameters alpha' = (α1, α2, α3):", tuple(alpha_prime))
print("\nQ1: P(θ1 > 0.5 | y)")
print("  Analytic (Beta marginal) :", f"{p_q1:.6f}")
print("  Monte Carlo               :", f"{mc_q1:.6f}")

print("\nQ2: P(θ3 < 0.2 | y)")
print("  Analytic (Beta marginal) :", f"{p_q2:.6f}")
print("  Monte Carlo               :", f"{mc_q2:.6f}")

print("\nQ3: P(θ1 > θ3 | y)")
print("  Analytic (Beta(α1,α3))   :", f"{p_q3:.6f}")
print("  Monte Carlo               :", f"{mc_q3:.6f}")

print("\nQ4: Predictive P(z1 ≥ 7 | y), m=10")
print("  Exact (Dirichlet–Multinomial sum) :", f"{p_q4_exact:.6f}")
print("  Monte Carlo                         :", f"{mc_q4:.6f}")


Posterior parameters alpha' = (α1, α2, α3): (np.float64(14.0), np.float64(8.0), np.float64(4.0))

Q1: P(θ1 > 0.5 | y)
  Analytic (Beta marginal) : 0.654981
  Monte Carlo               : 0.654862

Q2: P(θ3 < 0.2 | y)
  Analytic (Beta marginal) : 0.766007
  Monte Carlo               : 0.766638

Q3: P(θ1 > θ3 | y)
  Analytic (Beta(α1,α3))   : 0.993637
  Monte Carlo               : 0.993444

Q4: Predictive P(z1 ≥ 7 | y), m=10
  Exact (Dirichlet–Multinomial sum) : 0.278913
  Monte Carlo                         : 0.279426
