In [1]:
from scipy.stats import nbinom

Kidney cancer is a rare disease, so on average 46.5 cases of death by kidney cancer are registered per year per 1'000,000 people. Thus, if $\theta$ is the dead rate per year for kidnew cancer, then $$\mathbb{E}(\theta)=4.65\times 10^{-5}.$$

Let $Y_j$ be the number of deads for kidney cancer in 10 years in a county with $n_j$ habitants, we can model this by $$Y_j|\theta\sim\textsf{Poisson}(10n_j\theta_j)$$ and use the prior $\theta_j\sim\textsf{Gamma}(20,430000)$ for every county, note that $\mathbb{E}(\theta_j)\approx 4.65\times 10^{-5}$.

Since the gamma distribution is conuugate for a Poisson likelihood, we have that $$\theta_j|Y_j\sim\textsf{Gamma}(20+Y_j, 430000+10n_j),$$ then $${E}(\theta_j|Y_j)=\frac{20+Y_j}{430000+10n_j}.$$

Furthermore, according to the historical information the number of registered deads in 10 years would be $$Y_j\sim\textsf{NegativeBinomial}\left(\alpha,\frac{\beta}{10n_j}\right)$$. Therefore, the expected number of deads in 10 years would be $$\mathbb{E}(Y_j)=10n_j\frac{\alpha}{\beta}$$

In [2]:
ALPHA = 20
BETA  = 430000

def posterior_mean(n, y):
    return (ALPHA+y) / (BETA+10*n)

def prior_predictive_mean(n):
    beta_pred = BETA/(10*n)
    return ALPHA / beta_pred

Consider a small town of 1000 people.

In [3]:
N=10**3

# posterior mean with y=0
print(posterior_mean(N, 0))
print(posterior_mean(N, 1))

# expected number of deads due to kidney cancer during the next 10 years
print(prior_predictive_mean(N))

4.545454545454545e-05
4.772727272727273e-05
0.46511627906976744


Note that with zero deads the observed ratio is zero, while with just one dead the ratio is $10^4$, almost double the national mean!

Furthermore, with such a small population size, the inference is dominated by the prior distribution.

In [4]:
# probability of 0, 1, 2 and 3 deads during the next 10 years

beta_pred = BETA/(10*N)

p0 = nbinom.pmf(0, n=ALPHA, p=beta_pred/(beta_pred+1))
p1 = nbinom.pmf(1, n=ALPHA, p=beta_pred/(beta_pred+1))
p2 = nbinom.pmf(2, n=ALPHA, p=beta_pred/(beta_pred+1))
p3 = nbinom.pmf(3, n=ALPHA, p=beta_pred/(beta_pred+1))

In [5]:
round(p0,3), round(p1,3), round(p2,3), round(p3,3)

(0.631, 0.287, 0.068, 0.011)

Consider now a city of 1'000,000 people

In [6]:
N=10**6

# posterior mean with y=393, 545
print(posterior_mean(N, 393))
print(posterior_mean(N, 545))

# expected number of deads due to kidney cancer during the next 10 years
print(prior_predictive_mean(N))

3.959731543624161e-05
5.417066155321189e-05
465.1162790697675


In this large city, the data dominate the prior distribution.

### Constructing the prior distribution

To construct the prior distribution we can use historical information about the number of deads in the counties. Since $$Y_j\sim\textsf{NegativeBinomial}\left(\alpha,\frac{\beta}{10n_j}\right),$$ then $$\mathbb{E}\left(\frac{Y_j}{10n_j}\right)=\frac{\alpha}{\beta}$$ and $$\mathbb{V}\left(\frac{Y_j}{10n_j}\right)=\frac{1}{10n_j}\frac{\alpha}{\beta}+\frac{\alpha}{\beta^2}.$$

We set $\frac{\alpha}{\beta}$ to the mean of $\frac{Y_j}{10n_j}$, and set $\mathbb{E}\left(\frac{1}{10n_j}\right)\frac{\alpha}{\beta}+\frac{\alpha}{\beta^2}$ to the variance of $\frac{Y_j}{10n_j}$, using the sample average of the values $\frac{1}{10n_j}$ in place of $\mathbb{E}\left(\frac{1}{10n_j}\right)$.