In [None]:
import matplotlib.pyplot as plt
import numpy as np

(a) At the start of section 4.1 it was shown that the likelihood for a set of n independent Binomial trials is given by:

\begin{align}
L(\theta) & = \prod_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i} \\
& = \theta^{\sum_{i=1}^{n} x_i} (1 - \theta)^{n - \sum_{i=1}^{n} x_i}
\end{align}

or in terms of the log-likelihood:
\begin{align}
logL(\theta) & = log \theta \sum_{i=1}^{n} x_i + \log (1-\theta) \Bigg( {n- \sum_{i=1}^{n} x_i} \Bigg)
\end{align}

(b) The likelihood of $\theta$ is given by

In [None]:
k = np.array([i for i in range(7)])
n_k_2 = np.array([42860, 89213, 47819, 0, 0, 0, 0])
n_k_6 = np.array([1096, 6233, 15700, 22221, 17332, 7908, 1579])
x = n_k_2 + n_k_6
print(repr(k))
print(repr(x))

In [None]:
def log_likelihood(k: np.ndarray, x: np.ndarray, theta: np.ndarray) -> np.ndarray:
    n = np.sum(k * x)
    sum_x = np.sum(x)
    log_like = np.log(theta) * sum_x + np.log(1 - theta) * (n - sum_x)
    #like = np.exp(log_like - np.max(log_like))  # tiny values
    return log_like

In [None]:
theta = np.linspace(0.01, 0.99, num=100)
log_like = log_likelihood(k, x, theta)
plt.plot(theta, log_like)
plt.xlabel(r'$\theta$')
plt.ylabel('Log likelihood')
plt.title(r'Log likelihood of $\theta$');

The MLE of $\theta$ is given by the solution to the score equation which is:

$$
\hat{\theta} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

In [None]:
def mle(k: np.ndarray, x: np.ndarray) -> float:
    return np.sum(x) / np.sum(k * x)

In [None]:
theta_hat = mle(k, x)
print(f'MLE of theta = {np.round(theta_hat, 2)}')

The standard error of $\theta$ is given by:

\begin{align}
se(\hat{\theta}) & = \sqrt{var \Bigg(\frac{1}{n} \sum_{i=1}^{n} x_i  \Bigg)} \\
& = \sqrt{\frac{\hat{\theta} (1 - \hat{\theta})}{n}}
\end{align}

In [None]:
se_theta_hat = np.sqrt((theta_hat * (1 - theta_hat)) / np.sum(k * x))
np.round(se_theta_hat, 4)

(c) The binomial model is not a good fit to the data as the MLE is larger than would be expected. There is not equal sampling probability amongst the familes. Extra-binomial variance is observed since the sampling is done from a mixed population of families of size 2 and families of size 6.