In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

plt.style.use('fivethirtyeight')
%matplotlib inline

## 1. Efficiency of the Poisson MLE

Let $X_1, X_2, \ldots, X_n$ be i.i.d Poisson $(\mu)$. 

**(a)** For $1 \le i \le n$, find $\ell_1(\mu; X_i)$ and the Fisher information $I(\mu)$.

**(b)** Find $\hat{\mu}_n$, the MLE of $\mu$.

**(c)** You know that $\hat{\mu}_n$ is asymptotically unbiased and efficient. Determine whether $\hat{\mu}_n$ is unbiased and efficient for fixed $n$.

---

## 2. Exponential Parametrized by the Mean

A common way to parametrize the exponential is by its mean instead of its rate. Let $X_1, X_2, \ldots, X_n$ be i.i.d. exponential with mean $\theta$. Then the density of $X_i$ is given by
$$
f_\theta(x) = \frac{1}{\theta} e^{-\frac{1}{\theta}x}, ~~~~~ x > 0
$$

**(a)** Find $\hat{\theta}_n$, the MLE of $\theta$.

**(b)** Find $E(\hat{\theta}_n)$. Is $\hat{\theta}_n$ unbiased?

**(c)** Find the Fisher information $I(\theta)$.

**(d)** Find $Var(\hat{\theta}_n)$. Is $\hat{\theta}_n$ efficient?

---

## 3. Exponential Parametrized by the Rate

Fix $n > 2$ and let $X_1, X_2, \ldots, X_n$ be i.i.d. exponential with rate $\lambda$. 

You know that the MLE of $\lambda$ is $\hat{\lambda}_n = \displaystyle \frac{1}{\bar{X}_n}$ where $\bar{X}_n$ is the mean of the sample. You also know that the MLE is asymptotically unbiased and efficient. In Lecture 4 you showed that $I(\lambda) = \displaystyle \frac{1}{\lambda^2}$. Hence for large $n$, the variance of $\hat{\lambda}_n$ is approximately $\displaystyle \frac{\lambda^2}{n}$.

Now investigate whether $\hat{\lambda}_n$ is unbiased or efficient for each fixed $n$. You will need the following facts about the gamma function and gamma integrals. 

- For positive parameters $r$ and $\lambda$, $f(x) = \displaystyle \frac{\lambda^r}{\Gamma(r)} x^{r-1}e^{-\lambda x}$ is a density over positive $x$. Hence $\displaystyle \int_0^\infty x^{r-1}e^{-\lambda x}dx = \frac{\Gamma(r)}{\lambda^r}$.
- $\Gamma(k) = (k-1)!$ for positive integer $k$.

**(a)** What is the distribution of $\bar{X}_n$? Explain.

**(b)** Use the answer to Part **a** and the gamma facts above to find $E(\hat{\lambda}_n)$. Is $\hat{\lambda}_n$ unbiased? If not, is it an overestimate of $\lambda$, or is it an underestimate? Either way, is the answer consistent with $\hat{\lambda}_n$ being asymptotically unbiased?

**(c)** Use the same approach as in Part **b** to find $Var(\hat{\lambda}_n)$. Is $\hat{\lambda}_n$ efficient? If not, is its variance consistent with asymptotic efficiency?

---

## 4. The German Tank Problem

Let $X_1, \ldots, X_n \stackrel{\text{iid}}{\sim} \text{Unif}[0, \theta]$ for an unknown parameter $\theta > 0$.

This setup is sometimes called the **German tank problem**, referring to a famous application during World War II. Allied statisticians estimated the total number of German tanks by analyzing serial numbers on captured or destroyed tanks. If tanks are numbered $1, 2, \ldots, N$ and we observe a random sample of serial numbers, we can view this as (approximately) sampling from $\text{Unif}[0, N]$. The Allies' statistical estimates turned out to be far more accurate than traditional intelligence estimates!

---
### (a) Method of Moments Estimator

Consider the estimator $\tilde{\theta}_n = 2\bar{X}_n$, where $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$.

1. Show that $\tilde{\theta}_n$ is an unbiased estimator of $\theta$ for every $n$.
2. Show that $\tilde{\theta}_n$ is consistent.
3. Find the asymptotic distribution of $\sqrt{n}(\tilde{\theta}_n - \theta)$.

---
### (b) The Likelihood Function

Write down the likelihood function $L(\theta)$ for the data $X_1, \ldots, X_n$. Show that $L(\theta)$ depends on the data only through $X_{(n)} = \max(X_1, \ldots, X_n)$.

**Insight: $X_{(n)}$ carries all of the information about the entire sample,** at least if we assume the model is correct.

---
### (c) The Maximum Likelihood Estimator

1. Find the MLE $\hat{\theta}_n$ for $\theta$.
2. Is $\hat{\theta}_n$ unbiased? If not, compute its bias.

*Hint: Start by calculating the CDF of $X_{(n)}$.*

---
### (d) Simulation: Comparing the Estimators

Simulate the sampling distributions of $\tilde{\theta}_n = 2\bar{X}_n$ and $\hat{\theta}_n = X_{(n)}$ for $n = 20$ and $n = 200$, using $\theta = 1$. 

Create a figure with two panels (one for each sample size), overlaying the histograms of both estimators on the same axes. Which estimator appears to perform better?

Make a table comparing the squared bias, variance, and mean squared error of the two estimators, (a) as explicit functions of $n$ and $\theta$, and (b) as real numbers for $n=20$ and $n=200$, still using $\theta=1$.

---
### (e) Bias Correction

Find a constant $c_n$ (depending on $n$) such that $c_n \hat{\theta}_n$ is an unbiased estimator of $\theta$.

This is the estimator that the Allied statisticians actually used to estimate German tank production!

---
### (f) Asymptotic Distribution of the MLE

Unlike "regular" MLEs, $\hat{\theta}_n = X_{(n)}$ does not have an asymptotic normal distribution. Instead, it converges to $\theta$ at rate $n$ (not $\sqrt{n}$!).

Show that $n(\theta - \hat{\theta}_n)$ converges in distribution to an $\text{Exp}(1/\theta)$ random variable (i.e., an exponential with rate $1/\theta$ and mean $\theta$), and make a plot showing (a) a histogram of the actual sampling distribution for $n=200$ and $\theta = 1$, and (b) the exponential approximation.

- *Hint 1: Let $Y_n = n(\theta - X_{(n)})$. Find the CDF $P(Y_n \le y)$ for $y > 0$, and show it converges to the CDF of an exponential distribution.*
- *Hint 2: Remember that $(1+a/n)^n$ converges to $e^a$ as $n\to\infty$.*

**Insight: Why doesn't our regular theory work here?** Note that the uniform distribution has a "hard boundary" at $\theta$, which gives the likelihood function a discontinuity at $X_{(n)}$. As a result, our quadratic Taylor expansion makes no sense, so it's not too shocking that we could see qualitatively different behavior than we derived in class.

In particular, note that:
- The MLE error shrinks like $1/n$, much faster than the typical $1/\sqrt{n}$ rate!
- The asymptotic distribution is exponential, not normal.

---
---

## 5. Scale Families and Equivariant Estimation

In this problem, we study **scale families** of distributions and a surprising fact: for scale families, unbiased estimators are always inadmissible!

---
### (a) Scale Families: CDF and PDF

A **scale family** is a collection of distributions indexed by a scale parameter $\sigma > 0$. We say $X \sim F_\sigma$ belongs to a scale family if $X = \sigma Z$ where $Z \sim F_1$ (the "standard" member of the family).

1. Express the CDF $F_\sigma(x)$ in terms of $\sigma$ and $F_1$.
2. Assuming $X$ has a PDF $f_\sigma$, express $f_\sigma(x)$ in terms of $\sigma$ and $f_1$.

---
### (b) Scale Families and Location Families

Show that if $X$ comes from a scale family (with $X > 0$), then $\log X$ comes from a **location family**.

*Recall: A location family is a collection of distributions where $Y \sim G_\mu$ means $Y = \mu + W$ for $W \sim G_0$.*

---
### (c) Fisher Information for Scale Families

Consider $n$ i.i.d. observations $X_1, \ldots, X_n$ from a scale family with parameter $\sigma$. Let $I_n(\sigma)$ denote the Fisher information for $\sigma$ based on all $n$ observations.

Show that $I_n(\sigma) = \frac{n \cdot I_1(1)}{\sigma^2}$, where $I_1(1)$ is the Fisher information for a single observation when $\sigma = 1$.

---
### (d) Scale-Equivariant Estimators

An estimator $\hat{\sigma}(X_1, \ldots, X_n)$ is called **scale-equivariant** if for any constant $c > 0$:
$$\hat{\sigma}(cX_1, \ldots, cX_n) = c \cdot \hat{\sigma}(X_1, \ldots, X_n)$$

In other words, if we rescale all the data by $c$, the estimate also rescales by $c$. This is a natural property: if we change units (e.g., from meters to centimeters), the estimate should change units accordingly.

For any scale-equivariant estimator $\hat{\sigma}$, show that:
$$\text{MSE}_\sigma(\hat{\sigma}) = \sigma^2 \cdot \text{MSE}_1(\hat{\sigma})$$

where $\text{MSE}_\sigma$ denotes the MSE when the true parameter is $\sigma$.

---
### (e) Inadmissibility of Unbiased Equivariant Estimators

Consider a scale-equivariant estimator $\hat{\sigma}$ and its rescaled version $c\hat{\sigma}$ for some constant $c > 0$.

Define $h(c) = \text{MSE}_1(c\hat{\sigma})$, the MSE of $c\hat{\sigma}$ when $\sigma = 1$.

Let $m = E_1[\hat{\sigma}]$ and $v = \text{Var}_1(\hat{\sigma})$.

1. Show that $h(c) = (cm - 1)^2 + c^2 v$.

2. Show that if $\hat{\sigma}$ is unbiased (i.e., $E_\sigma[\hat{\sigma}] = \sigma$ for all $\sigma$) and non-constant (i.e., $\text{Var}_1(\hat{\sigma}) > 0$), then $h'(1) > 0$.

3. Conclude that any unbiased, non-constant, scale-equivariant estimator is **inadmissible**: there exists a $c < 1$ such that $c\hat{\sigma}$ has strictly smaller MSE than $\hat{\sigma}$ for all $\sigma > 0$.

---
### (f) Simulation: Optimal Shrinkage for MLEs

Consider two scale families we've seen:

1. **Exponential mean**: $X_1, \ldots, X_n$ are i.i.d. exponential with mean $\mu$ (so $\mu$ is the scale parameter). The MLE is $\hat{\mu} = \bar{X}_n$.

2. **German tank problem**: $X_1, \ldots, X_n \sim \text{Unif}[0, \theta]$. The MLE is $\hat{\theta} = X_{(n)} = \max(X_i)$.

Verify that each estimator is scale-equivariant, and use your $h(c)$ formula, along with what you know about the mean and variance of each estimator, to find the optimal $c_n^*$ as a function of $n$ only (it should not depend on $\mu$ or $\theta$).

For each estimator with $n = 20$:

1. Use Monte Carlo simulation to estimate $h(c) = \text{MSE}_1(c\hat{\sigma})$ for $c \in [0.8, 1.2]$.
2. Plot $h(c)$ and mark $c = 1$ (the MLE) and (for the German tank problem) $c_n = (n+1)/n$, which gives the unbiased version.
3. Show the optimal $c_n^*$ that minimizes $h(c)$.