<a href="https://colab.research.google.com/github/dlsun/Stat305-S20/blob/master/colabs/notebooks/STAT_305_Notebook_4_Comparing_Unbiased_Estimators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I encourage you to work through this notebook with a partner so that you can discuss your answers. You should meet over an application such as Discord or Zoom. One person can share their screen with this notebook open.

In [None]:
# This is a code cell.
# To run the code in this cell, click on it and press the "Play" button.
!pip install -q symbulate
from symbulate import *
import matplotlib.pyplot as plt

# Example 2 Revisited

Recall Example 2, where we wanted to estimate the rate parameter $\lambda$ of a Poisson process by which radioactive particles reach a Geiger counter.

We recorded the number of particles in 1-second intervals for 10 seconds:
$$ 0, 3, 1, 0, 0, 1, 0, 2, 0, 4. $$
These observations are independent observations from a $\text{Poisson}(\mu=\lambda)$ distribution.


Previously, we estimated $\lambda$ by the sample mean 
$$ \bar X = \frac{1}{n} \sum_{i=1}^{n} X_i = \frac{0 + 3 + 1 + 0 + 0 + 1 + 0 + 2 + 0 + 4}{10} = 1.1. $$
We showed that the sample mean is an unbiased estimator of $\lambda$.

What if we estimate $\lambda$ by the sample variance?

\begin{align*}
S^2 &= \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)^2 \\
&= \frac{1}{9} \Big((0 - 1.1)^2 + (3 - 1.1)^2 + ... + (4 - 1.1)^2 \Big) \\
&= 2.1 
\end{align*}

We get a very different estimate of the rate of the Poisson process. The next question asks you to show that this is not a completely ridiculous estimator.

**Question 1.** Show that $S^2$ is also unbiased for estimating $\lambda$. 

(_Hint:_ Use the result from Notebook 3, that $S^2$ is unbiased for estimating $\sigma^2 = \text{Var}[X_1]$.)

_YOUR MATH HERE_

# Choosing between Estimators

Now we have two estimators, $\bar X$ and $S^2$, both of which are unbiased for estimating $\lambda$. They given different estimates. How do we choose between them? Let's try to simulate their distributions.

To do this, we have to generate i.i.d. observations $X_1, ..., X_n$ from a Poisson distribution.

**Question 2.** Let $\lambda = 1.6$. (This was chosen arbitrarily.) Simulate the distributions of $\bar X$ and $S^2$ by simulating many data sets of size $n=10$ and calculating the sample mean and variance. 

We know that both $\bar X$ and $S^2$ are unbiased---that is, both distributions should be centered around $\lambda=1.6$. Looking at the simulated distributions, which estimator would you prefer, if you were trying to estimate $\lambda$ as best as possible?

In [None]:
lam = 1.6

# define a RV which is the mean of 10 independent numbers from a Poisson
X_bar = RV(Poisson(lam) ** 10, mean)

# TODO: define a RV which is the sample var of 10 independent numbers from a Poisson
def sample_var(data):
  n = len(data)
  x_bar = mean(data)
  return 0 # TODO
S_sq = RV(Poisson(lam) ** 10, sample_var)

# simulate the distributions and compare
X_bar.sim(10000).plot(type="bar")
S_sq.sim(10000).plot(type="bar")

_YOUR EXPLANATION HERE_

# The Variance Function

If two estimators are unbiased, we prefer the one with smaller variance, since it will be "closer" to the truth.

However, our simulation in Question 2 only shows that $\bar X$ has a smaller variance when $\lambda = 1.6$.


**Question 3.** Try several values of $\lambda$. Can you find any value of $\lambda$ where $S^2$ has a smaller variance than $\bar X$?

In [None]:
# YOUR CODE HERE

After simulating and comparing the distributions for several values of $\lambda$, you might start to suspect that $\bar X$ has a smaller variance than $S^2$ for every value of $\lambda$. But no simulation can prove this, since there are infinitely many possible values of $\lambda$. To prove this, we need math.

**Question 4.** Let $X_1, ..., X_{10}$ be i.i.d. $\text{Poisson}(\lambda)$. 

1. Calculate $\text{Var}[\bar X]$. Your answer should be a function of $\lambda$.
2. It can be shown that for i.i.d. observations $X_1, ..., X_n$ from any distribution,
$$\text{Var}[S^2] = \frac{E[(X_1 - E[X_1])^4] - \frac{n-3}{n-1} \text{Var}[X_1]^2}{n}.$$ 
Apply this formula to the random sample $X_1, ..., X_{10}$ from a Poisson distribution. If you did it correctly, your answer will be a quadratic function of $\lambda$. (_Hint:_ Set up LOTUS and plug it into [Wolfram Alpha](https://www.wolframalpha.com/).)
3. Graph $\text{Var}[\bar X]$ and $\text{Var}[S^2]$ as functions of $\lambda$. Based on your graph, which estimator seems better?

_YOUR EXPLANATION HERE_

In [None]:
# Define a grid of lambda values
import numpy as np
lams = np.arange(0, 5, step=0.01)

# You can graph functions of lambda as follows.
plt.plot(lams, lams / 2, '-')
plt.plot(lams, lams ** 2 + lams / 2, '-')

# Submission Instructions

1. If you worked with a different partner on this notebook than on the previous notebooks, [go here](https://canvas.calpoly.edu/courses/25458/groups) and add both you and your partner (if applicable) to one of the STAT 305 Groups.
2. Export this Colab notebook to PDF. Easiest way is File > Print > Save as PDF.
3. Double check that the PDF rendered properly (i.e., nothing is cut off).
4. Upload the PDF [to Canvas](https://canvas.calpoly.edu/courses/25458/assignments/112160). Only one of you needs to upload the PDF.