<a href="https://colab.research.google.com/github/dlsun/Stat305-S20/blob/master/colabs/notebooks/STAT_305_Notebook_5_Mean_Squared_Error.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I encourage you to work through this notebook with a partner so that you can discuss your answers. You should meet over an application such as Discord or Zoom. One person can share their screen with this notebook open.

In [None]:
# This is a code cell.
# To run the code in this cell, click on it and press the "Play" button.
!pip install -q symbulate
from symbulate import *

import numpy as np
import matplotlib.pyplot as plt

# Mean-Squared Error

In the last notebook, we saw that among _unbiased_ estimators, the one with the smallest variance is preferred, since it will be closest to the parameter on average. 

But what if the estimators are biased? For example, in Example 1, we considered two estimators of a binomial proportion $p_1$:

\begin{align*}
\hat p_1 &= \frac{X}{n} & \tilde p_1 &= \frac{X + 2}{n + 4}.
\end{align*}

We saw that $\hat p_1$ is unbiased, but $\tilde p_1$ is biased. How do we decide which of these two estimators is better?

If we want the estimator that comes closest to $\theta$ on average, we can measure that directly. If $\hat\theta$ represents the estimator and $\theta$ the parameter being estimated, then the **mean-squared error** (or MSE, for short) is defined as:

$$ \text{MSE}[\hat\theta] = E[ (\hat\theta - \theta)^2 ]. $$

It measures, on average, how far the estimator $\hat\theta$ is from the true value of $\theta$.

The MSE is not the same as the variance because $E[\hat\theta]$ may not be equal to $\theta$ (that is, $\hat\theta$ may not be unbiased).

**Question 1.** Derive a relation between the MSE and the bias and variance. (You may want to review the definitions of bias and variance.)

\begin{align}
\text{MSE}[\hat\theta] = E[(\hat\theta - \theta)^2] &= E\Big[\big((\hat\theta - E[\hat\theta]) + (E[\hat\theta] - \theta)\big)^2 \Big] & \text{(add and subtract $E[\hat\theta]$)} \\
&= ... & \text{(expand the square)} \\
&= ... & \text{(middle term is zero)} \\
&= ... & \text{(definition of variance and bias)}
\end{align}

You can check your answer by simulating the bias, variance, and MSE, and seeing if the relation you derived holds.

In [None]:
n = 50
p1 = 0.1

X = RV(Binomial(n, p1))

p1_tilde = (X + 2) / (n + 4)

ps = p1_tilde.sim(10000)
(ps.mean() - p1,      # bias
 ps.var(),            # variance
 mean((ps - p1) ** 2) # MSE
)

**Question 1b.** Complete the blanks below:

If an estimator $\hat\theta$ is unbiased, then the MSE is just the -----. The estimator with the smallest MSE is just the estimator with the smallest -----.

**Question 2.** In the setting of Example 1, calculate the bias, variance, and MSE of $\hat p_1$ and $\tilde p_1$ for estimating $p_1$.

Graph the bias, variance, and MSE as functions of $p_1$. Is there a clear winner between $\hat p_1$ and $\tilde p_1$?

_YOUR MATH HERE_

In [None]:
# Some code to graph functions is provided below.

# Define a grid of p1 values
p1s = np.arange(0, 1, step=0.01)

# You can graph functions of p1 as follows.
plt.plot(p1s, p1s, '-')
plt.plot(p1s, p1s + p1s ** 2, '-')
plt.legend([r"$\hat p_1$", r"$\tilde p_1$"])

_YOUR EXPLANATION HERE_

# The MSE of the Variance Estimator

In the notebook "Estimating the Variance", you showed that if $X_1, \ldots, X_n$ are i.i.d. from some distribution, then the sample variance 
$$ S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)^2 $$
is unbiased for estimating $\sigma^2 = \text{Var}[X_i]$. However, the estimator 
$$ S_0^2 = \frac{1}{n} \sum_{i=1}^n (X_i - \bar X)^2 $$
is biased for estimating $\sigma^2$. Let's see which of $S^2$ and $S_0^2$ has the smaller MSE.

**Question 3.** Let's revisit the setting of Example 2. We have 10 measurements of the number of radioactive particles that hit a Geiger counter over 1-second intervals, and the goal is to estimate the rate $\lambda$ of the Poisson process. That is, we observe $X_1, X_2, \ldots, X_{10}$, i.i.d. from a $\text{Poisson}(\lambda)$ distribution.

In the notebook "Comparing Unbiased Estimators", you derived $\text{Var}[S^2]$ as a quadratic function of $\lambda$.

1. Use previous results to calculate $\text{MSE}[S^2]$.
2. Calculate the bias and variance of $S_0^2$. Use these to calculate $\text{MSE}[S_0^2]$.

    (Hint: The bias and variance are easy to calculate, if you just observe that $S_0^2 = \frac{n-1}{n} S^2$. Now, just use properties of expected value and variance to calculate $E[S_0^2]$ and $\text{Var}[S_0^2]$, since you already know $E[S^2]$ and $\text{Var}[S^2]$.)

3. Graph the MSE of $S^2$ and $S_0^2$ as functions of $\lambda$. Is there a clear winner? (The answer may surprise you!)

_YOUR MATH HERE_

In [None]:
# YOUR CODE HERE (for making the graphs)

_YOUR EXPLANATION HERE_

# Submission Instructions

This notebook does not need to be turned in, since you'll be using these answers on your final project!