<a href="https://colab.research.google.com/github/dlsun/Stat425F19/blob/master/The_Bias_of_an_Estimator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
#@title Imports
!pip install -q symbulate
from symbulate import *

# Review of the First Lecture

![](https://github.com/dlsun/Stat425F19/blob/master/notes/img/prob_stat.png?raw=1)

## Example 1

A coin is tossed 100 times.

### Probability Question 

Suppose the coin has probability $0.5$ of coming up heads. What is the probability of observing 60 heads in the 100 tosses?

In [0]:
# 1 = heads, 0 = tails
model = BoxModel([0, 1], size=100, replace=True)
model.sim(5)

In [0]:
# Adding up the 0s and 1s gives the number of 1s (i.e., heads)
X = RV(model, sum)
X.sim(5)

In [0]:
# Simulate many instances and count how many are equal to 60.
X.sim(10000).count_eq(60) / 10000

### Statistics Question

The coin may or may not be fair; it has some probability $p$ of coming up heads. But we observed 60 heads in 100 tosses. Based on this data, how do we estimate $p$?

Intuitively, $\hat p = 60 / 100 = 0.6$. Is this estimate good or not? It's hard to say for certain whether an individual estimate is good. After all, even a fair coin ($p = 0.5$) could come up heads 100 / 100 times, in which case the estimate would be $\hat p = 100 / 100 = 1.0$, which is a terrible estimate.

Even though we cannot evaluate individual estimates, we can evaluate the _procedure_ for coming up with the estimate, given data. This procedure is called the **estimator**.

The procedure that we followed to come with $\hat p = 0.6$ is this: take the number of heads in the data and divide by the number of tosses. Let's evaluate this estimator by simulation.

In [0]:
# Suppose the coin is fair (p = 0.5)
model = BoxModel([0, 1], size=100, replace=True)

# Define the estimator
def estimator(data):
    # number of heads divided by the number of tosses
    return sum(data) / len(data)
p_hat = RV(model, estimator)

# Now simulate many estimates.
p_hat.sim(5)

In [0]:
# Make a plot of these estimates.
p_hat.sim(10000).plot()

We simulated the data from a model where the true probability of heads was $p = 0.5$. We see that the estimated probability is not always $0.5$ exactly. It is sometimes more, sometimes less. But in expectation, it is equal to $0.5$. Let's check this.

In [0]:
p_hat.sim(1000).mean()

The difference between this expectation and the truth is called the **bias** of the estimator.

$$ \text{bias} = E[\text{estimate} ] - \text{truth} $$

The simulation shows that the bias of $\hat p$ is $0$, at least when the true value of $p$ is $0.5$.

# The Bias of an Estimator

In general, the bias of an estimator $\hat\theta$ is defined as 

$$ \text{bias}(\theta) = E_\theta[\hat\theta] - \theta. $$

Note that the bias may be different for different values of $\theta$, so it is a function of $\theta$.

## Example 1 Revisited

Let's calculate the bias of $\hat p$ for _all_ values of $p$, not just $p = 0.5$.

Note that $\hat p = X / 100$, where $X$ is the number of heads. Note that $X$ is a $\text{Binomial}(n=100, p)$ random variable.

To calculate the bias, we first need to calculate $E_p[\hat p]$:
$$ E_p[\hat p] = E[X/100] = \frac{1}{100} E[X] = \frac{1}{100} (100 \cdot p) = p. $$

Therefore, the bias of $\hat p$ (as a function of $p$) is 
\begin{align}
\text{bias} &= E_p[\hat p] - p \\
&= p - p \\
&= 0
\end{align}
for all values of $p$.

An estimator with a bias of $0$ for all values of the parameter is said to be **unbiased**.

## Example 2

Let $X_1, \ldots, X_n$ be i.i.d. $\text{Normal}(\mu, \sigma=1)$, where $\mu$ is the parameter to be estimated. We know that the MLE of $\mu$ is 
$$ \hat\mu = \frac{1}{n} \sum_{i=1}^n X_i. $$
Calculate the bias of $\hat\mu$ (as a function of $\mu$). Is it unbiased?

First, we calculate the expected value:
\begin{align}
E_\mu[\hat\mu] &= E_\mu\left[\frac{1}{n} \sum_{i=1}^n X_i \right] \\
&= \frac{1}{n} E_\mu\left[ \sum_{i=1}^n X_i \right] \\
&= \frac{1}{n} \sum_{i=1}^n E[X_i] & \text{(linearity of expectation)} \\
&= \frac{1}{n} \sum_{i=1}^n \mu \\
&= \frac{1}{n} n \mu \\
&= \mu.
\end{align}

Therefore, the bias is 
$$ \text{bias} = \mu - \mu = 0 $$

## Example 3

Let $X$ be a $\text{NegativeBinomial}(r=1, p)$ random variable. Suppose we observe $X = 5$. The MLE of $p$ is $\hat p = 1/5$. What is the bias of this estimator?

First, we cannot judge whether $1/5$ is a good estimate of $p$ or not. We can only judge whether the procedure for estimating $p$ is good or not.

The "procedure", or estimator, is $\hat p = 1 / X$. To calculate its bias, we first evaluate its expected value for different values of $p$:
\begin{align}
E_p[\hat p] &= E_p[1 / X] \\
&= \sum_{x=1}^\infty 1/x \cdot (1-p)^{x-1} p \\
&= -\frac{p\log p}{1-p}
\end{align}
So the bias is 
$$ \text{bias} = -\frac{p\log p}{1-p} - p. $$

This estimator is not unbiased. Its bias changes, depending on the value of $p$.