<a href="https://colab.research.google.com/github/dlsun/Stat425F19/blob/master/The_Fisher_Information.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
#@title Import Symbulate
!pip install -q symbulate
from symbulate import *

In [0]:
#@title Define Plotting Functions
import matplotlib.pyplot as plt

def plot_continuous_function(f, xlim=(0, 1), 
                             xlabel=r"$\theta$", 
                             ylabel="Log Likelihood $\ell$"):
  xs = np.linspace(xlim[0], xlim[1], 1000)
  ys = [f(x) for x in xs]
  plt.plot(xs, ys, "-")
  plt.xlabel(xlabel, fontsize=18)
  plt.ylabel(ylabel, fontsize=18)
  plt.xlim(*xlim)

# The Fisher Information

$
\def\Var{\text{Var}}
\def\Cov{\text{Cov}}
$

In [0]:
# Generate some data
X = RV(Normal(0, 1) ** 10)
x = X.draw()
x

In [0]:
#@title Plot Log Likelihoods

# Define the log-likelihood of all 10 observations
def log_likelihood(mu):
  n = len(x)
  return -n * log(sqrt(2 * pi)) - sum((x - mu) ** 2 / 2)

# Define the log-likelihood of just the first 2 observations
def log_likelihood_2(mu):
  n = 2
  return -n * log(sqrt(2 * pi)) - sum((x[:2] - mu) ** 2 / 2)

# Plot the two log-likelihoods
plot_continuous_function(log_likelihood, xlim=(-3, 3))
plot_continuous_function(log_likelihood_2, xlim=(-3, 3))

plt.legend(["$n = 10$", "$n = 2$"])

How would you compare the blue log-likelihood (based on $n=10$ observations) to the green log-likelihood (based on $n=2$ observations)?

The log-likelihood with more data is more "curved". The more curved the log-likelihood is, the more precise the MLE is.

In mathematics, curvature is measured using the second derivative. That is, $\ell''$ contains information about the precision of our estimator.

## Fisher Information

The **Fisher information** measures the amount of information that data $X$ contains about the parameter $\theta$. It is defined as 

$$ I(\theta) \overset{\text{def}}{=} E[-\ell''(\theta)]. $$

Note the negative sign. This is because the log-likelihood $\ell$ is usually concave down, so $\ell''$ is negative. To make the measure of curvature positive, we tack on a negative sign in the definition.

## Cramer-Rao Inequality (a.k.a. Information Inequality)

We have said that among unbiased estimators, we prefer the one with the smallest variance. The **Cramer-Rao Inequality** provides a lower-bound on the variance of _any_ unbiased estimator.

Let $T(X)$ be any unbiased estimator of $\theta$. Then 
$$ \Var_\theta[T(X)] \geq \frac{1}{I(\theta)}. $$

We will prove this theorem next time. Today, we will focus on how to apply this theorem.

If we can find an unbiased estimator that has $\frac{1}{I(\theta)}$ as its variance, then we can be certain that we have found the best possible unbiased estimator. This estimator is called **efficient**.

# Example

Let $X$ be $\text{Binomial}(n, p)$. Show that there is no unbiased estimator of $p$ that is better than the MLE $\hat p = X / n$.

The log-likelihood is 
$$ \ell(p) = \log\binom{100}{X} + X \log p + (n - X) \log (1 - p). $$
The score is 
$$ \ell'(p) = \frac{X}{p} - \frac{n-X}{1-p}. $$
Taking another derivative, we get 
$$ \ell''(p) = -\frac{X}{p^2} - \frac{n-X}{(1-p)^2}. $$ 
The Fisher information is 
\begin{align}
I(p) = E[-\ell''(p)] &= \frac{E[X]}{p^2} + \frac{n-E[X]}{(1-p)^2} \\
&= \frac{np}{p^2} + \frac{n-np}{(1-p)^2} \\
&= \frac{n}{p} + \frac{n}{1-p} \\
&= \frac{n}{p(1-p)}.
\end{align}

The Cramer-Rao inequality says that no unbiased estimator can have a variance lower than 
$$ \frac{1}{I(p)} = \frac{p(1-p)}{n}. $$

Notice that $\hat p$ is unbiased, with variance equal to 
$$ \Var[\hat p] = \Var[\frac{X}{n}] = \frac{1}{n^2} np(1-p) = \frac{p(1-p)}{n}, $$
so it actually achieves the minimum possible variance.