In [0]:
#@title Imports
!pip install -q "symbulate==0.5.5"
from symbulate import *

import matplotlib.pyplot as plt
%matplotlib inline

In [0]:
#@title Define Plotting Functions

def plot_discrete_function(f, xlim=(0, 10), xlabel=r"$\theta$", ylabel="Likelihood"):
  xs = np.arange(np.ceil(xlim[0]), np.floor(xlim[1]) + 1, dtype=int)
  ys = [f(x) for x in xs]
  plt.plot(xs, ys, "ko-")
  plt.xlabel(xlabel, fontsize=18)
  plt.ylabel(ylabel, fontsize=18)
  plt.xlim(*xlim)

def plot_continuous_function(f, xlim=(0, 1), xlabel=r"$\theta$", ylabel="Likelihood"):
  xs = np.linspace(xlim[0], xlim[1], 1000)
  ys = [f(x) for x in xs]
  plt.plot(xs, ys, "k-")
  plt.xlabel(xlabel, fontsize=18)
  plt.ylabel(ylabel, fontsize=18)
  plt.xlim(*xlim)

# Maximum Likelihood Principle

_Probability Question:_ Suppose we roll a fair die 20 times. What is the probability that six comes up exactly 5 times?

The number of sixes follows a $\text{Binomial}(n=20, p=1/6)$ distribution, so the p.m.f. is 
\begin{align}
p_{n=20, p=1/6}(x) &= \binom{20}{x} (1/6)^x (1-1/6)^{20-x}.
\end{align}
Plugging in $x=5$ into this formula, we see that the probability is
$$p_{n=20, p=1/6}(5) = .1294.$$

In [0]:
Binomial(n=20, p=1 / 6).plot()
Binomial(n=20, p=1 / 6).pmf(5)

_Statistics Question:_ Suppose we roll a _loaded_ die 20 times, and six comes up 5 times. What is our estimate of $p$, the probability of a six?

Let's plug in everything we know into the binomial p.m.f.
$$ p_{n=20, p}(5) = \binom{20}{5} p^5 (1-p)^{20-5}. $$

Although we are working with the same binomial p.m.f. as before, our perspective has shifted. Before, we were thinking of the probabilities as a function of $x$; now, we are thinking of the probabilities as a function of $p$. We introduce new terminology for this new perspective.

The **likelihood** $L$ is the probability as a function of the parameter (e.g., $p, \mu, N$), which we denote by $\theta$ in general:
$$ L_{x}(\theta) \overset{def}{=} p_{\theta}(x). $$

Run the cell below to produce a plot that emphasizes the shift in perspective in going from the p.m.f. $p_\theta$ to the likelihood $L_x$.

In [0]:
#@title From PMF to Likelihood
ps = [1/6, 0.2, 0.23, 0.29] #@param {type:"raw"}

fig, axes = plt.subplots(1, 2, figsize=(12, 6))
ylim = (0, 0.25)
axes[0].set_ylim(ylim)

# draw plots
for p in ps:
  Binomial(n=20, p=p).plot(ax=axes[0])
  axes[1].plot([p], Binomial(n=20, p=p).pmf(5), 'o')
  
# make the right plot match the left plot
axes[1].set_xlim(0, 1)
axes[1].set_ylim(*axes[0].get_ylim())
axes[1].spines["bottom"].set_position("zero")
  
# draw vertical strip at x=5
axes[0].vlines(5, ylim[0], ylim[1], alpha=.3, linewidth=4)
  
# set legend
axes[0].legend(["$p=%s$" % p for p in ps])
axes[1].legend(["$p=%s$" % p for p in ps])

# set labels
axes[0].set_xlabel("$x$", fontsize=18)
axes[0].set_ylabel("Probability", fontsize=18)
axes[1].set_xlabel("$p$", fontsize=18)
axes[1].set_ylabel("Likelihood", fontsize=18);

Now let's plot the entire likelihood function when $X=5$, i.e., $ L_{X=5}(p)$.

In [0]:
def likelihood(p):
  return Binomial(n=20, p=p).pmf(5)

plot_continuous_function(likelihood, xlim=(0, 1),
                         xlabel="$p$", ylabel="Likelihood when $X=5$")

# Add the colored points from before.
for p in ps:
  plt.plot([p], Binomial(n=20, p=p).pmf(5), 'o')

How do we use the likelihood to come up with an estimate of $p$? We can choose the value of $p$ that maximizes $L(p)$. This is called the **maximum likelihood estimator**. The maximum likelihood estimate (MLE) when $X = 5$ turns out to be 
$$ \hat p_{\text{MLE}} = 0.25, $$
which agrees with our intuition that the answer should be $5 / 20$.

What if we had instead observed $X = 8$ sixes instead? What would the maximum likelihood estimate be then?

In [0]:
# YOUR CODE HERE

## Maximum Likelihood by Calculus

One way to obtain MLEs is to plot the likelihood and eyeball where the maximum occurs. A better way is to use the trick you learned in calculus for maximizing and minimizing functions: take the derivative, set it equal to zero, and solve the equation.

That is, to find the $p$ that maximizes
$$ L_{X=5}(p) = \binom{20}{5} p^5 (1-p)^{15}, $$
we first take the derivative (don't forget the [product rule](http://tutorial.math.lamar.edu/Classes/CalcI/ProductQuotientRule.aspx) and remember that $p$ is the variable here):
$$ L'(p) = \binom{20}{5} \left( 5 p^4 \cdot (1 - p)^{15} - p^5 \cdot 15 (1 - p)^{14} \right). $$

Now, we know that the $p$ that maximizes this function (the MLE $\hat p$) satisfies $L'(\hat p) = 0$. Solving this equation allows us to obtain the _exact_ value of $\hat p$.
\begin{align}
\binom{20}{5} \left( 5 \hat p^4 \cdot (1 - \hat p)^{15} - \hat p^5 \cdot 15 (1 - \hat p)^{14} \right) &= 0 \\
5 \hat p^4 \cdot (1 - \hat p)^{15} &= \hat p^5 \cdot 15 (1 - \hat p)^{14} \\
5 (1 - \hat p) &= 15 \hat p \\
\hat p &= \frac{5}{20} = 0.25.
\end{align}

(Technically, we should also check that $L''(\hat p) < 0$, so that we are sure that this is a maximum, rather than a minimum.)

# Maximum Likelihood for a Discrete Parameter

A professor would like to know how many students in her 30-person class has started studying for next week's exam. She asks 7 students at random, and 5 say they have started studying. How should she estimate the number of students in her 30-person class who have started studying?

The number of people in the sample who have started studying is a $\text{Hypergeometric}(n=7, N_1, N_0=30 - N_1)$ random variable. Let us first plot the p.m.f. for different values of $N_1$.

In [0]:
#@title From PMF to Likelihood
N1s = [12, 16, 20, 24] #@param {type:"raw"}

fig, axes = plt.subplots(1, 2, figsize=(12, 6))
ylim = (0, 0.45)
axes[0].set_ylim(ylim)

# draw plots
for N1 in N1s:
  dist = Hypergeometric(n=7, N1=N1, N0=30 - N1)
  dist.plot(ax=axes[0])
  axes[1].plot([N1], dist.pmf(5), 'o')
  
# make the right plot match the left plot
axes[1].set_xlim(0, 30)
axes[1].set_ylim(*axes[0].get_ylim())
axes[1].spines["bottom"].set_position("zero")
  
# draw vertical strip at x=5
axes[0].vlines(5, ylim[0], ylim[1], alpha=.3, linewidth=4)
  
# set legend
axes[0].legend(["$N_1=%s$" % N1 for N1 in N1s])
axes[1].legend(["$N_1=%s$" % N1 for N1 in N1s])

# set labels
axes[0].set_xlabel("$x$", fontsize=18)
axes[0].set_ylabel("Probability", fontsize=18)
axes[1].set_xlabel("$N_1$", fontsize=18)
axes[1].set_ylabel("Likelihood", fontsize=18);

Here is a general formula for the likelihood in terms of $N_1$:
$$ L_{X=5}(N_1) = p_{N_1}(5) = \frac{\displaystyle\binom{N_1}{5} \binom{30 - N_1}{7 - 5}}{\displaystyle\binom{30}{7}}. $$

Let's plot this likelihood. Remember that $N_1$ must be an integer, so this is a _discrete_ function.

In [0]:
def likelihood(N1):
  return Hypergeometric(n=7, N1=N1, N0=30 - N1).pmf(5)

plot_discrete_function(likelihood, xlim=(5, 30),
                       xlabel="$N_1$", ylabel="Likelihood")

We see that this likelihood is maximized when $N_1 = 22$, so the MLE is $\hat N_1 = 22$. (Note that this is different from the "naive" estimate $5/7 \cdot 30 \approx 21.42$, even when you round to the nearest integer.)

## Maximum Likelihood by Algebra (optional)

Is there a way we could have quickly determined where the likelihood was maximized, without evaluating the likelihood at every possible value of $N_1$? Because the likelihood is not defined when $N_1$ is not an integer, we cannot take its derivative. In other words, the approach above will not work.

Here is another approach. Consider the ratio of the likelihood at successive values of $N_1$, $L(N_1) / L(N_1 - 1)$. By comparing this ratio to $1$, we can determine where $L$ is increasing and where $L$ is decreasing. 

First, let's come up with a simple expression for the ratio.
\begin{align}
\frac{L(N_1)}{L(N_1 - 1)} &= \frac{\binom{N_1}{5} \binom{30 - N_1}{2} \big/ \binom{30}{7}}{\binom{N_1 - 1}{5} \binom{30 - (N_1 - 1)}{2} \big/ \binom{30}{7}} \\
&= \frac{\frac{N_1!}{5! (N_1 - 5)!} \cdot \frac{(30 - N_1)!}{2! (28 - N_1)!}}{\frac{(N_1 - 1)!}{5! (N_1 - 6)!} \cdot \frac{(31 - N_1)!}{2! (29 - N_1)!}} \\
&= \frac{N_1 (29 - N_1)}{(N_1 - 5)(31 - N_1)}
\end{align}

This ratio is $> 1$ if and only if the numerator is greater than the denominator.
\begin{align} 
N_1 (29 - N_1) &> (N_1 - 5)(31 - N_1) \\
29 N_1 - N_1^2 &> - N_1^2 + 36 N_1 - 155 \\
155 &> 7 N_1 \\
N_1 &< 155/7 \approx 22.14.
\end{align}
So we see that the likelihood will continue to increase up until $N_1 = 22$, after which it decreases. So the maximum is achieved at $\hat N_1 = 22$.