In [1]:
from datascience import *
import numpy as np
from math import *
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

## Lesson 29: Maximum Likelihood Estimation

Last lesson, we studied method of moments estimators. These estimators are obtained by setting the moments of a distribution equal to the sample moments obtained from an independent random sample, and then solving for the parameters of interest. As we saw, method of moments estimators are relatively easy to find, but don't always make sense (as in the case of $X\sim \textsf{Unif}(0,b)$.) 

Another way to estimate is by maximizing the likelihood function. First, we should introduce the likelihood function. The likelihood function, $L(\theta \mid \textbf{x})$, is a function of $\theta$ that is larger for likelier values of $\theta$. Finding the value of $\theta$ that maximizes this function yields a maximum likelihood estimator, or $\hat{\theta}_{ML}$. 

Let $X_1,X_2,...,X_n$ be a sequence of iid random variables with mass or density function $f(x;\theta)$. The likelihood function is given by:

$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta)
$$

Often, it is easier to deal with the log of the likelihood function. This is because the log of a product is the sum of individual logs, which is often analytically "nicer". The log-likelihood function is denoted as $l(\theta \mid \textbf{x})$ and is given by:

$$
l(\theta\mid\textbf{x})=\log \prod_{i=1}^n f(x_i;\theta) = \sum_{i=1}^n \log f(x_i;\theta)
$$
 

*Class note: **x** refers to the* vector *x.*

### Example 1: Exponential Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the exponential distribution with unknown parameter $\lambda$. I would like to obtain $\hat{\lambda}_{ML}$, the maximum likelihood estimate of $\lambda$. 

Recall that if $X\sim \textsf{Exp}(\lambda)$, then $f(x)=\lambda e^{-\lambda x}$. So,

$$
L(\theta\mid \textbf{x}) = \prod_{i=1}^n f(x_i;\theta) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n e^{-\lambda \sum x_i}
$$

Maximizing this through differentiation looks difficult. Let's consider the log-likelihood instead: 

$$
l(\theta\mid \textbf{x}) = n \log \lambda - \lambda \sum x_i
$$

This looks easier. Take the derivative with respect to $\lambda$ and set to 0. Then solve for $\lambda$. I leave this next step to you. How does your answer compare to $\hat{\lambda}_{MoM}$? 

$$
\begin{align}
l(\theta\mid \textbf{x}) &= n \log \lambda - \lambda \sum x_i\\
\frac{d}{d\lambda}(l(\theta\mid\textbf{x})) &= \frac{d}{d\lambda}(n\log\lambda) - \frac{d}{d\lambda}\left(\lambda\sum x_i\right)\\
\text{Rel Max for likelihood (L) of }&\lambda\text{ is at same place as max of l, so}\\
0&=n\left(\frac{1}{\lambda}\right) - \sum x_i\\
\sum x_i &= \frac{n}{\lambda}\\
\lambda&=\frac{n}{ \sum x_i }\\
\hat{\lambda}_{MoM}&=\frac{1}{ \bar{X} }
\end{align}
$$

$\frac{d^2}{d\lambda^2}=\frac{-n}{\lambda^2}$, so l has no minimums, only a maximum.

### Example 2: Uniform Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables from the continuous uniform distribution on $0 \leq X \leq b$ with unknown parameter $b$. I would like to obtain $\hat{b}_{ML}$, the maximum likelihood estimate of $b$. 

This one is trickier since the domain of $X$ depends on the parameter we are trying to estimate. So I will start you off with a hint. The pdf of $X$ is $f(x)=\frac{1}{b}$ where $0\leq x \leq b$ and 0 otherwise. Another way to write this is with indicator functions:

$$
f(x)={1\over b}I(x\leq b)
$$

where $I(x\leq b)$ is equal to 1 if $x \leq b$ and 0 otherwise. 

$$
\begin{align}
L(b\mid \textbf{x})&=\prod_{i=1}^n f(x_i;b)\\
&=\prod_{i=1}^n \left(\frac{1}{b} I(x_i\leq b) \right)\\
&=\left(\prod_{i=1}^n \frac{1}{b}\right)\left(\prod_{i=1}^n I(x_i\leq b) \right)\\
&=\left(\frac{1}{b}\right)^n I(x_1\leq b \text{ and } x_2\leq b \text{ and...and } x_n\leq b)\\
L(b\mid \textbf{x})&=(b^{-n}) I(\text{max}(\textbf{x})\leq b)\\
\frac{d}{db}L(b\mid \textbf{x})&=\frac{d}{db} (b^{-n}) \text{ where }b\geq\text{max}(\textbf{x})\\
0&=-n b^{-n-1} \text{ where }b\geq\text{max}(\textbf{x})\\
\end{align}
$$

*Since $\frac{-n}{b^{n+1}}\neq 0$ for all b, there is no relative maximum to the likelihood function. Therefore, the maximum likelihood must be at an endpoint of the function:*

$\lim_{b\rightarrow\infty} L(b\mid \textbf{x})=0\\
L(\text{max}(\textbf{x})\mid\textbf{x})=\frac{1}{\text{max}(\textbf{x})^n}>0$

*Therefore we know that the most likely value for b is $b=\text{max}(\textbf{x})$.*


### Example 3: Binomial Distribution

Suppose $X_1,X_2,...,X_n$ is an iid sequence of random variables with the binomial distribution with 20 trials and unknown probability of success $\pi$. Find the maximum likelihood estimate of $\pi$. 

$$
\begin{align}
L(\pi\mid \textbf{x})&=\prod_{i=1}^n f(x_i;20,\pi)\\
&=\prod_{i=1}^n {20\choose x_i} \pi^{x_i} (1-\pi)^{20-x_i}\\
&=\prod_{i=1}^n \frac{20!}{(20-x_i)! x_i!} \pi^{x_i} (1-\pi)^{20-x_i}\\
L(\pi\mid \textbf{x})&= (20!)^n \left(\prod_{i=1}^n\frac{1}{(20-x_i)! x_i!}\right) \pi^{\left(\sum_{i=1}^n x_i\right)} (1-\pi)^{\left(\sum_{i=1}^n (20-x_i)\right)}\\
l(\pi\mid \textbf{x})&=n\log(20!) + \sum_{i=1}^n \log\left(\frac{1}{(20-x_i)! x_i!}\right) + \log\pi\sum_{i=1}^n x_i + \log(1-\pi)\sum_{i=1}^n (20-x_i)\\
l(\pi\mid \textbf{x})&=n\log(20!) + \sum_{i=1}^n \log\left(\frac{1}{(20-x_i)! x_i!}\right) + n\bar{X}\log\pi - 20 n^2 \bar{X} \log(1-\pi)\\
\frac{d}{d\pi}l(\pi\mid \textbf{x})&= 0 + 0 + \frac{n\bar{X}}{\pi} + \frac{20 n^2 \bar{X}}{1-\pi}\\
0&=\frac{n\bar{X}(1-\hat{\pi}_{ML})+20 n^2\bar{X}\hat{\pi}_{ML}}{\hat{\pi}_{ML}(1-\hat{\pi}_{ML})}\\
0&=1-\hat{\pi}_{ML}+20n\hat{\pi}_{ML}\\
\hat{\pi}_{ML}&=\frac{1}{1-20n}
\end{align}
$$