<a href="https://colab.research.google.com/github/dlsun/Stat305-S20/blob/master/colabs/notebooks/STAT_305_Notebook_8_Maximum_Likelihood_for_Continuous_Random_Variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I encourage you to work through this notebook with a partner so that you can discuss your answers. You should meet over an application such as Discord or Zoom. One person can share their screen with this notebook open.

In [None]:
!pip install -q symbulate
from symbulate import *

import matplotlib.pyplot as plt

# Maximum Likelihood for Continuous Random Variables

So far, we have defined the likelihood for _discrete_ data as the p.m.f., viewed as a function of the parameter:

$$ L_X(\theta) = f_\theta(X). $$

Note that $X$ here may denote multiple observations, in which case $f_\theta$ is the joint p.m.f.

But what if the data $X$ is continuous?

For continuous data, we simply define the likelihood as the p.d.f., viewed as a function of the parameter:

$$ L_X(\theta) = f_\theta(X). $$

Notice that our notation does not distinguish between p.m.f.s and p.d.f.s. In terms of maximum likelihood estimation, we handle p.m.f.s and p.d.f.s in _exactly the same_ way.

# Example 3 Revisited

The true weight of NB10 is an unknown number $\mu$. It is very close to 10 grams (its weight is between 9.999 and 10.000), so we will report all of our measurements in _micrograms below 10 grams_. 

The 100 measurements of the weight of NB10 produced the following data (in micrograms below 10 grams).

In [None]:
data = [409,400,406,399,402,406,401,403,401,403,398,403,407,402,401,399,400,401,405,402,408,399,399,402,399,397,407,401,399,401,403,400,410,401,407,423,406,406,402,405,405,409,399,402,407,406,413,409,404,402,404,406,407,405,411,410,410,410,401,402,404,405,392,407,406,404,403,408,404,407,412,406,409,400,408,404,401,404,408,406,408,406,401,412,393,437,418,415,404,401,401,407,412,375,409,406,398,406,403,404]

We assumed that our data $X_1, ..., X_{100}$ were i.i.d. random variables from a $\text{Normal}(\mu, \sigma)$ distribution. How would we estimate $\mu$?

Although we do not know $\sigma$, let's pretend that it is known. This would be a problem if the MLE of $\mu$ turns out to depend on $\sigma$, but it does not.

The likelihood is just the joint p.d.f. of the 100 observations, which we simplify as much as possible:

\begin{align*}
L_X(\mu) &= f_\mu(409, 400, ..., 404) \\
&= \frac{1}{\sigma\sqrt{2\pi}} e^{-(409 - \mu)^2/2\sigma^2} \cdot \frac{1}{\sigma\sqrt{2\pi}} e^{-(400 - \mu)^2/2\sigma^2} \cdot ... \cdot \frac{1}{\sigma\sqrt{2\pi}} e^{-(404 - \mu)^2/2\sigma^2} \\
&= \frac{1}{\sigma^{100} (2\pi)^{50}} e^{-((409 - \mu)^2 + (400 - \mu)^2 + ... + (404 - \mu)^2) / 2\sigma^2}.
\end{align*}

To make this easier to optimize, we calculate the log-likelihood.

$$ \ell(\mu) = \log \frac{1}{\sigma^{100} (2\pi)^{50}} - \frac{1}{2\sigma^2} ((409 - \mu)^2 + (400 - \mu)^2 + ... + (404 - \mu)^2). $$

Let's graph the log-likelihood function for this data, assuming $\sigma = 5$. Notice that the log-likelihood is in the negative thousands, meaning that the likelihood is on the order of $e^{-1000}$, which is an unbelievably small number.


In [None]:
mus = np.arange(350, 450)
sigma = 5

plt.plot(mus,
         [-log(sigma ** 100 * (2 * pi) ** 50) 
          -sum((x - mu) ** 2 for x in data) / (2 * sigma ** 2)
          for mu in mus],
         "-")

Now, at this point, we could start taking derivatives. However, we will simplify a bit further. Notice that the first term does not depend on $\mu$, so it does not affect where the log-likelihood is maximized. The constant $\frac{1}{2\sigma^2}$ also does not affect where the log-likelihood is maximized, but the negative sign matters.

So to calculate the MLE, we only need to maximize a part of that expression.

$$ \hat\mu = \underset{\mu}{\arg\max} -((409 - \mu)^2 + (400 - \mu)^2 + ... + (404 - \mu)^2). $$

Since we are maximizing a negative, we can equivalently _minimize_ the positive expression.

$$ \hat\mu = \underset{\mu}{\arg\min}\  (409 - \mu)^2 + (400 - \mu)^2 + ... + (404 - \mu)^2. $$

We know the value of $\mu$ that minimizes this expression. In Question 1 from Notebook 3 on "Estimating the Variance", you showed that the value of $\mu$ that minimizes the sum-of-squared-distances to a set of numbers is the (sample) mean of those numbers. So the MLE is 

$$ \hat\mu = \frac{409 + 400 + ... + 404}{100} = 404.59. $$

**Question 1.** Generalize the calculation above, and derive an expression for the MLE of $\mu$, in terms of $X_1, ..., X_n$.

_YOUR ANSWER HERE_

**Question 2.** Now let's calculate the MLE of $\sigma$. We just showed that no matter the value of $\sigma$, $\mu$ is maximized for $\bar X = \frac{1}{n} \sum_{i=1}^n X_i$, so plug this in for $\mu$ and maximize the expression for $\sigma$. Does this correspond to the biased or the unbiased estimator of $\sigma^2$?

_YOUR ANSWER HERE_

# Example 6: Two Gauges

You have two pressure gauges for checking tire pressure. One gauge is more accurate than the other. The more accurate gauge produces readings (in psi) from a $\text{Normal}(\mu, \sigma=1)$ distribution, where $\mu$ is the true tire pressure. The less accurate gauge produces readings (in psi) from a $\text{Normal}(\mu, \sigma=2)$ distribution.

You take one measurement $X$ from the more accurate gauge and an independent measurement $Y$ from the less accurate one. $X = 27$, while $Y = 29$. How should we use these measurements to estimate the true tire pressure? Should we throw away the reading from the less accurate gauge?

**Question 3.** Write down the likelihood of $\mu$ for this data. (The likelihood is just the joint p.d.f. of $X$ and $Y$.) Calculate the MLE of $\mu$.



_YOUR ANSWER HERE_

**Question 4.** Calculate the MSE of the MLE for estimating $\mu$. How does it compare with the MSE of just using $X$?

_YOUR ANSWER HERE_

# Submission Instructions

1. [Go here](https://canvas.calpoly.edu/courses/25458/groups), and add you and your partner (if applicable) to one of the STAT 305 Groups.
2. Export this Colab notebook to PDF. Easiest way is File > Print > Save as PDF.
3. Double check that the PDF rendered properly (i.e., nothing is cut off).
4. Upload the PDF [to Canvas](https://canvas.calpoly.edu/courses/25458/assignments/114030). Only one of you needs to upload the PDF.