# Maximum Likelihood

The Maximum Likelihood is a fundamental concept in statistics that shows us the set of parameters that maximizes the likelihood function. The likelihood function measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters.

It plays a key role in statistical inference. The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For example, one may be interested in the heights of adult female penguins, but be unable to measure the height of every single penguin in a population due to cost or time constraints. Assuming that the heights are normally (Gaussian) distributed, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population.


## Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally as the bell curve.

In the case of a single variable, the distribution can be specified by two parameters: mean ($\mu$) and variance ($\sigma^2$). If a random variable X follows the normal distribution, we write:

$$ X \sim N(\mu,\sigma^2) $$

The probability density of the normal distribution is

$$ f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} } $$

where:

- $\mu$ is the mean or expectation of the distribution (and also its median and mode),
- $\sigma$ is the standard deviation, and
- $\sigma^2$ is the variance.

## Maximum Likelihood Estimation (MLE)

Suppose we have a sample of $n$ observations $\{x_1, x_2, ..., x_n\}$ that are independently and identically distributed (i.i.d.) as Normal$(\mu, \sigma^2)$, and we want to estimate the parameters $\mu$ and $\sigma^2$.

The likelihood function is given by:

$$ L(\mu, \sigma^2 \mid x) = f(x_1, x_2, ..., x_n \mid \mu, \sigma^2) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x_i-\mu)^2}{2\sigma^2} } $$

The maximum likelihood estimates (MLEs) are the values of $\mu$ and $\sigma^2$ that maximize the likelihood function.

It can be shown that the MLEs are:

$$ \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i $$

and

$$ \hat{\sigma^2} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{\mu})^2 $$

which are just the sample mean and variance, respectively.

## References

1. Casella, G., Berger, R.L. (2002). Statistical Inference. Duxbury.
2. Cramér, H. (1999). Mathematical Methods of Statistics. Princeton University Press.
3. Fisher, R.A. (1921). On the 'probable error' of a coefficient of correlation deduced from a small sample. Metron.
4. Fisher, R.A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London A. 222.
5. Neyman, J., Pearson, E.S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A. 231.