**Maximum Likelihood Estimation (MLE)**  is a statistical method used to estimate the parameters of a probability distribution based on observed data. It aims to find the set of parameter values that maximize the likelihood function, which measures the probability of observing the given data under the assumed distribution.

**Algorithm**:
1. Define a probability distribution with unknown parameters.
2. Collect a set of observed data.
3. Construct the likelihood function, which calculates the probability of observing the data for different parameter values.
4. Maximize the likelihood function by finding the parameter values that yield the highest probability of observing the data.

# MLE for Gaussian population

Suppose we have $n$ samples $\textbf{X} = (X_1,X_2,\dots,X_n)$ from a Gaussian distribution with mean $\mu$ and variance $\sigma^2$ This means that $X_i​∼i.i.d.N(μ,σ^2)$.

If we want the MLE for $\mu$ and $\sigma$ the first step is to define the likelihood. If both $\mu$ and $\sigma$ are unknown, then the likelihood $L$ will be a function of these two parameters. For a realization of $\textbf{X}$, given by $x=(x_1,x_2,…,x_n)$
$$\large\begin{array}{rl}
L(\mu,\sigma; \boldsymbol{x}) = \prod_{i=1}^n f_{X_i}(x_i) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\sigma } e^{-\frac{1}{2}\frac{(x_i-\mu)²}{\sigma²}} \\
\\
\Large  = \frac{1}{\left(\sqrt{2\pi}\right)^n\sigma^n }e^{-\frac{1}{2}\frac{\sum_{i=1}^n (x_i-\mu)²}{\sigma²}}
\end{array}$$  

Now all we have to do is find the values of $\mu$ and $\sigma$ that maximize the likelihood $L(μ,σ;x)$.
To do this analytically we have to take the derivative of the Likelihood function and equate it to 0. The values of $\mu$ and $\sigma$ that make the derivative zero, are the extreme points. In our case they will be maximums.

It is difficult to take the derivative of the likelihood due to all the products involved, but we can take the logarithm of the likelihood, because logarithm function is always increasing – meaning that all the values that maximize $L(\mu, \sigma; \mathbf{x})$ will also maximize it's logarithm.  

This is the **Log – Likelihood**:

$$\cal{l}(\mu,\sigma) = log(L(\mu,\sigma;\mathcal{x})$$  

Important property of a logarithm is to turn products into sums, meaning that $log(a \cdot b) = log(a) + log(b)$. This property simplifies taking derivatives a LOT. There is a couple more properties of logarithm which we will have to utilize to get the optimal expression for the **Log–Likelihood for a Gaussian population**:
$$\begin{align}
log(\frac{1}{a}) = - log(a) \\
log(a^k) = k \cdot log(a)
\end{align}$$

$$
\begin{align}\ell(\mu,\sigma) = \log\left( \frac{1}{\left(\sqrt{2\pi}\right)^n\sigma^n }e^{-\frac{1}{2}\frac{\sum_{i=1}^n (x_i-\mu)²}{\sigma²}}\right)
\end{align}
$$
... which we can further simplify to:
***
**Log - Likelihood for Gaussian population:**

$$
 \large \ell(\mu,\sigma)= -\frac{n}{2}\log(2\pi) - n\log(\sigma) - \frac{1}{2}\frac{\sum_{i=1}^n (x_i-\mu)²}{\sigma²}\\
$$
***
Now to find the MLE for $\mu$ and $\sigma$ we must find the partial derivatives and equate them to zero.
***
**Derivative w.r.t. $\LARGE \mu$:**
$$
\frac{\partial}{\partial\mu}\ell(\mu,\sigma) = -\frac{1}{2}\frac{\sum_{i=1}^n2(x_i-\mu)}{\sigma^2}(-1)
$$
$$
\large= \frac{1}{\sigma^2}\left(\sum_{i=1}^n x_i-\sum_{i=1}^n\mu\right) = \frac{1}{\sigma^2}\left(\sum_{i=1}^n x_i - n\mu\right)  
$$
***
**Derivative w.r.t. $\LARGE \sigma$:**

$$
\frac{\partial}{\partial\sigma}\ell(\mu,\sigma) = -\frac{n}{\sigma}-\frac{1}{2}\left(\sum_{i=1}^n(x_i-\mu)^2\right)(-2)\frac{1}{\sigma^3}
$$
$$
\large = -\frac{n}{\sigma}+\left(\sum_{i=1}^n(x_i-\mu)^2\right)\frac{1}{\sigma^3}
$$
***
Now we need to equate partial derivatives to $0$ to find estimates for $\LARGE \mu, \sigma$. 

# MLE for $\large \mu$ :
$$
\frac{\partial}{\partial\mu}\ell(\mu,\sigma) =\frac{1}{\sigma^2}\left(\sum_{i=1}^n x_i - n\mu\right)  
$$
Notice that since $\large \sigma > 0$, the only option is that $\sum_{i=1}^n x_i - n\mu = 0$. Simple algebraic manipulation leads us to 


$$
\Large \hat \mu = \frac{\sum_{i=1}^nx_i}{n}=\bar x,\text{ which is the sample mean}
$$
***
## MLE for  $\Large \sigma:$
$$
\frac{\partial}{\partial\sigma}\ell(\mu,\sigma) = -\frac{n}{\sigma}+\left(\sum_{i=1}^n(x_i-\mu)^2\right)\frac{1}{\sigma^3} = 0
$$
Since $\large \sigma > 0$ we can simplify the expression to
$$
\frac{\partial}{\partial\sigma}\ell(\mu,\sigma) = -n+\left(\sum_{i=1}^n(x_i-\mu)^2\right)\frac{1}{\sigma^2} = 0
$$
We can replace $\large \mu$ with it's estimate $\large \hat \mu = \bar x$, because we want partial derivatives to be equal $0$ at the same time. Now we get:
$$
\frac{\partial}{\partial\sigma}\ell(\mu,\sigma) = -n+\left(\sum_{i=1}^n(x_i-\bar x)^2\right)\frac{1}{\sigma^2} = 0
$$
This gives us
$$
\sigma^2 = \frac{\sum(x_i - \bar x)^2}{n}
$$
$$
\large \sigma=\sqrt{\frac{\sum(x_i - \bar x)^2}{n}}
$$
***
## Example

Suppose we have sample data of the heights of teenagers in UK and we want to estimate the mean and standard deviation of all of UK's teenagers (population)

$Data = 66.75, 70.24, 67.19, 67.09, 63.65, 64.64, 69.81, 69.79, 73.52, 71.74$

Each measure is supposed to come from a Gaussian (Normal) distribution with unknown $\large \mu, \sigma$, 
Maximum Likelihood Estimation for the parameters with this samples are:  

$$
\hat \mu = \frac{66.75+70.24+67.19+67.09+63.65+64.64+69.81+69.79+73.52+71.74}{10} =68.44 
$$


$$
\hat \sigma = \sqrt{\frac{1}{10}\left((66.75-68.442)^2+(70.24-68.442)^2+ (67.19-68.442)^2 + (67.09 - 68.442)^2 + (63.65-68.442)^2 +(64.64 - 68.44)^2 + (69.81-68.44)^2 + (69.79 - 68.44)^2 + (73.52-68.44)^2 + (71.74-68.44)^2\right)} = 2.954
$$
