# Understanding Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) is a method for estimating the parameters of a statistical model. Given a dataset and a probability distribution, the goal of MLE is to find the parameter values (e.g., mean $( \mu )$ and standard deviation $( \sigma )$ that maximize the likelihood of observing the given data.

## Key Concepts
- **Likelihood Function**: The likelihood function is the joint probability of observing the data, treated as a function of the parameters.
- **Log-Likelihood**: To simplify calculations, we often work with the log-likelihood because it converts products into sums and is easier to differentiate.

## Likelihood Function for a Normal Distribution
$$
L(\mu, \sigma \mid x) = \prod_{i=1}^{n} f(x_i \mid \mu, \sigma)
$$

where:
$
f(x_i \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}
$

## Log-Likelihood Function
Taking the logarithm:
$
\ell(\mu, \sigma \mid x) = -\frac{n}{2} \log (2\pi) - \frac{n}{2} \log (\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2
$



![likelihood vs log likelihood](image1.png)


**image taken from statquest YT channel**

# Finding the Maximum Likelihood Estimates (MLEs)

To find the MLEs analytically, we differentiate the log-likelihood function with respect to the parameters \( \mu \) and \( \sigma \), set the derivatives to zero, and solve.

Given a dataset $( x_1, x_2, \dots, x_n )$ sampled from a normal distribution $( \mathcal{N}(\mu, \sigma^2) )$, the likelihood function is:

$$
L(\mu, \sigma | x) = \prod_{i=1}^{n} f(x_i | \mu, \sigma)
$$

where the probability density function (PDF) of the normal distribution is:

$
f(x_i | \mu, \sigma) = \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}
$

Taking the **log-likelihood**:

$
\ell(\mu, \sigma | x) = \sum_{i=1}^{n} \log f(x_i | \mu, \sigma)
$

Expanding the terms:

$
\ell(\mu, \sigma | x) = -\frac{n}{2} \log (2\pi) - n \log \sigma - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2
$

---


![lmage2](image2.png)

---

## Finding the MLE for \( \mu \)
To find the MLE for \( \mu \), we take the derivative of the log-likelihood with respect to $( \mu )$:

$$
\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu) = 0
$$

Solving for $( \mu )$:

$$
\sum_{i=1}^{n} (x_i - \mu) = 0
$$

$$
\mu = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

Thus, the **MLE for $( \mu )$** is:

$$
\hat{\mu}_{MLE} = \frac{1}{n} \sum_{i=1}^{n} x_i
$$

which is simply the **sample mean**.

---

## Finding the MLE for \( \sigma \)
To find the MLE for \( \sigma \), we differentiate the log-likelihood with respect to $( \sigma )$:

$$
\frac{\partial \ell}{\partial \sigma} = -\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_{i=1}^{n} (x_i - \mu)^2 = 0
$$

Rearranging:

$$
\sigma^3 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2
$$

Taking the square root:

$$
\hat{\sigma}_{MLE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}
$$

which is the **sample standard deviation** with a divisor of \( n \) (not \( n-1 \), which is used in unbiased variance estimation).

---

for a more detailed derivation i suggest to go to this link : [MLE in case of normal distribution](https://medium.com/@lorenzojcducv/maximum-likelihood-for-the-normal-distribution-966df16fd031)
