# Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a statistical model. It is based on the principle of finding the parameter values that make the observed data most likely. The "likelihood" refers to the probability of observing the given data under a specific model, as a function of the model parameters. MLE seeks to maximize this likelihood, hence the name "maximum likelihood."

Here's a step-by-step breakdown of how MLE works:




### Computing the MLE for a Normal Distribution

Suppose you have a dataset consisting of $n$ independent observations from a normal distribution with unknown mean $\mu$ and known variance $\sigma^2$. The likelihood function for this model, given the data $x_1, x_2, ..., x_n$, is proportional to:

$
L(\mu; x_1, x_2, ..., x_n) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}
$

The log-likelihood function is:

$
\log L(\mu; x_1, x_2, ..., x_n) = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2
$

Maximizing this log-likelihood with respect to $\mu$ (by taking the derivative and setting it to zero) leads to the MLE of $\mu$, which is the sample mean:

$
\hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i
$

This result shows that, under the model assumptions, the sample mean is the value of $\mu$ that makes the observed data most likely.



The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. The probability mass function (PMF) of the binomial distribution, for obtaining $k$ successes out of $n$ trials, is given by:

$
P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
$

where:
- $X$ is the random variable representing the number of successes,
- $k = 0, 1, 2, \ldots, n$ is the number of successes,
- $n$ is the number of trials,
- $p$ is the probability of success on any given trial, and
- $\binom{n}{k}$ is the binomial coefficient, which calculates the number of ways to choose $k$ successes from $n$ trials, and is given by $\binom{n}{k} = \frac{n!}{k!(n-k)!}$.

### Computing the MLE for a Binomial Distribution

To compute the Maximum Likelihood Estimation (MLE) for the binomial distribution's parameter $p$, given a set of $n$ trials and $k$ observed successes, you follow these steps:

1. **Likelihood Function**: The likelihood function for the binomial distribution, given the observed data, is the probability of observing $k$ successes in $n$ trials, which is:

   $
   L(p; k, n) = \binom{n}{k} p^k (1 - p)^{n - k}
   $

   Since $\binom{n}{k}$ does not depend on $p$, it can be treated as a constant in the maximization process.

2. **Log-Likelihood Function**: The log-likelihood function is the natural logarithm of the likelihood function, which simplifies the multiplication into addition, making it easier to differentiate:

   $
   \log L(p; k, n) = \log\binom{n}{k} + k\log(p) + (n - k)\log(1 - p)
   $

3. **Maximize the Log-Likelihood**: To find the MLE of $p$, you take the derivative of the log-likelihood function with respect to $p$, set it to zero, and solve for $p$. This gives:

   $
   \frac{d}{dp}\log L(p; k, n) = \frac{k}{p} - \frac{n - k}{1 - p} = 0
   $

4. **Solving for $p$**: Solving the above equation for $p$ yields the MLE of $p$:

   $
   \hat{p} = \frac{k}{n}
   $

This result intuitively makes sense: the MLE for the probability of success $p$ in a binomial distribution is simply the ratio of the number of observed successes $k$ to the total number of trials $n$. This estimation method leverages the observed data to directly estimate the underlying probability of success in the binomial setting.



Certainly! Let's work through numerical examples for both the normal and binomial distributions, including the process of estimating parameters using Maximum Likelihood Estimation (MLE).



## Example
### Normal Distribution Example

Suppose we have a small dataset of five measurements: $X = \{2, 3, 5, 6, 7\}$, and we know these measurements come from a normal distribution with an unknown mean $\mu$ and a known variance $\sigma^2 = 4$ (hence, $\sigma = 2$).

#### Task: Estimate $\mu$ using MLE.

1. **Model Specification**: Normal distribution with unknown $\mu$ and known $\sigma^2 = 4$.
2. **Likelihood Function**: Not explicitly needed for the calculation since the MLE for $\mu$ in a normal distribution with known variance is the sample mean.
3. **Maximize the Likelihood**: The MLE for $\mu$, $\hat{\mu}$, is the average of the sample:

   $
   \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i
   $

4. **Calculation**:

   $
   \hat{\mu} = \frac{2 + 3 + 5 + 6 + 7}{5}
   $

Let's compute this:

$ \hat{\mu} = \frac{2 + 3 + 5 + 6 + 7}{5} $

### Binomial Distribution Example

Imagine we have conducted 10 trials of a binary experiment (e.g., flipping a coin), and we observed 7 successes (e.g., getting heads).

#### Task: Estimate the probability of success $p$ using MLE.

1. **Model Specification**: Binomial distribution with $n = 10$ trials and $k = 7$ successes.
2. **Likelihood and Log-Likelihood Function**: Not explicitly needed for direct calculation as the MLE for $p$ is the ratio of successes to trials.
3. **Maximize the Likelihood**: The MLE for $p$, $\hat{p}$, is given by:

   $
   \hat{p} = \frac{k}{n}
   $

4. **Calculation**:

   $
   \hat{p} = \frac{7}{10}
   $

For the normal distribution example, let's calculate the MLE of $\mu$ explicitly. For the binomial distribution example, the calculation is straightforward but let's formalize it.

For the **normal distribution example**, the MLE for the mean $\mu$ is $\hat{\mu} = 4.6$. This estimate suggests that, given our sample, the most likely mean value of the underlying normal distribution is 4.6.

For the **binomial distribution example**, the MLE for the probability of success $p$ is $\hat{p} = 0.7$. This result indicates that, based on observing 7 successes in 10 trials, the most likely probability of success in each trial is 0.7.

These examples illustrate how MLE is used to estimate parameters of distributions based on observed data, providing intuitive and statistically sound estimates.


Refs: [1](https://www.youtube.com/watch?v=sHTK-jKrRuw)

In [1]:
# Normal distribution example calculations
X = [2, 3, 5, 6, 7] # Data points
mu_hat = sum(X) / len(X) # MLE for mu

# Binomial distribution example
n = 10 # Number of trials
k = 7 # Number of successes
p_hat = k / n # MLE for p

mu_hat, p_hat


(4.6, 0.7)