# Mod2/L1 Maximum Likelihood Estimation (MLE)

## Introduction
In this module, we explore Maximum Likelihood Estimation (MLE), a fundamental method for estimating parameters of a population distribution based on sample data. MLE is widely used in real-world applications due to its versatility and effectiveness.

## Motivation
MLE aims to find the parameter value that makes the observed data most likely. Given a population with a distribution and an unknown parameter $(\theta)$, we take a random sample $(X_1, X_2, \ldots, X_n)$ and use this sample to estimate $(\theta)$.

## Example: Coin Flipping

### Scenario
Suppose we have a biased coin that comes up heads with probability \(p\) and tails with probability \(1 - p\). The parameter \(p\) is unknown, and we want to estimate it. For simplicity, assume \(p\) can take on only three possible values: 0.2, 0.3, or 0.8.

### Observed Data
We flip the coin 20 times and observe the following sequence:
$[ \text{Heads, Heads, Tails, Tails, Heads, Heads, Heads, Heads, Tails, Heads, Heads, Heads, Heads, Heads, Tails, Tails, Heads, Heads, Heads, Tails} ]$

### Likelihood Calculation
To determine which value of \(p\) is most likely, we calculate the likelihood of observing the given data for each possible value of \(p\).

#### Example in R
```r
# Possible values of p
p_values <- c(0.2, 0.3, 0.8)

# Observed data: 1 for heads, 0 for tails
observed_data <- c(1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

# Function to calculate likelihood
calculate_likelihood <- function(p, data) {
  n_heads <- sum(data)
  n_tails <- length(data) - n_heads
  likelihood <- p^n_heads * (1 - p)^n_tails
  return(likelihood)
}

# Calculate likelihood for each p value
likelihoods <- sapply(p_values, calculate_likelihood, data = observed_data)
names(likelihoods) <- p_values
likelihoods

# Determine the p value with the highest likelihood
best_p <- p_values[which.max(likelihoods)]
cat(sprintf("The most likely value of p is: %.1f\n", best_p))

In [2]:
# Possible values of p
p_values <- c(0.2, 0.3, 0.8)

# Observed data: 1 for heads, 0 for tails
observed_data <- c(1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0)

# Function to calculate likelihood
calculate_likelihood <- function(p, data) {
  n_heads <- sum(data)
  n_tails <- length(data) - n_heads
  likelihood <- p^n_heads * (1 - p)^n_tails
  return(likelihood)
}

# Calculate likelihood for each p value
likelihoods <- sapply(p_values, calculate_likelihood, data = observed_data)
names(likelihoods) <- p_values
likelihoods

# Determine the p value with the highest likelihood
best_p <- p_values[which.max(likelihoods)]
cat(sprintf("The most likely value of p is: %.1f\n", best_p))

The most likely value of p is: 0.8


### Simplified Example with Two Flips
Suppose we only flip the coin twice and observe the following sequences:

-Sequence 1: Heads, Heads

-Sequence 2: Heads, Tails

-Sequence 3: Tails, Tails

#### Likelihood Calculation for Two Flips

In [3]:
# Observed data for two flips
observed_data_1 <- c(1, 1)  # Heads, Heads
observed_data_2 <- c(1, 0)  # Heads, Tails
observed_data_3 <- c(0, 0)  # Tails, Tails

# Calculate likelihood for each sequence
likelihoods_1 <- sapply(p_values, calculate_likelihood, data = observed_data_1)
likelihoods_2 <- sapply(p_values, calculate_likelihood, data = observed_data_2)
likelihoods_3 <- sapply(p_values, calculate_likelihood, data = observed_data_3)

# Determine the p value with the highest likelihood for each sequence
best_p_1 <- p_values[which.max(likelihoods_1)]
best_p_2 <- p_values[which.max(likelihoods_2)]
best_p_3 <- p_values[which.max(likelihoods_3)]

cat(sprintf("The most likely value of p for sequence 1 (Heads, Heads) is: %.1f\n", best_p_1))
cat(sprintf("The most likely value of p for sequence 2 (Heads, Tails) is: %.1f\n", best_p_2))
cat(sprintf("The most likely value of p for sequence 3 (Tails, Tails) is: %.1f\n", best_p_3))

The most likely value of p for sequence 1 (Heads, Heads) is: 0.8
The most likely value of p for sequence 2 (Heads, Tails) is: 0.3
The most likely value of p for sequence 3 (Tails, Tails) is: 0.2


## Conclusion
-When the data is 0,0, the most likely value of (p) is 0.2.

-When the data is 0,1 or 1,0, the most likely value of (p) is 0.3.

-When the data is 1,1, the most likely value of (p) is 0.8.

## Summary
Maximum Likelihood Estimation (MLE) is a powerful method for estimating parameters of a population distribution. By calculating the likelihood of observed data for different parameter values, we can identify the parameter value that makes the data most likely. This method is widely applicable and forms the basis for many statistical analyses.

This concludes the introduction to MLE. In the next lessons (next up (refer to [mod2_summarytranscript_L2_Formalizing_MLEs.ipynb](mod2_summarytranscript_L2_Formalizing_MLEs.ipynb))), we will explore more advanced topics and applications of MLE in statistical inference.