# Mod2/L2 Formalizing Maximum Likelihood Estimators (MLEs)

## Introduction
In this lesson, we formalize the concept of Maximum Likelihood Estimators (MLEs). We aim to estimate a parameter $(\theta)$ from a random sample $(X_1, X_2, \ldots, X_n)$ that is independent and identically distributed (iid).

## Concept
The idea behind MLE is to find the value of $(\theta)$ that makes the observed data most likely. For discrete data, we maximize the joint probability of the observed data. For continuous data, we maximize the probability density function (PDF).

## Notation
- **Probability Density Function (PDF)**: $( f(x; \theta) )$
- **Joint PDF for the sample**: $( f(X_1, X_2, \ldots, X_n; \theta) = \prod_{i=1}^n f(X_i; \theta) )$

## Steps to Find MLE

### 1. Define the Likelihood Function
The likelihood function $(L(\theta))$ is the joint PDF viewed as a function of $(\theta)$:
$$ L(\theta) = f(X_1, X_2, \ldots, X_n; \theta) = \prod_{i=1}^n f(X_i; \theta) $$

### 2. Log-Likelihood Function
To simplify the maximization process, we take the natural logarithm of the likelihood function:
$$ \log L(\theta) = \sum_{i=1}^n \log f(X_i; \theta) $$

### 3. Maximize the Log-Likelihood
Find the value of $(\theta)$ that maximizes the log-likelihood function.

## Example: Bernoulli Distribution
Suppose we have a random sample from a Bernoulli distribution with parameter $(p)$.

#### Likelihood Function
$$ L(p) = \prod_{i=1}^n p^{x_i} (1-p)^{1-x_i} $$

#### Log-Likelihood Function
$$ \log L(p) = \sum_{i=1}^n \left[ x_i \log p + (1 - x_i) \log (1 - p) \right] $$

#### Maximizing the Log-Likelihood
To find the MLE for $(p)$, take the derivative of the log-likelihood with respect to $(p)$ and set it to zero:
$$ \frac{d}{dp} \log L(p) = \sum_{i=1}^n \left[ \frac{x_i}{p} - \frac{1 - x_i}{1 - p} \right] = 0 $$

Solving for $(p)$, we get:
$$ \hat{p} = \frac{\sum_{i=1}^n x_i}{n} $$

### Example in R

In [1]:
# Generate a random sample from Bernoulli distribution
set.seed(123)
n <- 100
p_true <- 0.6
sample_data <- rbinom(n, size = 1, prob = p_true)

# Log-likelihood function
log_likelihood <- function(p, data) {
  n <- length(data)
  logL <- sum(data * log(p) + (1 - data) * log(1 - p))
  return(logL)
}

# Initial guess for p
initial_p <- mean(sample_data)

# Optimize the log-likelihood
optim_result <- optim(initial_p, log_likelihood, data = sample_data, method = "Brent", lower = 0, upper = 1, control = list(fnscale = -1))
optim_result$par

## Example: Exponential Distribution
Suppose we have a random sample from the exponential distribution with rate $(\lambda)$.

### Likelihood Function
$$ L(\lambda) = \lambda^n e^{-\lambda \sum X_i} $$

### Log-Likelihood Function
$$ \log L(\lambda) = n \log \lambda - \lambda \sum X_i $$

### Maximizing the Log-Likelihood
To find the MLE for $(\lambda)$, take the derivative of the log-likelihood with respect to $(\lambda)$ and set it to zero: $$ \frac{d}{d\lambda} \log L(\lambda) = \frac{n}{\lambda} - \sum X_i = 0 $$ 

Solving for $(\lambda)$, we get: 
$$ \hat{\lambda} = \frac{n}{\sum X_i} $$

#### Example in R

In [3]:
# Generate a random sample from Exponential distribution
set.seed(123)
n <- 100
lambda_true <- 2
sample_data <- rexp(n, rate = lambda_true)

# Log-likelihood function
log_likelihood <- function(lambda, data) {
  n <- length(data)
  logL <- n * log(lambda) - lambda * sum(data)
  return(logL)
}

# Initial guess for lambda
initial_lambda <- 1 / mean(sample_data)

# Optimize the log-likelihood
optim_result <- optim(initial_lambda, log_likelihood, data = sample_data, method = "Brent", lower = 0, upper = 10, control = list(fnscale = -1))
optim_result$par

## Conclusion
In this lesson, we formalized the concept of Maximum Likelihood Estimators (MLEs) by defining the likelihood function, taking the log-likelihood, and maximizing it to find the parameter estimates. This method is widely applicable and forms the basis for many statistical analyses.

This concludes the formalization of MLEs. In the next lessons (refer to [mod2_summarytranscript_L3_AdvancedMLE.ipynb](mod2_summarytranscript_L3_AdvancedMLE.ipynb)), we will continue to explore more advanced topics and applications of MLE in statistical inference.