# Estimating Regression Coefficients Using Maximum Likelihood for Logistic Regression

## Theory

In logistic regression, we use the logistic function to model a binary outcome. The logistic function is defined as:

$$p(X) = \frac{1}{1 + e^{-(b_0 + b_1X)}}$$

where $p(X)$ represents the probability of the positive class (usually denoted as 1) given the input $X$. The logistic function maps the input values to a probability between 0 and 1.

The coefficients $b_0$ and $b_1$ in the logistic function determine the shape of the sigmoid curve:
- $b_0$ is the intercept term, which shifts the curve horizontally.
- $b_1$ is the slope term, which determines the steepness of the curve.

The likelihood function for a series of independent Bernoulli trials is:

$$L(b_0, b_1) = \prod_{i=1}^{n} p(X_i)^{Y_i}(1 - p(X_i))^{1 - Y_i}$$

where $Y_i$ is the observed class label (0 or 1) for the $i$-th data point, and $p(X_i)$ is the predicted probability of the positive class for the $i$-th data point.

The log-likelihood function is obtained by taking the logarithm of the likelihood function:

$$\ell(b_0, b_1) = \sum_{i=1}^{n} [Y_i \log(p(X_i)) + (1 - Y_i) \log(1 - p(X_i))]$$

The log-likelihood function is often used instead of the likelihood function because it simplifies the calculations and avoids numerical instability when dealing with small probabilities.

To find the optimal values of $b_0$ and $b_1$ that maximize the log-likelihood, we can use numerical optimization methods such as gradient ascent. The gradients of the log-likelihood with respect to $b_0$ and $b_1$ are:

$$\frac{\partial \ell}{\partial b_0} = \sum_{i=1}^{n} (Y_i - p(X_i))$$
$$\frac{\partial \ell}{\partial b_1} = \sum_{i=1}^{n} (Y_i - p(X_i))X_i$$

These gradients are used to update the coefficients iteratively until convergence or a specified number of steps is reached.

## Python Implementation

```python

In [2]:
import numpy as np

def logistic(x):
    return 1 / (1 + np.exp(-x))

def log_likelihood(X, Y, b0, b1):
    logits = b0 + b1 * X
    likelihood = np.sum(Y * np.log(logistic(logits)) + (1 - Y) * np.log(1 - logistic(logits)))
    return likelihood

# Gradient ascent settings
steps = 100
learning_rate = 0.01
b0, b1 = 0.0, 0.0  # Initial guesses

# Generate some binary data
X_logistic = np.linspace(0, 10, num=100)
Y_logistic = np.random.binomial(1, logistic(0.5 * X_logistic - 1))

# Performing gradient ascent
for step in range(steps):
    logits = b0 + b1 * X_logistic
    predictions = logistic(logits)
    gradient_b0 = np.sum(Y_logistic - predictions)
    gradient_b1 = np.sum((Y_logistic - predictions) * X_logistic)
    b0 += learning_rate * gradient_b0
    b1 += learning_rate * gradient_b1

print(f"Estimated coefficients (Maximum Likelihood): b0 = {b0}, b1 = {b1}")

Estimated coefficients (Maximum Likelihood): b0 = -1.7479414557849138, b1 = 0.4330671386606252
