#  Logistic Regression
---
## Intuition

> **Logistic Regression = Linear Regression + Sigmoid + Log Loss**
---
## Equation
$$
\hat{y} = \frac{1}{1 + e^{-(XW + b)}}
$$
----
## Sigmoid Function

![Sigmoid Function](https://upload.wikimedia.org/wikipedia/commons/8/88/Logistic-curve.svg)


---

##  Problem Setup

We are given:

- **X** → Input features (shape: m X n)
- **y** → Target labels (0 or 1)
- **W** → Weights
- **b** → Bias

### Goal:
Predict the probability:

$$
P(y = 1 \mid X)
$$

---

##  Linear Model (Same as Linear Regression)

$$
z = XW + b
$$

z can take any value from **−∞ to +∞**

---

##  Sigmoid Function (Key Difference)

To convert `z` into probability, we apply **Sigmoid**:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

$$
\hat{y} = \sigma(z)
$$

### Properties:
- Output range → **(0, 1)**
- Large positive `z` → probability ≈ **1**
- Large negative `z` → probability ≈ **0**

---

##  Decision Rule (Classification)

$$
\text{Prediction} =
\begin{cases}
1 & \text{if } \hat{y} \ge 0.5 \\
0 & \text{if } \hat{y} < 0.5
\end{cases}
$$

Threshold `0.5` is default (can be tuned)

---

##  Why NOT Mean Squared Error (MSE)?

MSE leads to:
- Slow convergence
- Non-convex cost with sigmoid

Logistic Regression uses **Log Loss**

---

## Loss Function (Binary Cross-Entropy)

For a **single data point**:

$$
L(y, \hat{y}) =
-\left[
y \log(\hat{y}) +
(1 - y)\log(1 - \hat{y})
\right]
$$

Penalizes **confident wrong predictions heavily**

---

## Cost Function (Entire Dataset)

$$
J(W, b) =
-\frac{1}{m}
\sum_{i=1}^{m}
\left[
y^{(i)}\log(\hat{y}^{(i)}) +
(1 - y^{(i)})\log(1 - \hat{y}^{(i)})
\right]
$$

This function is **convex** → global minimum guaranteed

---

##  Gradient Descent (Training)

### Gradients:

$$
\frac{\partial J}{\partial W} =
\frac{1}{m} X^T (\hat{y} - y)
$$

$$
\frac{\partial J}{\partial b} =
\frac{1}{m} \sum (\hat{y} - y)
$$

---

## Parameter Update Rules

$$
W := W - \alpha \frac{\partial J}{\partial W}
$$

$$
b := b - \alpha \frac{\partial J}{\partial b}
$$

Where:
-  $$ 
\alpha \ = learning rate
$$
- Training runs for multiple **epochs**

---

## Complete Training Flow

1. Initialize W, b
2. Repeat for epochs:
3. z = XW + b
4. y_hat = sigmoid(z)
5. Compute log loss
6. Compute gradients
7. Update W, b


In [0]:
import pandas as pd
import numpy as np
import math

In [0]:
def initialize_parameters(n_features: int) -> (np.ndarray,float): 
    """
    Initialize parameters for Logistic Regression.

    Parameters:
    n_features (int): Number of input features (columns in X)

    Returns:
    W (numpy.ndarray): Weight vector of shape (n_features, 1)
    b (float): Bias term initialized to 0.0
    """
    # Initialize weight vector (one weight per feature)
    # Shape: (n_features, 1)
    W = np.random.rand(n_features, 1)

    # Initialize bias term (scalar)
    b = 0.0

    return W, b


In [0]:
def linear_forward(X: np.ndarray, W: np.ndarray, b: float) -> np.ndarray:
    """
    Compute the linear forward step for Logistic Regression.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (m, n),
        where m is the number of samples and n is the number of features.
    W : np.ndarray
        Weight vector of shape (n, 1).
    b : float
        Bias term (scalar).

    Returns
    -------
    np.ndarray
        Linear output Z of shape (m, 1).
    """
    # Validate dimensions
    if X.ndim != 2:
        raise ValueError("X must be a 2D array of shape (m, n)")
    if W.ndim != 2 or W.shape[1] != 1:
        raise ValueError("W must be a 2D column vector of shape (n, 1)")
    if X.shape[1] != W.shape[0]:
        raise ValueError("Number of features in X must match size of W")

    Z = X.dot(W) + b
    return Z


In [0]:
def sigmoid(Z: np.ndarray) -> np.ndarray:
    """
    Compute the sigmoid activation function in a numerically stable way.

    Parameters
    ----------
    Z : np.ndarray
        Linear input (logits).

    Returns
    -------
    np.ndarray
        Sigmoid output with values in the range (0, 1).
    """
    # Clip values to avoid overflow in exp
    #We clip the logits before applying the exponential function to avoid numerical overflow and ensure stable computation of the sigmoid and loss.
    Z_clipped = np.clip(Z, -500, 500)

    mu = 1.0 / (1.0 + np.exp(-Z_clipped))
    return mu


In [0]:
def forward_propagation(X: np.ndarray, W: np.ndarray, b: float) -> np.ndarray:
    """
    Perform forward propagation for Logistic Regression.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (m, n).
    W : np.ndarray
        Weight vector of shape (n, 1).
    b : float
        Bias term (scalar).

    Returns
    -------
    np.ndarray
        Predicted probabilities y_hat of shape (m, 1).
    """
    Z = linear_forward(X, W, b)
    y_hat = sigmoid(Z)

    return y_hat


In [0]:
def compute_loss(y: np.ndarray, y_hat: np.ndarray) -> float:
    """
    Compute Binary Cross-Entropy (Log Loss) for Logistic Regression.

    Parameters
    ----------
    y : np.ndarray
        True labels of shape (m, 1).
    y_hat : np.ndarray
        Predicted probabilities of shape (m, 1).

    Returns
    -------
    float
        Scalar loss value.
    """
    if y.shape != y_hat.shape:
        raise ValueError("y and y_hat must have the same shape")

    m = y.shape[0]

    # Numerical stability: avoid log(0)
    epsilon = 1e-15
    y_hat_clipped = np.clip(y_hat, epsilon, 1 - epsilon)

    loss = - (1 / m) * np.sum(
        y * np.log(y_hat_clipped) +
        (1 - y) * np.log(1 - y_hat_clipped)
    )

    return loss


In [0]:
def compute_gradients(
    X: np.ndarray,
    y: np.ndarray,
    y_hat: np.ndarray
) -> tuple[np.ndarray, float]:
    """
    Compute gradients of the loss with respect to weights and bias.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (m, n).
    y : np.ndarray
        True labels of shape (m, 1).
    y_hat : np.ndarray
        Predicted probabilities of shape (m, 1).

    Returns
    -------
    tuple[np.ndarray, float]
        dW : Gradient w.r.t weights of shape (n, 1)
        db : Gradient w.r.t bias (scalar)
    """
    if y.shape != y_hat.shape:
        raise ValueError("y and y_hat must have the same shape")

    m = y.shape[0]

    error = y_hat - y  # (m, 1)

    dW = (1 / m) * X.T.dot(error)   # (n, 1)
    db = (1 / m) * np.sum(error)

    return dW, db


In [0]:
def update_parameters(
    W: np.ndarray,
    b: float,
    dW: np.ndarray,
    db: float,
    learning_rate: float
) -> tuple[np.ndarray, float]:
    """
    Update weights and bias using Gradient Descent.

    Parameters
    ----------
    W : np.ndarray
        Current weight vector of shape (n, 1).
    b : float
        Current bias term.
    dW : np.ndarray
        Gradient w.r.t weights of shape (n, 1).
    db : float
        Gradient w.r.t bias.
    learning_rate : float
        Learning rate (alpha).

    Returns
    -------
    tuple[np.ndarray, float]
        Updated weights and bias.
    """
    if learning_rate <= 0:
        raise ValueError("learning_rate must be positive")

    W = W - learning_rate * dW
    b = b - learning_rate * db

    return W, b


In [0]:
def train(
    X_df: pd.DataFrame,
    y_df: pd.DataFrame,
    epochs: int,
    learning_rate: float
) -> tuple[np.ndarray, float]:
    """
    Train Logistic Regression model using Gradient Descent.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (m, n).
    y : np.ndarray
        True labels of shape (m, 1).
    epochs : int
        Number of training iterations.
    learning_rate : float
        Learning rate for Gradient Descent.

    Returns
    -------
    tuple[np.ndarray, float]
        Trained weights and bias.
    """

    X = X_df.values
    y = y_df.values

    if X.ndim != 2:
        raise ValueError("X must be a 2D array")
    if y.ndim != 2 or y.shape[1] != 1:
        raise ValueError("y must be a column vector of shape (m, 1)")

    n_features = X.shape[1]
    W, b = initialize_parameters(n_features)

    for epoch in range(epochs):
        # Forward propagation
        y_hat = forward_propagation(X, W, b)

        # Compute loss
        loss = compute_loss(y, y_hat)

        # Compute gradients
        dW, db = compute_gradients(X, y, y_hat)

        # Update parameters
        W, b = update_parameters(W, b, dW, db, learning_rate)

        # Optional: monitor training
        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Loss: {loss:.6f}")

    return W, b


In [0]:

def predict(
    X_df: pd.DataFrame,
    W: np.ndarray,
    b: float,
    threshold: float = 0.5
) -> np.ndarray:
    """
    Predict class labels using trained Logistic Regression parameters.

    Parameters
    ----------
    X : np.ndarray
        Input feature matrix of shape (m, n).
    W : np.ndarray
        Trained weight vector of shape (n, 1).
    b : float
        Trained bias term.
    threshold : float, optional
        Decision threshold for classification (default is 0.5).

    Returns
    -------
    np.ndarray
        Predicted class labels of shape (m, 1).
    """
    if not isinstance(X_df, pd.DataFrame):
        raise TypeError("X_df must be a pandas DataFrame")

    X = X_df.values

    if not 0 < threshold < 1:
        raise ValueError("threshold must be between 0 and 1")

    # Get predicted probabilities
    y_hat = forward_propagation(X, W, b)

    # Apply threshold
    y_pred = (y_hat >= threshold).astype(int)

    return y_pred


In [0]:
def accuracy(y_true, y_pred):
    return np.mean(y_true == y_pred)


In [0]:
import pandas as pd

df = pd.DataFrame({
    "age": [22, 25, 47, 52],
    "salary": [20000, 25000, 80000, 110000],
    "label": [0, 0, 1, 1]
})

X_df = df[["age", "salary"]]
y_df = df[["label"]]


In [0]:
W, b = train(
    X_df,
    y_df,
    epochs=1000,
    learning_rate=0.01
)


In [0]:
predictions = predict(X_df, W, b)
print(predictions)


In [0]:
acc = accuracy(y, y_pred)
print("Accuracy:", acc)
