# Logistic Regression

## 1. Definition
Logistic regression is derived from the linear regression but it is a classification model and it's a supervised learning model. The main goal of logistic regress is to estimate the probability that an data instance belongs to a particular class. Built upon linear regression that predicts a numerical value, logistic regression transforms the output to fit between 0 and 1 using sigmoid function / logit function, making it sutiable for classification tasks.


## 2. Core Idea
Logistic regression assume the linearity between input features and output in the log-odds form. In other words, the input features relate linearly to the logit of target. Some other assumptions are: no multicollinearity, which means predictors should not be highly correlated with each other.


## 3. Mechanism
Logistic regression maps predictions into a probability between 0 and 1 through a sigmoid function:
$$\alpha(z) = \frac{1}{1 + e^{-z}}$$

* The output $\hat{p}$ >= 0.5, the model classifies the instances as class 1; otherwise, it's class 0.

* Log-Odds: The linear component $z$ is equivalent to the log-odds. Log-odds = the logarithm of the odds of an event happening vs not happening. 

$$ odds = \frac{p}{1-p}$$

$$log-odds = log(\frac{p}{1-p})$$


## 4. Mathematical Details
The goal of training is to find the optimal weights $w$ and bias $b$ that minimize the difference between the predicted probabilities $\hat{p}$ and the actual class labels $y$.
* Loss function: Logistic regeression uses the Binary Cross-Entropy Loss (or Log Loss) because it penzlizes confident incorrect predictions heavily, which is necessary for probabilities.

$$J(w) = -[y\log{(\hat{y})} + (1-y)\log{(1-\hat{y})}] $$


Total Cost J over all m examples:

$$ J(w,b) = \frac{1}{m}\sum_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)})$$


* Optimization: Itâ€™s a convex optimization problem, so it has a guaranteed global optimum. The weights are updated iteratively using an optimization algorithm like Gradient Descent or its variant (e.g. SGD, Adam). The gradient determines the direction to move in the weight space to minimize the loss.

The update rules are derived by taking the **partial derivatives** of the cost function $J$ with respect to $w$ and $b$.



$$ \frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})x_j^{(i)} $$

$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) $$

## 5. Pros and Cons
* Pros
    * Works well for linearly separable dataset
    * Highly interpretable (weights indicate features importance).
    * Fast to train
    * Provide well-calibrated probabilities
* Cons:
    * Cannot handle complex, non-linear decision boundaries.
    * Performs poorly with highly correlated features unless regularized
    * Decision boundary is strictly linear
    * Not ideal when classes are highly imbalanced



## 6. Production Consideration
* Model is lightweight, very low inference latency... good for real-time systems
* Monitor for feature drift because linear models degreade quickly
* Coefficients must be immutable if used in live scoring pipeline
* Ensure consistency in feature scaling during training and inference
* Often used in ensembles or a first-stage model


## 7. Other Variants

* Multinomial Logistic Regression: Used for classification tasks with more than two classes. It generalizes the sigmoid function to the Softmax function.
* L1/L2 Regularized Logistic Regression: Used to prevent overfitting by adding a penalty term to the loss function.
* Its performance can be significantly improved by manually creating non-linear features (e.g., polynomial features or interaction terms).

In [1]:
import sys, os
root = os.path.abspath("..")
sys.path.append(root)



from src.logistic_regression import LogisticRegression
import numpy as np

In [2]:
def create_dataset():
    np.random.seed(42)
    X_pos = np.random.randn(50,2) + np.array([2,2])
    X_neg = np.random.randn(50,2) + np.array([-2,2])

    X = np.vstack((X_pos, X_neg))
    y = np.hstack((np.ones(50),np.zeros(50)))
    return X, y



X, y = create_dataset()
print(X.shape, y.shape)

(100, 2) (100,)


In [3]:
model = LogisticRegression()
model.fit(X, y)
pred = model.predict(X)

In [4]:
from collections import Counter
Counter(pred)

Counter({np.int64(0): 51, np.int64(1): 49})