# 📘 Logistic Regression – Math Formulas & Intuition

---

## 🧠 1. Linear Combination (Logit)

$$
z = w_1x_1 + w_2x_2 + \dots + w_nx_n + b = \mathbf{w}^T \mathbf{x} + b
$$

- \( \mathbf{x} \): feature vector  
- \( \mathbf{w} \): weight vector  
- \( b \): bias (intercept)  
- \( z \): linear score (logit)

---

## 📈 2. Sigmoid Function (Logistic Function)

$$
\sigma(z) = \frac{1}{1 + e^{-z}} = \hat{y}
$$

- Maps \( z \) to a value in (0, 1)  
- Represents the **predicted probability** of class 1

---

## ✅ 3. Prediction Rule

$$
\text{Predict } y =
\begin{cases}
1 & \text{if } \hat{y} \geq 0.5 \\
0 & \text{if } \hat{y} < 0.5
\end{cases}
$$

---

## ❌ 4. Loss Function: Binary Cross-Entropy

$$
\mathcal{L}(\hat{y}, y) = - \left[ y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y}) \right]
$$

- Penalizes wrong predictions more when they're confident but wrong  
- Works best for binary classification tasks

### Average Loss Over All Samples:

$$
J(w, b) = \frac{1}{m} \sum_{i=1}^{m} \mathcal{L}(\hat{y}^{(i)}, y^{(i)})
$$

---

## 🔁 5. Gradient Descent (Optimization)

### Partial Derivatives:

$$
\frac{\partial J}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) x_j^{(i)}
$$

$$
\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})
$$

### Update Rule:

$$
w_j := w_j - \alpha \cdot \frac{\partial J}{\partial w_j}
$$

$$
b := b - \alpha \cdot \frac{\partial J}{\partial b}
$$

- \( \alpha \): learning rate  
- \( m \): number of training examples

---

## 📦 6. Regularization (Optional)

### L2 Regularization (Ridge):

$$
J_{\text{reg}}(w, b) = J(w, b) + \frac{\lambda}{2m} \sum_{j=1}^{n} w_j^2
$$

- Helps prevent overfitting  
- \( \lambda \): regularization strength (tunable hyperparameter)

---

## 🧮 Summary Table


| Component        | Formula                             | Description                         |
|------------------|--------------------------------------|-------------------------------------|
| Logit            | z = w^T x + b                        | Linear combination                  |
| Sigmoid          | y_hat = 1 / (1 + exp(-z))            | Converts logit to probability       |
| Prediction       | y = 1 if y_hat >= 0.5                | Binary threshold decision           |
| Loss             | -[y * log(y_hat) + (1 - y) * log(1 - y_hat)] | Binary cross-entropy loss   |
| Gradients        | grad = (1/m) * X^T(y_hat - y)        | Update weights via gradient descent |
| Regularized Loss | J + (λ / 2m) * sum(w^2)              | L2 penalty to avoid overfitting     |

---

> 🧠 **Tip for interviews**: Practice walking through each formula **intuitively**, not just mathematically. For example:  
> “We use sigmoid to squash the logit into a probability between 0 and 1 — this allows us to interpret it as a classification confidence.”


In [1]:
"""
Logistic Regression (binary) – NumPy‑only implementation
=======================================================

•  Vectorised forward & backward pass
•  Gradient‑descent optimiser
•  L2 regularisation (optional)
•  Accuracy + loss monitoring
"""

import numpy as np

# -------------------------- helper functions --------------------------

def sigmoid(z: np.ndarray) -> np.ndarray:
    """Numerically stable sigmoid."""
    # Clip to avoid overflow when z is very large / small
    z = np.clip(z, -500, 500)
    return 1.0 / (1.0 + np.exp(-z))

def compute_loss(y: np.ndarray, y_hat: np.ndarray, w: np.ndarray, lam: float = 0.0) -> float:
    """Binary cross‑entropy + (optional) L2 penalty."""
    m = len(y)
    eps = 1e-15                      # for log(0) protection
    loss = (
        - np.mean(y * np.log(y_hat + eps) + (1 - y) * np.log(1 - y_hat + eps))
        + lam / (2 * m) * np.sum(w**2)
    )
    return loss

# --------------------------- model class ------------------------------

class LogisticRegressionScratch:
    def __init__(self, lr: float = 0.1, n_iter: int = 1000, lam: float = 0.0):
        self.lr, self.n_iter, self.lam = lr, n_iter, lam
        self.w, self.b = None, None       # weights & bias will be np.ndarray / float

    # ---- training ----
    def fit(self, X: np.ndarray, y: np.ndarray, verbose: bool = False) -> None:
        m, n = X.shape
        self.w = np.zeros(n)
        self.b = 0.0

        for i in range(self.n_iter):
            z = X @ self.w + self.b          # (m,)  vector
            y_hat = sigmoid(z)               # (m,)

            # gradients (vectorised)
            dw = (1 / m) * (X.T @ (y_hat - y)) + (self.lam / m) * self.w
            db = (1 / m) * np.sum(y_hat - y)

            # parameter update
            self.w -= self.lr * dw
            self.b -= self.lr * db

            # optional monitoring
            if verbose and (i % (self.n_iter // 10) == 0 or i == self.n_iter - 1):
                loss = compute_loss(y, y_hat, self.w, self.lam)
                print(f"iter {i:4d}  |  loss = {loss:.4f}")

    # ---- inference ----
    def predict_proba(self, X: np.ndarray) -> np.ndarray:
        return sigmoid(X @ self.w + self.b)

    def predict(self, X: np.ndarray, threshold: float = 0.5) -> np.ndarray:
        return (self.predict_proba(X) >= threshold).astype(int)

    # ---- evaluation ----
    @staticmethod
    def accuracy(y_true: np.ndarray, y_pred: np.ndarray) -> float:
        return np.mean(y_true == y_pred)

# ----------------------- quick demonstration --------------------------

if __name__ == "__main__":
    # Tiny synthetic set: separable circles
    from sklearn.datasets import make_classification
    X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
                               n_informative=2, random_state=42)

    # Standardise features (helps convergence)
    X = (X - X.mean(axis=0)) / X.std(axis=0)

    # Train / test split
    from sklearn.model_selection import train_test_split
    X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=1)

    # Train our scratch model
    clf = LogisticRegressionScratch(lr=0.1, n_iter=2000, lam=0.01)
    clf.fit(X_tr, y_tr, verbose=True)

    # Evaluate
    acc_train = clf.accuracy(y_tr, clf.predict(X_tr))
    acc_test  = clf.accuracy(y_te, clf.predict(X_te))
    print(f"Train accuracy: {acc_train:.3f}   Test accuracy: {acc_test:.3f}")


iter    0  |  loss = 0.6931
iter  200  |  loss = 0.3750
iter  400  |  loss = 0.3639
iter  600  |  loss = 0.3619
iter  800  |  loss = 0.3614
iter 1000  |  loss = 0.3613
iter 1200  |  loss = 0.3613
iter 1400  |  loss = 0.3613
iter 1600  |  loss = 0.3613
iter 1800  |  loss = 0.3613
iter 1999  |  loss = 0.3613
Train accuracy: 0.864   Test accuracy: 0.867


## Logistic Regression using Scikit-Learn

In [3]:
import pandas as pd 
import numpy as np 
from sklearn.linear_model import LogisticRegression
from  sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns


### Load and Preprocess the Titanic Dataset

In [18]:
# Load Titanic dataset from seaborn
df = sns.load_dataset('titanic')

In [19]:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [20]:
# Select relevant features
df = df[['survived', 'sex', 'age', 'pclass']].dropna()

# Encode categorical column
df['sex'] = df['sex'].map({'male': 0, 'female': 1})


In [22]:
df.isnull().sum()

survived    0
sex         0
age         0
pclass      0
dtype: int64

In [23]:
# Features and target
X = df[['sex', 'age', 'pclass']]
y = df['survived']

### Train_Test_Split

In [26]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

### Feature Scaling

In [27]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [28]:
X_train_scaled

array([[ 1.32606716e+00,  6.68025202e-02,  9.28167783e-01],
       [-7.54109620e-01, -2.72964097e-01,  9.28167783e-01],
       [-7.54109620e-01, -1.15080322e-03,  9.28167783e-01],
       ...,
       [ 1.32606716e+00,  7.46335754e-01, -1.45914665e+00],
       [-7.54109620e-01,  2.02709167e-01,  9.28167783e-01],
       [-7.54109620e-01,  2.02709167e-01,  9.28167783e-01]],
      shape=(571, 3))

### Model Training

In [29]:
# create and train model
# Create and train the model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)



0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,100


### Evaluate the Model

In [31]:
# Predict
y_pred = model.predict(X_test_scaled)

# Accuracy
print("Accuracy", accuracy_score(y_test, y_pred))

Accuracy 0.7412587412587412


In [32]:
# confusion matrix
print("confusion_matrix", confusion_matrix(y_test, y_pred))

confusion_matrix [[68 19]
 [18 38]]


In [33]:
# classification repot
print("classification_report", classification_report(y_test, y_pred))

classification_report               precision    recall  f1-score   support

           0       0.79      0.78      0.79        87
           1       0.67      0.68      0.67        56

    accuracy                           0.74       143
   macro avg       0.73      0.73      0.73       143
weighted avg       0.74      0.74      0.74       143



### Check Model Coefficients

In [37]:
# Intercept and coefficients
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)

# Feature importance
features = X.columns
for feature, coef in zip(features, model.coef_[0]):
    print(f"{feature}: {coef:.4f}")


Intercept: [-0.49972956]
Coefficients: [[ 1.26839647 -0.6226701  -1.06246075]]
sex: 1.2684
age: -0.6227
pclass: -1.0625


| Component                    | Explanation                                                                      |
| ---------------------------- | -------------------------------------------------------------------------------- |
| **Why Logistic Regression?** | Simple, interpretable model for binary classification                            |
| **Sigmoid?**                 | Converts logit (linear output) to probability                                    |
| **Loss Function?**           | Binary Cross Entropy                                                             |
| **Interpretation?**          | Coefficients represent log-odds; positive means increasing likelihood of class 1 |
| **Regularization?**          | scikit-learn uses L2 regularization by default to prevent overfitting            |


# 🎯 Logistic Regression Interview Prep (Markdown Version)

---

## 🧠 Flashcards – Quick Q\&A

### Q1: What is logistic regression?

> A classification algorithm that models the probability that a given input belongs to a certain class using a sigmoid function.

### Q2: Why not use linear regression for classification?

> Linear regression outputs unbounded values; logistic regression bounds the output between 0 and 1 using sigmoid.

### Q3: What is the sigmoid function?

> $\sigma(z) = \frac{1}{1 + e^{-z}}$, used to convert any real value into a probability.

### Q4: What is the loss function used?

> Binary Cross-Entropy Loss:
> $L = -[y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y})]$

### Q5: What is regularization?

> A technique to avoid overfitting by penalizing large weights (L1: Lasso, L2: Ridge).

### Q6: Can logistic regression handle multiple classes?

> Yes, using One-vs-Rest or Softmax (Multinomial Logistic Regression).

---

## 🧭 Mind Map – Concept Breakdown

```mermaid
graph TD
    A[Logistic Regression]
    A --> B[Sigmoid Function]
    A --> C[Binary Classification]
    A --> D[Loss Function]
    A --> E[Training: Gradient Descent]
    A --> F[Evaluation Metrics]
    A --> G[Regularization]
    A --> H[Multiclass Extension]

    D --> D1[Binary Cross-Entropy]
    E --> E1[Compute Gradients]
    E --> E2[Update Weights]
    F --> F1[Accuracy, Precision]
    F --> F2[Recall, F1, ROC-AUC]
    G --> G1[L1 (Lasso)]
    G --> G2[L2 (Ridge)]
    H --> H1[One-vs-Rest]
    H --> H2[Softmax/Multinomial]
```

---

## 🎤 Mock Interview Q\&A Sheet

### 🧩 Question 1: Explain Logistic Regression

**You:** Logistic regression is a classification algorithm that models the probability of class membership using the sigmoid function. The output ranges from 0 to 1 and is interpreted as a probability.

### 🧩 Question 2: How does logistic regression learn parameters?

**You:** It uses gradient descent to minimize the binary cross-entropy loss by updating weights based on partial derivatives of the loss.

### 🧩 Question 3: What's the intuition behind the sigmoid function?

**You:** It transforms linear output (logit) into a probability between 0 and 1, which makes it interpretable for classification.

### 🧩 Question 4: How do you interpret the coefficients?

**You:** Each coefficient represents the change in log-odds for a one-unit increase in the feature, holding others constant.

### 🧩 Question 5: What if your dataset is imbalanced?

**You:** Use class weights, oversampling/undersampling, or SMOTE. Also, focus on metrics like precision, recall, and F1-score.

### 🧩 Question 6: How is logistic regression different from SVM?

**You:** Logistic regression outputs probabilities; SVM maximizes the margin between classes and is better for outlier robustness.

### 🧩 Question 7: What are its limitations?

**You:** Assumes linear boundary in log-odds space, not suitable for highly non-linear patterns, sensitive to outliers.

---

Let me know if you'd like:

* A PDF export of this file
* Additional sections (code, real-world problems, etc.)
* Similar prep files for SVM, Decision Trees, or Random Forest

PDF export of this file
