# **3. Logistic Regression**

---

## **1. Introduction**

* **Definition:**
  Logistic Regression is a **supervised learning algorithm** used to predict **categorical outcomes** (binary or multi-class) based on one or more independent variables.

* Despite the name, it’s a **classification** algorithm, not regression.

* **Example:**

  * Predict whether an email is **spam (1)** or **not spam (0)**.
  * Predict whether a patient has **diabetes (1)** or **no diabetes (0)**.

---

## **2. Working Principle**

* Idea: Instead of predicting a continuous value (like linear regression), we **predict the probability** of a class.
* Uses **Sigmoid function** to map any real number to a value between 0 and 1.

$$
P(Y=1|X) = \sigma(z) = \frac{1}{1 + e^{-z}}
$$

Where:

* $z = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n$

* $\sigma(z)$ = sigmoid function → outputs probability between 0 and 1

* Decision rule (for binary classification):

$$
\hat{Y} =
\begin{cases}
1 & \text{if } P(Y=1|X) \ge 0.5 \\
0 & \text{if } P(Y=1|X) < 0.5
\end{cases}
$$

---

## **3. Mathematical Intuition**

* **Linear combination:** $z = \beta_0 + \beta_1 X_1 + ... + \beta_n X_n$
* **Sigmoid transformation:** Converts linear output to probability.
* **Cost Function:** Uses **Log Loss / Cross-Entropy** instead of MSE:

$$
J(\beta) = - \frac{1}{m} \sum_{i=1}^m \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
$$

Where:

* $y_i$ = actual label (0 or 1)

* $\hat{y}_i$ = predicted probability

* $m$ = number of samples

* **Optimization:** Use **Gradient Descent** or advanced optimizers (Newton-Raphson, BFGS) to minimize log loss.

---

## **4. Assumptions of Logistic Regression**

1. **Linearity of log-odds:** The logit (log-odds) of the outcome is linearly related to predictors.
2. **Independent observations**
3. **No multicollinearity** among predictors
4. **Large sample size** → ensures stable estimates
5. **Binary or categorical outcome** (for basic logistic regression)

---

## **5. Pros & Cons**

### ✅ Pros

* Simple and interpretable
* Outputs probabilities → useful for risk prediction
* Works well with **linearly separable data**
* Can handle multiple predictors

### ❌ Cons

* Assumes **linear relationship between features and log-odds**
* Sensitive to outliers
* Cannot handle **complex non-linear decision boundaries** without feature engineering
* Requires large datasets for stable estimates

---

## **6. Variants**

1. **Binary Logistic Regression** → 2 classes (0/1)
2. **Multinomial Logistic Regression** → more than 2 classes (softmax function)
3. **Ordinal Logistic Regression** → ordered categories (e.g., low/medium/high)

---

## **7. Real-Life Applications**

* **Healthcare:** Predict disease presence (diabetes, cancer).
* **Finance:** Predict loan default (yes/no).
* **Marketing:** Predict if a customer will buy a product.
* **Email filtering:** Spam detection.
* **Customer churn prediction:** Will a customer leave a service?

---

## **8. Flowchart – Logistic Regression Workflow**

```
         Input Data (X, Y)
                 ↓
       Data Preprocessing (scaling, encoding)
                 ↓
    Compute linear combination: z = β0 + β1 X1 + ... + βn Xn
                 ↓
       Apply Sigmoid function: σ(z) → P(Y=1|X)
                 ↓
        Predict class based on threshold (0.5)
                 ↓
         Evaluate Model (Accuracy, F1, ROC-AUC)
```

---

## **9. Key Takeaways**

* Logistic Regression = classification algorithm, not regression.
* Uses **sigmoid function** to map predictions to probabilities.
* Cost function = **Log Loss**, optimized via Gradient Descent.
* Simple, interpretable, widely used for **binary outcomes**.

---
---
---

# **Variants of Logistic Regression**

---

## **1. Binary Logistic Regression**

### **Definition**

* Used when the **dependent variable has only 2 classes** (Yes/No, 0/1, True/False).
* Most common logistic regression type.

### **Equation**

$$
P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \dots + \beta_nX_n)}}
$$

Decision rule:

$$
\hat{Y} =
\begin{cases}
1 & \text{if } P \geq 0.5 \\
0 & \text{if } P < 0.5
\end{cases}
$$

### **Example Applications**

* Predict whether a student **passes/fails** an exam.
* Predict if a transaction is **fraudulent or not**.
* Predict if a customer will **churn (leave) or stay**.

---

## **2. Multinomial Logistic Regression (Softmax Regression)**

### **Definition**

* Used when the **dependent variable has more than two categories** with **no natural ordering**.
* Generalizes binary logistic regression by using the **softmax function**.

### **Equation**

For class $k$ (out of $K$ classes):

$$
P(Y=k|X) = \frac{e^{\beta_{0k} + \beta_{1k}X_1 + \dots + \beta_{nk}X_n}}{\sum_{j=1}^{K} e^{\beta_{0j} + \beta_{1j}X_1 + \dots + \beta_{nj}X_n}}
$$

* Ensures all probabilities add up to 1.
* Prediction = class with highest probability.

### **Example Applications**

* Predict type of **fruit** (apple, orange, banana).
* Predict **mode of transport** (car, bus, train, bicycle).
* Classifying **news articles** into topics (sports, politics, business).

---

## **3. Ordinal Logistic Regression**

### **Definition**

* Used when the **dependent variable has more than two categories with a natural order**.
* Example: Rating (Low, Medium, High), Education level (High School < Bachelor < Master < PhD).

### **Equation**

* Uses **cumulative logits** instead of simple logits:

$$
\log \left( \frac{P(Y \leq j)}{P(Y > j)} \right) = \theta_j - (\beta_1X_1 + \dots + \beta_nX_n)
$$

Where:

* $j$ = threshold for categories (e.g., cut-off between Low and Medium).
* $\theta_j$ = threshold parameter for category $j$.

### **Example Applications**

* Predicting **customer satisfaction** (Very Unsatisfied → Very Satisfied).
* Predicting **disease severity** (Mild, Moderate, Severe).
* Predicting **credit ratings** (AAA, AA, A, BBB...).

---

## **4. Quick Comparison Table**

| Variant                  | Outcome Variable         | Function Used    | Example                              |
| ------------------------ | ------------------------ | ---------------- | ------------------------------------ |
| **Binary Logistic**      | 2 categories (0/1)       | Sigmoid          | Spam vs Not Spam                     |
| **Multinomial Logistic** | >2 categories, unordered | Softmax          | Classify animals (dog, cat, bird)    |
| **Ordinal Logistic**     | >2 categories, ordered   | Cumulative Logit | Customer satisfaction (Low/Med/High) |

---

✅ **Key Takeaways:**

* **Binary Logistic Regression** → 2 classes.
* **Multinomial Logistic Regression** → multiple unordered classes (softmax).
* **Ordinal Logistic Regression** → multiple ordered classes (cumulative logit).

---
---
---