### 🎯 **Logistic Regression 🎯**

### 📌 **Definition:**

- Logistic Regression is used to predict a binary or categorical outcome (1 or 0, true or false) based on one or more input features.
- Goal is to predict the probability of an observation belonging one of two categories.
### 📈 **Equation:**

**P(Y = 1) = 1 / (1 + e^-(β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ))**

→ P(Y = 1) = Probability of the target being 1

→ X = Features

→ β = Coefficients

→ e = Euler’s number (approx. 2.718)

### 🧠 **Key Assumptions (VERY IMPORTANT!)**

✅ **Linearity in the Logit** – Relationship between features and the log-odds of the target is linear.

✅ **Independence of observations** – Each observation is independent.

✅ **No multicollinearity** – Predictors should not be highly correlated.

✅ **Large Sample Size** – Logistic regression generally requires a larger sample size for reliable results.

### 🧮 **Types:**

* **Binary Logistic Regression** → One dependent binary variable (e.g., yes/no, 0/1).
* **Multinomial Logistic Regression** → More than two categories for the dependent variable.

### 📊 **Evaluation Metrics:**

* **Accuracy** – Percentage of correct predictions.
* **Precision, Recall, F1-Score** – Metrics for imbalanced classes.
* **AUC-ROC Curve** – Evaluates classification performance at all thresholds.


Sure! Here's the updated version with your point added:

---

### ✅ **Logistic Function (Sigmoid Function)**

The **logistic function**, also known as the **sigmoid function**, is a mathematical function used to model the **probability of a binary outcome**. It produces an **S-shaped curve** that maps any real-valued number to a value between **0 and 1**.

### 📈 **Formula:**

$$
f(x) = \frac{1}{1 + e^{-x}}
$$

Where:

* **x** = Linear combination (β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ)
* **e** = Euler’s number (\~2.718)

### 🔍 **Output Behavior:**

* **x → +∞** → **f(x) → 1**
* **x → -∞** → **f(x) → 0**
* **x = 0** → **f(x) = 0.5**

### 📊 **Use in Logistic Regression:**

* Converts the linear output into a **probability**.
* Used to classify:

  * **f(x) ≥ 0.5** → Class 1
  * **f(x) < 0.5** → Class 0

---



In [11]:
# LOGISTIC REGRESSION:
DATA = {
    'Age': [22, 25, 47, 52, 46, 56, 55, 60],
    'Purchase': [0, 0, 1, 1, 1, 1, 1, 1]
}

# Convert dictionary to DataFrame
import pandas as pd
DATA = pd.DataFrame(DATA)

# Split data-
from sklearn.model_selection import train_test_split
X__TRAIN, X__TEST, Y__TRAIN, Y__TEST = train_test_split(DATA[["Age"]],DATA["Purchase"], test_size=0.3, random_state=0)

# Create and train model
from sklearn.linear_model import LogisticRegression
LOGR = LogisticRegression()
LOGR.fit(X__TRAIN, Y__TRAIN)

# Make predictions
y_pred = LOGR.predict(X__TEST)


# Evaluation Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Accuracy
accuracy = accuracy_score(Y__TEST, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Precision
precision = precision_score(Y__TEST, y_pred)
print(f"Precision: {precision:.4f}")

# Recall
recall = recall_score(Y__TEST, y_pred)
print(f"Recall: {recall:.4f}")

# F1 Score
f1 = f1_score(Y__TEST, y_pred)
print(f"F1 Score: {f1:.4f}")


# Calculate confusion matrix
conf_matrix = confusion_matrix(Y__TEST, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000
F1 Score: 1.0000
Confusion Matrix:
[[1 0]
 [0 2]]




## **📊 Confusion Matrix & Goodness of Fit**

The **Confusion Matrix** is a powerful tool to evaluate the performance of a classification model. It compares the actual (observed) results with predicted outcomes.

### **Confusion Matrix Table:**

|                              | **Observed Positive (Y) ✅** | **Observed Negative (N) ❌** |
| ---------------------------- | --------------------------- | --------------------------- |
| **Predicted Positive (Y) ✅** | **a = TP (True Positive)**  | **b = FP (False Positive)** |
| **Predicted Negative (N) ❌** | **c = FN (False Negative)** | **d = TN (True Negative)**  |

* **True Positives (TP)** ✅: Correct predictions of the positive class.
* **True Negatives (TN)** ❌: Correct predictions of the negative class.
* **False Positives (FP)** 🚫: Incorrectly predicting the positive class.
* **False Negatives (FN)** ❌: Incorrectly predicting the negative class.



### **Goodness of Fit**

The model fits well if:

* **True Positives (TP)** and **True Negatives (TN)** are **high** 💯.
* **False Positives (FP)** and **False Negatives (FN)** are **low** 🔴.



### **Key Metrics:**

1. **Sensitivity (Recall)**: Measures how well the model identifies positives.

   $$
   \text{Sensitivity} = \frac{a}{a + c}
   $$

2. **Specificity**: Measures how well the model identifies negatives.

   $$
   \text{Specificity} = \frac{d}{b + d}
   $$

---


| **Metric**                  | **Definition**                                                                                        | **Code**                           | **Interpretation**                                                                                                      |
| --------------------------- | ----------------------------------------------------------------------------------------------------- | ---------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| **Accuracy** ✔              | The proportion of correct predictions to total predictions.                                           | `accuracy_score(y_true, y_pred)`   | - **High Accuracy**: Model performs well overall ✅<br> - **Low Accuracy**: Model struggles with predictions ❌           |
| **Precision** 🎯            | The proportion of true positives to the total predicted positives.                                    | `precision_score(y_true, y_pred)`  | - **High Precision**: Few false positives, good for imbalanced data ⚖️<br> - **Low Precision**: More false positives 🔴 |
| **Recall (Sensitivity)** 📉 | The proportion of true positives to the total actual positives.                                       | `recall_score(y_true, y_pred)`     | - **High Recall**: Few false negatives, good for detecting all positives 🔍<br> - **Low Recall**: Misses positives ⚠️   |
| **F1 Score** 🔥             | The harmonic mean of Precision and Recall, balancing both metrics.                                    | `f1_score(y_true, y_pred)`         | - **High F1**: Balance between Precision and Recall 🌟<br> - **Low F1**: Imbalance in Precision and Recall 🔧           |
| **ROC AUC** 🔵              | Measures the area under the ROC curve, indicating the model's ability to distinguish between classes. | `roc_auc_score(y_true, y_pred)`    | - **High AUC**: Good at distinguishing classes 🔵<br> - **Low AUC**: Struggles to distinguish between classes ⚠️        |
| **Confusion Matrix** 🔲     | A table showing actual vs. predicted classifications for each class.                                  | `confusion_matrix(y_true, y_pred)` | - **Diagonal elements**: Correct predictions<br> - **Off-diagonal elements**: Incorrect predictions 🛑                  |
