## Logistic Regression – Overview
| Category       | Details                                                                                                                                                     |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Definition** | A **supervised learning algorithm** used for binary classification problems. It estimates the probability that a given input belongs to a certain class.    |
| **Key Idea**   | Instead of predicting continuous output (like linear regression), it predicts **probabilities** using a **sigmoid function** to map values between 0 and 1. |
| **Use Cases**  | Email Spam Detection, Credit Card Fraud Detection, Customer Churn Prediction, Disease Diagnosis (yes/no)                                                    |


### 📈 Mathematical Foundation
| Component               | Explanation                                                                                                                                        |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Hypothesis Function** | $h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}$                                                                                                      |
| **Cost Function**       | Cross-Entropy Loss: <br> $-\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] $ |
| **Optimization**        | Gradient Descent / Stochastic Gradient Descent                                                                                                     |
| **Decision Boundary**   | If $h_\theta(x) \geq 0.5 \Rightarrow y = 1$, else $y = 0$                                                                                          |


### ⚙️ Python Implementation Example (Using scikit-learn)


In [1]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data[:100, :2]  # Taking 2 features for simplicity and binary target
y = iris.target[:100]    # Binary classification (setosa vs versicolor)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0


### 📊 Evaluation Metrics for Classification
| Metric                   | Formula / Use                                                                             |
| ------------------------ | ----------------------------------------------------------------------------------------- |
| **Accuracy**             | $\frac{TP + TN}{TP + TN + FP + FN}$                                                       |
| **Precision**            | $\frac{TP}{TP + FP}$                                                                      |
| **Recall (Sensitivity)** | $\frac{TP}{TP + FN}$                                                                      |
| **F1-Score**             | Harmonic mean of Precision & Recall                                                       |
| **AUC-ROC**              | Probability that the model ranks a random positive instance higher than a random negative |


### 🧠 Interview Questions – Logistic Regression
| Level        | Question                                                        | Expected Answer                                                                                |
| ------------ | --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| Beginner     | What is logistic regression?                                    | A classification algorithm that uses sigmoid function to predict binary outcomes.              |
| Intermediate | Why use sigmoid instead of linear output?                       | Because it maps any real-valued number into the range (0, 1) for probability interpretation.   |
| Intermediate | Explain the cost function in logistic regression.               | Uses cross-entropy (log loss) to penalize wrong predictions more harshly.                      |
| Advanced     | What are the assumptions of logistic regression?                | Linearity between independent variables and log-odds, no multicollinearity, large sample size. |
| Advanced     | Can logistic regression be used for multi-class classification? | Yes, via One-vs-Rest (OvR) or Softmax (for multinomial logistic regression).                   |

