1. What is Logistic Regression, and how does it differ from Linear Regression?
--> Logistic Regression is a classification algorithm that predicts the probability of a binary (or multiclass, via extensions) outcome, mapping inputs to values between 0 and 1. Instead of fitting a straight line to y, it fits an S‑shaped curve using the sigmoid (logistic) function. Linear Regression, by contrast, predicts a continuous numeric target and minimizes squared error, whereas Logistic Regression optimizes a likelihood-based cost for categorical decisions.

---

2. What is the mathematical equation of Logistic Regression?
--> For a single instance with feature vector x, the model computes

𝑝
(
𝑦
=
1
∣
𝑥
)
=
𝜎
(
𝑧
)
=
1
1
+
𝑒
−
𝑧
,
where
𝑧
=
𝛽
0
+
𝛽
1
𝑥
1
+
⋯
+
𝛽
𝑘
𝑥
𝑘
p(y=1∣x)=σ(z)=
1+e
−z
1
​
 ,where z=β
0
​
 +β
1
​
 x
1
​
 +⋯+β
k
​
 x
k
​

Here σ( ) is the sigmoid function, β are the learned coefficients, and the output p is the estimated probability of the positive class.

---

3. Why do we use the Sigmoid function in Logistic Regression?
--> The sigmoid squashes any real‑valued input to the open interval (0, 1), letting the linear combination of features be interpreted as a probability. Its smooth, differentiable shape enables gradient‑based optimization, and its log‑odds interpretation (
log
⁡
𝑝
1
−
𝑝
=
𝑧
log
1−p
p
​
 =z) makes coefficients easy to read as log‑odds changes.

---

4. What is the cost function of Logistic Regression?
--> Logistic Regression minimizes the negative log‑likelihood, often called binary cross‑entropy:

𝐽
(
𝛽
)
=
−
1
𝑛
∑
𝑖
=
1
𝑛
[
𝑦
𝑖
log
⁡
𝑝
𝑖
+
(
1
−
𝑦
𝑖
)
log
⁡
(
1
−
𝑝
𝑖
)
]
J(β)=−
n
1
​
  
i=1
∑
n
​
 [y
i
​
 logp
i
​
 +(1−y
i
​
 )log(1−p
i
​
 )]
where
𝑝
𝑖
=
𝜎
(
𝑧
𝑖
)
p
i
​
 =σ(z
i
​
 ). Minimizing this convex loss finds the parameter set that maximizes the probability of observing the training labels.

---

5. What is Regularization in Logistic Regression and why is it needed?
--> Regularization adds a penalty term to the cost function to discourage overly large coefficients, thereby reducing variance and mitigating overfitting. By shrinking or constraining β, the model becomes simpler, more stable on unseen data, and less sensitive to multicollinearity.

---

6. Explain the difference between Lasso, Ridge, and Elastic Net regression.
--> Ridge (L2) adds
𝜆
∑
𝛽
𝑗
2
λ∑β
j
2
​
 , shrinking coefficients smoothly toward zero without eliminating any.
Lasso (L1) adds
𝜆
∑
∣
𝛽
𝑗
∣
λ∑∣β
j
​
 ∣, which can drive some coefficients exactly to zero, performing built‑in feature selection.
Elastic Net blends both penalties:
𝜆
[
𝛼
∑
∣
𝛽
𝑗
∣
+
(
1
−
𝛼
)
∑
𝛽
𝑗
2
]
λ[α∑∣β
j
​
 ∣+(1−α)∑β
j
2
​
 ], balancing Ridge’s stability with Lasso’s sparsity.

---

7. When should we use Elastic Net instead of Lasso or Ridge?
--> Elastic Net excels when you have many correlated predictors or expect only a subset to be truly influential. Ridge alone can’t drop redundant features, and Lasso may arbitrarily pick one among correlated variables; Elastic Net tends to share weights across correlated groups while still allowing others to shrink to zero, yielding a more reliable, interpretable model.

---

8. What is the impact of the regularization parameter (λ) in Logistic Regression?
--> λ controls the strength of the penalty: a larger λ enforces more shrinkage, increasing bias but lowering variance (risking underfitting), while a smaller λ allows coefficients to grow, reducing bias but heightening variance (risking overfitting). Model performance typically peaks at an intermediate λ found via cross‑validation.

---

9. What are the key assumptions of Logistic Regression?

--> The true log‑odds of the outcome is linearly related to the predictors.

Observations are independent.

There is little or no multicollinearity among predictors.

The sample is sufficiently large to approximate maximum‑likelihood properties. Unlike linear regression, homoscedasticity and normal residuals are not required.

---

10. What are some alternatives to Logistic Regression for classification tasks?
--> Common alternatives include Decision Trees, Random Forests, Gradient Boosting (e.g., XGBoost, LightGBM), Support Vector Machines, k‑Nearest Neighbors, Naïve Bayes, and neural networks (from simple MLPs to deep architectures like CNNs or Transformers). Choice depends on data size, feature types, interpretability needs, and non‑linearity of relationships.

---

11. What are Classification Evaluation Metrics?
--> Classification metrics evaluate how well a model distinguishes between classes. Key metrics include Accuracy, Precision, Recall, F1-Score, and ROC-AUC. While accuracy is easy to understand, F1-score balances precision and recall, making it useful in imbalanced datasets.

---

12. How does class imbalance affect Logistic Regression?
--> Class imbalance can lead to biased predictions toward the majority class, reducing the model’s ability to detect the minority class. This inflates accuracy while harming precision, recall, and F1-score for the minority. To address this, techniques like class weighting, resampling, or anomaly detection methods are used.

---

13. What is Hyperparameter Tuning in Logistic Regression?
--> Hyperparameter tuning involves selecting optimal values for parameters like the regularization strength (C) and penalty type (L1/L2). It’s typically done using grid search or randomized search with cross-validation. Tuning improves model generalization by preventing underfitting or overfitting.

---

14. What are different solvers in Logistic Regression? Which one should be used?
--> Common solvers include liblinear, saga, newton-cg, lbfgs, and sag.

liblinear is good for small datasets and L1 regularization.

saga handles large datasets and supports both L1 and L2.

lbfgs is efficient for multiclass and dense data.
Choose based on data size, sparsity, and regularization.

---

15. How is Logistic Regression extended for multiclass classification?
--> Multiclass Logistic Regression can be extended using One-vs-Rest (OvR) or Softmax (Multinomial) strategies. OvR trains one binary classifier per class, while Softmax generalizes logistic regression by estimating probabilities across all classes simultaneously. Most libraries (like scikit-learn) support both.

---

16. What are the advantages and disadvantages of Logistic Regression?
--> Advantages: It’s easy to implement, interpretable, efficient on small datasets, and works well with linearly separable data.
Disadvantages: It struggles with non-linear patterns, sensitive to outliers, assumes linearity in log-odds, and doesn’t perform well with complex feature interactions unless explicitly modeled.

---

17. What are some use cases of Logistic Regression?
--> Logistic Regression is widely used in spam detection, credit scoring, disease diagnosis (e.g., diabetes prediction), customer churn prediction, and ad click prediction. Its interpretability makes it popular in regulated industries like healthcare and finance.

---

18. What is the difference between Softmax Regression and Logistic Regression?
--> Logistic Regression is binary and uses the sigmoid function to model two-class problems. Softmax Regression (Multinomial Logistic Regression) generalizes this to multiple classes using the softmax function, which ensures predicted class probabilities sum to 1 across all classes.

---

19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?
--> OvR is simpler, trains multiple binary models, and works well when classes are well-separated. Softmax handles all classes in one model and is better when the classes are interrelated or overlapping. For large, balanced datasets with interconnected classes, Softmax is usually preferred.

---

20. How do we interpret coefficients in Logistic Regression?
--> Each coefficient represents the change in the log-odds of the target class for a one-unit increase in the predictor, keeping other variables constant. A positive coefficient increases the probability of the positive class; a negative one decreases it. Exponentiating the coefficient gives the odds ratio, useful for interpretation.