1. What is Logistic Regression, and how does it differ from Linear Regression?
Ans: Logistic Regression predicts probabilities for classification problems (output 0 or 1), while Linear Regression predicts continuous values.





2. What is the mathematical equation of Logistic Regression?
Ans:

𝑃
(
𝑦
=
1
∣
𝑥
)
=
1
1
+
𝑒
−
(
𝛽
0
+
𝛽
1
𝑥
1
+
⋯
+
𝛽
𝑛
𝑥
𝑛
)
P(y=1∣x)=
1+e
−(β
0
​
 +β
1
​
 x
1
​
 +⋯+β
n
​
 x
n
​
 )

1
​





3. Why do we use the Sigmoid function in Logistic Regression?
Ans: The sigmoid function maps any real-valued number to a range between 0 and 1, making it ideal for binary classification.





5. What is Regularization in Logistic Regression? Why is it needed?
Ans: Regularization (L1/L2) prevents overfitting by penalizing large coefficients in the model.




6. Explain the difference between Lasso, Ridge, and Elastic Net regression.
Ans:

Lasso (L1): Shrinks some coefficients to zero (feature selection).

Ridge (L2): Shrinks all coefficients, but none to zero.

Elastic Net: Combines L1 and L2.





7. When should we use Elastic Net instead of Lasso or Ridge?
Ans: When you have many correlated features or want a balance between feature selection and coefficient shrinkage.




8. What is the impact of the regularization parameter (λ or C)?
Ans: Higher λ (lower C) means stronger regularization, reducing overfitting but possibly underfitting.




9. What are the key assumptions of Logistic Regression?
Ans:

Linear relationship between features and log-odds

No multicollinearity

Independence of observations

Large sample size






10. What are some alternatives to Logistic Regression for classification tasks?
Ans:

Decision Trees

Random Forest

SVM

K-NN

Naive Bayes

Neural Networks



11. What are Classification Evaluation Metrics?
Ans: Accuracy, Precision, Recall, F1-Score, ROC-AUC, Confusion Matrix, etc.




12. How does class imbalance affect Logistic Regression?
Ans: It may bias the model toward the majority class. Solutions include resampling, class weights, or synthetic data.


13. What is Hyperparameter Tuning in Logistic Regression?
Ans: Process of optimizing model parameters (like C, penalty) using methods like GridSearchCV or RandomizedSearchCV.




14. What are different solvers in Logistic Regression? Which one should be used?
Ans:

liblinear: For small datasets, supports L1

saga: For large datasets, supports L1/L2/elasticnet

lbfgs: Fast and accurate for L2

newton-cg: For L2



15. How is Logistic Regression extended for multiclass classification?
Ans: Using strategies like One-vs-Rest (OvR) or Softmax (multinomial).




16. What are the advantages and disadvantages of Logistic Regression?
Ans:

✅ Simple, fast, interpretable

❌ Assumes linearity, struggles with complex data



17. What are some use cases of Logistic Regression?
Ans:

Spam detection

Disease diagnosis

Credit scoring

Customer churn prediction




18. What is the difference between Softmax Regression and Logistic Regression?
Ans:

Logistic: Binary classification

Softmax: Multiclass classification with probabilities across multiple classes




19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?
Ans:

Use OvR for simplicity and small datasets

Use Softmax for true multiclass problems with mutual exclusivity




20. How do we interpret coefficients in Logistic Regression?
Ans: Each coefficient shows the change in log-odds of the target for a one-unit increase in that feature.



# Practical

In [None]:
#1. RandomizedSearchCV for Tuning

from sklearn.model_selection import RandomizedSearchCV
params = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}
rs = RandomizedSearchCV(LogisticRegression(), params, cv=3)
rs.fit(X_train, y_train)
print("Best Params:", rs.best_params_)
print("Accuracy:", rs.score(X_test, y_test))


In [None]:
#2. One-vs-One (OvO) Multiclass Logistic Regression

from sklearn.multiclass import OneVsOneClassifier
ovo = OneVsOneClassifier(LogisticRegression())
ovo.fit(X_train, y_train)
print("OvO Accuracy:", ovo.score(X_test, y_test))


In [None]:
#3. Confusion Matrix Visualization

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(cm).plot()


In [None]:
#4. Precision, Recall, F1-Score

from sklearn.metrics import precision_score, recall_score, f1_score
print("Precision:", precision_score(y_test, y_pred, average='macro'))
print("Recall:", recall_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))


In [None]:
#5. Imbalanced Data with Class Weights

model = LogisticRegression(class_weight='balanced')
model.fit(X_train, y_train)
print("Imbalanced Accuracy:", model.score(X_test, y_test))

In [None]:
#6. Titanic Dataset - Missing Values Handling

import pandas as pd
df = pd.read_csv('titanic.csv')
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

In [None]:
#7. Feature Scaling (Standardization)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
#8. ROC-AUC Score

from sklearn.metrics import roc_auc_score
prob = model.predict_proba(X_test)
print("ROC-AUC:", roc_auc_score(y_test, prob, multi_class='ovr'))


In [None]:
#9. Custom Learning Rate (C=0.5)

model = LogisticRegression(C=0.5)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))


In [None]:
#10. Feature Importance

import numpy as np
print("Feature Importance:", np.abs(model.coef_))


In [None]:
#11. Cohen’s Kappa Score

from sklearn.metrics import cohen_kappa_score
print("Cohen Kappa Score:", cohen_kappa_score(y_test, y_pred))


In [None]:
#12. Precision-Recall Curve

from sklearn.metrics import PrecisionRecallDisplay
PrecisionRecallDisplay.from_estimator(model, X_test, y_test)



In [None]:
#13. Solvers Comparison

for solver in ['liblinear', 'saga', 'lbfgs']:
    m = LogisticRegression(solver=solver, max_iter=200)
    m.fit(X_train, y_train)
    print(solver, "accuracy:", m.score(X_test, y_test))


In [None]:
#14. Matthews Correlation Coefficient (MCC)

from sklearn.metrics import matthews_corrcoef
print("MCC:", matthews_corrcoef(y_test, y_pred))


In [None]:
#15. Compare Raw vs Standardized Accuracy

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_s, X_test_s, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3)
model = LogisticRegression()
model.fit(X_train_s, y_train)
print("Scaled Accuracy:", model.score(X_test_s, y_test))


In [None]:
#16. Optimal C using Cross-Validation

from sklearn.model_selection import cross_val_score
for c in [0.01, 0.1, 1, 10]:
    score = cross_val_score(LogisticRegression(C=c), X, y, cv=5).mean()
    print("C=", c, "Score:", score)


In [None]:
#17. Save and Load Model with joblib

import joblib
joblib.dump(model, 'logreg_model.pkl')
model_loaded = joblib.load('logreg_model.pkl')
print("Loaded model accuracy:", model_loaded.score(X_test, y_test))


In [None]:
#18. Train Logistic Regression on the Titanic dataset, handle missing values, and evaluate performance

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load Titanic dataset
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")

# Basic preprocessing
df = df[['Survived', 'Pclass', 'Sex', 'Age', 'Fare']].dropna()
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})

X = df.drop('Survived', axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
model = LogisticRegression()
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


In [None]:
#19. Apply feature scaling (Standardization) before training a Logistic Regression model


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3)

model = LogisticRegression()
model.fit(X_train, y_train)
print("Accuracy with scaling:", model.score(X_test, y_test))


In [None]:
#20. Evaluate performance using ROC-AUC score

from sklearn.metrics import roc_auc_score
probs = model.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, probs))


In [None]:
#21. Train with custom learning rate (C = 0.5) and evaluate

model = LogisticRegression(C=0.5)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))



In [None]:
#22. Identify important features based on model coefficients


importance = model.coef_[0]
for i, v in enumerate(importance):
    print(f"Feature {i}: {v:.4f}")


In [None]:
#23. Evaluate performance using Cohen’s Kappa Score

from sklearn.metrics import cohen_kappa_score
print("Cohen Kappa Score:", cohen_kappa_score(y_test, model.predict(X_test)))

In [None]:
#24. Visualize the Precision-Recall Curve for binary classification

from sklearn.metrics import PrecisionRecallDisplay
PrecisionRecallDisplay.from_estimator(model, X_test, y_test)


In [None]:
#25. Compare accuracy of different solvers: liblinear, saga, lbfgs


solvers = ['liblinear', 'saga', 'lbfgs']
for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=200)
    model.fit(X_train, y_train)
    print(f"{solver} accuracy:", model.score(X_test, y_test))
