<a href="https://colab.research.google.com/github/Chaakash16/Python-Basics/blob/main/Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Theoretical**

### **1. What is Logistic Regression, and how does it differ from Linear Regression?**  
Logistic Regression is a classification algorithm used to predict categorical outcomes (e.g., spam vs. not spam). It applies the **sigmoid function** to map predictions between 0 and 1. In contrast, Linear Regression predicts continuous values.  

### **2. What is the mathematical equation of Logistic Regression?**  
The equation is:  
\[
P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n)}}
\]  
where **P(Y=1)** is the probability of the positive class, and **β** are the regression coefficients.  

### **3. Why do we use the Sigmoid function in Logistic Regression?**  
The sigmoid function maps any real-valued number to a probability between **0 and 1**, which helps in classification tasks by converting linear outputs into probabilities.  

### **4. What is the cost function of Logistic Regression?**  
The cost function used is the **Log Loss (Binary Cross-Entropy)**:  
\[
J(\theta) = - \frac{1}{m} \sum \left[ y \log(h) + (1 - y) \log(1 - h) \right]
\]  
where \( h \) is the predicted probability. It penalizes incorrect predictions.  

### **5. What is Regularization in Logistic Regression? Why is it needed?**  
Regularization prevents overfitting by adding a penalty to large coefficients. It ensures that the model generalizes well to unseen data. **L1 (Lasso) and L2 (Ridge) regularization** are commonly used.  

### **6. Explain the difference between Lasso, Ridge, and Elastic Net Regression.**  
- **Lasso (L1)**: Shrinks some coefficients to zero, performing feature selection.  
- **Ridge (L2)**: Shrinks coefficients but does not eliminate them.  
- **Elastic Net**: A mix of L1 and L2, useful when there are many correlated features.  

### **7. When should we use Elastic Net instead of Lasso or Ridge?**  
Elastic Net is preferred when **features are highly correlated**. It benefits from Lasso’s feature selection and Ridge’s ability to handle multicollinearity.  

### **8. What is the impact of the regularization parameter (λ) in Logistic Regression?**  
- **Higher λ** → More regularization → Simpler model, but may underfit.  
- **Lower λ** → Less regularization → More complex model, but may overfit.  

### **9. What are the key assumptions of Logistic Regression?**  
- **No multicollinearity** among independent variables.  
- **Logistic relationship** between independent variables and the log-odds.  
- **Observations are independent** (no autocorrelation).  

### **10. What are some alternatives to Logistic Regression for classification tasks?**  
- Decision Trees  
- Random Forest  
- Support Vector Machines (SVM)  
- Neural Networks  
- Naïve Bayes  

### **11. What are Classification Evaluation Metrics?**  
- **Accuracy**: (TP + TN) / Total  
- **Precision**: TP / (TP + FP)  
- **Recall**: TP / (TP + FN)  
- **F1 Score**: Harmonic mean of precision and recall  
- **AUC-ROC**: Measures model discrimination ability  

### **12. How does class imbalance affect Logistic Regression?**  
Class imbalance skews predictions toward the majority class. Solutions include **resampling techniques (oversampling, undersampling)** and using **weighted loss functions**.  

### **13. What is Hyperparameter Tuning in Logistic Regression?**  
It involves adjusting parameters like **C (inverse of λ), solver, and penalty type** to improve model performance. Grid Search and Random Search are common techniques.  

### **14. What are different solvers in Logistic Regression? Which one should be used?**  
- **lbfgs**: Default, efficient for small to medium datasets.  
- **liblinear**: Works well for small datasets, supports L1 regularization.  
- **saga**: Best for large datasets, supports L1, L2, and Elastic Net.  

### **15. How is Logistic Regression extended for multiclass classification?**  
By using:  
- **One-vs-Rest (OvR)**: Trains multiple binary classifiers.  
- **Softmax Regression**: Generalizes logistic regression to multiple classes.  

### **16. What are the advantages and disadvantages of Logistic Regression?**  
**Advantages**: Simple, interpretable, works well for small datasets.  
**Disadvantages**: Assumes a linear decision boundary, struggles with large and complex datasets.  

### **17. What are some use cases of Logistic Regression?**  
- Spam detection  
- Credit scoring  
- Disease diagnosis  
- Fraud detection  

### **18. What is the difference between Softmax Regression and Logistic Regression?**  
- **Logistic Regression**: Used for **binary classification** (0 or 1).  
- **Softmax Regression**: Used for **multiclass classification**, assigning probabilities across multiple categories.  

### **19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?**  
- **OvR**: Works well for small datasets and binary-like problems.  
- **Softmax**: Preferred when classes are **mutually exclusive** and dataset is large.  

### **20. How do we interpret coefficients in Logistic Regression?**  
Each coefficient represents the **log-odds change** for a unit increase in that predictor. A positive coefficient increases the likelihood of the event, while a negative one decreases it.  

---  


**Practical**

1. Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic Regression, and prints the model accuracy.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Model Accuracy: {accuracy_score(y_test, y_pred):.2f}")

2. Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1') and print the model accuracy.

In [None]:
model = LogisticRegression(penalty='l1', solver='liblinear', max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"L1 Regularization Model Accuracy: {accuracy_score(y_test, y_pred):.2f}")

3. Write a Python program to train Logistic Regression with L2 regularization (Ridge) using LogisticRegression(penalty='l2'). Print model accuracy and coefficients.

In [None]:
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"L2 Regularization Model Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("Model Coefficients:", model.coef_)

4. Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet').

In [None]:
model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Elastic Net Regularization Model Accuracy: {accuracy_score(y_test, y_pred):.2f}")

5. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr'.
python
Copy
Edit


In [None]:
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Multiclass (OvR) Model Accuracy: {accuracy_score(y_test, y_pred):.2f}")

6. Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic Regression. Print the best parameters and accuracy.

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}
grid_search = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Model Accuracy: {grid_search.best_score_:.2f}")

7. Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the average accuracy.

In [None]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(LogisticRegression(max_iter=200), X, y, cv=skf, scoring='accuracy')
print(f"Average Accuracy using Stratified K-Fold: {scores.mean():.2f}")

8. Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its accuracy.

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')
X, y = df.iloc[:, :-1], df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Model Accuracy from CSV dataset: {accuracy_score(y_test, y_pred):.2f}")

9. Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in Logistic Regression. Print the best parameters and accuracy.

In [None]:
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

param_dist = {'C': np.logspace(-4, 4, 20), 'penalty': ['l1', 'l2'], 'solver': ['liblinear', 'saga']}
random_search = RandomizedSearchCV(LogisticRegression(max_iter=200), param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
random_search.fit(X_train, y_train)

print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Model Accuracy: {random_search.best_score_:.2f}")

10. Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy.

In [None]:
from sklearn.multiclass import OneVsOneClassifier

model = OneVsOneClassifier(LogisticRegression(max_iter=200))
model.fit(X_train, y_train)
print(f"One-vs-One (OvO) Model Accuracy: {model.score(X_test, y_test):.2f}")

11. Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binary classification.

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

12. Write a Python program to train a Logistic Regression model and evaluate its performance using Precision, Recall, and F1-Score.

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.2f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.2f}")
print(f"F1-Score: {f1_score(y_test, y_pred, average='weighted'):.2f}")

13. Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to improve model performance.

In [None]:
model = LogisticRegression(class_weight='balanced', max_iter=200)
model.fit(X_train, y_train)
print(f"Model Accuracy on Imbalanced Data: {model.score(X_test, y_test):.2f}")

14. Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and evaluate performance.

In [None]:
df = pd.read_csv('titanic.csv')
df.fillna(df.mean(), inplace=True)
df = pd.get_dummies(df, drop_first=True)

X, y = df.iloc[:, :-1], df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
print(f"Titanic Dataset Model Accuracy: {model.score(X_test, y_test):.2f}")

15. Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression model. Evaluate its accuracy and compare results with and without scaling.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
accuracy_without_scaling = model.score(X_test, y_test)

model.fit(X_train_scaled, y_train)
accuracy_with_scaling = model.score(X_test_scaled, y_test)

print(f"Accuracy without Scaling: {accuracy_without_scaling:.2f}")
print(f"Accuracy with Scaling: {accuracy_with_scaling:.2f}")

16. Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score.

In [None]:
from sklearn.metrics import roc_auc_score

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred_prob = model.predict_proba(X_test)[:, 1]

print(f"ROC-AUC Score: {roc_auc_score(y_test, y_pred_prob):.2f}")

17. Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate accuracy.

In [None]:
model = LogisticRegression(C=0.5, max_iter=200)
model.fit(X_train, y_train)
print(f"Custom Learning Rate Model Accuracy: {model.score(X_test, y_test):.2f}")

18. Write a Python program to train Logistic Regression and identify important features based on model coefficients.

In [None]:
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

feature_importance = abs(model.coef_).sum(axis=0)
features = pd.DataFrame({'Feature': df.columns[:-1], 'Importance': feature_importance})
print(features.sort_values(by="Importance", ascending=False))

19. Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa Score.

In [None]:
from sklearn.metrics import cohen_kappa_score

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"Cohen’s Kappa Score: {cohen_kappa_score(y_test, y_pred):.2f}")

20. Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary classification.

In [None]:
from sklearn.metrics import precision_recall_curve

y_scores = model.decision_function(X_test)
precision, recall, _ = precision_recall_curve(y_test, y_scores)

plt.plot(recall, precision, marker='.')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()

21. Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare their accuracy.

In [None]:
solvers = ['liblinear', 'saga', 'lbfgs']
for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=200)
    model.fit(X_train, y_train)
    print(f"Solver: {solver}, Accuracy: {model.score(X_test, y_test):.2f}")

22. Write a Python program to train Logistic Regression and evaluate its performance using Matthews Correlation Coefficient (MCC).

In [None]:
from sklearn.metrics import matthews_corrcoef

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(f"Matthews Correlation Coefficient: {matthews_corrcoef(y_test, y_pred):.2f}")

23. Write a Python program to train Logistic Regression on both raw and standardized data. Compare their accuracy to see the impact of feature scaling.

In [None]:
model.fit(X_train, y_train)
raw_accuracy = model.score(X_test, y_test)

model.fit(X_train_scaled, y_train)
scaled_accuracy = model.score(X_test_scaled, y_test)

print(f"Accuracy on Raw Data: {raw_accuracy:.2f}")
print(f"Accuracy on Standardized Data: {scaled_accuracy:.2f}")

24. Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using cross-validation.

In [None]:
from sklearn.model_selection import cross_val_score

C_values = [0.01, 0.1, 1, 10]
for C in C_values:
    model = LogisticRegression(C=C, max_iter=200)
    scores = cross_val_score(model, X, y, cv=5)
    print(f"C={C}, Accuracy: {scores.mean():.2f}")

25. Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to make predictions.

In [None]:
import joblib

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
joblib.dump(model, "logistic_model.pkl")

loaded_model = joblib.load("logistic_model.pkl")
print(f"Loaded Model Accuracy: {loaded_model.score(X_test, y_test):.2f}")