___
#  ML - MODULE 7 Logistic regression ASSIGNMENT
---

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 1: What is Logistic Regression, and how does it differ from Linear Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Logistic Regression is a statistical method used for binary classification problems, where the outcome is categorical (typically 0 or 1). It estimates the probability that a given input belongs to a particular class using the logistic (sigmoid) function. 

In contrast, Linear Regression predicts continuous numeric outcomes by fitting a linear relationship between input features and the target variable.

The key difference is that Logistic Regression models probabilities and outputs values between 0 and 1, suitable for classification, while Linear Regression models continuous outcomes and can produce any real number.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 2: What is the mathematical equation of Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
The logistic regression model predicts the probability \( p \) of the positive class using:

$$
p = \sigma(z) = \frac{1}{1 + e^{-z}}
$$

where

$$
z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
$$

Here, $\beta_0$ is the intercept, $\beta_i$ are the coefficients, and $x_i$ are the input features. $\sigma(z)$ is the sigmoid function mapping any real value to the interval (0,1).

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 3: Why do we use the Sigmoid function in Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
The sigmoid function converts the linear combination of inputs (which can be any real number) into a probability value between 0 and 1. This probabilistic output allows Logistic Regression to model the likelihood of the input belonging to the positive class, making it suitable for classification tasks.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 4: What is the cost function of Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
The cost function is the Negative Log-Likelihood (also known as Log Loss or Cross-Entropy Loss):

$$
J(\beta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right]
$$

where $m$ is the number of samples, $y^{(i)}$ is the true label, and $\hat{y}^{(i)}$ is the predicted probability for the positive class.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 5: What is Regularization in Logistic Regression? Why is it needed </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Regularization is a technique to add a penalty term to the cost function to prevent overfitting by discouraging complex models (large coefficients). It helps improve generalization on unseen data by shrinking or eliminating less important feature coefficients.

Regularization is needed when the model is too complex, especially with high-dimensional data, to avoid overfitting and improve stability.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 6: Explain the difference between Lasso, Ridge, and Elastic Net regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
    
- **Lasso (L1) Regularization:** Adds the sum of absolute values of coefficients as a penalty. It can shrink some coefficients exactly to zero, thus performing feature selection.

* **Ridge (L2) Regularization:** Adds the sum of squared coefficients as a penalty. It shrinks coefficients towards zero but does not eliminate them entirely.

* **Elastic Net:** Combines L1 and L2 penalties, balancing between Ridge and Lasso. It is useful when there are correlated features and you want both feature selection and coefficient shrinkage.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 7: When should we use Elastic Net instead of Lasso or Ridge </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Elastic Net is preferred when the dataset has many correlated features. Lasso may arbitrarily select one feature and ignore others, while Ridge shrinks all coefficients evenly. Elastic Net combines the strengths of both, allowing group selection and better handling of multicollinearity.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 8: What is the impact of the regularization parameter (λ) in Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
The regularization parameter \( \lambda \) controls the strength of the penalty on coefficients:

* Large $\lambda$: Stronger penalty, more shrinkage, simpler model, possibly underfitting.
* Small $\lambda$: Weaker penalty, less shrinkage, model fits training data closely, possibly overfitting.

Choosing the right $\lambda$ balances bias and variance.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 9: What are the key assumptions of Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
- The dependent variable is binary.
- Observations are independent.
- There is a linear relationship between the log-odds of the outcome and the independent variables.
- No or little multicollinearity among predictors.
- Large sample size for stable estimates.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 10: What are some alternatives to Logistic Regression for classification tasks </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Alternatives include:
- Decision Trees
- Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN)
- Neural Networks
- Gradient Boosting Machines (XGBoost, LightGBM)
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 11: What are Classification Evaluation Metrics </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Metrics used to evaluate classification models include:
- Accuracy
- Precision
- Recall (Sensitivity)
- F1-Score
- ROC-AUC (Receiver Operating Characteristic - Area Under Curve)
- Confusion Matrix
- Log Loss
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 12: How does class imbalance affect Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Class imbalance can cause Logistic Regression to be biased towards the majority class, resulting in poor predictive performance for the minority class. The model might have high accuracy but low recall/precision on the minority class, thus misleading evaluation.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 13: What is Hyperparameter Tuning in Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Hyperparameter tuning involves selecting the best values for parameters that are not learned during training, such as the regularization strength \( \lambda \), type of regularization (L1, L2), and solver type, to optimize model performance on validation data.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 14: What are different solvers in Logistic Regression? Which one should be used </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Common solvers include:
- **liblinear:** Good for small datasets, supports L1 and L2 regularization.
- **newton-cg:** Handles L2 regularization, suitable for multinomial loss.
- **lbfgs:** Good for large datasets, supports L2 and multinomial.
- **sag/saga:** Fast for large datasets; saga supports L1 and Elastic Net.

Choice depends on dataset size and regularization type; for large datasets and L2, 'lbfgs' or 'saga' is preferred.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 15: How is Logistic Regression extended for multiclass classification </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
It is extended via:
    
- **One-vs-Rest (OvR):** Train one binary classifier per class versus all others.
- **Softmax (Multinomial Logistic Regression):** Directly models probabilities for multiple classes using the softmax function.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 16: What are the advantages and disadvantages of Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Advantages:
- Simple and interpretable.
- Efficient and fast to train.
- Outputs calibrated probabilities.
- Works well when the relationship is linear in log-odds.

Disadvantages:

* Cannot capture complex nonlinear relationships.
* Sensitive to outliers and multicollinearity.
* Performance can degrade with high-dimensional sparse data without regularization.

</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 17: What are some use cases of Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
- Medical diagnosis (disease presence/absence)
- Credit scoring and risk assessment
- Spam detection
- Customer churn prediction
- Marketing response modeling
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 18: What is the difference between Softmax Regression and Logistic Regression </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Softmax Regression (Multinomial Logistic Regression) generalizes Logistic Regression to multiple classes by predicting a probability distribution over multiple classes using the softmax function. Logistic Regression typically refers to binary classification using the sigmoid function.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 19: How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
- Use OvR when classes are imbalanced or when you want independent binary classifiers.
- Use Softmax (multinomial) when classes are mutually exclusive and balanced; it models all classes jointly, often yielding better calibrated probabilities.
</div><hr> 

<div style="font-family: Verdana; font-size: 20px; font-weight: bold; color: black;"> Question 20: How do we interpret coefficients in Logistic Regression? </div> <hr> 
<div style="font-family: Verdana; font-size: 18px; line-height: 1.6;"> 
Coefficients represent the change in the log-odds of the positive class for a one-unit increase in the predictor, holding others constant. Exponentiating a coefficient \( \beta_i \) gives the odds ratio \( e^{\beta_i} \), indicating how the odds change multiplicatively with a one-unit increase in the feature.
</div><hr> 


---
### PRACTICAL QUESTIONS:
---

### **1. Load dataset, train-test split, Logistic Regression, print accuracy**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **2. Logistic Regression with L1 regularization (Lasso)**

```python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(penalty='l1', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("L1 Regularization Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **3. Logistic Regression with L2 regularization (Ridge), print coefficients**

```python
model = LogisticRegression(penalty='l2', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("L2 Regularization Accuracy:", accuracy_score(y_test, y_pred))
print("Coefficients:", model.coef_)
```

---

### **4. Logistic Regression with Elastic Net Regularization**

```python
model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("Elastic Net Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **5. Multiclass Logistic Regression using `multi_class='ovr'`**

```python
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

model = LogisticRegression(multi_class='ovr', solver='liblinear', max_iter=1000)
model.fit(X, y)

y_pred = model.predict(X)
print("Multiclass OVR Accuracy:", accuracy_score(y, y_pred))
```

---

### **6. Hyperparameter tuning using GridSearchCV**

```python
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']  # required for L1
}

grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)
```

---

### **7. Stratified K-Fold Cross-Validation with Logistic Regression**

```python
from sklearn.model_selection import StratifiedKFold, cross_val_score

skf = StratifiedKFold(n_splits=5)
model = LogisticRegression(max_iter=1000)

scores = cross_val_score(model, X, y, cv=skf, scoring='accuracy')
print("Stratified K-Fold Accuracy Scores:", scores)
print("Average Accuracy:", scores.mean())
```

---

### **8. Load dataset from CSV, apply Logistic Regression, evaluate accuracy**

```python
import pandas as pd

# Replace with your CSV file path and column names
df = pd.read_csv("your_dataset.csv")

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("CSV Dataset Accuracy:", accuracy_score(y_test, y_pred))
```

---


### **9. RandomizedSearchCV for Logistic Regression**

```python
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
import numpy as np

param_dist = {
    'C': np.logspace(-4, 4, 20),
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}

model = LogisticRegression(max_iter=1000)
rand_search = RandomizedSearchCV(model, param_dist, n_iter=10, scoring='accuracy', cv=5)
rand_search.fit(X, y)

print("Best Parameters:", rand_search.best_params_)
print("Best Accuracy:", rand_search.best_score_)
```

---

### **10. One-vs-One (OvO) Multiclass Logistic Regression**

```python
from sklearn.multiclass import OneVsOneClassifier
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

model = OneVsOneClassifier(LogisticRegression(max_iter=1000))
model.fit(X, y)
print("OvO Accuracy:", model.score(X, y))
```

---

### **11. Confusion Matrix Visualization**

```python
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
```

---

### **12. Precision, Recall, F1-Score**

```python
from sklearn.metrics import precision_score, recall_score, f1_score

print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1-Score:", f1_score(y_test, y_pred))
```

---

### **13. Imbalanced Data with Class Weights**

```python
model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("Balanced Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **14. Titanic Dataset + Missing Values Handling**

```python
import seaborn as sns
from sklearn.impute import SimpleImputer

df = sns.load_dataset('titanic').dropna(subset=['embarked'])
df = df[['age', 'fare', 'embarked', 'sex', 'survived']].dropna()

df['sex'] = df['sex'].map({'male': 0, 'female': 1})
df = pd.get_dummies(df, columns=['embarked'], drop_first=True)

X = df.drop('survived', axis=1)
y = df['survived']

imp = SimpleImputer(strategy='mean')
X = imp.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

print("Titanic Accuracy:", model.score(X_test, y_test))
```

---

### **15. Feature Scaling (Standardization)**

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train_s, X_test_s, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2)
model_scaled = LogisticRegression(max_iter=1000)
model_scaled.fit(X_train_s, y_train)

print("Accuracy with Scaling:", model_scaled.score(X_test_s, y_test))
```

---

### **16. ROC-AUC Score**

```python
from sklearn.metrics import roc_auc_score

probs = model.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, probs))
```

---

### **17. Logistic Regression with C = 0.5**

```python
model = LogisticRegression(C=0.5, max_iter=1000)
model.fit(X_train, y_train)

print("Custom C Accuracy:", model.score(X_test, y_test))
```

---

### **18. Identify Important Features**

```python
import numpy as np

model.fit(X_train, y_train)
importance = model.coef_[0]
for i, coef in enumerate(importance):
    print(f"Feature {i}: Coefficient = {coef}")
```

---

### **19. Cohen’s Kappa Score**

```python
from sklearn.metrics import cohen_kappa_score

print("Cohen’s Kappa:", cohen_kappa_score(y_test, y_pred))
```

---

### **20. Precision-Recall Curve**

```python
from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay

probs = model.predict_proba(X_test)[:, 1]
precision, recall, _ = precision_recall_curve(y_test, probs)
disp = PrecisionRecallDisplay(precision=precision, recall=recall)
disp.plot()
```

---

### **21. Compare Solvers (liblinear, saga, lbfgs)**

```python
for solver in ['liblinear', 'saga', 'lbfgs']:
    model = LogisticRegression(solver=solver, max_iter=1000)
    model.fit(X_train, y_train)
    print(f"{solver} Accuracy:", model.score(X_test, y_test))
```

---

### **22. Matthews Correlation Coefficient (MCC)**

```python
from sklearn.metrics import matthews_corrcoef

print("Matthews Correlation Coefficient:", matthews_corrcoef(y_test, y_pred))
```

---

### **23. Compare Accuracy on Raw vs Standardized Data**

```python
# Raw
model_raw = LogisticRegression(max_iter=1000)
model_raw.fit(X_train, y_train)
print("Raw Accuracy:", model_raw.score(X_test, y_test))

# Standardized
model_scaled = LogisticRegression(max_iter=1000)
model_scaled.fit(X_train_s, y_train)
print("Standardized Accuracy:", model_scaled.score(X_test_s, y_test))
```
