# <div align="center" style="color: brown"><strong>Logistic Regression: A Comprehensive Tutorial</strong></div>

## <div style="color: red"><strong>Part 1. Theoretical Background</strong></div>

### What is Logistic Regression?
Logistic Regression is a supervised machine learning algorithm used for classification tasks. Unlike linear regression, which predicts continuous values, logistic regression predicts the probability of a categorical outcome.

- **Binary Logistic Regression**: Used when the target variable has two classes (e.g., spam/not spam).
- **Multiclass Logistic Regression**: Used when the target variable has more than two classes (e.g., digit recognition).

### Mathematical Formulation
For binary classification, the model predicts the probability $p$ that $y=1$ given input $X$:
$p = \sigma(z) = \frac{1}{1 + e^{-z}}$
where $z = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n$

For multiclass, logistic regression can use either the one-vs-rest (OvR) or multinomial (softmax) approach.

### Loss Function
- **Binary cross-entropy (log-loss)** for binary classification
- **Multinomial cross-entropy** for multiclass

### Regularization
- **L1 (Lasso)**: Can shrink some coefficients to zero (feature selection)
- **L2 (Ridge)**: Shrinks coefficients but does not set them to zero
- Regularization helps prevent overfitting

### Feature Scaling
- Not strictly required, but helps with convergence and regularization
- StandardScaler or MinMaxScaler are commonly used

### Assumptions
- The outcome is categorical
- Observations are independent
- Little or no multicollinearity among predictors
- Linear relationship between features and log-odds

### Applications
- Medical diagnosis
- Email spam detection
- Credit scoring
- Image and text classification

### Limitations
- Assumes linearity in log-odds
- Not suitable for highly non-linear problems without feature engineering
- Sensitive to outliers


## <div style="color: red"><strong>Part 2. Binary Logistic Regression: Implementation and Evaluation</strong></div>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report, roc_curve, auc,
    precision_recall_curve, average_precision_score
)

# 1. Generate a synthetic binary classification dataset
X, y = make_classification(
    n_samples=300, n_features=3, n_redundant=0, n_clusters_per_class=1,
    flip_y=0.1, class_sep=1.5, random_state=42
)

# 2. Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# 3. Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 4. Train logistic regression with L2 regularization
model = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs')
model.fit(X_train_scaled, y_train)

# 5. Predictions
y_pred = model.predict(X_test_scaled)
y_proba = model.predict_proba(X_test_scaled)[:, 1]

# 6. Evaluation
print('Accuracy:', accuracy_score(y_test, y_pred))
print('Confusion Matrix:', confusion_matrix(y_test, y_pred))
print('Classification Report:', classification_report(y_test, y_pred))

# 7. ROC Curve
fpr, tpr, _ = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)
plt.figure(figsize=(7,5))
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True)
plt.show()

# 8. Precision-Recall Curve
precision, recall, _ = precision_recall_curve(y_test, y_proba)
ap = average_precision_score(y_test, y_proba)
plt.figure(figsize=(7,5))
plt.plot(recall, precision, label=f'AP = {ap:.2f}')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend()
plt.grid(True)
plt.show()


### Visualizing the Confusion Matrix


In [None]:
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()


### Interpreting Coefficients

The coefficients in logistic regression represent the change in the log-odds of the outcome for a one-unit increase in the predictor (after scaling).

- Positive coefficient: increases the log-odds (probability) of the positive class
- Negative coefficient: decreases the log-odds

Let's display the coefficients:

In [None]:
for i, coef in enumerate(model.coef_[0]):
    print(f'Feature {i+1}: {coef:.3f}')
print('Intercept:', model.intercept_[0])


## <div style="color: red"><strong>Part 3. Multiclass Logistic Regression</strong></div>

Logistic regression can be extended to multiclass problems using two main approaches:
- **One-vs-Rest (OvR)**: Fits one classifier per class
- **Multinomial (Softmax)**: Fits a single model with a softmax function

Let's demonstrate both on the Iris dataset.

In [None]:
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# One-vs-Rest (OvR)
ovr_model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=200)
ovr_model.fit(X_scaled, y)
y_pred_ovr = ovr_model.predict(X_scaled)
print('One-vs-Rest Classification Report:', classification_report(y, y_pred_ovr))
print('Confusion Matrix (OvR):', confusion_matrix(y, y_pred_ovr))

# Multinomial (Softmax)
multi_model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=200)
multi_model.fit(X_scaled, y)
y_pred_multi = multi_model.predict(X_scaled)
print('Multinomial Classification Report:', classification_report(y, y_pred_multi))
print('Confusion Matrix (Multinomial):', confusion_matrix(y, y_pred_multi))

## <div style="color: red"><strong>Part 4. Regularization and Hyperparameter Tuning</strong></div>

Regularization helps prevent overfitting. The strength of regularization is controlled by the parameter $C$ (inverse of regularization strength).

Let's use GridSearchCV to find the best $C$ for our binary example.

In [None]:
from sklearn.model_selection import GridSearchCV
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']  # liblinear supports both l1 and l2
}
grid = GridSearchCV(LogisticRegression(), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train_scaled, y_train)
print('Best parameters:', grid.best_params_)
print('Best cross-validated accuracy:', grid.best_score_)


## <div style="color: red"><strong>Part 5. Best Practices, Tips, and Summary</strong></div>

- Always scale features when using regularization.
- Use cross-validation to tune hyperparameters (C, penalty).
- For imbalanced data, consider using class_weight='balanced' or resampling.
- Check for multicollinearity among features.
- Interpret coefficients in the context of scaled features.
- Use ROC and precision-recall curves for model evaluation, especially with imbalanced data.
- Logistic regression is a strong baseline for many classification problems.

### Further Reading
- [scikit-learn LogisticRegression documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
- [Logistic Regression (Wikipedia)](https://en.wikipedia.org/wiki/Logistic_regression)
- [Interpreting Logistic Regression Coefficients](https://www.theanalysisfactor.com/interpret-coefficients-logistic-regression/)
