# <div align="center" style="color: brown"><strong>Logistic Regression Tutorial</strong></div>

## <div style="color: red"><strong>Part 1. Introduction to Logistic Regression</strong></div>

**Logistic Regression** is a fundamental classification algorithm in machine learning. Unlike linear regression, which predicts continuous values, logistic regression is used to predict the probability of a binary outcome (e.g., yes/no, 0/1, true/false).

### Key Concepts
- **Binary Classification**: Logistic regression is mainly used for problems where the target variable has two possible outcomes.
- **Sigmoid Function**: The core of logistic regression is the sigmoid (logistic) function, which maps any real-valued number into the (0, 1) interval, representing probability.
- **Decision Boundary**: The model predicts class 1 if the probability is above a threshold (commonly 0.5), otherwise class 0.

### Mathematical Formulation
The logistic regression model predicts the probability $p$ that the target variable $y$ is 1 given input features $X$:

$p = \sigma(z) = \frac{1}{1 + e^{-z}}$
where
$z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$

- $\sigma(z)$ is the sigmoid function.
- $\beta_0$ is the intercept (bias).
- $\beta_1, ..., \beta_n$ are the coefficients (weights).

### Loss Function
Logistic regression uses the **log-loss** (binary cross-entropy) as its cost function:

$\text{Loss} = -\frac{1}{m} \sum_{i=1}^m [y_i \log(p_i) + (1 - y_i) \log(1 - p_i)]$

### Assumptions
- The outcome is binary.
- Observations are independent.
- There is little or no multicollinearity among the independent variables.
- The relationship between the independent variables and the log-odds is linear.

### Applications
- Medical diagnosis (disease vs. no disease)
- Email spam detection
- Credit scoring (default vs. no default)
- Marketing (buy vs. not buy)


## <div style="color: red"><strong>Part 2. Implementation in Python</strong></div>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report, roc_curve, auc
)

# 1. Generate a synthetic binary classification dataset
X, y = make_classification(
    n_samples=200, n_features=2, n_redundant=0, n_clusters_per_class=1,
    flip_y=0.1, class_sep=1.5, random_state=42
)

# 2. Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# 3. Create and train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# 4. Make predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

# 5. Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:", confusion_matrix(y_test, y_pred))
print("Classification Report:", classification_report(y_test, y_pred))

# 6. Plot the decision boundary
plt.figure(figsize=(8, 6))
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='bwr', edgecolor='k', alpha=0.7, label='Test data')
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200), np.linspace(y_min, y_max, 200))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.2, cmap='bwr')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.legend()
plt.show()

# 7. Plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()


## <div style="color: red"><strong>Part 3. Interpretation and Best Practices</strong></div>

- **Interpretation of Coefficients**: The coefficients represent the change in the log-odds of the outcome for a one-unit increase in the predictor.
- **Feature Scaling**: Not strictly required, but can help with convergence.
- **Regularization**: Use L1 (Lasso) or L2 (Ridge) regularization to prevent overfitting, especially with many features.
- **Multiclass Extension**: Logistic regression can be extended to multiclass problems using the "one-vs-rest" or "softmax" (multinomial) approach.

### Advantages
- Simple and fast to train
- Outputs well-calibrated probabilities
- Interpretable coefficients

### Limitations
- Assumes linear relationship between features and log-odds
- Not suitable for non-linear problems without feature engineering
- Sensitive to outliers
