# **Logistic Regression**

Logistic regression is a statistical analysis method used to predict a data value based on prior observations of a data set. It is used for Classification problems.

Logistic regression is a method for predicting binary outcomes (0 or 1). It estimates the probability of the occurrence of an event using a logistic function, which outputs values between 0 and 1. It is a **classification algorithm**, as opposed to regression, which predicts continuous values.

### Why "Logistic" Regression?

The term "logistic" comes from the logistic function, also known as the sigmoid function, which maps any input value into a range between 0 and 1.

---

## Logistic Function (Sigmoid)

The logistic function, also called the sigmoid function, is a mathematical function that maps ***real-valued input to a value between 0 and 1***. It is defined as:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

Where:
- \( $\sigma(z)$ \) is the sigmoid function.
- \( z \) is the linear combination of input features: \( z = w_0 + w_1x_1 + w_2x_2 + $\cdots$ + w_nx_n \).

![Sigmoid Function.png](../images/sigmoid_function.png)

### Properties of Sigmoid:

- **Output Range**: The output is always between 0 and 1, making it suitable for probability estimation.
- **S-shaped curve**: The function has a characteristic "S" shape, where it sharply transitions from 0 to 1 as the input moves from negative to positive values.

The sigmoid function can be interpreted as the probability of a certain class (e.g., class 1) given the input features.



##
---

## Model Representation

The logistic regression model can be represented as:

$$
P(y = 1|X) = \sigma(w_0 + w_1x_1 + w_2x_2 + \cdots + w_nx_n)
$$

Where:
- \( P(y=1|X) \) is the probability that the output \( y \) is 1, given the input features \( X \).
- \( $\sigma$ \) is the sigmoid function.
- \( w_0 \) is the bias term (intercept).
- \( w_1, w_2, $\ldots$, w_n \) are the weights for the corresponding features.

The output of logistic regression is a probability, and typically, a threshold of 0.5 is used to decide the predicted class:
- If \( P(y = 1|X) $\geq$ 0.5 \), predict class 1.
- If \( P(y = 1|X) < 0.5 \), predict class 0.



##
---

## Cost Function

The goal of logistic regression is to find the parameters \( w_0, w_1, $\dots$, w_n \) that minimize the cost function. For logistic regression, we use the **log-loss** (or **binary cross-entropy**) cost function, defined as:

$$
J(w) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_{\theta}(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_{\theta}(x^{(i)})) \right]
$$

Where:
- \( m \) is the number of training examples.
- \( $y^{(i)}$ \) is the true label for the \( i \)-th example.
- \( $h_{\theta}(x^{(i)})$ = $\sigma(w_0 + w_1x_1 + \cdots + w_nx_n)$ \) is the predicted probability for the \( i \)-th example.
- \( $\log$ \) is the natural logarithm.

### Cost Function Intuition:

- If the model predicts a value close to the true label, the cost will be small.
- If the model predicts a value far from the true label, the cost will be large.
- The goal is to minimize this cost using optimization techniques like **Gradient Descent**.

##
---

## Training the Model

To train a logistic regression model, we need to adjust the model parameters (weights) to minimize the cost function. The most common approach is **Gradient Descent**.

### Gradient Descent:

- **Gradient Descent** is an iterative optimization algorithm used to minimize the cost function by updating the weights in the opposite direction of the gradient.

The update rule is:

$$
w_j = w_j - \alpha \cdot \frac{\partial J(w)}{\partial w_j}
$$

Where:
- \( $\alpha$ \) is the learning rate.
- \( $\frac{\partial J(w)}{\partial w_j} $\) is the partial derivative of the cost function with respect to the parameter \( $w_j$ \).

The weights are updated in each iteration until convergence, i.e., when the cost function stops decreasing significantly.



##
---

## Regularization

Regularization is a technique used to prevent overfitting, especially when the model has many parameters or when there is multicollinearity among the features. Logistic regression uses two types of regularization:

### 1. **L1 Regularization (Lasso)**

The L1 regularization term adds the absolute value of the coefficients to the cost function:

$$
J(w) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_{\theta}(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_{\theta}(x^{(i)})) \right] + \lambda \sum_{j=1}^{n} |w_j|
$$

Where:
- \( $\lambda$ \) is the regularization parameter that controls the strength of regularization.

### 2. **L2 Regularization (Ridge)**

L2 regularization adds the squared value of the coefficients to the cost function:

$$
J(w) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_{\theta}(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_{\theta}(x^{(i)})) \right] + \lambda \sum_{j=1}^{n} w_j^2
$$

Where:
- \( $\lambda$ \) is the regularization parameter.



### Python Implementation

- #### Problem Statement
Prevent overfitting on a dataset with many features using L1 (Lasso) and L2 (Ridge) regularization.

- #### Implementation
```python
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# L2 Regularization (default)
model_l2 = LogisticRegression(penalty="l2", solver="lbfgs", max_iter=200)
model_l2.fit(X_train, y_train)
print("L2 Regularization Accuracy:", model_l2.score(X_test, y_test))

# L1 Regularization
model_l1 = LogisticRegression(penalty="l1", solver="liblinear", max_iter=200)
model_l1.fit(X_train, y_train)
print("L1 Regularization Accuracy:", model_l1.score(X_test, y_test))
```

### Key Concepts in Regularization
- **L1 Regularization**: Encourages sparsity by adding the absolute value of coefficients to the loss function.
- **L2 Regularization**: Penalizes large coefficients by adding their squared values to the loss function.

##
---

## Evaluation Metrics for Logistic Regression

When evaluating a logistic regression model, [this](../Concepts/07%20-%20models_evaluation_methods.ipynb) metrics are typically used:

### Implementation Example

```python
from sklearn.metrics import confusion_matrix, roc_curve, auc
import matplotlib.pyplot as plt

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)[:, 1])
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend(loc="lower right")
plt.show()
```

### Key Metrics

- **Confusion Matrix**: Shows TP, FP, TN, FN counts.
- **ROC Curve**: Plots TPR against FPR at various thresholds.
- **AUC (Area Under Curve)**: Summarizes the ROC curve into a single value.

##
---

## Binary Logistic Regression

- #### Problem Statement
We aim to predict whether a student will pass or fail based on their study hours.

- #### Dataset
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Sample dataset
data = {
    "Hours_Studied": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    "Passed": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Features and target
X = df[["Hours_Studied"]]
y = df["Passed"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
```

### Key Concepts in Binary Logistic Regression
- Use of a sigmoid function to model probabilities.
- Threshold-based decision-making (e.g., \( P(y=1|X) \geq 0.5 \)).


[Implement Binary Logistic Regression Model](04%20-%20Implement%20Logistic%20Regression%20Model.ipynb)

##
---

## Multiclass Logistic Regression

Logistic regression can be extended to handle multiple classes using one of two common strategies:
- **One-vs-All (OvA)**: Train one binary classifier per class, where each classifier distinguishes between a specific class and the rest.
- **Softmax Regression**: This generalizes logistic regression to multiclass problems, where the probability of each class is modeled using a softmax function.

The softmax function is defined as:

$$
P(y = k | X) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}
$$

Where \( z_k \) is the score for class \( k \), and \( K \) is the total number of classes.



### Python Implementation

- #### Problem Statement
Predict the species of flowers based on their features using the Iris dataset.

- #### Implementation
```python
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Multiclass Logistic Regression model
model = LogisticRegression(multi_class="multinomial", solver="lbfgs", max_iter=200)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate
print("Classification Report:\n", classification_report(y_test, y_pred))
```



### Key Concepts in Multiclass Logistic Regression
- **One-vs-All (OvA)**: Train one binary classifier per class.
- **Softmax Regression**: Generalization of logistic regression for multiple classes.

[Implement Multiclass Logistic Regression Model](04%20-%20Implement%20Logistic%20Regression%20Model.ipynb)

##
---

## Applications of Logistic Regression

- **Medical Diagnosis**: Predicting whether a patient has a certain disease based on medical data.
- **Email Spam Detection**: Classifying emails as spam or not spam.
- **Credit Scoring**: Predicting whether a customer will default on a loan.
- **Customer Churn Prediction**: Predicting whether a customer will cancel a subscription.

##
---

## Advantages and Disadvantages

### Advantages:
- Simple to implement and understand.
- Outputs probabilities, making it interpretable.
- Works well for binary classification problems.
- Can be regularized to avoid overfitting.

### Disadvantages:
- Assumes a linear relationship between input features and log-odds.
- Can be underpowered for more complex relationships.
- Sensitive to outliers in the data.

##
---

## Conclusion


Logistic regression is a powerful and interpretable tool for binary classification. By understanding the logistic function, cost function, and evaluation metrics, one can effectively apply logistic regression to real-world classification tasks. Regularization and extension to multiclass problems make logistic regression versatile for many applications.