# Logistic Regression 

### **Logistic Regression for Beginners**

Logistic regression is a **supervised learning algorithm** used for **binary classification problems** (where the target variable has only two classes, like 0 or 1, yes or no, true or false).

---

### **Why Logistic Regression?**
- Linear regression predicts continuous values (e.g., prices, weights).
- Logistic regression predicts probabilities that are **bounded between 0 and 1**, and converts these probabilities into classes (0 or 1).

---

### **Core Idea of Logistic Regression**
The goal is to model the probability \( P(Y = 1 | X) \), where:
- \( Y \) is the target variable (e.g., 1 for success, 0 for failure).
- \( X \) represents the input features.

The probability is modeled using the **sigmoid (logistic) function** applied to a linear combination of the input features.

---

### **1. Sigmoid Function:**
The sigmoid function maps any value to a range between 0 and 1. It is defined as:

```
P(Y = 1 | X) = 1 / (1 + e^(-z))
```

Where:
- `z = w0 + w1*x1 + w2*x2 + ... + wn*xn`
  - `w0`: Bias term.
  - `w1, w2, ..., wn`: Coefficients (weights) for the features.
  - `x1, x2, ..., xn`: Feature values of the data point.

**Explanation:**
- `z` is the linear equation (similar to linear regression).
- The sigmoid function transforms `z` into a probability between 0 and 1.

---

### **2. Decision Rule:**
The model predicts:

```
Y = 1 if P(Y = 1 | X) >= 0.5
Y = 0 if P(Y = 1 | X) < 0.5
```

---

### **3. Loss Function:**
Logistic regression uses the **log-loss (cross-entropy loss)** to optimize the weights:

```
c
```

Where:
- `N`: Number of data points.
- `yi`: Actual label of data point i (0 or 1).
- `yi_hat`: Predicted probability for data point i.

---

### **4. Optimization:**
To find the best weights (`w0, w1, ..., wn`), logistic regression uses optimization techniques like **Gradient Descent**.

---

### **5. Example (Intuition):**
Suppose we want to predict whether a student passes an exam (1 = pass, 0 = fail) based on the number of hours studied.

1. Fit the sigmoid function:

```
P(Pass | Hours Studied) = 1 / (1 + e^(-(w0 + w1 * Hours Studied)))
```

2. Use the decision rule to classify:
   - If the probability `P >= 0.5`, predict **Pass**.
   - Otherwise, predict **Fail**.

---

### **Advantages:**
- Simple and interpretable.
- Outputs probabilities, useful in many applications.
  
### **Disadvantages:**
- Assumes a linear relationship between features and log-odds.
- Struggles with complex decision boundaries.

---


## **Logistic Regression: How It Works Step by Step**

---

#### **1. Input Features (\(X\))**
- Logistic regression starts with the input data, which consists of features (predictors) organized as a matrix \(X\):
```
X = [x1, x2, ..., xn]
```
- Each \(x_i\) represents a feature.
- If there are \(m\) samples and \(n\) features, \(X\) is an \(m * n\) matrix.

---

#### **2. Model Parameters (\(w\) and \(b\))**
- The model learns the parameters:
```
w = [w1, w2, ..., wn]  # weights for each feature
b                     # bias or intercept
```

---

#### **3. Linear Combination**
- A weighted sum of the features is computed for each sample:
```
z = Xw + b
```

---

#### **4. Apply the Sigmoid Function**
- The sigmoid function maps the linear output (\(z\)) to a probability (\(\hat{y}\)) between 0 and 1:
$$
\hat{y} = \text{sigmoid}(z) = \frac{1}{1 + \exp(-z)}
$$

- $If (z) is large, (\hat{y} \to 1)$
- $If (z) is small, (\hat{y} \to 0)$

---

#### **5. Prediction**
- Predicted probabilities $(\hat{y})$ are interpreted as the likelihood of belonging to the positive class $(y = 1)$.
- To make binary predictions:
```
Predicted Class =
    1, if y_hat >= 0.5
    0, if y_hat < 0.5
```

---

#### **6. Compute the Loss (Cross-Entropy Loss)**
- Logistic regression minimizes the **cross-entropy loss**, which is defined as:
```
L = -(1/N) * SUM [ yi * log(y_hat_i) + (1 - yi) * log(1 - y_hat_i) ]
```
$$
L = -\frac{1}{N} \sum_{i=1}^N \left[ y_i \cdot \log(\hat{y}_i) + (1 - y_i) \cdot \log(1 - \hat{y}_i) \right]
$$

Where:
- `N`: Number of samples.  
- `yi`: True label (\(1\) for positive class, \(0\) for negative class).  
- `y_hat_i`: Predicted probability for the \(i\)-th sample.

---

#### **7. Update Parameters (Gradient Descent)**
- Parameters \(w\) and \(b\) are updated to minimize the loss using **gradient descent**:
```
w := w - alpha * (partial derivative of L w.r.t w)
b := b - alpha * (partial derivative of L w.r.t b)
```
$$
w := w - \alpha \cdot \frac{\partial L}{\partial w}
$$

$$
b := b - \alpha \cdot \frac{\partial L}{\partial b}
$$

Where:
- `alpha`: Learning rate.  
- Gradients `partial derivative of L w.r.t w` and `partial derivative of L w.r.t b` measure how the loss changes with respect to \(w\) and \(b\).

---

#### **8. Repeat Until Convergence**
- Steps 3–7 are repeated until:
  - The loss converges (stops decreasing significantly).  
  - The maximum number of iterations is reached.

---

### **Summary**
1. **Initialize parameters** (\(w\) and \(b\)).  
2. Compute the **linear combination**:
```
z = Xw + b
```
3. Apply the **sigmoid function**:
```
y_hat = 1 / (1 + exp(-z))
```
$$
\hat{y} = \frac{1}{1 + \exp(-z)}
$$

4. Compute the **cross-entropy loss**:
```
L = -(1/N) * SUM [ yi * log(y_hat_i) + (1 - yi) * log(1 - y_hat_i) ]
```
$$
L = -\frac{1}{N} \sum_{i=1}^N \left[ y_i \cdot \log(\hat{y}_i) + (1 - y_i) \cdot \log(1 - \hat{y}_i) \right]
$$

5. **Update parameters**:
```
w := w - alpha * (partial derivative of L w.r.t w)
b := b - alpha * (partial derivative of L w.r.t b)
```
$$
w := w - \alpha \cdot \frac{\partial L}{\partial w}
$$

$$
b := b - \alpha \cdot \frac{\partial L}{\partial b}
$$
6. **Repeat until the model converges**.

---

This ensures that the formulas are displayed in the same inline style you requested!

### **Question (Markdown Format)**

Suppose we have a logistic regression model that predicts whether a student passes an exam (1 = Pass, 0 = Fail) based on the number of hours studied. The sigmoid function for the model is given by:


P(Pass | Hours Studied) = 1 / (1 + e^(-(w0 + w1 * Hours Studied)))

Where:
- $( w_0 = -3 )$ (bias term)
- $( w_1 = 0.8 )$ (weight for hours studied)

1. Write a Python function to compute the probability of passing for a given number of hours studied.
2. Use the decision rule:
   - If $( P \geq 0.5 )$, predict "Pass".
   - Otherwise, predict "Fail".
3. Test your function with the following cases:
   - Hours Studied = 4
   - Hours Studied = 6

---

In [7]:
### **Answer (Python Code)**

import math

# Sigmoid function
def predict_pass(hours_studied, w0=-3, w1=0.8):
    # Compute the linear combination (z)
    z = w0 + w1 * hours_studied
    # Apply the sigmoid function
    probability = 1 / (1 + math.exp(-z))
    # Decision rule
    prediction = "Pass" if probability >= 0.5 else "Fail"
    return probability, prediction

# Test cases
hours_1 = 5
hours_2 = 6
hours_3 = 3

# Predictions
probability_1, prediction_1 = predict_pass(hours_1)
probability_2, prediction_2 = predict_pass(hours_2)
probability_3, prediction_3 = predict_pass(hours_3)

# Output results
print(f"Hours Studied: {hours_1} | Probability: {probability_1:.2f} | Prediction: {prediction_1}")
print(f"Hours Studied: {hours_2} | Probability: {probability_2:.2f} | Prediction: {prediction_2}")
print(f"Hours Studied: {hours_3} | Probability: {probability_3:.2f} | Prediction: {prediction_3}")

Hours Studied: 5 | Probability: 0.73 | Prediction: Pass
Hours Studied: 6 | Probability: 0.86 | Prediction: Pass
Hours Studied: 3 | Probability: 0.35 | Prediction: Fail


# Logistic Regression implementation 

## binary classification

In [10]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline

In [11]:
from sklearn.datasets import make_classification

## create the dataset 
X, y  = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.30, random_state=42)

In [13]:
## model trainging 
from sklearn.linear_model import LogisticRegression

logistic = LogisticRegression()

In [14]:
logistic.fit(X_train, y_train)

In [15]:
y_pred = logistic.predict(X_test)
print(y_pred)

[0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 0 0 0 1
 1 1 1 0 1 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0
 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 1
 1 1 1 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 0
 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 1 1 1 1 1 1
 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 0 0 0 0 1 0 1 0 1 1 0
 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1
 0 1 0 1 1 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 0 1 1
 1 1 1 0]


In [16]:
## Perfomance mettix 
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [17]:
score = accuracy_score(y_test, y_pred)
print(score)
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(classification_report(y_test, y_pred))

0.8466666666666667
[[118  17]
 [ 29 136]]
              precision    recall  f1-score   support

           0       0.80      0.87      0.84       135
           1       0.89      0.82      0.86       165

    accuracy                           0.85       300
   macro avg       0.85      0.85      0.85       300
weighted avg       0.85      0.85      0.85       300



# Logistic Regression from scratch

In [19]:
import numpy as np

class LogisticRegression:
    """
    Logistic Regression implementation from scratch using gradient descent.
    
    Attributes:
        learning_rate (float): The step size for gradient descent updates.
        n_iterations (int): The number of iterations to run gradient descent.
        weights (np.ndarray): The weights learned by the model.
        bias (float): The bias term learned by the model.
    """
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        """
        Initializes the Logistic Regression model.

        Parameters:
            learning_rate (float): The step size for gradient descent updates.
            n_iterations (int): The number of iterations to run gradient descent.
        """
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None

    def sigmoid(self, z):
        """
        Computes the sigmoid function.

        Parameters:
            z (np.ndarray): The input linear combination of features and weights.

        Returns:
            np.ndarray: The sigmoid output, values between 0 and 1.
        """
        return 1 / (1 + np.exp(-z))

    def fit(self, X, y):
        """
        Trains the logistic regression model using gradient descent.

        Parameters:
            X (np.ndarray): Input features of shape (n_samples, n_features).
            y (np.ndarray): Target labels of shape (n_samples,).
        """
        n_samples, n_features = X.shape

        # Initialize weights and bias to zero
        self.weights = np.zeros(n_features)
        self.bias = 0

        for i in range(self.n_iterations):
            # Compute the linear combination z = Xw + b
            z = np.dot(X, self.weights) + self.bias
            # Apply the sigmoid function to compute predictions
            y_pred = self.sigmoid(z)

            # Compute the gradients for weights and bias
            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))  # Gradient for weights
            db = (1 / n_samples) * np.sum(y_pred - y)         # Gradient for bias

            # Update weights and bias using gradient descent
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict_proba(self, X):

        """
        Predicts probabilities for the positive class (y = 1).

        Parameters:
            X (np.ndarray): Input features of shape (n_samples, n_features).

        Returns:
            np.ndarray: Predicted probabilities for each sample.
        """
        z = np.dot(X, self.weights) + self.bias
        return self.sigmoid(z)

    def predict(self, X):
        """
        Predicts binary class labels (0 or 1) for the input data.

        Parameters:
            X (np.ndarray): Input features of shape (n_samples, n_features).

        Returns:
            np.ndarray: Predicted class labels (0 or 1) for each sample.
        """
        probabilities = self.predict_proba(X)
        return np.where(probabilities >= 0.5, 1, 0)

# Example usage with comments:
if __name__ == "__main__":
    # Generate some synthetic binary classification data
    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score

    # Create synthetic dataset
    X, y = make_classification(
        n_samples=100, n_features=2, n_classes=2, n_informative=2, n_redundant=0, random_state=42
    )

    # Split the data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Initialize the logistic regression model
    model = LogisticRegression(learning_rate=0.1, n_iterations=1000)

    # Train the model
    model.fit(X_train, y_train)

    # Make predictions
    y_pred = model.predict(X_test)

    # Evaluate the model
    print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.95
