<a href="https://colab.research.google.com/github/boodie04/Supervised_ML/blob/main/LogisticRegressionScratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

imports

In [3]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score



class LogisticRegressionScratch

In [5]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

# Logistic Regression Class Implementation
class LogisticRegressionScratch:

    # Constructor to initialize hyperparameters and other variables
    def __init__(self, learning_rate=0.01, epochs=1000, optimizer='gd', batch_size=32):
        self.learning_rate = learning_rate  # Learning rate for gradient descent
        self.epochs = epochs  # Number of training iterations (epochs)
        self.optimizer = optimizer  # Type of optimizer: 'gd', 'sgd', 'mbgd'
        self.batch_size = batch_size  # Size of the mini-batches for MBGD
        self.losses = []  # To store the loss value at each epoch for monitoring training progress

    # Sigmoid function to transform the linear output to a probability
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))  # Sigmoid function, which outputs a value between 0 and 1

    # Initialize weights (W) to zeros and bias (b) to zero
    def initialize_weights(self, n_features):
        self.W = np.zeros(n_features)  # Initialize weights to zero, one for each feature
        self.b = 0.  # Initialize bias to zero

    # Compute the loss using binary cross-entropy (logistic loss function)
    def compute_loss(self, y, y_pred):
        # Clip predicted values to prevent log(0) error (because log(0) is undefined)
        epsilon = 1e-15  # Small value to avoid log(0)
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)  # Clip the predicted values
        # Binary cross-entropy loss formula
        return -np.mean(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))  # Average loss over all samples

    # Predict probabilities using the sigmoid function
    def predict_proba(self, X):
        return self.sigmoid(np.dot(X, self.W) + self.b)  # Linear combination of inputs + bias, passed through sigmoid

    # Predict the class labels (0 or 1) based on the probability threshold of 0.5
    def predict(self, X):
        return (self.predict_proba(X) >= 0.5).astype(int)  # If probability >= 0.5, classify as 1, else 0

    # Update weights using the gradient descent method
    def update_weights(self, X, y):
        y_pred = self.predict_proba(X)  # Get the predicted probabilities
        error = y_pred - y  # Calculate the error (difference between predicted and true labels)
        # Compute gradients (derivatives of the loss w.r.t. weights and bias)
        dw = np.dot(X.T, error) / len(y)  # Gradient w.r.t. weights (X.T is the transpose of X)
        db = np.sum(error) / len(y)  # Gradient w.r.t. bias
        # Update weights and bias using the gradients and learning rate
        self.W -= self.learning_rate * dw  # Update weights
        self.b -= self.learning_rate * db  # Update bias

    # Training function to fit the model to the data using different optimizers
    def fit(self, X, y):
        self.initialize_weights(X.shape[1])  # Initialize weights based on the number of features

        # Iterate through each epoch (training cycle)
        for epoch in range(self.epochs):
            # Gradient Descent (GD)
            if self.optimizer == 'gd':
                self.update_weights(X, y)

            # Stochastic Gradient Descent (SGD) - one sample at a time
            elif self.optimizer == 'sgd':
                for i in range(len(y)):
                    self.update_weights(X[i:i+1], y[i:i+1])  # Update weights for each individual sample

            # Mini-Batch Gradient Descent (MBGD) - using small batches of data
            elif self.optimizer == 'mbgd':
                indices = np.arange(len(y))  # Create an array of indices from 0 to len(y)-1
                np.random.shuffle(indices)  # Shuffle indices to introduce randomness
                # Iterate over the shuffled indices in mini-batches
                for start in range(0, len(y), self.batch_size):
                    end = start + self.batch_size
                    batch_indices = indices[start:end]  # Select the batch of data
                    self.update_weights(X[batch_indices], y[batch_indices])  # Update weights using the mini-batch

            # Compute the loss after each epoch and store it
            y_pred = self.predict_proba(X)  # Get predicted probabilities
            loss = self.compute_loss(y, y_pred)  # Compute the loss
            self.losses.append(loss)  # Append loss to the list for monitoring
            # Print the current epoch and loss for tracking
            print(f"Epoch {epoch + 1}/{self.epochs} - Loss: {loss:.4f}")

    # Evaluation function to calculate various performance metrics
    def evaluate(self, X, y_true):
        y_pred = self.predict(X)  # Get predicted class labels (0 or 1)
        print("\nConfusion Matrix:")
        print(confusion_matrix(y_true, y_pred))  # Print confusion matrix
        # Print other classification metrics
        print(f"Accuracy : {accuracy_score(y_true, y_pred):.4f}")
        print(f"Precision: {precision_score(y_true, y_pred):.4f}")
        print(f"Recall   : {recall_score(y_true, y_pred):.4f}")
        print(f"F1 Score : {f1_score(y_true, y_pred):.4f}")

# ========== Run Logistic Regression with Comparison ==========

# Generate synthetic binary classification data (X: features, y: labels)
X, y = make_classification(n_samples=1000, n_features=5, n_classes=2, random_state=42)

# Split data into training and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# List of optimizers to test
optimizers = ['gd', 'sgd', 'mbgd']

# Train and evaluate the model using each optimizer
for opt in optimizers:
    print(f"\n========== Training with {opt.upper()} ==========")
    model = LogisticRegressionScratch(learning_rate=0.01, epochs=50, optimizer=opt, batch_size=32)
    model.fit(X_train, y_train)  # Train the model
    model.evaluate(X_test, y_test)  # Evaluate the model on the test set



Epoch 1/50 - Loss: 0.6895
Epoch 2/50 - Loss: 0.6859
Epoch 3/50 - Loss: 0.6824
Epoch 4/50 - Loss: 0.6790
Epoch 5/50 - Loss: 0.6756
Epoch 6/50 - Loss: 0.6722
Epoch 7/50 - Loss: 0.6689
Epoch 8/50 - Loss: 0.6656
Epoch 9/50 - Loss: 0.6624
Epoch 10/50 - Loss: 0.6593
Epoch 11/50 - Loss: 0.6562
Epoch 12/50 - Loss: 0.6531
Epoch 13/50 - Loss: 0.6501
Epoch 14/50 - Loss: 0.6472
Epoch 15/50 - Loss: 0.6443
Epoch 16/50 - Loss: 0.6414
Epoch 17/50 - Loss: 0.6386
Epoch 18/50 - Loss: 0.6358
Epoch 19/50 - Loss: 0.6331
Epoch 20/50 - Loss: 0.6304
Epoch 21/50 - Loss: 0.6277
Epoch 22/50 - Loss: 0.6251
Epoch 23/50 - Loss: 0.6225
Epoch 24/50 - Loss: 0.6200
Epoch 25/50 - Loss: 0.6175
Epoch 26/50 - Loss: 0.6150
Epoch 27/50 - Loss: 0.6126
Epoch 28/50 - Loss: 0.6102
Epoch 29/50 - Loss: 0.6079
Epoch 30/50 - Loss: 0.6055
Epoch 31/50 - Loss: 0.6032
Epoch 32/50 - Loss: 0.6010
Epoch 33/50 - Loss: 0.5988
Epoch 34/50 - Loss: 0.5966
Epoch 35/50 - Loss: 0.5944
Epoch 36/50 - Loss: 0.5923
Epoch 37/50 - Loss: 0.5902
Epoch 38/

📌 The Binary Cross-Entropy Loss Function
This is the line we’re focusing on:

return -np.mean(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred))
This loss function calculates the error between the actual labels y and the predicted probabilities y_pred. We want to minimize this error during training.

🧠 How It Works:
Element-wise Operations: The terms y * np.log(y_pred) and (1 - y) * np.log(1 - y_pred) are computed for each data point. These are the individual losses for each example.
What do these parts do?

y * np.log(y_pred):
When y = 1, this term becomes the log loss for when the model predicts a high probability of 1.
If y_pred is close to 1, the log term becomes very small (low loss). If y_pred is close to 0, the log term becomes large (high loss).
(1 - y) * np.log(1 - y_pred):
When y = 0, this term becomes the log loss for when the model predicts a low probability of 1 (high probability for class 0).
If y_pred is close to 0, the log term is small (low loss). If y_pred is close to 1, the log term becomes large (high loss).
Summing All the Losses: After the element-wise operations, you get an array of individual losses for each data point.
Example:

y = [1, 0, 1]
y_pred = [0.9, 0.2, 0.8]

loss_1 = 1 * np.log(0.9)  # For the first example
loss_2 = 0 * np.log(0.2) + 1 * np.log(0.8)  # For the second example
loss_3 = 1 * np.log(0.8)  # For the third example
After calculating each loss, the result would look like this:

losses = [loss_1, loss_2, loss_3]
Computing the Average Loss:
np.mean() calculates the average of these individual losses across all samples.
This gives the final average loss for the entire dataset.
If we had, for example:

losses = [0.1054, 0.2231, 0.1405]  # Example losses
np.mean(losses)
This would compute:

0.1054
+
0.2231
+
0.1405
3
=
0.1563
‘
‘
‘
3
0.1054+0.2231+0.1405
​
 =0.1563‘‘‘
Negative Sign: The minus sign - in front of np.mean() is there because we want the loss to be positive. The log terms give negative values, so we negate it to make the loss positive.
