🔴 Task 25-> Logistic Regression from scratch
Logistic Regression from scratch involves building a classification model by implementing the logistic function and optimization algorithm manually. You start with defining the sigmoid function to model probabilities, then use a cost function (like cross-entropy) to measure prediction error. Optimization, typically using gradient descent, adjusts model weights to minimize the cost function and improve accuracy. This process helps you understand the core mechanics of logistic regression beyond using built-in libraries.


Step 1: Define the Sigmoid Function

In [1]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))


Step 2: Implement the cost function (Cross-entrophy Loss)

In [2]:
def compute_cost(X, y, weights):
    m = len(y)
    h = sigmoid(X @ weights)
    cost = (-1/m) * (y.T @ np.log(h) + (1 - y).T @ np.log(1 - h))
    return cost

Step 3: Implement the Gradient Descent

In [3]:
def gradient_descent(X, y, weights, learning_rate, iterations):
    m = len(y)
    cost_history = np.zeros(iterations)

    for i in range(iterations):
        weights = weights - (learning_rate/m) * (X.T @ (sigmoid(X @ weights) - y))
        cost_history[i] = compute_cost(X, y, weights)

    return weights, cost_history

Step 4: Build Logistic Regression Model

In [4]:
class LogisticRegression:
    def __init__(self, learning_rate=0.01, iterations=1000):
        self.learning_rate = learning_rate
        self.iterations = iterations

    def fit(self, X, y):
        self.m, self.n = X.shape
        self.weights = np.zeros(self.n)
        self.cost_history = np.zeros(self.iterations)

        self.weights, self.cost_history = gradient_descent(X, y, self.weights, self.learning_rate, self.iterations)

    def predict_prob(self, X):
        return sigmoid(X @ self.weights)

    def predict(self, X):
        return self.predict_prob(X) >= 0.5


Step 5 : Train the model

In [5]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate some data (e.g., from sklearn's datasets)
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Add intercept term to X (bias)
X_train = np.hstack((np.ones((X_train.shape[0], 1)), X_train))
X_test = np.hstack((np.ones((X_test.shape[0], 1)), X_test))

# Train the model
model = LogisticRegression(learning_rate=0.01, iterations=1000)
model.fit(X_train, y_train)

# Predictions
predictions = model.predict(X_test)


Step 6 : Evaluate the Model

In [6]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy * 100:.2f}%")


Accuracy: 83.00%
