# Logistic Regression for Binary Classification problem

**Overview**
- This project implements a Logistic Regression algorithm from scratch to classify a dataset with two classes.
- The algorithm uses gradient descent to learn optimal weights and then evaluates its performance using accuracy and a classification report.

In [None]:
import math
import numpy as np
from sklearn.metrics import accuracy_score, classification_report
np.random.seed(42)

### Generate Synthetic Data

In [None]:
def generate_binary_synthetic_data(nFeatures=4, nSamples=40, test_ratio=0.2):
    K = 2 # number of classes
    # Generate random feature values between 0 and 1
    X = np.round(np.random.rand(nSamples, nFeatures),3)
    # print(X)

    # Add some noise to X
    N = nSamples
    mid_N = int(N/2)
    # The first half of the samples are scaled by 2, and the second half by 5, to introduce a distinction between two classes
    X[:mid_N, :] = X[:mid_N, :] * 2
    X[mid_N:N, :] = X[mid_N:N, :] * 5

    test_set_size = int(nSamples * test_ratio / 2)
    X_test_class1 = X[:test_set_size, :]
    X_test_class2 = X[mid_N : mid_N+test_set_size, :]

    # np.delete(array, obj, axis) function removes elements from an array along a specified axis.
    X = np.delete(X, np.s_[0:test_set_size], 0)
    # Remove from mid_N - test_set_size as after removing first 4 (test_set_size) rows, the index will be shifted
    X_train = np.delete(X, np.s_[mid_N -test_set_size : mid_N], 0)

    X_test = np.concatenate([X_test_class1, X_test_class2])

    N_train = X_train.shape[0]
    N_test = X_test.shape[0]

    # Output variable is R
    R_train = np.repeat([1,0], N_train/K, axis=0)
    R_test = np.repeat([1,0], N_test/K, axis = 0)

    return X_train, R_train, X_test, R_test

In [None]:
X_train, y_train, X_test, y_test = generate_binary_synthetic_data(nFeatures=4, nSamples=40, test_ratio=0.2)
# Add y_test as last column to X_test and print the Test Set with its labels
print(np.c_[X_test,y_test])

[[0.75  1.902 1.464 1.198 1.   ]
 [0.312 0.312 0.116 1.732 1.   ]
 [1.202 1.416 0.042 1.94  1.   ]
 [1.664 0.424 0.364 0.366 1.   ]
 [4.315 3.115 1.655 0.32  0.   ]
 [1.555 1.625 3.65  3.19  0.   ]
 [4.435 2.36  0.6   3.565 0.   ]
 [3.805 2.805 3.855 2.47  0.   ]]


### Logistic Regression Algorithm

Implement logistic regression using Gradient Descent

In [None]:
def log_regression(x,r,d,step_size=0.1,iterations=10):
    x0 = np.repeat(1, len(x))
    # A column of ones is added to x to account for the bias term, w_0
    new_x = np.c_[x0, x]
    # print(new_x)
    w = []
    d = 4
    for j in range(0,d+1):
        # Weights are randomly initialized close to zero.
        w.append(np.random.uniform(-0.01, 0.01))
    # print(w)

    for test in range(0, iterations):
        deriv_w = []
        for j in range(0, d+1):
            deriv_w.append(0)
        for i in range(0, len(new_x)):
            o = 0
            for j in range(0, d+1):
                o += w[j]*new_x[i][j]
            # Compute the logistic function
            y = 1/(1+math.exp(-o))
            for j in range(0, d+1):
                # Calculate the gradient (or derivative) of the loss function
                deriv_w[j] += (r[i]-y) * new_x[i][j]
        for j in range(d+1):
            # Update the weights
            w[j] += step_size * deriv_w[j]
    # Return the final learned weights after training.
    return(np.round(w,3))

### Training the Logistic Regression model

In [None]:
# The model is trained on the generated dataset with 4 features, 5000 iterations and learning rate (step_size) = 0.01.
n_features = 4
X_train, y_train, X_test, y_test = generate_binary_synthetic_data(nFeatures=n_features, nSamples=40, test_ratio=0.2)
w_array = log_regression(X_train, y_train, n_features, step_size=0.01,iterations=5000)
print(w_array)

[14.996 -4.524 -3.352  1.558 -3.278]


### Make Predictions
- Above logistic regression model learns a set of weights (w_array) that define a decision boundary. These weights can be used to predict class labels for new samples

In [None]:
def predict(X, w):
    x0 = np.ones((X.shape[0], 1))  # Add bias term
    X_new = np.c_[x0, X]

    # Compute probability using sigmoid function
    probs = 1 / (1 + np.exp(-np.dot(X_new, w)))

    # Convert probabilities to binary labels (threshold at 0.5)
    predictions = (probs >= 0.5).astype(int)
    return predictions

In [None]:
# Get predictions on test set
y_pred = predict(X_test, w_array)
print(np.c_[y_test, y_pred])

[[1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [0 0]
 [0 0]
 [0 0]]


In [None]:
# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

Accuracy: 1.0000


The model achieved 100% accuracy on the test set, which means all test samples were correctly classified

In [None]:
# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         4
           1       1.00      1.00      1.00         4

    accuracy                           1.00         8
   macro avg       1.00      1.00      1.00         8
weighted avg       1.00      1.00      1.00         8



- The model perfectly classified all test samples.
- Logistic Regression algorithm has effectively learned the decision boundary between the two classes