### Logistic Regression on Breast Cancer Data

This notebook uses the breast cancer dataset from scikit-learn to build a binary classifier with logistic regression implemented from scratch using Python loops instead of vectorized NumPy code. It covers data loading, basic feature standardization, manual implementation of cost, gradient, and prediction functions, gradient descent training, and evaluation of train/test accuracy as a first practical ML project.

### Load dataset (scikit learn)

In [12]:
import numpy as np
from sklearn.datasets import load_breast_cancer

In [13]:
cancer = load_breast_cancer()
x = cancer.data
y = cancer.target
print("shape of x:", x.shape)
print("shape of y:", y.shape)


shape of x: (569, 30)
shape of y: (569,)


### Standardization of data

In [14]:
# simple standardization: (x - mean) / std
x_mean = np.mean(x, axis=0)
x_std  = np.std(x, axis=0)
x_norm = (x - x_mean) / x_std

print("x_norm mean (approx):", np.mean(x_norm, axis=0)[:5])
print("x_norm std (approx):", np.std(x_norm, axis=0)[:5])


x_norm mean (approx): [-3.16286735e-15 -6.53060890e-15 -7.07889127e-16 -8.79983452e-16
  6.13217737e-15]
x_norm std (approx): [1. 1. 1. 1. 1.]


In [16]:
m = x_norm.shape[0]
train_size = int(0.8 * m)

x_train = x_norm[:train_size]
y_train = y[:train_size]

x_test = x_norm[train_size:]
y_test = y[train_size:]

print("Train shape:", x_train.shape, y_train.shape)
print("Test shape:", x_test.shape, y_test.shape)


Train shape: (455, 30) (455,)
Test shape: (114, 30) (114,)


### Implementing Logistic Regression - loop based

In [28]:
def compute_gradient(x, y, w, b):
    m, n = x.shape
    #print("DEBUG: m =", m, "len(y) =", len(y))
    dj_dw = np.zeros_like(w)
    dj_db = 0.0
    for i in range(m):
        z_wb = np.dot(x[i], w) + b
        f_wb = sigmoid(z_wb)
        err = f_wb - y[i]
        dj_db += err
        for j in range(n):
            dj_dw[j] += err * x[i][j]
    dj_db /= m
    dj_dw /= m
    return dj_db, dj_dw


In [31]:
dj_db, dj_dw = compute_gradient(x_train, y_train, w, b)


### Training on gradient descent

In [32]:
print(x_train.shape, y_train.shape)

# training loop
m, n = x_train.shape
w = np.zeros(n)
b = 0.0

alpha = 0.1
num_iters = 1000

for it in range(num_iters):
    dj_db, dj_dw = compute_gradient(x_train, y_train, w, b)
    w = w - alpha * dj_dw
    b = b - alpha * dj_db
    if it % 100 == 0:
        cost = compute_cost(x_train, y_train, w, b)
        print(f"Iter {it:4d}: cost {cost:.4f}")


(455, 30) (455,)
Iter    0: cost 0.5174
Iter  100: cost 0.0985
Iter  200: cost 0.0813
Iter  300: cost 0.0739
Iter  400: cost 0.0696
Iter  500: cost 0.0667
Iter  600: cost 0.0646
Iter  700: cost 0.0629
Iter  800: cost 0.0615
Iter  900: cost 0.0604


### Evaluating Accuracy

In [33]:
p_train = predict(x_train, w, b)
train_acc = np.mean(p_train == y_train) * 100
print(f"Train accuracy: {train_acc:.2f}%")


Train accuracy: 98.24%


### The End