## Logistic Regression Model

Logistic regression keeps the **linear score**:

$$
z = w \cdot x + b
$$

but applies the sigmoid function:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

The final model is:

$$
f_{w,b}(x) = \sigma(w \cdot x + b)
$$

This maps any real-valued input into the interval \((0,1)\),
allowing the output to be interpreted as a probability.

In [147]:
import numpy as np

Why the Sigmoid Function?

It maps any real-valued input to the interval $ (0, 1) $

This allows the model output to be interpreted as a probability.

In [148]:
def sigmoid(z):
    return 1.0 / (1.0 + np.exp(-z))

In [149]:
def zscore_normalize_features(X):
    mu = np.mean(X, axis=0)
    sigma = np.std(X, axis=0)
    
    sigma[sigma == 0] = 1.0
    
    X_norm = (X - mu) / sigma
    return X_norm

#### Decision Boundary

Predictions are made using the rule:

$$
\hat{y} =
\begin{cases}
1 & \text{if } f_{w,b}(x) \ge 0.5 \\
0 & \text{otherwise}
\end{cases}
$$

Since:

$$
\sigma(z) \ge 0.5 \iff z \ge 0
$$

the decision boundary is defined by:

$$
w \cdot x + b = 0
$$

#### Cross-Entropy Loss

Logistic regression models a Bernoulli distribution for the target variable.

The cost function is:

$$
J(w,b) =
-\frac{1}{m}
\sum_{i=1}^{m}
\left[
y^{(i)} \log(\hat{y}^{(i)}) +
(1 - y^{(i)}) \log(1 - \hat{y}^{(i)})
\right]
$$

This function measures how badly the modelâ€™s predicted probabilities disagree with the actual outcomes, heavily penalizing confident wrong predictions.

In [160]:
def compute_cost(X, y, w, b):
    m = X.shape[0]
    y_pred = np.zeros(m)

    for i in range(m):
        y_pred[i] = sigmoid(np.dot(X[i], w) + b)

    # avoid log(0)
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)

    total_cost = np.sum(-y*np.log(y_pred) - (1-y)*np.log(1-y_pred)) / m
    return total_cost


## Gradients

The gradients of the loss function are:

$$
\frac{\partial J}{\partial w}
=
\frac{1}{m} X^T (\hat{y} - y)
$$

$$
\frac{\partial J}{\partial b}
=
\frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})
$$

These expressions closely resemble linear regression, with a different
interpretation of the error term.


In [161]:
def compute_gradient(X, y, w, b): 
    m = X.shape[0]
    y_hat = sigmoid(X @ w + b)
    error = y_hat - y
    dj_dw = (1/m) * (X.T @ error)
    dj_db = (1/m) * np.sum(error)
    return dj_db, dj_dw

In [152]:
def gradient_descent(X, y, w_in, b_in, compute_gradient, alpha, num_iters): 
    
    for i in range(num_iters):
        dj_db, dj_dw = compute_gradient(X, y, w_in, b_in)   

        w_in = w_in - alpha * dj_dw               
        b_in = b_in - alpha * dj_db              

        
    return w_in, b_in

In [153]:
def predict(X, w, b): 
    
    m, n = X.shape   
    p = np.zeros(m)
   
    for i in range(m):   
        z_wb = 0
        for j in range(n): 
            z_wb += X[i, j] * w[j]
    
        z_wb += b
        
        f_wb = sigmoid(z_wb)

        p[i] = 0 if f_wb < 0.5 else 1

    return p

In [154]:
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X = data.data
y = data.target.astype(float)  

print("X shape:", X.shape, "y shape:", y.shape, "positive rate:", y.mean())

X_train = zscore_normalize_features(X)


X shape: (569, 30) y shape: (569,) positive rate: 0.6274165202108963


In [155]:
n = X_train.shape[1]
w0 = np.zeros(n, dtype=float)
b0 = 0.0

w_fit, b_fit = gradient_descent(
    X_train, y, w0, b0, compute_gradient, alpha=0.1, num_iters=2000)

In [156]:
yhat_train = predict(X_train, w_fit, b_fit)

print("Train accuracy:", (yhat_train == y).mean())


Train accuracy: 0.9876977152899824


In [157]:
from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(y, yhat_train)
print("Confusion matrix:\n", cm)
print("\nClassification report:\n", classification_report(y, yhat_train, digits=4))

Confusion matrix:
 [[207   5]
 [  2 355]]

Classification report:
               precision    recall  f1-score   support

         0.0     0.9904    0.9764    0.9834       212
         1.0     0.9861    0.9944    0.9902       357

    accuracy                         0.9877       569
   macro avg     0.9883    0.9854    0.9868       569
weighted avg     0.9877    0.9877    0.9877       569



Precision: correctness of positive predictions.  
Recall: coverage of actual positive cases.  
F1-score: balance between precision and recall.  
Support: number of true samples per class.

In [158]:
from sklearn.linear_model import LogisticRegression

sk = LogisticRegression(max_iter=2000)
sk.fit(X_train, y)

print("sklearn Train accuracy:", sk.score(X_train, y))

sklearn Train accuracy: 0.9876977152899824
