3) Implement a program to train a binary logistic regression model using mini-batch SGD. Use the logistic
regression model we derived in class, corresponding to Equation (4.90) from the textbook, and where
the feature transformation φ is the identity function.
The program should include the following hyperparameters:


*   Batch size
*   Fixed learning rate
*   Maximum number of iterations

In [1]:
import numpy as np

def sigmoid(z):
    """
    Compute the sigmoid function for input z.

    Parameters:
    z (array-like): Input array or scalar.

    Returns:
    array-like: The sigmoid function applied to each element of z.
    """

    z = np.clip(z, -500, 500)  # Clip to avoid overflow in exp
    return 1 / (1 + np.exp(-z))

def compute_gradients(X, y, t):
    """
    Compute the gradients for the mini-batch SGD.

    Parameters:
    X (array-like): Input feature matrix with shape (n_samples, n_features).
    y (array-like): Predicted label for each sample.
    t (array-like): True binary labels for each sample.

    Returns:
    array-like: The gradient of the loss with respect to weights.
    """

    m = X.shape[0]
    error = y - t  # Difference between predicted and actual labels
    gradients = np.dot(X.T, error) / m
    return gradients

def train_logistic_regression(X, t, learning_rate=0.01, batch_size=30, max_iters=1000):
    """
    Train a logistic regression model using mini-batch stochastic gradient descent.

    Parameters:
    X (array-like): Input feature matrix with shape (n_samples, n_features).
    t (array-like): Target binary labels (0 or 1).
    learning_rate (float): Learning rate for weight updates.
    batch_size (int): Number of samples per mini-batch.
    max_iters (int): Maximum number of training iterations.

    Returns:
    array-like: Trained weight vector.
    """

    n_samples, n_features = X.shape
    w = np.random.normal(loc=0.0, scale=1.0, size=n_features)  # Initialize random weights from Gaussian distribution

    for i in range(max_iters):
        # Shuffle the data at each iteration
        indices = np.random.permutation(n_samples)
        X_shuffled = X[indices]
        t_shuffled = t[indices]

        # Mini-batch SGD
        for start in range(0, n_samples, batch_size):
            end = min(start + batch_size, n_samples)
            X_batch = X_shuffled[start:end]
            t_batch = t_shuffled[start:end]

            # Compute the predicted probabilities
            y_batch = sigmoid(np.dot(X_batch, w))

            # Compute gradients and update weights
            gradients = compute_gradients(X_batch, y_batch, t_batch)
            w -= learning_rate * gradients

    return w

def predict_probability(X, w):
    """
    Predict probabilities for each sample in X using trained weights.

    Parameters:
    X (array-like): Input feature matrix with shape (n_samples, n_features).
    w (array-like): Trained weight vector.

    Returns:
    array-like: Predicted probabilities for each sample.
    """
    return sigmoid(np.dot(X, w))

def predict(X, w, threshold=0.5):
    """
    Predict binary class labels based on a probability threshold.

    Parameters:
    X (array-like): Input feature matrix.
    w (array-like): Trained weight vector.
    threshold (float): Threshold to classify as 1 if probability >= threshold.

    Returns:
    array-like: Predicted binary labels (0 or 1) for each sample.
    """

    return (predict_probability(X, w) >= threshold).astype(int)

# Testing the function
if __name__ == "__main__":
    # Generate some random binary classification data for testing
    np.random.seed(42)
    X = np.random.randn(1000, 3)
    true_w = np.array([1.5, -2.0, 1.0])
    t = (np.dot(X, true_w) + np.random.randn(1000) > 0).astype(int)

    # Hyperparameters
    learning_rate = 0.01
    batch_size = 32
    max_iters = 1000

    # Train model
    trained_w = train_logistic_regression(X, t, learning_rate, batch_size, max_iters)

    # Predict class probabilities and labels for new data
    X_new = np.random.randn(5, 3)
    probabilities = predict_probability(X_new, trained_w)
    predictions = predict(X_new, trained_w)

    print("Predicted probabilities:", probabilities)
    print("Predicted class labels:", predictions)


Predicted probabilities: [0.01470187 0.43718152 0.80765623 0.43988754 0.70155581]
Predicted class labels: [0 0 1 0 1]


4. In this problem, you will run a logistic regression model for classification on a breast cancer dataset.

(a) Download the Wisconsin Breast Cancer dataset from the UCI Machine Learning Repository 1 or
scikit-learn’s built-in datasets.

In [4]:
from sklearn.datasets import load_breast_cancer


# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data  # Features
y = data.target  # Target (0: malignant, 1: benign)

Features shape: (569, 30)
Labels shape: (569,)


(b) Split the dataset into train, validation, and test sets.

In [5]:
from sklearn.model_selection import train_test_split

# Split the dataset into 80% train+validation and 20% test
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Further split the train+validation set into 70% train and 10% validation
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.125, random_state=42)

print(f"Training set size: {X_train.shape}")
print(f"Validation set size: {X_val.shape}")
print(f"Test set size: {X_test.shape}")

Training set size: (398, 30)
Validation set size: (57, 30)
Test set size: (114, 30)


(c) Report the size of each class in your training (+ validation) set.

In [6]:
# Combine train and validation sets for class size reporting
y_train_val_combined = np.concatenate([y_train, y_val])

# Report the size of each class
unique, counts = np.unique(y_train_val_combined, return_counts=True)
class_distribution = dict(zip(unique, counts))

print("Class distribution in training + validation set:")
print(f"Malignant (0): {class_distribution[0]}")
print(f"Benign (1): {class_distribution[1]}")

Class distribution in training + validation set:
Malignant (0): 169
Benign (1): 286


(d) Train a binary logistic regression model using your implementation from problem 3. Initialize
the model weights randomly, sampling from a standard Gaussian distribution. Experiment with
different choices of fixed learning rate and batch size.

In [7]:
# Hyperparameters
learning_rate = 0.01
batch_size = 32
max_iters = 1000

# Train the model
trained_weights = train_logistic_regression(X_train, y_train, learning_rate, batch_size, max_iters)

trained_weights


array([ 10.26091871,  -9.76076151,  42.17274633,   6.7202822 ,
        -1.02326443,  -1.77819733,  -0.48800564,  -2.21783767,
         1.13994331,   1.35521215,  -0.36709308,   0.56698365,
        -0.78796832, -17.21830295,  -0.18170578,   0.66718687,
        -0.41033344,   1.15142781,  -0.59790463,   0.15251699,
        10.59269902, -21.31083455,  28.09641288, -12.03489545,
        -0.56938357,  -1.97846294,  -2.10656458,  -0.34556878,
        -0.76861342,  -0.44749083])

(e) Use the trained model to report the performance of the model on the test set. For evaluation
metrics, use accuracy, precision, recall, and F1-score.

In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Predict on the test set
y_test_pred = predict(X_test, trained_weights)

# Evaluate performance
accuracy = accuracy_score(y_test, y_test_pred)
precision = precision_score(y_test, y_test_pred)
recall = recall_score(y_test, y_test_pred)
f1 = f1_score(y_test, y_test_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-score: {f1}")

Accuracy: 0.9473684210526315
Precision: 0.922077922077922
Recall: 1.0
F1-score: 0.9594594594594594


(f) Summarize your findings.

The model correctly classified about 94.7% of the test samples. Of all the instances the model predicted as positive, 92.2% were actually positive. This is a good indication of the model’s ability to avoid false positives. The model identified all actual positive cases correctly.The F1-score balances precision and recall, showing a high score of 95.9%, which indicates the model has a good trade-off between these measures.

