# Logistic Regression: Cost Function and Gradient Implementation

In this coding exercise, you will be implementing the cost function and gradient function for linear regression. The cost function measures the error between the predicted values and the actual values, while the gradient function calculates the derivatives of the cost function with respect to the model parameters. You will use a synthetic dataset to test your code.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [None]:
# Let's simulate some data
np.random.seed(0)

# Generate two clouds of points from normal distributions
n_samples = 1000

# Generate points for the first group
mean1 = [1, -1]
cov1 = [[1, 0], [0, 1]]
cloud1 = np.random.multivariate_normal(mean1, cov1, n_samples)

# Generate points for the second group
mean2 = [-1, 1]
cov2 = [[1, 0], [0, 1]]
cloud2 = np.random.multivariate_normal(mean2, cov2, n_samples)

# Combine the two groups to create the feature matrix X
X = np.vstack((cloud1, cloud2))

# Generate the target variable y
y = np.concatenate((np.zeros(n_samples), np.ones(n_samples)))

# TODO: Split the data into training and testing sets
test_size = 0.2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=0)

# Reshape the target variables. What is the -1 used for?
y_train = y_train.reshape((-1, 1))
y_test = y_test.reshape((-1, 1))

# Print the shapes of the training and testing sets
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print()
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)

In [None]:
X_train

In [None]:
# Visualize the dataset
plt.figure(figsize=(8, 4))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.title('Dataset Visualization')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.colorbar(ticks=[0, 1], label='Class')
plt.grid()

# TODO: Plot train data
plt.scatter(# TODO[:, 0], # TODO[:, 1], c=y_train, cmap='viridis', edgecolor='black', linewidth=1, marker='s', label='Train Data')

# TODO: Plot test data
plt.scatter(# TODO[:, 0], # TODO[:, 1], c=y_test, cmap='viridis', edgecolor='black', linewidth=1, marker='^', label='Test Data')

# Set legend
plt.legend()

# Show the plot
plt.show()

# Logistic Regression Formulation

Logistic regression is a statistical method for predicting binary classes. The output is a probability that the given input point belongs to a certain class.

## Hypothesis Function

The hypothesis function in logistic regression is defined as:

$$
h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}}
$$

Where:
- `h_\theta(x)` is the predicted probability that the output is 1.
- `x` is the feature vector.
- `\theta` is the parameter vector.
- `e` is the base of the natural logarithm.

This function is also known as the sigmoid function. It maps any real value into the range of 0 to 1, making it suitable for a probability estimate.

In [None]:
# TODO: Implement the predict function for logistic regression
def predict(X, theta):
    """
    Predict the target variable using the logistic regression model.

    Parameters:
    X (numpy.ndarray): Feature matrix of shape (n, p), where n is the number of samples and p is the number of features.
    theta (numpy.ndarray): Model parameters of shape (p, 1).

    Returns:
    probabilities (numpy.ndarray): Predicted probabilities of shape (n, 1).
    """

    # TODO: Define probabilitiesof the logistic regression
    probabilities = # TODO

    return probabilities

# Mathematical Formulation for Logistic Regression

## Cost Function

The cost function in logistic regression, also known as the logistic loss, is defined as:

$$
J(\theta) = -\frac{1}{n} \sum_{i=1}^{n} [y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)}))]
$$

Where:
- `n` is the number of training examples.
- `x^{(i)}` represents the feature vector of the `i`-th training example.
- `y^{(i)}` is the actual label of the `i`-th training example.
- `h_\theta(x)` is the hypothesis function for logistic regression, defined as \( h_\theta(x) = \frac{1}{1 + e^{-\theta^T x}} \).
- `\theta` represents the model parameters.

## Gradient Function

The gradient of the cost function for logistic regression is a vector where each element is the partial derivative of the cost function with respect to the corresponding parameter:

$$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{n} \sum_{i=1}^{n} \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}
$$

For vectorized implementation:

$$
\nabla_\theta J(\theta) = \frac{1}{n} X^T (h_\theta(X) - \mathbf{y})
$$

Where:
- `X` is the matrix of input features.
- `\mathbf{y}` is the vector of actual labels.
- `\nabla_\theta J(\theta)` is the gradient of the cost function with respect to `\theta`.


In [None]:
# TODO: Implement the cost function for logistic regression
def cost_function(X, y, theta):
    """
    Compute the cost function for logistic regression.

    Parameters:
    X (numpy.ndarray): Feature matrix of shape (n, p), where n is the number of samples and p is the number of features.
    y (numpy.ndarray): Target values of shape (n, 1).
    theta (numpy.ndarray): Model parameters of shape (p, 1).

    Returns:
    cost (float): Cost value corresponding to the logistic loss.
    """

    n = len(y)

    # TODO: Calculate probabilities
    probabilities = # TODO

    # TODO: Compute the cost function (- log-likelihood)
    cost = # TODO

    return cost


# TODO: Implement the gradient function for logistic regression
def gradient_function(X, y, theta):
    """
    Compute the gradient of the cost function for logistic regression.

    Parameters:
    X (numpy.ndarray): Feature matrix of shape (n, p), where n is the number of samples and p is the number of features.
    y (numpy.ndarray): Target values of shape (n, 1).
    theta (numpy.ndarray): Model parameters of shape (p, 1).

    Returns:
    gradient (numpy.ndarray): Gradient vector of shape (p, 1).
    """

    n = len(y)

    # TODO: Calculate probabilities
    probabilities = # TODO

    # TODO: Compute the gradient of the cost function
    gradient = # TODO

    return gradient


# TODO: Implement the train function to learn the weights of the logistic regression model using gradient descent
def train_model(X_train, y_train, learning_rate, num_iterations):
    """
    Train the logistic regression model using gradient descent optimization.

    Parameters:
    X_train (numpy.ndarray): Feature matrix of shape (n, p) for training data.
    y_train (numpy.ndarray): Target values of shape (n, 1) for training data.
    learning_rate (float): Learning rate for gradient descent.
    num_iterations (int): Number of iterations for training.

    Returns:
    theta (numpy.ndarray): Model parameters of shape (p, 1).
    costs_train (list): List of training costs at each iteration.
    """

    n, p = X_train.shape
    theta = np.zeros((p, 1))
    costs_train = []

    # TODO: Implement the optimization part
    for _ in range(num_iterations):
        gradient = # TODO
        theta -= # TODO

        cost_train = # TODO
        costs_train.append(cost_train[0, 0])

    return theta, costs_train

In [None]:
# Generate artificial data for demonstration
np.random.seed(0)

# TODO: Train the logistic regression model
learning_rate = # TODO
num_iterations = # TODO
theta_hat, costs_train = # TODO:

# TODO: Make predictions on the training data
probability_threshold = 1/2

probabilities_train = # TODO
probabilities_test = # TODO

y_train_pred = (# TODO).astype(int)
y_test_pred = (# TODO).astype(int)

In [None]:
# TODO: Plot the training costs
plt.plot(# TODO)
plt.xlabel('Iteration')
plt.ylabel('Cost')
plt.title('Training Cost over Iterations')z
plt.show()

In [None]:
# TODO: Define the precision function
def precision(y_true, y_pred):
    """
    Compute the precision score for binary classification.

    Parameters:
    y_true (numpy.ndarray): True target values of shape (n,).
    y_pred (numpy.ndarray): Predicted target values of shape (n,).

    Returns:
    precision (float): Precision score.
    """
    
    # TODO: Compute the number of true positives
    true_positives = # TODO

    # TODO: Compute the number of false positives
    false_positives = # TODO
    
    # TODO: Compute the precision
    precision = # TODO
    
    return precision

# TODO: Define the accuracy function
def accuracy(y_true, y_pred):
    """
    Compute the accuracy score for binary classification.

    Parameters:
    y_true (numpy.ndarray): True target values of shape (n,).
    y_pred (numpy.ndarray): Predicted target values of shape (n,).

    Returns:
    accuracy (float): Accuracy score.
    """
    
    # TODO: Compute the number of correct predictions
    correct_predictions = # TODO
    
    # TODO: Compute the number of total predictions
    total_predictions = # TODO
    
    # TODO: Compute the accuracy
    accuracy = # TODO
    
    return accuracy

In [None]:
# TODO: Compute precision for the training dataset
train_precision = # TODO

# TODO: Compute precision for the test dataset
test_precision = # TODO

# TODO: Compute accuracy for the training dataset
train_accuracy = # TODO

# TODO: Compute accuracy for the test dataset
test_accuracy = # TODO

# Print the results
print("Training Precision:", train_precision)
print("Training Accuracy:", train_accuracy)
print("Test Precision:", test_precision)
print("Test Accuracy:", test_accuracy)

# What are your conclusions?

In [None]:
def calculate_confusion_matrix(y_true, y_pred):
    """
    Calculate the confusion matrix for binary classification.

    Parameters:
    y_true (numpy.ndarray): True target values of shape (n,).
    y_pred (numpy.ndarray): Predicted target values of shape (n,).

    Returns:
    confusion_matrix (numpy.ndarray): Confusion matrix of shape (2, 2).
    """
    true_positive = np.sum(np.logical_and(y_true == 1, y_pred == 1))
    true_negative = np.sum(np.logical_and(y_true == 0, y_pred == 0))
    false_positive = np.sum(np.logical_and(y_true == 0, y_pred == 1))
    false_negative = np.sum(np.logical_and(y_true == 1, y_pred == 0))
    
    confusion_matrix = np.array([[true_negative, false_positive], [false_negative, true_positive]])
    return confusion_matrix


def plot_confusion_matrix(cm, classes, title):
    """
    Plot the confusion matrix.
    
    Parameters:
    cm (numpy.ndarray): Confusion matrix.
    classes (list): List of class labels.
    title (str): Title of the plot.
    """
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title(title)
    plt.colorbar(shrink=0.37)
    
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    
    thresh = cm.max() / 2
    
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, cm[i, j], ha="center", va="center", color="white" if cm[i, j] > thresh else "black")
    
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')


In [None]:
# Compute the confusion matrix for the training dataset
train_cm = calculate_confusion_matrix(y_train, y_train_pred)

# Compute the confusion matrix for the test dataset
test_cm = calculate_confusion_matrix(y_test, y_test_pred)

# Plot the confusion matrix for the training dataset
plt.subplot(1, 2, 1)
plot_confusion_matrix(train_cm, classes=['Class 0', 'Class 1'], title='Confusion Matrix - Training Dataset')

# Plot the confusion matrix for the test dataset
plt.subplot(1, 2, 2)
plot_confusion_matrix(test_cm, classes=['Class 0', 'Class 1'], title='Confusion Matrix - Test Dataset')

# Adjust the layout and display the plot
plt.tight_layout()
plt.show()

In [None]:
# Or Use Sklearn's implementation
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
# Compute the confusion matrix for the training dataset
train_cm = confusion_matrix(y_train, y_train_pred)

# Compute the confusion matrix for the test dataset
test_cm = confusion_matrix(y_test, y_test_pred)

# Plot the confusion matrix for the training dataset
print("Training Confusion Matrix")
ConfusionMatrixDisplay(train_cm, display_labels=['Class 0', 'Class 1']).plot()

# Plot the confusion matrix for the test dataset
print("Test Confusion Matrix")
ConfusionMatrixDisplay(test_cm, display_labels=['Class 0', 'Class 1']).plot()

# Logistic Regression:  Breast cancer case study

In this exercise, we will fit logistic regression using our implementation and compare it that provided by scikit-learn. We will use the Breast Cancer Wisconsin dataset, split it into training and testing sets, and then train and evaluate the models' performance using precision and accuracy.

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# Load the breast cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# TODO: Split the data into training and testing sets
X_train, X_test, y_train, y_test = # TODO

# TODO: Scale the features using StandardScaler
scaler = # TODO
X_train_scaled = # TODO
X_test_scaled = # TODO

# Reshape the target variables
y_train = y_train.reshape((-1, 1))
y_test = y_test.reshape((-1, 1))

In [None]:
# What is the dataset about? print data.DESCR

In [None]:
# TODO: Train the logistic regression using custom implementation
learning_rate = # TODO
num_iterations = # TODO
theta_hat, costs_train = # TODO

# TODO: Create a logistic regression object
lr_sklearn = # TODO

# TODO: Train the model on the scaled training data

# TODO: Compute the predictions for both models
y_pred_custom = (# TODO).astype(int)
y_pred_sklearn = # TODO

# TODO: Compute the precision and accuracy for both models
precision_custom = # TODO
accuracy_custom = # TODO

precision_sklearn = # TODO
accuracy_sklearn = # TODO

# Print the results
print("Custom Logistic Regression:")
print(f"Precision: {precision_custom}")
print(f"Accuracy: {accuracy_custom}")

print("\nScikit-learn Logistic Regression:")
print(f"Precision: {precision_sklearn}")
print(f"Accuracy: {accuracy_sklearn}")