## 1. Understanding Classification Task

### Classification Problem Formulation:
Classification is a supervised learning problem, where we are given a training dataset of the form:

$$D = \{(X_i, Y_i) | x_i \in \mathbb{R}^d, y_i \in \{1, 2, ..., K\}\}_{i=1}^{n}$$

where:
- $x_i = [x_{i1}, x_{i2}, ..., x_{id}]$: is the feature vector for the $i^{th}$ sample.
- $y_i$: is the target class label for the $i^{th}$ sample, belonging to one of K distinct classes.
- $d$: is the number of features in each input vector.
- $K$: is the number of classes in the classification problem.

The goal is to learn a hypothesis function that maps input features to discrete class labels:

$$f : \mathbb{R}^d \rightarrow \{1, 2, ..., K\}, f \in M(\text{Model Class})$$

## 2. Necessary Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## 3. Implementation of Sigmoid Regression from Scratch

### 3.1 Building and Testing Helper Functions

#### Task 1: Implementing Sigmoid Function

The sigmoid function is given by:

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

where: $x \in \mathbb{R}$

In [None]:
def logistic_function(x):
    """
    Computes the logistic function applied to any value of x.
    Arguments:
    x: scalar or numpy array of any size.
    Returns:
    y: logistic function applied to x.
    """
    import numpy as np
    y = 1/(1+np.exp(-x))
    return y

#### Test Case for Logistic Function

In [None]:
import numpy as np
def test_logistic_function():
    """
    Test cases for the logistic_function.
    """
    # Test with scalar input
    x_scalar = 0
    expected_output_scalar = round(1 / (1 + np.exp(0)), 3) # Expected output: 0.5
    assert round(logistic_function(x_scalar), 3) == expected_output_scalar, "Test failed for scalar input"
    
    # Test with positive scalar input
    x_pos = 2
    expected_output_pos = round(1 / (1 + np.exp(-2)), 3) # Expected output: ~0.881
    assert round(logistic_function(x_pos), 3) == expected_output_pos, "Test failed for positive scalar input"
    
    # Test with negative scalar input
    x_neg = -3
    expected_output_neg = round(1 / (1 + np.exp(3)), 3) # Expected output: ~0.047
    assert round(logistic_function(x_neg), 3) == expected_output_neg, "Test failed for negative scalar input"
    
    # Test with numpy array input
    x_array = np.array([0, 2, -3])
    expected_output_array = np.array([0.5, 0.881, 0.047]) # Adjusted expected values rounded to 3 decimals
    # Use np.round to round the array element-wise and compare
    assert np.all(np.round(logistic_function(x_array), 3) == expected_output_array), "Test failed for numpy array input"
    print("All tests passed!")

# Run the test case
test_logistic_function()

#### Task 2: Implementing Log Loss Function

For Sigmoid Regression and Binary Classification we use log-loss given by:

$$L(y, \hat{y}) = -y \log(\hat{y}) - (1 - y) \log(1 - \hat{y})$$

where:
- $y \in \{0, 1\}$: True target value
- $\hat{y} = P(y = 1|x)$: Predicted target value

In [None]:
def log_loss(y_true, y_pred):
    """
    Computes log loss for true target value y ={0 or 1} and predicted target value y' inbetween {0-1}.
    Arguments:
    y_true (scalar): true target value {0 or 1}.
    y_pred (scalar): predicted taget value {0-1}.
    Returns:
    loss (float): loss/error value
    """
    import numpy as np
    # Ensure y_pred is clipped to avoid log(0)
    y_pred = np.clip(y_pred, 1e-10, 1 - 1e-10)
    loss = -y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred)
    return loss

#### Verifying the Intuition

The basic intuition behind the log-loss function is: **the loss value should be minimum when our predicted probability values are closer to true target value.**

In [None]:
# Test function:
y_true, y_pred = 0, 0.1
print(f'log loss({y_true}, {y_pred}) ==> {log_loss(y_true, y_pred)}')
print("+++++++++++++--------------------------++++++++++++++++++++++++")
y_true, y_pred = 1, 0.9
print(f'log loss({y_true}, {y_pred}) ==> {log_loss(y_true, y_pred)}')

#### Test Case for Log Loss Function

In [None]:
def test_log_loss():
    """
    Test cases for the log_loss function.
    """
    import numpy as np
    # Test case 1: Perfect prediction (y_true = 1, y_pred = 1)
    y_true = 1
    y_pred = 1
    expected_loss = 0.0 # Log loss is 0 for perfect prediction
    assert np.isclose(log_loss(y_true, y_pred), expected_loss), "Test failed for perfect prediction (y_true=1, y_pred=1)"
    
    # Test case 2: Perfect prediction (y_true = 0, y_pred = 0)
    y_true = 0
    y_pred = 0
    expected_loss = 0.0 # Log loss is 0 for perfect prediction
    assert np.isclose(log_loss(y_true, y_pred), expected_loss), "Test failed for perfect prediction (y_true=0, y_pred=0)"
    
    # Test case 3: Incorrect prediction (y_true = 1, y_pred = 0)
    y_true = 1
    y_pred = 0
    try:
        log_loss(y_true, y_pred) # This should raise an error due to log(0)
    except ValueError:
        pass # Test passed if ValueError is raised for log(0)
    
    # Test case 4: Incorrect prediction (y_true = 0, y_pred = 1)
    y_true = 0
    y_pred = 1
    try:
        log_loss(y_true, y_pred) # This should raise an error due to log(0)
    except ValueError:
        pass # Test passed if ValueError is raised for log(0)
    
    # Test case 5: Partially correct prediction
    y_true = 1
    y_pred = 0.8
    expected_loss = -(1 * np.log(0.8)) - (0 * np.log(0.2)) # ~0.2231
    assert np.isclose(log_loss(y_true, y_pred), expected_loss, atol=1e-6), "Test failed for partially correct prediction (y_true=1, y_pred=0.8)"
    
    y_true = 0
    y_pred = 0.2
    expected_loss = -(0 * np.log(0.2)) - (1 * np.log(0.8)) # ~0.2231
    assert np.isclose(log_loss(y_true, y_pred), expected_loss, atol=1e-6), "Test failed for partially correct prediction (y_true=0, y_pred=0.2)"
    
    print("All tests passed!")

# Run the test case
test_log_loss()

#### Task 3: Implementing Cost Function

The cost function is the average of loss function values calculated for each observation/data-point.

$$\text{cost}(y, \hat{y}) = \frac{1}{n} \cdot \sum_{i=1}^{n} L(y_i, \hat{y}_i)$$

where:
- $n$ â†’ number of observations/data-points.

In [None]:
def cost_function(y_true, y_pred):
    """
    Computes log loss for inputs true value (0 or 1) and predicted value (between 0 and 1)
    Args:
    y_true (array_like, shape (n,)): array of true values (0 or 1)
    y_pred (array_like, shape (n,)): array of predicted values (probability of y_pred being 1)
    Returns:
    cost (float): nonnegative cost corresponding to y_true and y_pred
    """
    assert len(y_true) == len(y_pred), "Length of true values and length of predicted values do not match"
    n = len(y_true)
    loss_vec = log_loss(y_true, y_pred)
    cost = np.sum(loss_vec) / n
    return cost

#### Testing the Cost Function

In [None]:
import numpy as np
def test_cost_function():
    # Test case 1: Simple example with known expected cost
    y_true = np.array([1, 0, 1])
    y_pred = np.array([0.9, 0.1, 0.8])
    # Expected output: Manually calculate cost for these values
    # log_loss(y_true, y_pred) for each example
    expected_cost = (-(1 * np.log(0.9)) - (1 - 1) * np.log(1 - 0.9) +
                    -(0 * np.log(0.1)) - (1 - 0) * np.log(1 - 0.1) +
                    -(1 * np.log(0.8)) - (1 - 1) * np.log(1 - 0.8)) / 3
    
    # Call the cost_function to get the result
    result = cost_function(y_true, y_pred)
    # Assert that the result is close to the expected cost with a tolerance of 1e-6
    assert np.isclose(result, expected_cost, atol=1e-6), f"Test failed: {result} != {expected_cost}"
    print("Test passed for simple case!")

# Run the test case
test_cost_function()

#### Task 4: Cost Function for Sigmoid Regression

We are estimating the following function:

$$\hat{y} = \sigma(x \cdot w^T + b) = \frac{1}{1 + e^{-(x \cdot w^T + b)}}$$

The cost function $L(w, b)$ computes the average of the log loss for each training example:

$$L(w, b) := C(y, \hat{y} | X, w, b) = \frac{1}{n} \sum_{i=1}^{n} L\left(y_i, \frac{1}{1 + e^{-(x_i \cdot w + b)}}\right)$$

In [None]:
# Function to compute cost function in terms of model parameters - using vectorization
def costfunction_logreg(X, y, w, b):
    """
    Computes the cost function, given data and model parameters.
    Args:
    X (ndarray, shape (m,n)): data on features, m observations with n features.
    y (array_like, shape (m,)): array of true values of target (0 or 1).
    w (array_like, shape (n,)): weight parameters of the model.
    b (float): bias parameter of the model.
    Returns:
    cost (float): nonnegative cost corresponding to y and y_pred.
    """
    n, d = X.shape
    assert len(y) == n, "Number of feature observations and number of target observations do not match."
    assert len(w) == d, "Number of features and number of weight parameters do not match."
    # Compute z using np.dot
    z = np.dot(X, w) + b # Matrix-vector multiplication and adding bias
    # Compute predictions using logistic function (sigmoid)
    y_pred = logistic_function(z)
    # Compute the cost using the cost function
    cost = cost_function(y, y_pred)
    return cost

#### Testing the Function

In [None]:
X, y, w, b = np.array([[10, 20], [-10, 10]]), np.array([1, 0]), np.array([0.5, 1.5]), 1
print(f"cost for logistic regression(X = {X}, y = {y}, w = {w}, b = {b}) = {costfunction_logreg(X, y, w, b)}")

#### Task 5: Implementing Gradient Descent for Training Sigmoid Regression

##### Computing the Gradient

The gradients of the binary cross-entropy log loss are:

$$\frac{\partial \text{Log Loss}}{\partial w} = -\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i) x_i$$

$$\frac{\partial \text{Log Loss}}{\partial b} = -\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)$$

The weights $w$ and bias $b$ are updated as:

$$w \leftarrow w - \alpha \frac{\partial \text{Log Loss}}{\partial w}, \quad b \leftarrow b - \alpha \frac{\partial \text{Log Loss}}{\partial b}$$

In [None]:
def compute_gradient(X, y, w, b):
    """
    Computes gradients of the cost function with respect to model parameters.
    Args:
    X (ndarray, shape (n,d)): Input data, n observations with d features
    y (array_like, shape (n,)): True labels (0 or 1)
    w (array_like, shape (d,)): Weight parameters of the model
    b (float): Bias parameter of the model
    Returns:
    grad_w (array_like, shape (d,)): Gradients of the cost function with respect to the weight parameters
    grad_b (float): Gradient of the cost function with respect to the bias parameter
    """
    n, d = X.shape # X has shape (n, d)
    assert len(y) == n, f"Expected y to have {n} elements, but got {len(y)}"
    assert len(w) == d, f"Expected w to have {d} elements, but got {len(w)}"
    # Compute predictions using logistic function (sigmoid)
    y_pred = 1 / (1 + np.exp(-(np.dot(X, w) + b)))
    # Compute gradients
    grad_w = -(1 / n) * np.dot(X.T, (y - y_pred)) # Gradient w.r.t weights, shape (d,)
    grad_b = -(1 / n) * np.sum(y - y_pred) # Gradient w.r.t bias, scalar
    return grad_w, grad_b

#### Simple Test for Compute Gradient Function

In [None]:
# Simple test case
X = np.array([[10, 20], [-10, 10]]) # shape (2, 2)
y = np.array([1, 0]) # shape (2,)
w = np.array([0.5, 1.5]) # shape (2,)
b = 1 # scalar
# Assertion tests
try:
    grad_w, grad_b = compute_gradient(X, y, w, b)
    print("Gradients computed successfully.")
    print(f"grad_w: {grad_w}")
    print(f"grad_b: {grad_b}")
except AssertionError as e:
    print(f"Assertion error: {e}")

#### Task 6: Gradient Descent for Sigmoid Regression

In [None]:
def gradient_descent(X, y, w, b, alpha, n_iter, show_cost=False, show_params=True):
    """
    Implements batch gradient descent to optimize logistic regression parameters.
    Args:
    X (ndarray, shape (n,d)): Data on features, n observations with d features
    y (array_like, shape (n,)): True values of target (0 or 1)
    w (array_like, shape (d,)): Initial weight parameters
    b (float): Initial bias parameter
    alpha (float): Learning rate
    n_iter (int): Number of iterations
    show_cost (bool): If True, displays cost every 100 iterations
    show_params (bool): If True, displays parameters every 100 iterations
    Returns:
    w (array_like, shape (d,)): Optimized weight parameters
    b (float): Optimized bias parameter
    cost_history (list): List of cost values over iterations
    params_history (list): List of parameters (w, b) over iterations
    """
    n, d = X.shape
    assert len(y) == n, "Number of observations in X and y do not match"
    assert len(w) == d, "Number of features in X and w do not match"
    cost_history = []
    params_history = []
    for i in range(n_iter):
        # Compute gradients
        grad_w, grad_b = compute_gradient(X, y, w, b)
        # Update weights and bias
        w -= alpha * grad_w
        b -= alpha * grad_b
        # Compute cost
        cost = costfunction_logreg(X, y, w, b)
        # Store cost and parameters
        cost_history.append(cost)
        params_history.append((w.copy(), b))
        # Optionally print cost and parameters
        if show_cost and (i % 100 == 0 or i == n_iter - 1):
            print(f"Iteration {i}: Cost = {cost:.6f}")
        if show_params and (i % 100 == 0 or i == n_iter - 1):
            print(f"Iteration {i}: w = {w}, b = {b:.6f}")
    return w, b, cost_history, params_history

#### Testing the Gradient Descent Function

In [None]:
# Test the gradient_descent function with sample data
X = np.array([[0.1, 0.2], [-0.1, 0.1]]) # Shape (2, 2)
y = np.array([1, 0]) # Shape (2,)
w = np.zeros(X.shape[1]) # Shape (2,) - same as number of features
b = 0.0 # Scalar
alpha = 0.1 # Learning rate
n_iter = 100000 # Number of iterations
# Perform gradient descent
w_out, b_out, cost_history, params_history = gradient_descent(X, y, w, b, alpha, n_iter, show_cost=True, show_params=False)
# Print final parameters and cost
print("\nFinal parameters:")
print(f"w: {w_out}, b: {b_out}")
print(f"Final cost: {cost_history[-1]:.6f}")

#### Simple Assertion Test for Gradient Descent

In [None]:
# Simple assertion test for gradient_descent
def test_gradient_descent():
    X = np.array([[0.1, 0.2], [-0.1, 0.1]]) # Shape (2, 2)
    y = np.array([1, 0]) # Shape (2,)
    w = np.zeros(X.shape[1]) # Shape (2,)
    b = 0.0 # Scalar
    alpha = 0.1 # Learning rate
    n_iter = 100 # Number of iterations
    # Run gradient descent
    w_out, b_out, cost_history, _ = gradient_descent(X, y, w, b, alpha, n_iter, show_cost=False, show_params=False)
    # Assertions
    assert len(cost_history) == n_iter, "Cost history length does not match the number of iterations"
    assert w_out.shape == w.shape, "Shape of output weights does not match the initial weights"
    assert isinstance(b_out, float), "Bias output is not a float"
    assert cost_history[-1] < cost_history[0], "Cost did not decrease over iterations"
    print("All tests passed!")

# Run the test
test_gradient_descent()

#### Visualizing Convergence of Cost During Gradient Descent

This plot tracks how the cost decreases over iterations, providing insight into the convergence of the gradient descent algorithm.

In [None]:
# Plotting cost over iteration
plt.figure(figsize = (9, 6))
plt.plot(cost_history)
plt.xlabel("Iteration", fontsize = 14)
plt.ylabel("Cost", fontsize = 14)
plt.title("Cost vs Iteration", fontsize = 14)
plt.tight_layout()
plt.show()

#### Task 7: Decision/Prediction Function for Binary Classification

We perform two tasks:
1. **Prediction**: Using the trained weights and bias, calculate the probability for each sample.
2. **Decision Boundary**: Convert predicted probability to binary class label using threshold $\tau = 0.5$:

$$\hat{y} = \begin{cases} 1 & \text{if } y_{prob} \geq \tau \\ 0 & \text{if } y_{prob} < \tau \end{cases}$$

In [None]:
import numpy as np
def prediction(X, w, b, threshold=0.5):
    """
    Predicts binary outcomes for given input features based on logistic regression parameters.
    Arguments:
    X (ndarray, shape (n,d)): Array of test independent variables (features) with n samples and d features.
    w (ndarray, shape (d,)): Array of weights learned via gradient descent.
    b (float): Bias learned via gradient descent.
    threshold (float, optional): Classification threshold for predicting class labels. Default is 0.5.
    Returns:
    y_pred (ndarray, shape (n,)): Array of predicted dependent variable (binary class labels: 0 or 1).
    """
    # Compute the predicted probabilities using the logistic function
    z = np.dot(X, w) + b
    y_test_prob = 1/(1 + np.exp(-z)) # z = wx + b
    # Classify based on the threshold
    y_pred = np.where(y_test_prob >= threshold, 1, 0)
    return y_pred

#### Test Case for Prediction Function

In [None]:
def test_prediction():
    X_test = np.array([[0.5, 1.0], [1.5, -0.5], [-0.5, -1.0]]) # Shape (3, 2)
    w_test = np.array([1.0, -1.0]) # Shape (2,)
    b_test = 0.0 # Scalar bias
    threshold = 0.5 # Default threshold
    # Updated expected output
    expected_output = np.array([0, 1, 1])
    # Call the prediction function
    y_pred = prediction(X_test, w_test, b_test, threshold)
    # Assert that the output matches the expected output
    assert np.array_equal(y_pred, expected_output), f"Expected {expected_output}, but got {y_pred}"
    print("Test passed!")

test_prediction()

#### Task 8: Evaluating Classifier

This function computes confusion matrix, precision, recall, and F1-score from scratch based on predictions and ground truth.

In [None]:
def evaluate_classification(y_true, y_pred):
    """
    Computes the confusion matrix, precision, recall, and F1-score for binary classification.
    Arguments:
    y_true (ndarray, shape (n,)): Ground truth binary labels (0 or 1).
    y_pred (ndarray, shape (n,)): Predicted binary labels (0 or 1).
    Returns:
    metrics (dict): A dictionary containing confusion matrix, precision, recall, and F1-score.
    """
    # Initialize confusion matrix components
    TP = np.sum((y_true == 1) & (y_pred == 1)) # True Positives
    TN = np.sum((y_true == 0) & (y_pred == 0)) # True Negatives
    FP = np.sum((y_true == 0) & (y_pred == 1)) # False Positives
    FN = np.sum((y_true == 1) & (y_pred == 0)) # False Negatives
    # Confusion matrix
    confusion_matrix = np.array([[TN, FP],
                                 [FN, TP]])
    # Precision, recall, and F1-score
    precision = TP / (TP + FP) if (TP + FP) > 0.0 else 0.0
    recall = TP / (TP + FN) if (TP + FN) > 0.0 else 0.0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0.0 else 0.0
    # Metrics dictionary
    metrics = {
        "confusion_matrix": confusion_matrix,
        "precision": precision,
        "recall": recall,
        "f1_score": f1_score
    }
    return metrics

#### Testing Evaluation Function

In [None]:
import numpy as np

y_true = np.array([1, 0, 1, 0, 1])
y_pred = np.array([1, 0, 0, 0, 1])

metrics = evaluate_classification(y_true, y_pred)

print("Confusion Matrix:\n", metrics["confusion_matrix"])
print("Precision:", metrics["precision"])
print("Recall:", metrics["recall"])
print("F1-score:", metrics["f1_score"])

## 4. Putting Helper Functions to Action - Sigmoid Regression on Real Dataset

### Dataset Used: "pima-indians-diabetes.data.csv"

### 4.1 Loading and Basic Data Operations

In [None]:
# Load dataset
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
data_pima_diabetes = pd.read_csv(url, names=columns)

### 4.2 Basic Data Cleaning

In [None]:
# Data cleaning
columns_to_clean = ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']
data_pima_diabetes[columns_to_clean] = data_pima_diabetes[columns_to_clean].replace(0, np.nan)
data_pima_diabetes.fillna(data_pima_diabetes.median(), inplace=True)
data_pima_diabetes.info()

### 4.3 Summary Statistics

In [None]:
data_pima_diabetes.describe()

### 4.4 Train-Test Split and Standard Scaling

In [None]:
# Train-test split
X = data_pima_diabetes.drop(columns=['Outcome']).values
y = data_pima_diabetes['Outcome'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### 4.5 Training the Sigmoid Regression Model

In [None]:
# Initialize parameters
w = np.zeros(X_train_scaled.shape[1])
b = 0.0
alpha = 0.1
n_iter = 1000

# Train model
print("\nTraining Logistic Regression Model:")
w, b, cost_history, params_history = gradient_descent(X_train_scaled, y_train, w, b, alpha, n_iter, show_cost=True, show_params=False)

# Plot cost history
plt.figure(figsize=(9, 6))
plt.plot(cost_history)
plt.xlabel("Iteration", fontsize=14)
plt.ylabel("Cost", fontsize=14)
plt.title("Cost vs Iteration", fontsize=14)
plt.tight_layout()
plt.show()

### 4.6 Checking for Overfitting or Underfitting

In [None]:
# Test model
y_train_pred = prediction(X_train_scaled, w, b)
y_test_pred = prediction(X_test_scaled, w, b)

# Evaluate train and test performance
train_cost = costfunction_logreg(X_train_scaled, y_train, w, b)
test_cost = costfunction_logreg(X_test_scaled, y_test, w, b)
print(f"\nTrain Loss (Cost): {train_cost:.4f}")
print(f"Test Loss (Cost): {test_cost:.4f}")

### 4.7 Model Performance Evaluation

In [None]:
# Accuracy on test data
test_accuracy = np.mean(y_test_pred == y_test) * 100
print(f"\nTest Accuracy: {test_accuracy:.2f}%")

# Evaluation
metrics = evaluate_classification(y_test, y_test_pred)
confusion_matrix = metrics["confusion_matrix"]
precision = metrics["precision"]
recall = metrics["recall"]
f1_score = metrics["f1_score"]

print(f"\nConfusion Matrix:\n{confusion_matrix}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1_score:.2f}")

### 4.8 Visualizing the Confusion Matrix (Optional)

In [None]:
# Visualizing Confusion Matrix
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(confusion_matrix, cmap='Blues')
ax.grid(False)
ax.xaxis.set(ticks=(0, 1), ticklabels=('Predicted 0s', 'Predicted 1s'))
ax.yaxis.set(ticks=(0, 1), ticklabels=('Actual 0s', 'Actual 1s'))
ax.set_ylim(1.5, -0.5)
for i in range(2):
    for j in range(2):
        ax.text(j, i, confusion_matrix[i, j], ha='center', va='center', color='white', fontsize=20)
plt.title('Confusion Matrix', fontsize=16)
plt.tight_layout()
plt.show()

## Conclusion

In this worksheet, we successfully implemented Sigmoid/Logistic Regression from scratch using only NumPy and core Python libraries. We:

1. Built helper functions including sigmoid function, log loss, cost function
2. Implemented gradient computation and gradient descent optimization
3. Created prediction and evaluation functions
4. Applied our implementation to the Pima Indians Diabetes dataset
5. Evaluated model performance using various metrics

This hands-on implementation helps understand the mathematics and mechanics behind logistic regression, which is a fundamental algorithm in machine learning.