<a href="https://colab.research.google.com/github/Apoorvmittal11/23-CS-072-ML-LAB-EXPERIMENT/blob/main/23-CS-072%20EXPERIMENT9/Experiment_9_Implementing_a_Neural_Network_and_Backpropagation_from_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Experiment 9 : Implementing a Neural Network and Backpropagation from Scratch**

**1. Learning Objectives**

* Upon successful completion of this assignment, students will be able to:
Understand and articulate the mathematical foundations of a feedforward neural network.

* Implement the core components of an ANN, including parameter initialization, activation
functions (ReLU, Sigmoid), and their derivatives.

* Implement the Forward Propagation algorithm to generate predictions from network
inputs.

* Implement the Backpropagation algorithm from scratch to calculate gradients for all
network parameters.

* Implement various loss functions (Binary Cross-Entropy, Mean Squared Error) and their
derivatives.

* Implement the Gradient Descent algorithm to update network weights and biases.

* Build a complete, modular MyANNClassifier class using only NumPy.

* Train the "from scratch" classifier on a real-world dataset and evaluate its performance.

* Compare the custom-built classifier's performance and behavior against
sklearn.neural_network.MLPClassifier .

* Analyze the impact of different loss functions and network architectures on model
training and final performance.

**2. Introduction**
This assignment is designed to demystify the "black box" of neural networks. You will move
beyond high-level libraries and implement the core engine of a simple, fully-connected neural
network using only NumPy. Your primary task is to build a classifier by implementing the two most critical components: Forward Propagation (for making predictions) and
Backpropagation (for learning from errors).

You will use the well-known Wisconsin Breast Cancer dataset for a binary classification
task. After building your network, you will experiment with different loss functions (BCE vs.
MSE) and architectures. Finally, you will compare your "from scratch" model to scikit-learn's
MLPClassifier to benchmark your work and appreciate the optimizations provided by modern
libraries.

**3. Prerequisites**

Ensure your Python environment has the following libraries installed:

In [2]:
pip install numpy pandas scikit-learn matplotlib seaborn



**4. Experiment Tasks**

You are required to build a complete neural network pipeline. Follow the structured tasks
below.

**Task 1: Data Loading and Preprocessing (15 Marks)**
1. Load Data: Load the Breast Cancer Wisconsin dataset directly from scikit-learn.

2. Inspect Data: Print the shapes of X and y and the feature names to understand the data.This is a binary classification problem.

3. Create Hold-Out Set: Perform a single 70/30 split on the data.

* X_train , y_train (70% of the data)

* X_val , y_val (30% of the data)

* Use train_test_split with random_state=42 for reproducibility.

4. Standardize Features: This is critical for neural networks.

* Fit a StandardScaler from sklearn.preprocessing on X_train only.

* Transform both X_train and X_val using the fitted scaler.

* X_train_scaled will be used for training, and X_val_scaled for all final evaluations.

In [4]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1.Load Data
data = load_breast_cancer()
X = data.data          # features
y = data.target        # labels

# 2.Inspect Data
print("Feature matrix shape (X):", X.shape)
print("Target vector shape (y):", y.shape)
print("\nFeature names:\n", data.feature_names)
print("\nTarget names:", data.target_names)

# 3.Create Hold-Out Set (70% Train, 30% Validation)
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.3, random_state=42
)

print("\nTraining set size:", X_train.shape)
print("Validation set size:", X_val.shape)

# 4.Standardize Features
scaler = StandardScaler()

# Fit only on training data
scaler.fit(X_train)

# Transform both train and validation sets
X_train_scaled = scaler.transform(X_train)
X_val_scaled = scaler.transform(X_val)

print("\nAfter scaling:")
print("Mean of training features (approx 0):", np.round(X_train_scaled.mean(), 2))
print("Std of training features (approx 1):", np.round(X_train_scaled.std(), 2))


Feature matrix shape (X): (569, 30)
Target vector shape (y): (569,)

Feature names:
 ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

Target names: ['malignant' 'benign']

Training set size: (398, 30)
Validation set size: (171, 30)

After scaling:
Mean of training features (approx 0): -0.0
Std of training features (approx 1): 1.0


**Task 2: 'From Scratch' Utilities (NumPy) (20 Marks)**

Implement the following helper functions using only NumPy.

1. Activation Functions:

* sigmoid(Z) : Computes the sigmoid.

* relu(Z) : Computes the Rectified Linear Unit ( np.maximum(0, Z) ).

2. Activation Derivatives: These are crucial for backpropagation.

* sigmoid_derivative(A) : Where A = sigmoid(Z) . The derivative is A * (1 - A) .

* relu_derivative(Z) : The derivative is 1 if Z > 0 , and 0 otherwise.

3. Loss Functions:

* compute_bce_loss(Y, Y_hat) : Computes the Binary Cross-Entropy (BCE) loss. (Add a
small epsilon=1e-15 for numerical stability to avoid log(0) ).

* compute_mse_loss(Y, Y_hat) : Computes the Mean Squared Error (MSE) loss.

In [6]:
import numpy as np
# 1.Activation Functions
def sigmoid(Z):
    """Compute the Sigmoid activation function."""
    return 1 / (1 + np.exp(-Z))


def relu(Z):
    """Compute the ReLU activation function."""
    return np.maximum(0, Z)

# 2.Activation Derivatives
def sigmoid_derivative(A):
    """
    Derivative of Sigmoid.
    A = sigmoid(Z)
    Derivative = A * (1 - A)
    """
    return A * (1 - A)


def relu_derivative(Z):
    """
    Derivative of ReLU.
    1 if Z > 0 else 0
    """
    return (Z > 0).astype(float)

# 3.Loss Functions
def compute_bce_loss(Y, Y_hat):
    """
    Compute Binary Cross-Entropy (BCE) Loss.
    Y: True labels (0 or 1)
    Y_hat: Predicted probabilities (sigmoid outputs)
    """
    epsilon = 1e-15  # for numerical stability
    Y_hat = np.clip(Y_hat, epsilon, 1 - epsilon)
    m = Y.shape[0]
    loss = - (1 / m) * np.sum(Y * np.log(Y_hat) + (1 - Y) * np.log(1 - Y_hat))
    return loss


def compute_mse_loss(Y, Y_hat):
    """
    Compute Mean Squared Error (MSE) Loss.
    """
    m = Y.shape[0]
    loss = (1 / (2 * m)) * np.sum((Y - Y_hat) ** 2)
    return loss

if __name__ == "__main__":
    Z = np.array([-1, 0, 1, 2])
    A_sig = sigmoid(Z)
    A_relu = relu(Z)

    print("Sigmoid(Z):", A_sig)
    print("ReLU(Z):", A_relu)
    print("Sigmoid Derivative:", sigmoid_derivative(A_sig))
    print("ReLU Derivative:", relu_derivative(Z))

    Y = np.array([1, 0, 1, 0])
    Y_hat = np.array([0.9, 0.1, 0.8, 0.3])
    print("BCE Loss:", compute_bce_loss(Y, Y_hat))
    print("MSE Loss:", compute_mse_loss(Y, Y_hat))


Sigmoid(Z): [0.26894142 0.5        0.73105858 0.88079708]
ReLU(Z): [0 0 1 2]
Sigmoid Derivative: [0.19661193 0.25       0.19661193 0.10499359]
ReLU Derivative: [0. 0. 1. 1.]
BCE Loss: 0.19763488164214868
MSE Loss: 0.018749999999999996


**Task 3: 'From Scratch' ANN Classifier (40 Marks)**

Implement a MyANNClassifier class. This class will orchestrate the entire learning process.

1. Class Structure ( __init__ ):

* __init__(self, layer_dims, learning_rate=0.01, n_iterations=1000, loss='bce') :

  * layer_dims : A list specifying the number of units in each layer. e.g., [n_x, 10, 5, 1] , where n_x is the number of input features (30 for the breast cancer dataset).

  * Store learning_rate , n_iterations , and loss (either 'bce' or 'mse').

  * self.parameters_ : A dictionary to store weights ( W1 , W2 , ...) and biases ( b1 , b2 ,...).

  * self.costs_ : A list to store the loss at each iteration (for plotting).

2. Parameter Initialization ( _initialize_parameters ):

* Create a helper method that iterates through layer_dims .
* Initialize weights W with small random values ( np.random.randn(...) * 0.01 ) to break
symmetry.
* Initialize biases b as zeros ( np.zeros(...) ).
* Store them in self.parameters_ (e.g., self.parameters_['W1'] , self.parameters_['b1'] ).
3. Forward Propagation ( _forward_propagation ):

* Create a method _forward_propagation(self, X) .

* A_prev = X .

* Loop from layer 1 to L:

  * The hidden layers (1 to L-1) must use the ReLU activation.

  * The output layer (L) must use the Sigmoid activation (for binary classification).

  * Calculate Z = W @ A_prev + b .

  * Calculate A = activation(Z) .

  * Store all A (activations) and Z (linear results) in a cache (e.g., a list of tuples(A, Z) ). This cache is essential for backpropagation.

* Return the final activation A_L (which is Y_hat ) and the cache .

4. Backward Propagation ( _backward_propagation ):

* Create a method _backward_propagation(self, Y, Y_hat, cache) . This is the most complex
task.

* Y is the true labels, Y_hat is the prediction ( A_L ) from the forward pass.

* Initialize Backprop:
  * Calculate dA_L (the derivative of the loss function w.r.t. Y_hat ).
    * If self.loss == 'bce' : dA_L = -(np.divide(Y, Y_hat) - np.divide(1 - Y, 1 - Y_hat))
    * If self.loss == 'mse' : dA_L = 2 * (Y_hat - Y)
  * Output Layer (Sigmoid):
    * Get A_L and Z_L from the cache .
    * dZ_L = dA_L * sigmoid_derivative(A_L)
    * Calculate dW_L and db_L using dZ_L and the corresponding A_prev from the cache.
  * Loop Backwards (Hidden Layers - ReLU):
    * Iterate from layer L-1 down to 1.
    * Calculate dA_prev = W.T @ dZ (using W and dZ from the current layer).
    * dZ_prev = dA_prev * relu_derivative(Z_prev) (using Z_prev from the cache).
    * Calculate dW and db for this layer.
  * Store all gradients ( dW1 , db1 , dW2 , db2 , ...) in a grads dictionary.
5. Parameter Update ( _update_parameters ):

* Create a method _update_parameters(self, grads) .

* Iterate through all parameters in self.parameters_ .

* Update them using gradient descent:

  * W = W - self.learning_rate * dW

  * b = b - self.learning_rate * db

6. Fit and Predict Methods:

* fit(self, X, y) :

  * Reshape y to be (1, n_samples) .
  * Reshape X to be (n_features, n_samples) .
  * Call _initialize_parameters .
  * Loop for n_iterations :
      1. Y_hat, cache = _forward_propagation(X)
      2. loss = compute_bce_loss(y, Y_hat) (or mse based on self.loss )
      3. grads = _backward_propagation(y, Y_hat, cache)
      4. _update_parameters(grads)
      5. Store the loss in self.costs_ .
* predict(self, X) :

  * Reshape X to (n_features, n_samples) .
  * Run _forward_propagation(X) to get Y_hat .
  * Convert probabilities to binary predictions: predictions = (Y_hat > 0.5).astype(int) .
  * Return the flattened 1D array of predictions.

In [8]:
import numpy as np
class MyANNClassifier:
    def __init__(self, layer_dims, learning_rate=0.01, n_iterations=1000, loss='bce'):
        """
        Initialize the ANN model.
        layer_dims: list of layer sizes [n_input, n_hidden1, ..., n_output]
        """
        self.layer_dims = layer_dims
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.loss = loss
        self.parameters_ = {}
        self.costs_ = []

    # 1.Parameter Initialization
    def _initialize_parameters(self):
        np.random.seed(42)  # for reproducibility
        L = len(self.layer_dims)

        for l in range(1, L):
            self.parameters_[f"W{l}"] = np.random.randn(
                self.layer_dims[l], self.layer_dims[l-1]
            ) * 0.01
            self.parameters_[f"b{l}"] = np.zeros((self.layer_dims[l], 1))

    # 2.Forward Propagation
    def _forward_propagation(self, X):
        cache = []  # to store (A, Z)
        A = X
        L = len(self.layer_dims) - 1

        for l in range(1, L + 1):
            W = self.parameters_[f"W{l}"]
            b = self.parameters_[f"b{l}"]
            Z = np.dot(W, A) + b

            if l < L:
                A = relu(Z)
            else:
                A = sigmoid(Z)

            cache.append((A, Z))

        return A, cache  # A = Y_hat

    # 3.Backward Propagation
    def _backward_propagation(self, Y, Y_hat, cache):
        grads = {}
        L = len(cache)
        m = Y.shape[1]

        # Compute dA for output layer
        if self.loss == 'bce':
            dA = -(np.divide(Y, Y_hat + 1e-15) - np.divide(1 - Y, 1 - Y_hat + 1e-15))
        else:  # mse
            dA = 2 * (Y_hat - Y)

        # Output layer (Sigmoid)
        A_L, Z_L = cache[-1]
        dZ = dA * sigmoid_derivative(A_L)
        A_prev = cache[-2][0] if L > 1 else None
        grads[f"dW{L}"] = (1/m) * np.dot(dZ, A_prev.T if A_prev is not None else Y.T)
        grads[f"db{L}"] = (1/m) * np.sum(dZ, axis=1, keepdims=True)

        # Hidden layers (ReLU)
        for l in reversed(range(1, L)):
            A, Z = cache[l-1]
            W_next = self.parameters_[f"W{l+1}"]
            dA_prev = np.dot(W_next.T, dZ)
            dZ = dA_prev * relu_derivative(Z)
            A_prev = cache[l-2][0] if l > 1 else None
            grads[f"dW{l}"] = (1/m) * np.dot(dZ, A_prev.T if A_prev is not None else Y.T)
            grads[f"db{l}"] = (1/m) * np.sum(dZ, axis=1, keepdims=True)

        return grads

    # 4.Update Parameters
    def _update_parameters(self, grads):
        L = len(self.layer_dims) - 1
        for l in range(1, L + 1):
            self.parameters_[f"W{l}"] -= self.learning_rate * grads[f"dW{l}"]
            self.parameters_[f"b{l}"] -= self.learning_rate * grads[f"db{l}"]

    # 5.Fit Model
    def fit(self, X, y):
        X = X.T  # shape: (n_features, n_samples)
        y = y.reshape(1, -1)
        self._initialize_parameters()

        for i in range(self.n_iterations):
            Y_hat, cache = self._forward_propagation(X)

            # Compute loss
            if self.loss == 'bce':
                cost = compute_bce_loss(y.flatten(), Y_hat.flatten())
            else:
                cost = compute_mse_loss(y.flatten(), Y_hat.flatten())

            grads = self._backward_propagation(y, Y_hat, cache)
            self._update_parameters(grads)
            self.costs_.append(cost)

            if i % 100 == 0:
                print(f"Iteration {i}, Loss: {cost:.6f}")

    # 6.Predict
    def predict(self, X):
        X = X.T
        Y_hat, _ = self._forward_propagation(X)
        predictions = (Y_hat > 0.5).astype(int)
        return predictions.flatten()


In [9]:
model = MyANNClassifier(layer_dims=[30, 10, 5, 1], learning_rate=0.01, n_iterations=1000, loss='bce')
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_val_scaled)

from sklearn.metrics import accuracy_score
print("\nValidation Accuracy:", accuracy_score(y_val, y_pred))

Iteration 0, Loss: 0.693145
Iteration 100, Loss: 0.680709
Iteration 200, Loss: 0.673160
Iteration 300, Loss: 0.668563
Iteration 400, Loss: 0.665753
Iteration 500, Loss: 0.664029
Iteration 600, Loss: 0.662968
Iteration 700, Loss: 0.662312
Iteration 800, Loss: 0.661907
Iteration 900, Loss: 0.661655

Validation Accuracy: 0.631578947368421


**Task 4: Training and Experimentation (15 Marks)**

Use your scaled training and validation sets ( X_train_scaled , y_train , X_val_scaled , y_val ).

1. Model 1 (BCE Loss):

* Define your layer_dims . Start with one hidden layer (e.g., [30, 10, 1] ).

* Instantiate MyANNClassifier with loss='bce' , learning_rate=0.001 , and n_iterations=5000 .

* fit the model on X_train_scaled and y_train .

* predict on X_val_scaled .

* Print the classification_report for this model.

2. Model 2 (MSE Loss):

* Instantiate a new model with the exact same parameters as Model 1, but set
loss='mse' .

* fit and predict as before.

* Print the classification_report for this model.

3. Model 3 (Deeper Architecture):

* Instantiate a new model with loss='bce' but a deeper architecture (e.g., [30, 10, 5, 1] ).

* fit and predict .

* Print the classification_report for this model.

In [11]:
from sklearn.metrics import classification_report
# Model 1 — BCE Loss (Single Hidden Layer)
print("Model 1: BCE Loss (Single Hidden Layer)")

layer_dims_1 = [30, 10, 1]  # input layer = 30 features, 1 hidden layer (10 neurons), output = 1
model1 = MyANNClassifier(layer_dims=layer_dims_1,
                         learning_rate=0.001,
                         n_iterations=5000,
                         loss='bce')

model1.fit(X_train_scaled, y_train)
y_pred1 = model1.predict(X_val_scaled)

print("\nClassification Report — Model 1 (BCE):")
print(classification_report(y_val, y_pred1))

# Model 2 — MSE Loss (Single Hidden Layer)
print("\nModel 2: MSE Loss (Single Hidden Layer)")

layer_dims_2 = [30, 10, 1]
model2 = MyANNClassifier(layer_dims=layer_dims_2,
                         learning_rate=0.001,
                         n_iterations=5000,
                         loss='mse')

model2.fit(X_train_scaled, y_train)
y_pred2 = model2.predict(X_val_scaled)

print("\nClassification Report — Model 2 (MSE):")
print(classification_report(y_val, y_pred2))

# Model 3 — BCE Loss (Deeper Architecture)
print("\nModel 3: BCE Loss (Deeper Architecture)")

layer_dims_3 = [30, 10, 5, 1]  # deeper network with 2 hidden layers
model3 = MyANNClassifier(layer_dims=layer_dims_3,
                         learning_rate=0.001,
                         n_iterations=5000,
                         loss='bce')

model3.fit(X_train_scaled, y_train)
y_pred3 = model3.predict(X_val_scaled)

print("\nClassification Report — Model 3 (Deeper BCE):")
print(classification_report(y_val, y_pred3))


Model 1: BCE Loss (Single Hidden Layer)
Iteration 0, Loss: 0.693180
Iteration 100, Loss: 0.691634
Iteration 200, Loss: 0.690164
Iteration 300, Loss: 0.688766
Iteration 400, Loss: 0.687436
Iteration 500, Loss: 0.686172
Iteration 600, Loss: 0.684969
Iteration 700, Loss: 0.683826
Iteration 800, Loss: 0.682738
Iteration 900, Loss: 0.681702
Iteration 1000, Loss: 0.680716
Iteration 1100, Loss: 0.679778
Iteration 1200, Loss: 0.678886
Iteration 1300, Loss: 0.678037
Iteration 1400, Loss: 0.677230
Iteration 1500, Loss: 0.676462
Iteration 1600, Loss: 0.675731
Iteration 1700, Loss: 0.675035
Iteration 1800, Loss: 0.674373
Iteration 1900, Loss: 0.673743
Iteration 2000, Loss: 0.673144
Iteration 2100, Loss: 0.672574
Iteration 2200, Loss: 0.672032
Iteration 2300, Loss: 0.671516
Iteration 2400, Loss: 0.671025
Iteration 2500, Loss: 0.670557
Iteration 2600, Loss: 0.670111
Iteration 2700, Loss: 0.669687
Iteration 2800, Loss: 0.669284
Iteration 2900, Loss: 0.668901
Iteration 3000, Loss: 0.668536
Iteration 3

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Iteration 700, Loss: 0.123737
Iteration 800, Loss: 0.123573
Iteration 900, Loss: 0.123413
Iteration 1000, Loss: 0.123257
Iteration 1100, Loss: 0.123105
Iteration 1200, Loss: 0.122957
Iteration 1300, Loss: 0.122813
Iteration 1400, Loss: 0.122672
Iteration 1500, Loss: 0.122535
Iteration 1600, Loss: 0.122401
Iteration 1700, Loss: 0.122270
Iteration 1800, Loss: 0.122143
Iteration 1900, Loss: 0.122019
Iteration 2000, Loss: 0.121898
Iteration 2100, Loss: 0.121780
Iteration 2200, Loss: 0.121665
Iteration 2300, Loss: 0.121553
Iteration 2400, Loss: 0.121443
Iteration 2500, Loss: 0.121337
Iteration 2600, Loss: 0.121233
Iteration 2700, Loss: 0.121132
Iteration 2800, Loss: 0.121033
Iteration 2900, Loss: 0.120937
Iteration 3000, Loss: 0.120843
Iteration 3100, Loss: 0.120751
Iteration 3200, Loss: 0.120662
Iteration 3300, Loss: 0.120575
Iteration 3400, Loss: 0.120490
Iteration 3500, Loss: 0.120408
Iteration 3600, Loss: 0.120327
Iteration 3700, Loss: 0.120248
Iteration 3800, Loss: 0.120172
Iteration 3

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Iteration 500, Loss: 0.686160
Iteration 600, Loss: 0.684961
Iteration 700, Loss: 0.683820
Iteration 800, Loss: 0.682734
Iteration 900, Loss: 0.681702
Iteration 1000, Loss: 0.680720
Iteration 1100, Loss: 0.679785
Iteration 1200, Loss: 0.678896
Iteration 1300, Loss: 0.678050
Iteration 1400, Loss: 0.677246
Iteration 1500, Loss: 0.676480
Iteration 1600, Loss: 0.675751
Iteration 1700, Loss: 0.675058
Iteration 1800, Loss: 0.674398
Iteration 1900, Loss: 0.673771
Iteration 2000, Loss: 0.673173
Iteration 2100, Loss: 0.672605
Iteration 2200, Loss: 0.672064
Iteration 2300, Loss: 0.671549
Iteration 2400, Loss: 0.671059
Iteration 2500, Loss: 0.670592
Iteration 2600, Loss: 0.670148
Iteration 2700, Loss: 0.669725
Iteration 2800, Loss: 0.669323
Iteration 2900, Loss: 0.668940
Iteration 3000, Loss: 0.668575
Iteration 3100, Loss: 0.668228
Iteration 3200, Loss: 0.667898
Iteration 3300, Loss: 0.667583
Iteration 3400, Loss: 0.667283
Iteration 3500, Loss: 0.666998
Iteration 3600, Loss: 0.666726
Iteration 370

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


**Task 5: Comparison with scikit-learn (10 Marks)**

1. Train MLPClassifier :

* Import from sklearn.neural_network import MLPClassifier .

* Instantiate MLPClassifier with parameters that roughly match your best "from scratch"
model.

* Example: MLPClassifier(hidden_layer_sizes=(10,), activation='relu', solver='adam', max_iter=1000,
learning_rate_init=0.001, random_state=42) .

* fit the MLPClassifier on X_train_scaled and y_train .

2. Evaluate MLPClassifier :

* predict on X_val_scaled .

* Print the classification_report for the sklearn model.

In [19]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report

print("Model 4: Scikit-Learn MLPClassifier")

# Match parameters similar to your best model (say, Model 1 or 3)
mlp_model = MLPClassifier(
    hidden_layer_sizes=(10,),
    activation='relu',
    solver='adam',
    max_iter=1000,
    learning_rate_init=0.001,
    random_state=42
)

# Train
mlp_model.fit(X_train_scaled, y_train)

# Predict
y_pred_mlp = mlp_model.predict(X_val_scaled)

# Evaluate
print("\nClassification Report — Model 4 (scikit-learn MLPClassifier):")
print(classification_report(y_val, y_pred_mlp))


Model 4: Scikit-Learn MLPClassifier

Classification Report — Model 4 (scikit-learn MLPClassifier):
              precision    recall  f1-score   support

           0       0.98      0.98      0.98        63
           1       0.99      0.99      0.99       108

    accuracy                           0.99       171
   macro avg       0.99      0.99      0.99       171
weighted avg       0.99      0.99      0.99       171

