<a href="https://colab.research.google.com/github/Shaunak-Mukherjee/ECE570-Artificial-Intelligence/blob/main/ECE570_F2024_Assignment_02_Exercise_Shaunak_Mukherjee.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ECE 57000 Assignment 2 Exercises



Name: Shaunak Mukherjee

# Important submission information

1. Follow the instructions in the provided "uploader.ipynb" to convert your ipynb file into PDF format.
2. Please make sure to select the corresponding pages for each exercise when you submitting your PDF to Gradescope. Make sure to include both the **output** and the **code** when selecting pages. (You do not need to include the instruction for the exercises)


**We may assess a 20% penalty for those who do not correctly follow these steps.**

# 1. Task description & Background
## 1-1. Task description

In this assignment, students will implement Stochastic Gradient Descent (SGD) for logistic regression and apply backpropagation for gradient descent/SGD on neural networks. You are only allowed to use basic functions or equivalent operations of NumPy package. The dataset from Assignment 1 will be reused.

For the first part (logistic regression), students will define the model, loss function, compute gradients, and implement the SGD algorithm. In the second part, students will implement GD/SGD for a three-layer neural network, focusing on the forward pass and backpropagation.

## 1-2. Background on dataset
In this assignment, we will explore the application of logistic regression to a binary classification problem in the field of medical diagnostics similar to the first assignment. The objective is to predict whether a breast tumor is benign or malignant based on features extracted from digitized images of fine needle aspirate (FNA) of breast mass.

The dataset used is the Breast Cancer dataset from the UCI Machine Learning Repository, incorporated into scikit-learn as `load_breast_cancer`. This dataset includes measurements from 569 instances of breast tumors, with each instance described by 30 numeric attributes. These features include things like the texture, perimeter, smoothness, and symmetry of the tumor cells.

You will split the data into training and test sets, with 80% of the data used for training and the remaining 20% for testing. This setup tests the model’s ability to generalize to new, unseen data. We set the `random_state` as 42 to ensure reproducibility. The logistic regression model, initialized with the 'liblinear' solver, will be trained on the training set.



# 2. Loading and preprocessing data from the previous assignment


You can load the Breast Cancer dataset by using [this function](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html) from the `sklearn.datasets` module (we have imported the function for you). Refer to the official documentation to understand more about this function.

**Implement the Following:**
1.  `data`: Use the built-in function to load the dataset and store it in this variable.
2.  `X`: This should store the feature matrix from the dataset.
3.  `y`: This should store the target vector, which includes the labels indicating whether the tumor is benign or malignant.

`X_train, X_test, y_train, y_test`: Split `X` and `y` into training and testing sets.
   - Set `test_size` to 0.2, allocating 20% of the data for testing.
   - Use `random_state=42` to ensure that your results are reproducible.


In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

data = load_breast_cancer()
X, y = data.data, data.target
print(f'The data has a shape of {X.shape}, and the target has a shape of {y.shape}')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f'The training set has {X_train.shape[0]} datapoints and the test set has {X_test.shape[0]} datapoints.')

scaler = MinMaxScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
print(f'The max of training data is {X_train.max():.2f} and the min is {X_train.min():.2f}.')

The data has a shape of (569, 30), and the target has a shape of (569,)
The training set has 455 datapoints and the test set has 114 datapoints.
The max of training data is 1.00 and the min is 0.00.


# 3. Initialize and train the logistic regression model with SGD (60/100 points)


You will initialize and train a logistic regression model.


## 3-1. Defining sigmoid function and binary cross entropy function (10/100 points)
**Implement the Following:**
1. Sigmoid function: Implement the sigmoid function, which takes in a scalar or vector and returns the sigmoid of the input.
2. Binary Cross-Entropy Loss: Implement the binary cross-entropy loss function, which takes in the predictions and the true labels and returns the loss value. It is formulated as $\ell(y,\hat{y})=-\frac{1}{N} \sum_{n=1}^{N} \left[ y_n( \log \hat{y}_n ) + (1-y_n) \log (1-\hat{y}_n) \right]$.

Please implement by using basic functions in numpy.
Ensure your code is placed between the comments `<Your code>` and `<end code>`. This structure is intended to keep your implementation organized and straightforward.


In [None]:
import numpy as np
# initialize numpy random seed
np.random.seed(29)

# Sigmoid function for logistic regression
def sigmoid(z):
    # <Your code>

    """
    Computes sigmoid function on Z (scalar or vector)
    parameter z: scalar or vector
    return: sigmoid of z
    """

    return 1 / (1 + np.exp(-z))

    # <end code>

# Binary Cross-Entropy Loss
def binary_cross_entropy(y_true, y_pred):
    # Avoid log(0) by clipping predictions

    # <Your code>

    """
    Computes binary cross-entropy loss between true labels and predictions
    parameter y_true: true label
    parameter y_pred: prediction
    return: binary cross-entropy loss
    Clipped predicted values to avoid log(0) numerical instability issues in loss calculation
    and ensures y_pred is never exactly 0 or 1, preventing undefined log operations
    """

    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon) # This code avoids log(0)

    # Computation of binary cross-entropy loss
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

    # <end code>

## 3-2. Defining a class `LogisticRegression` based on SGD (50/100 points)

**Implement the Following:**
1. Initialize Parameters: Implement `initialize_weights` function. Use zero initialization for `weights` and `bias`.

2. predict function: Implement the predict function, which takes in the feature matrix `X` and returns the predicted value. Assuming $W \in \mathbb{R}^{D}$, $X \in \mathbb{R}^{N \times D}$, $b \in \mathbb{R}^{1}$, the linear model is defined as $\sigma(XW + b)$, where $\sigma$ is sigmoid function.

3. fit function: Implement the fit function, which trains the logistic regression model using SGD. The function should take in the feature matrix `X`, the true labels `y`, the learning rate `lr`, and the number of epochs `n_epochs`. In specific, first, at every epoch, you may shuffle indices of `n_samples` and reorganize the order of `X` and `y` to make sure that the order is randomized per epoch. Second, make a for loop for SGD. You may want to make a small batch data like `X_batch` and `y_batch`. Third, inside of the for loop for SGD, make a prediction by using the `predict` function you implemented. Fourth, compute the gradient with respect to `weights` and with respect to `bias`. Fifth, use the gradient to update `weights` and `bias`. In other words, implement the SGD algorithm $w^{(1)}=w^{(0)}-\alpha \nabla_w (\text{BCE} (y, \hat{y} ) )$ and $b^{(1)}=b^{(0)}-\alpha \nabla_b ( \text{BCE} ( y,\hat{y} ))$, where $\alpha$ is a learning rate and $\hat{y}$ is the prediction $\sigma(XW + b)$. BCE indicates binary cross entropy loss.

You are encouraged to experiment with different architectures and learning rates to see how they affect the performance of the model.   
Make sure you get accuracy greater than **0.85** on the test set.

In [None]:
class LogisticRegression_SGD:
    def __init__(self, learning_rate=0.01, epochs=100, batch_size=32):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.batch_size = batch_size

    # Initialize weights
    def initialize_weights(self, n_features):
        """
        Initializes weights and bias to zero.
        :param n_features: Number of input features
        """
        # <Your code>

        # Zero initialization for weights and bias
        self.weights = np.zeros(n_features)
        self.bias = 0

        # <end code>


    # Prediction function
    def predict(self, X):
        # <Your code>

        """
        Predicts class labels for the input data.
        parameter X: Input data
        return: Predicted class labels
        """
        # Prediction function using the sigmoid on the linear
        # model XW + b provided above

        linear_model = np.dot(X, self.weights) + self.bias
        predictions = sigmoid(linear_model)
        return predictions

        # <end code>

    # Training function using mini-batch SGD
    def fit(self, X, y, Trigger=True):
        n_samples, n_features = X.shape
        self.initialize_weights(n_features)

        for epoch in range(self.epochs):
            # Shuffle the data
            indices = np.arange(n_samples)
            np.random.shuffle(indices)
            X = X[indices]
            y = y[indices]

            if epoch == 0:
              loss = binary_cross_entropy(y, self.predict(X))
              if Trigger:
                print("SGD loss")
                print(f"Epoch {epoch + 1}/{self.epochs}, Loss: {loss:.4f}")

            for i in range(0, n_samples, self.batch_size):
                X_batch = X[i:i + self.batch_size]
                y_batch = y[i:i + self.batch_size]

                # <Your code>

                # Predictions
                y_pred = self.predict(X_batch)

                # Compute gradients
                dw = np.dot(X_batch.T, (y_pred - y_batch)) / self.batch_size
                db = np.sum(y_pred - y_batch) / self.batch_size

                # Update weights
                self.weights = self.weights - self.learning_rate * dw

                # Update bias
                self.bias = self.bias - self.learning_rate * db


                # <end code>

            # Calculate loss for monitoring
            loss = binary_cross_entropy(y, self.predict(X))
            if Trigger:
              if (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch + 1}/{self.epochs}, Loss: {loss:.4f}")

In [None]:
# You are encouraged to experiment with different architectures and learning rates to see how they affect the performance of the model.
# Training the model
model_SGD = LogisticRegression_SGD(learning_rate=0.1, epochs=100, batch_size=16)
model_SGD.fit(X_train, y_train)

SGD loss
Epoch 1/100, Loss: 0.6931
Epoch 10/100, Loss: 0.3551
Epoch 20/100, Loss: 0.2728
Epoch 30/100, Loss: 0.2340
Epoch 40/100, Loss: 0.2106
Epoch 50/100, Loss: 0.1943
Epoch 60/100, Loss: 0.1822
Epoch 70/100, Loss: 0.1726
Epoch 80/100, Loss: 0.1647
Epoch 90/100, Loss: 0.1582
Epoch 100/100, Loss: 0.1525


In [None]:
# Code to check accuracy of your implementation
from sklearn.metrics import accuracy_score

predictions = model_SGD.predict(X_test)
predictions = (predictions > 0.5).astype(int)
print(predictions)
print(y_test)
accuracy = accuracy_score(y_test, predictions)

print(f'The accuracy is {accuracy:.4f}')

[1 0 0 1 1 0 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0]
[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 0 1 1 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0]
The accuracy is 0.9649


### Experimenting with hyperparameters tuning used earlier to improve accuracy.

In [23]:
from sklearn.metrics import accuracy_score
from tqdm import tqdm

"""
Below is an experiment with varying hyperparameters such as epochs, learning rates,
and batch sizes. The idea here is to find the best combination of hyperparameters
that yields the highest accuracy.
"""

# List of hyperparameters to vary
epochs_list = [100, 200, 300, 400]  # Varying epochs
learning_rates = [0.001, 0.01, 0.05, 0.1, 0.5, 1]  # Varying learning rates
batches = [16, 32, 64]  # Varying batch sizes
accuracy_results = []  # Empty list to store computed accuracies

# Initialize variables
best_accuracy = 0
best_epochs = 0
best_lr = 0
best_batch = 0

# Total iterations for tqdm
total_iterations = len(epochs_list) * len(learning_rates) * len(batches)

"""
Implement tqdm progress bar for the entire process (Reference: https://github.com/tqdm/tqdm)
"""

with tqdm(total=total_iterations, desc='Hyperparameter Tuning Experimentation', unit='iteration') as progress_bar:
    # Nested loops to iterate over epochs, learning rates, and batch sizes
    for epochs in epochs_list:
        for lr in learning_rates:
            for batch in batches:
                # Initialize and fit the model for each iteration
                model_SGD = LogisticRegression_SGD(learning_rate=lr, epochs=epochs, batch_size=batch)
                model_SGD.fit(X_train, y_train, Trigger=False)

                # Make predictions and calculate accuracy
                predictions = model_SGD.predict(X_test)
                predictions = (predictions > 0.5).astype(int)
                accuracy = accuracy_score(y_test, predictions)

                # Store
                accuracy_results.append((epochs, lr, batch, accuracy))

                # Print results
                print(f'For Epochs: {epochs}, Learning Rate: {lr}, Batch Size: {batch}, the corresponding Accuracy is: {accuracy:.4f}')

                # Check if this accuracy is better than the best found so far
                if accuracy > best_accuracy:
                    best_accuracy = accuracy
                    best_epochs = epochs
                    best_lr = lr
                    best_batch = batch

                # Update progress bar
                progress_bar.update(1)

# Print the best hyper-parameter settings and corresponding accuracy
print("\nThe Best Accuracy hyper-parameter Settings:")
print(f'For Epochs: {best_epochs}, Learning Rate: {best_lr} and Batch Size: {best_batch}, we can get best accuracy: {best_accuracy:.4f}')

Hyperparameter Tuning Experimentation:   1%|▏         | 1/72 [00:00<00:11,  6.32iteration/s]

For Epochs: 100, Learning Rate: 0.001, Batch Size: 16, the corresponding Accuracy is: 0.8246


Hyperparameter Tuning Experimentation:   4%|▍         | 3/72 [00:00<00:06, 10.22iteration/s]

For Epochs: 100, Learning Rate: 0.001, Batch Size: 32, the corresponding Accuracy is: 0.7544
For Epochs: 100, Learning Rate: 0.001, Batch Size: 64, the corresponding Accuracy is: 0.7105


Hyperparameter Tuning Experimentation:   7%|▋         | 5/72 [00:00<00:08,  7.82iteration/s]

For Epochs: 100, Learning Rate: 0.01, Batch Size: 16, the corresponding Accuracy is: 0.9474
For Epochs: 100, Learning Rate: 0.01, Batch Size: 32, the corresponding Accuracy is: 0.9386
For Epochs: 100, Learning Rate: 0.01, Batch Size: 64, the corresponding Accuracy is: 0.8947


Hyperparameter Tuning Experimentation:  11%|█         | 8/72 [00:00<00:07,  8.57iteration/s]

For Epochs: 100, Learning Rate: 0.05, Batch Size: 16, the corresponding Accuracy is: 0.9561
For Epochs: 100, Learning Rate: 0.05, Batch Size: 32, the corresponding Accuracy is: 0.9386
For Epochs: 100, Learning Rate: 0.05, Batch Size: 64, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  17%|█▋        | 12/72 [00:01<00:05, 10.04iteration/s]

For Epochs: 100, Learning Rate: 0.1, Batch Size: 16, the corresponding Accuracy is: 0.9649
For Epochs: 100, Learning Rate: 0.1, Batch Size: 32, the corresponding Accuracy is: 0.9561
For Epochs: 100, Learning Rate: 0.1, Batch Size: 64, the corresponding Accuracy is: 0.9386


Hyperparameter Tuning Experimentation:  19%|█▉        | 14/72 [00:01<00:06,  8.36iteration/s]

For Epochs: 100, Learning Rate: 0.5, Batch Size: 16, the corresponding Accuracy is: 0.9737
For Epochs: 100, Learning Rate: 0.5, Batch Size: 32, the corresponding Accuracy is: 0.9737
For Epochs: 100, Learning Rate: 0.5, Batch Size: 64, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  25%|██▌       | 18/72 [00:02<00:05,  9.54iteration/s]

For Epochs: 100, Learning Rate: 1, Batch Size: 16, the corresponding Accuracy is: 0.9737
For Epochs: 100, Learning Rate: 1, Batch Size: 32, the corresponding Accuracy is: 0.9737
For Epochs: 100, Learning Rate: 1, Batch Size: 64, the corresponding Accuracy is: 0.9649
For Epochs: 200, Learning Rate: 0.001, Batch Size: 16, the corresponding Accuracy is: 0.8772


Hyperparameter Tuning Experimentation:  29%|██▉       | 21/72 [00:02<00:07,  6.75iteration/s]

For Epochs: 200, Learning Rate: 0.001, Batch Size: 32, the corresponding Accuracy is: 0.8246
For Epochs: 200, Learning Rate: 0.001, Batch Size: 64, the corresponding Accuracy is: 0.7544


Hyperparameter Tuning Experimentation:  31%|███       | 22/72 [00:02<00:09,  5.50iteration/s]

For Epochs: 200, Learning Rate: 0.01, Batch Size: 16, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  32%|███▏      | 23/72 [00:03<00:09,  4.99iteration/s]

For Epochs: 200, Learning Rate: 0.01, Batch Size: 32, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  33%|███▎      | 24/72 [00:03<00:10,  4.38iteration/s]

For Epochs: 200, Learning Rate: 0.01, Batch Size: 64, the corresponding Accuracy is: 0.9386


Hyperparameter Tuning Experimentation:  35%|███▍      | 25/72 [00:04<00:16,  2.93iteration/s]

For Epochs: 200, Learning Rate: 0.05, Batch Size: 16, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  36%|███▌      | 26/72 [00:04<00:15,  2.91iteration/s]

For Epochs: 200, Learning Rate: 0.05, Batch Size: 32, the corresponding Accuracy is: 0.9561


Hyperparameter Tuning Experimentation:  38%|███▊      | 27/72 [00:04<00:15,  2.98iteration/s]

For Epochs: 200, Learning Rate: 0.05, Batch Size: 64, the corresponding Accuracy is: 0.9386


Hyperparameter Tuning Experimentation:  39%|███▉      | 28/72 [00:05<00:18,  2.34iteration/s]

For Epochs: 200, Learning Rate: 0.1, Batch Size: 16, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  40%|████      | 29/72 [00:05<00:18,  2.36iteration/s]

For Epochs: 200, Learning Rate: 0.1, Batch Size: 32, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  42%|████▏     | 30/72 [00:06<00:15,  2.63iteration/s]

For Epochs: 200, Learning Rate: 0.1, Batch Size: 64, the corresponding Accuracy is: 0.9561


Hyperparameter Tuning Experimentation:  43%|████▎     | 31/72 [00:06<00:19,  2.13iteration/s]

For Epochs: 200, Learning Rate: 0.5, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  46%|████▌     | 33/72 [00:07<00:13,  2.95iteration/s]

For Epochs: 200, Learning Rate: 0.5, Batch Size: 32, the corresponding Accuracy is: 0.9737
For Epochs: 200, Learning Rate: 0.5, Batch Size: 64, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  47%|████▋     | 34/72 [00:07<00:12,  3.01iteration/s]

For Epochs: 200, Learning Rate: 1, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  50%|█████     | 36/72 [00:08<00:08,  4.07iteration/s]

For Epochs: 200, Learning Rate: 1, Batch Size: 32, the corresponding Accuracy is: 0.9737
For Epochs: 200, Learning Rate: 1, Batch Size: 64, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  51%|█████▏    | 37/72 [00:08<00:10,  3.21iteration/s]

For Epochs: 300, Learning Rate: 0.001, Batch Size: 16, the corresponding Accuracy is: 0.9123


Hyperparameter Tuning Experimentation:  53%|█████▎    | 38/72 [00:08<00:10,  3.31iteration/s]

For Epochs: 300, Learning Rate: 0.001, Batch Size: 32, the corresponding Accuracy is: 0.8684


Hyperparameter Tuning Experimentation:  54%|█████▍    | 39/72 [00:08<00:09,  3.66iteration/s]

For Epochs: 300, Learning Rate: 0.001, Batch Size: 64, the corresponding Accuracy is: 0.7807


Hyperparameter Tuning Experimentation:  56%|█████▌    | 40/72 [00:09<00:10,  3.03iteration/s]

For Epochs: 300, Learning Rate: 0.01, Batch Size: 16, the corresponding Accuracy is: 0.9386


Hyperparameter Tuning Experimentation:  57%|█████▋    | 41/72 [00:09<00:09,  3.18iteration/s]

For Epochs: 300, Learning Rate: 0.01, Batch Size: 32, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  58%|█████▊    | 42/72 [00:09<00:08,  3.50iteration/s]

For Epochs: 300, Learning Rate: 0.01, Batch Size: 64, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  60%|█████▉    | 43/72 [00:10<00:09,  2.95iteration/s]

For Epochs: 300, Learning Rate: 0.05, Batch Size: 16, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  62%|██████▎   | 45/72 [00:10<00:07,  3.51iteration/s]

For Epochs: 300, Learning Rate: 0.05, Batch Size: 32, the corresponding Accuracy is: 0.9649
For Epochs: 300, Learning Rate: 0.05, Batch Size: 64, the corresponding Accuracy is: 0.9561


Hyperparameter Tuning Experimentation:  64%|██████▍   | 46/72 [00:11<00:08,  2.94iteration/s]

For Epochs: 300, Learning Rate: 0.1, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  67%|██████▋   | 48/72 [00:11<00:06,  3.58iteration/s]

For Epochs: 300, Learning Rate: 0.1, Batch Size: 32, the corresponding Accuracy is: 0.9649
For Epochs: 300, Learning Rate: 0.1, Batch Size: 64, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  68%|██████▊   | 49/72 [00:12<00:07,  2.90iteration/s]

For Epochs: 300, Learning Rate: 0.5, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  71%|███████   | 51/72 [00:12<00:05,  3.57iteration/s]

For Epochs: 300, Learning Rate: 0.5, Batch Size: 32, the corresponding Accuracy is: 0.9737
For Epochs: 300, Learning Rate: 0.5, Batch Size: 64, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  72%|███████▏  | 52/72 [00:13<00:06,  2.90iteration/s]

For Epochs: 300, Learning Rate: 1, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  75%|███████▌  | 54/72 [00:13<00:05,  3.55iteration/s]

For Epochs: 300, Learning Rate: 1, Batch Size: 32, the corresponding Accuracy is: 0.9737
For Epochs: 300, Learning Rate: 1, Batch Size: 64, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  76%|███████▋  | 55/72 [00:14<00:06,  2.58iteration/s]

For Epochs: 400, Learning Rate: 0.001, Batch Size: 16, the corresponding Accuracy is: 0.9386


Hyperparameter Tuning Experimentation:  78%|███████▊  | 56/72 [00:14<00:06,  2.63iteration/s]

For Epochs: 400, Learning Rate: 0.001, Batch Size: 32, the corresponding Accuracy is: 0.8772


Hyperparameter Tuning Experimentation:  79%|███████▉  | 57/72 [00:14<00:05,  2.98iteration/s]

For Epochs: 400, Learning Rate: 0.001, Batch Size: 64, the corresponding Accuracy is: 0.8246


Hyperparameter Tuning Experimentation:  81%|████████  | 58/72 [00:15<00:05,  2.34iteration/s]

For Epochs: 400, Learning Rate: 0.01, Batch Size: 16, the corresponding Accuracy is: 0.9561


Hyperparameter Tuning Experimentation:  82%|████████▏ | 59/72 [00:15<00:05,  2.45iteration/s]

For Epochs: 400, Learning Rate: 0.01, Batch Size: 32, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  83%|████████▎ | 60/72 [00:16<00:04,  2.73iteration/s]

For Epochs: 400, Learning Rate: 0.01, Batch Size: 64, the corresponding Accuracy is: 0.9474


Hyperparameter Tuning Experimentation:  85%|████████▍ | 61/72 [00:16<00:04,  2.22iteration/s]

For Epochs: 400, Learning Rate: 0.05, Batch Size: 16, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  86%|████████▌ | 62/72 [00:17<00:04,  2.18iteration/s]

For Epochs: 400, Learning Rate: 0.05, Batch Size: 32, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  88%|████████▊ | 63/72 [00:17<00:04,  1.97iteration/s]

For Epochs: 400, Learning Rate: 0.05, Batch Size: 64, the corresponding Accuracy is: 0.9561


Hyperparameter Tuning Experimentation:  89%|████████▉ | 64/72 [00:19<00:05,  1.42iteration/s]

For Epochs: 400, Learning Rate: 0.1, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  90%|█████████ | 65/72 [00:19<00:05,  1.37iteration/s]

For Epochs: 400, Learning Rate: 0.1, Batch Size: 32, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  92%|█████████▏| 66/72 [00:20<00:04,  1.48iteration/s]

For Epochs: 400, Learning Rate: 0.1, Batch Size: 64, the corresponding Accuracy is: 0.9649


Hyperparameter Tuning Experimentation:  93%|█████████▎| 67/72 [00:21<00:04,  1.20iteration/s]

For Epochs: 400, Learning Rate: 0.5, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  94%|█████████▍| 68/72 [00:22<00:02,  1.44iteration/s]

For Epochs: 400, Learning Rate: 0.5, Batch Size: 32, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  96%|█████████▌| 69/72 [00:22<00:01,  1.78iteration/s]

For Epochs: 400, Learning Rate: 0.5, Batch Size: 64, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  97%|█████████▋| 70/72 [00:22<00:01,  1.69iteration/s]

For Epochs: 400, Learning Rate: 1, Batch Size: 16, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation:  99%|█████████▊| 71/72 [00:23<00:00,  1.88iteration/s]

For Epochs: 400, Learning Rate: 1, Batch Size: 32, the corresponding Accuracy is: 0.9737


Hyperparameter Tuning Experimentation: 100%|██████████| 72/72 [00:23<00:00,  3.05iteration/s]

For Epochs: 400, Learning Rate: 1, Batch Size: 64, the corresponding Accuracy is: 0.9737

The Best Accuracy hyper-parameter Settings:
For Epochs: 100, Learning Rate: 0.5 and Batch Size: 16, we can get best accuracy: 0.9737





# 4. 3-Layer Neural Network with SGD (40/100 points)


Now, we extend our 1-layer neural network to 3-layers neural network.

**Implement the Following:**

Ensure your code is placed between the comments `<Your code>` and `<end code>`. This structure is intended to keep your implementation organized and straightforward.




## 4-1. Defining activation functions and the derivative (10/100 points)

**Implement the Following:**
1. relu function: Implement the ReLU activation function, which takes in a scalar or vector and returns the ReLU of the input.
2. relu_derivative function: Implement the derivative of the ReLU activation function, which takes in a scalar or vector and returns the derivative of the ReLU of the input.
3. sigmoid function: Implement the sigmoid activation function, which takes in a scalar or vector and returns the sigmoid of the input.

In [None]:
def relu(z):
    """ReLU activation function."""
    # <Your code>

    return np.maximum(0.0, z)

    # <end code>

def relu_derivative(z):
    """Derivative of ReLU activation function."""
    # <Your code>
    return (z > 0).astype(int)

    # <end code>

def sigmoid(z):
    """Sigmoid activation function."""
    # <Your code>
    return 1 / (1 + np.exp(-z))

    # <end code>


## 4-2. Defining 3-layer Neural Network (30/100 points)

**Implement the Following:**

1. Initialize Parameters: Implement `initialize_weights` function. Implement Kaiming initialization to initialize `weights`. Use zero initialization for `bias`.

2. forward function: Compute the pre-activation for each layer by multiplying inputs or previous activations with weights and adding biases. Apply the ReLU activation function for hidden layers and the Sigmoid function for the output layer. Finally, return the activated output of the network. The formulation of forward function can be defined as:

$$\sigma(\text{relu}(\text{relu}(XW_1+b_1)W_2+b_2)W_3+b_3)$$

3. backward function:
    1. **Compute Gradient of Loss**: Calculate the gradient of the loss with respect to the network's output.
    2. **Compute Gradients for Weights and Biases**: Use the gradients from the output to compute the gradients of weights and biases at each layer, applying the activation function's derivative where needed.
    3. **Propagate Gradients Backward**: Continue to backpropagate the gradients through the network, adjusting calculations as you move from one layer to the previous.
    4. **Update Parameters**: Update all weights and biases using the calculated gradients and learning rate.

    This backpropagation adjusts the model parameters to minimize the loss.

4. predict function: Implement the predict function, which takes in the feature matrix `X` and returns the predicted class (0 or 1).

You are encouraged to experiment with different architectures and learning rates to see how they affect the performance of the model.

Make sure you get accuracy greater than **0.75** on the test set.

In [None]:
# Neural Network Model with an additional hidden layer
class NeuralNetwork:
    def __init__(self, input_size, hidden_size1, hidden_size2, output_size, learning_rate=0.01, epochs=100, batch_size=32):
        """
        Initialize the Neural Network with given parameters.
        :param input_size: Number of input features
        :param hidden_size1: Number of neurons in the first hidden layer
        :param hidden_size2: Number of neurons in the second hidden layer
        :param output_size: Number of output neurons (1 for binary classification)
        :param learning_rate: Learning rate for weight updates
        :param epochs: Number of training iterations
        :param batch_size: Size of mini-batches for SGD
        """
        self.input_size = input_size
        self.hidden_size1 = hidden_size1
        self.hidden_size2 = hidden_size2
        self.output_size = output_size
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.batch_size = batch_size
        self.initialize_weights()
        self.loss_history = []

    def initialize_weights(self):
        """Initialize weights and biases using Kaiming initialization."""

        # <Your code>

        # Kaiming initialization for the first layer weights
        self.W1 = np.random.randn(self.input_size, self.hidden_size1) * np.sqrt(2 / self.input_size)
        self.b1 = np.zeros(self.hidden_size1)


        # Kaiming initialization for the second layer weights
        self.W2 = np.random.randn(self.hidden_size1, self.hidden_size2) * np.sqrt(2 / self.hidden_size1)
        self.b2 = np.zeros(self.hidden_size2)


        # Kaiming initialization for the third layer weights
        self.W3 = np.random.randn(self.hidden_size2, self.output_size) * np.sqrt(2 / self.hidden_size2)
        self.b3 = np.zeros(self.output_size)


        # <end code>


    def forward(self, X):
        """
        Forward pass through the network.
        :param X: Input data
        :return: Activated output of the network
        """
        # <Your code>

        # Forward pass for the first hidden layer
        self.Z1 = np.dot(X, self.W1) + self.b1
        self.A1 = relu(self.Z1)

        # Forward pass for the second hidden layer
        self.Z2 = np.dot(self.A1, self.W2) + self.b2
        self.A2 = relu(self.Z2)

        # Forward pass for the output layer
        self.Z3 = np.dot(self.A2, self.W3) + self.b3
        A3 = sigmoid(self.Z3)

        return A3
        # <end code>

    def backward(self, X, y, output):
        """
        Backpropagation to compute gradients and update weights.
        :param X: Input data
        :param y: True labels
        :param output: Predicted output from forward pass
        """
        m = X.shape[0]

        # Gradient of loss w.r.t. output (binary cross-entropy with sigmoid activation)
        dZ3 = output - y[:,None]  # Gradient wrt Z3 when using sigmoid activation at output


        # <Your code>

        # Gradients for the third layer (output layer)
        dW3 = (1/m) * np.dot(self.A2.T, dZ3)
        db3 = (1/m) * np.sum(dZ3, axis=0)


        # Gradients for the second hidden layer
        dZ2 = np.dot(dZ3, self.W3.T) * relu_derivative(self.Z2)


        # Gradients for the first hidden layer
        dZ1 = np.dot(dZ2, self.W2.T) * relu_derivative(self.Z1)



        # Update weights and biases using gradients
        self.W3 -= self.learning_rate * dW3
        self.b3 -= self.learning_rate * db3

        # <end code>

    def fit(self, X, y, Trigger =True):
        """
        Train the neural network using mini-batch SGD.
        :param X: Training data
        :param y: True labels
        """
        loss = binary_cross_entropy(y, self.forward(X))
        if Trigger:
          print(f"Epoch 0/{self.epochs}, Loss: {loss:.4f}")

        for epoch in range(self.epochs):
            indices = np.arange(X.shape[0])
            np.random.shuffle(indices)
            X = X[indices]
            y = y[indices]

            for i in range(0, X.shape[0], self.batch_size):
                X_batch = X[i:i + self.batch_size]
                y_batch = y[i:i + self.batch_size]

                # Forward and backward pass
                output = self.forward(X_batch)
                self.backward(X_batch, y_batch, output)

            # Calculate and print loss for monitoring

                loss = binary_cross_entropy(y, self.forward(X))
                self.loss_history.append(loss)  # Store loss
                if Trigger and (epoch + 1) % 100 == 0:
                    print(f"Epoch {epoch + 1}/{self.epochs}, Loss: {loss:.4f}")

    def get_loss_history(self):
            """Return the loss history."""
            return self.loss_history


    def predict(self, X):
        """
        Predict using the trained neural network.
        :param X: Input data
        :return: Predicted labels
        """

        # <Your code>
        predictions = self.forward(X)
        return (predictions > 0.5).astype(int)  # Convert probabilities to binary predictions
        # <end code>

In [None]:
# You are encouraged to experiment with different architectures and learning rates to see how they affect the performance of the model.
nn_network = NeuralNetwork(input_size=X_train.shape[1], hidden_size1=8, hidden_size2=4, output_size=1, learning_rate=0.007, epochs=1000, batch_size=32)
nn_network.fit(X_train, y_train)

Epoch 0/1000, Loss: 0.8212
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6648
Epoch 100/1000, Loss: 0.6649
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 200/1000, Loss: 0.6665
Epoch 300/1000, Loss: 0.6691
Epoch 300/1000, Loss: 0.6691
Epoch 300/1000, Loss: 0.6690
Epoch 300/1000, 

In [None]:
# Code to check accuracy of your implementation
predictions = nn_network.predict(X_test)
print(predictions.reshape(-1))
print(y_test)
accuracy = accuracy_score(y_test, predictions)

print(f'The accuracy is {accuracy:.4f}')

[1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1
 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1
 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1
 1 0 0]
[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 0 1 1 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0]
The accuracy is 0.7719


hence greater than 0.75 achieved.

## Experimenting with hyperparameters tuning used earlier to improve accuracy

In [22]:
from sklearn.metrics import accuracy_score
from tqdm import tqdm
import numpy as np

# List of hyperparameters to vary
epochs_list = [500, 1000, 1500, 2000]  # Varying epochs
learning_rates = [0.001, 0.005, 0.0100, 0.1000]  # Varying learning rates
batches = [16, 32, 64]  # Varying batch sizes
accuracy_results = []  # Empty list to store computed accuracies

# Initialize variables
best_accuracy = 0
best_epochs = 0
best_lr = 0
best_batch = 0

# Total iterations for tqdm
total_iterations = len(epochs_list) * len(learning_rates) * len(batches)

# Create a single tqdm progress bar for the entire process
with tqdm(total=total_iterations, desc='Hyperparameter Tuning', unit='iteration') as progress_bar:
    # Nested loops to iterate over epochs, learning rates, and batch sizes
    for epochs in epochs_list:
        for lr in learning_rates:
            for batch in batches:
                # Fit the model for each iteration
                nn_model = NeuralNetwork(input_size=X_train.shape[1],
                                         hidden_size1=8,
                                         hidden_size2=4,
                                         output_size=1,
                                         learning_rate=lr,
                                         epochs=epochs,
                                         batch_size=batch)

                nn_model.fit(X=X_train, y=y_train, Trigger=False)


                # Predictions and accuracy calculation
                predictions = nn_model.predict(X_test)
                predictions = predictions.reshape(-1)
                accuracy = accuracy_score(y_test, predictions)
                loss_history = nn_model.get_loss_history()

                # Store the accuracy result
                accuracy_results.append((epochs, lr, batch, accuracy))

                # Print results
                print(f'For Epochs: {epochs}, Learning Rate: {lr}, Batch Size: {batch}, the corresponding Loss is {loss_history[-1]:.4f} and Accuracy is: {accuracy:.4f}')

                # Check if this accuracy is better than previous
                if accuracy > best_accuracy:
                    best_accuracy = accuracy
                    best_epochs = epochs
                    best_lr = lr
                    best_batch = batch

                # Update tqdm progress bar
                progress_bar.update(1)

# Print the best hyper-parameters and accuracy
print("\nBest Accuracy Settings:")
print(f'For Epochs: {best_epochs}, Learning Rate: {best_lr}, and Batch Size: {best_batch}, the Best Accuracy is: {best_accuracy:.4f}')

Hyperparameter Tuning:   2%|▏         | 1/48 [00:17<13:42, 17.51s/iteration]

For Epochs: 500, Learning Rate: 0.001, Batch Size: 16, the corresponding Loss is 0.6778 and Accuracy is: 0.8070


Hyperparameter Tuning:   4%|▍         | 2/48 [00:29<10:50, 14.15s/iteration]

For Epochs: 500, Learning Rate: 0.001, Batch Size: 32, the corresponding Loss is 0.6771 and Accuracy is: 0.7982


Hyperparameter Tuning:   6%|▋         | 3/48 [00:34<07:22,  9.84s/iteration]

For Epochs: 500, Learning Rate: 0.001, Batch Size: 64, the corresponding Loss is 0.6749 and Accuracy is: 0.7456


Hyperparameter Tuning:   8%|▊         | 4/48 [00:52<09:38, 13.15s/iteration]

For Epochs: 500, Learning Rate: 0.005, Batch Size: 16, the corresponding Loss is 0.6598 and Accuracy is: 0.6228


Hyperparameter Tuning:  10%|█         | 5/48 [01:01<08:21, 11.65s/iteration]

For Epochs: 500, Learning Rate: 0.005, Batch Size: 32, the corresponding Loss is 0.7527 and Accuracy is: 0.8596


Hyperparameter Tuning:  12%|█▎        | 6/48 [01:06<06:41,  9.56s/iteration]

For Epochs: 500, Learning Rate: 0.005, Batch Size: 64, the corresponding Loss is 0.6598 and Accuracy is: 0.6228


Hyperparameter Tuning:  15%|█▍        | 7/48 [01:31<09:51, 14.42s/iteration]

For Epochs: 500, Learning Rate: 0.01, Batch Size: 16, the corresponding Loss is 0.7124 and Accuracy is: 0.8246


Hyperparameter Tuning:  17%|█▋        | 8/48 [01:42<08:58, 13.46s/iteration]

For Epochs: 500, Learning Rate: 0.01, Batch Size: 32, the corresponding Loss is 0.6757 and Accuracy is: 0.7018


Hyperparameter Tuning:  19%|█▉        | 9/48 [01:48<07:07, 10.97s/iteration]

For Epochs: 500, Learning Rate: 0.01, Batch Size: 64, the corresponding Loss is 0.6744 and Accuracy is: 0.7456


Hyperparameter Tuning:  21%|██        | 10/48 [02:09<09:03, 14.31s/iteration]

For Epochs: 500, Learning Rate: 0.1, Batch Size: 16, the corresponding Loss is 0.9729 and Accuracy is: 0.8333


Hyperparameter Tuning:  23%|██▎       | 11/48 [02:19<07:55, 12.85s/iteration]

For Epochs: 500, Learning Rate: 0.1, Batch Size: 32, the corresponding Loss is 0.9513 and Accuracy is: 0.7544


Hyperparameter Tuning:  25%|██▌       | 12/48 [02:24<06:17, 10.48s/iteration]

For Epochs: 500, Learning Rate: 0.1, Batch Size: 64, the corresponding Loss is 1.0138 and Accuracy is: 0.8421


Hyperparameter Tuning:  27%|██▋       | 13/48 [02:59<10:32, 18.07s/iteration]

For Epochs: 1000, Learning Rate: 0.001, Batch Size: 16, the corresponding Loss is 0.6732 and Accuracy is: 0.7105


Hyperparameter Tuning:  29%|██▉       | 14/48 [03:19<10:25, 18.41s/iteration]

For Epochs: 1000, Learning Rate: 0.001, Batch Size: 32, the corresponding Loss is 0.6641 and Accuracy is: 0.6579


Hyperparameter Tuning:  31%|███▏      | 15/48 [03:29<08:42, 15.85s/iteration]

For Epochs: 1000, Learning Rate: 0.001, Batch Size: 64, the corresponding Loss is 0.6665 and Accuracy is: 0.6316


Hyperparameter Tuning:  33%|███▎      | 16/48 [04:05<11:42, 21.96s/iteration]

For Epochs: 1000, Learning Rate: 0.005, Batch Size: 16, the corresponding Loss is 0.9854 and Accuracy is: 0.8772


Hyperparameter Tuning:  35%|███▌      | 17/48 [04:23<10:48, 20.91s/iteration]

For Epochs: 1000, Learning Rate: 0.005, Batch Size: 32, the corresponding Loss is 0.7175 and Accuracy is: 0.8684


Hyperparameter Tuning:  38%|███▊      | 18/48 [04:33<08:51, 17.70s/iteration]

For Epochs: 1000, Learning Rate: 0.005, Batch Size: 64, the corresponding Loss is 0.6801 and Accuracy is: 0.7807


Hyperparameter Tuning:  40%|███▉      | 19/48 [05:10<11:15, 23.30s/iteration]

For Epochs: 1000, Learning Rate: 0.01, Batch Size: 16, the corresponding Loss is 0.9894 and Accuracy is: 0.9298


Hyperparameter Tuning:  42%|████▏     | 20/48 [05:28<10:13, 21.90s/iteration]

For Epochs: 1000, Learning Rate: 0.01, Batch Size: 32, the corresponding Loss is 0.6597 and Accuracy is: 0.6228


Hyperparameter Tuning:  44%|████▍     | 21/48 [05:39<08:18, 18.45s/iteration]

For Epochs: 1000, Learning Rate: 0.01, Batch Size: 64, the corresponding Loss is 0.7158 and Accuracy is: 0.8158


Hyperparameter Tuning:  46%|████▌     | 22/48 [06:15<10:15, 23.66s/iteration]

For Epochs: 1000, Learning Rate: 0.1, Batch Size: 16, the corresponding Loss is 0.8039 and Accuracy is: 0.6930


Hyperparameter Tuning:  48%|████▊     | 23/48 [06:34<09:21, 22.44s/iteration]

For Epochs: 1000, Learning Rate: 0.1, Batch Size: 32, the corresponding Loss is 1.2117 and Accuracy is: 0.8684


Hyperparameter Tuning:  50%|█████     | 24/48 [06:45<07:33, 18.91s/iteration]

For Epochs: 1000, Learning Rate: 0.1, Batch Size: 64, the corresponding Loss is 0.8576 and Accuracy is: 0.7982


Hyperparameter Tuning:  52%|█████▏    | 25/48 [07:39<11:18, 29.50s/iteration]

For Epochs: 1500, Learning Rate: 0.001, Batch Size: 16, the corresponding Loss is 0.7662 and Accuracy is: 0.8947


Hyperparameter Tuning:  54%|█████▍    | 26/48 [08:08<10:42, 29.20s/iteration]

For Epochs: 1500, Learning Rate: 0.001, Batch Size: 32, the corresponding Loss is 0.6831 and Accuracy is: 0.7368


Hyperparameter Tuning:  56%|█████▋    | 27/48 [08:22<08:42, 24.87s/iteration]

For Epochs: 1500, Learning Rate: 0.001, Batch Size: 64, the corresponding Loss is 0.6703 and Accuracy is: 0.6930


Hyperparameter Tuning:  58%|█████▊    | 28/48 [09:16<11:09, 33.49s/iteration]

For Epochs: 1500, Learning Rate: 0.005, Batch Size: 16, the corresponding Loss is 1.0452 and Accuracy is: 0.9298


Hyperparameter Tuning:  60%|██████    | 29/48 [09:44<10:05, 31.89s/iteration]

For Epochs: 1500, Learning Rate: 0.005, Batch Size: 32, the corresponding Loss is 0.6598 and Accuracy is: 0.6228


Hyperparameter Tuning:  62%|██████▎   | 30/48 [09:59<08:01, 26.73s/iteration]

For Epochs: 1500, Learning Rate: 0.005, Batch Size: 64, the corresponding Loss is 0.6764 and Accuracy is: 0.7368


Hyperparameter Tuning:  65%|██████▍   | 31/48 [10:52<09:51, 34.79s/iteration]

For Epochs: 1500, Learning Rate: 0.01, Batch Size: 16, the corresponding Loss is 0.9310 and Accuracy is: 0.7982


Hyperparameter Tuning:  67%|██████▋   | 32/48 [11:21<08:45, 32.86s/iteration]

For Epochs: 1500, Learning Rate: 0.01, Batch Size: 32, the corresponding Loss is 0.9938 and Accuracy is: 0.9211


Hyperparameter Tuning:  69%|██████▉   | 33/48 [11:35<06:51, 27.41s/iteration]

For Epochs: 1500, Learning Rate: 0.01, Batch Size: 64, the corresponding Loss is 0.7405 and Accuracy is: 0.8246


Hyperparameter Tuning:  71%|███████   | 34/48 [12:29<08:14, 35.34s/iteration]

For Epochs: 1500, Learning Rate: 0.1, Batch Size: 16, the corresponding Loss is 0.7445 and Accuracy is: 0.7281


Hyperparameter Tuning:  73%|███████▎  | 35/48 [12:58<07:11, 33.22s/iteration]

For Epochs: 1500, Learning Rate: 0.1, Batch Size: 32, the corresponding Loss is 1.0723 and Accuracy is: 0.8333


Hyperparameter Tuning:  75%|███████▌  | 36/48 [13:12<05:32, 27.67s/iteration]

For Epochs: 1500, Learning Rate: 0.1, Batch Size: 64, the corresponding Loss is 0.6762 and Accuracy is: 0.6140


Hyperparameter Tuning:  77%|███████▋  | 37/48 [14:23<07:27, 40.65s/iteration]

For Epochs: 2000, Learning Rate: 0.001, Batch Size: 16, the corresponding Loss is 0.6674 and Accuracy is: 0.6579


Hyperparameter Tuning:  79%|███████▉  | 38/48 [15:01<06:36, 39.69s/iteration]

For Epochs: 2000, Learning Rate: 0.001, Batch Size: 32, the corresponding Loss is 0.7190 and Accuracy is: 0.8947


Hyperparameter Tuning:  81%|████████▏ | 39/48 [15:22<05:06, 34.11s/iteration]

For Epochs: 2000, Learning Rate: 0.001, Batch Size: 64, the corresponding Loss is 0.6652 and Accuracy is: 0.6491


Hyperparameter Tuning:  83%|████████▎ | 40/48 [16:32<05:59, 44.94s/iteration]

For Epochs: 2000, Learning Rate: 0.005, Batch Size: 16, the corresponding Loss is 0.7392 and Accuracy is: 0.8070


Hyperparameter Tuning:  85%|████████▌ | 41/48 [17:10<04:59, 42.83s/iteration]

For Epochs: 2000, Learning Rate: 0.005, Batch Size: 32, the corresponding Loss is 0.7342 and Accuracy is: 0.8772


Hyperparameter Tuning:  88%|████████▊ | 42/48 [17:29<03:34, 35.71s/iteration]

For Epochs: 2000, Learning Rate: 0.005, Batch Size: 64, the corresponding Loss is 0.6610 and Accuracy is: 0.6228


Hyperparameter Tuning:  90%|████████▉ | 43/48 [18:40<03:51, 46.33s/iteration]

For Epochs: 2000, Learning Rate: 0.01, Batch Size: 16, the corresponding Loss is 0.7230 and Accuracy is: 0.7105


Hyperparameter Tuning:  92%|█████████▏| 44/48 [19:18<02:55, 43.76s/iteration]

For Epochs: 2000, Learning Rate: 0.01, Batch Size: 32, the corresponding Loss is 0.7759 and Accuracy is: 0.8596


Hyperparameter Tuning:  94%|█████████▍| 45/48 [19:38<01:50, 36.70s/iteration]

For Epochs: 2000, Learning Rate: 0.01, Batch Size: 64, the corresponding Loss is 0.7814 and Accuracy is: 0.8772


Hyperparameter Tuning:  96%|█████████▌| 46/48 [20:49<01:33, 46.88s/iteration]

For Epochs: 2000, Learning Rate: 0.1, Batch Size: 16, the corresponding Loss is 1.5798 and Accuracy is: 0.9123


Hyperparameter Tuning:  98%|█████████▊| 47/48 [21:27<00:44, 44.18s/iteration]

For Epochs: 2000, Learning Rate: 0.1, Batch Size: 32, the corresponding Loss is 1.5078 and Accuracy is: 0.9211


Hyperparameter Tuning: 100%|██████████| 48/48 [21:46<00:00, 27.22s/iteration]

For Epochs: 2000, Learning Rate: 0.1, Batch Size: 64, the corresponding Loss is 1.1712 and Accuracy is: 0.8684

Best Accuracy Settings:
For Epochs: 1000, Learning Rate: 0.01, and Batch Size: 16, the Best Accuracy is: 0.9298





End of Assignment 2