Find the optimal number of hidden neurons for the first depth and widths of the neural network designed in Question 1 and 2.

#### Plot the mean cross-validation accuracies on the final epoch for at least 8 different combinations of different depth (limit to 1-3 layers) and widths (limit to 64, 128 or 256 neurons) using a scatter plot. Continue using 5-fold cross validation on the training dataset. Select the optimal number of neurons for the hidden layer. State the rationale for your selection. Plot the train and test accuracies against training epochs with the optimal number of neurons using a line plot. [optional + 2 marks] Implement an alternative approach that searches through these combinations that could significantly reduce the computational time but achieve similar search results, without enumeration all the possibilities.



This might take a while to run, approximately 30 - 60 min, so plan your time carefully.

1.Firstly, we import relevant libraries.

In [1]:
import tqdm
import time
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from scipy.io import wavfile as wav

from sklearn import preprocessing
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
from common_utils import set_seed

# setting seed
set_seed()

2.To reduce repeated code, place your

- network (MLP defined in QA1)
- torch datasets (CustomDataset defined in QA1)
- loss function (loss_fn defined in QA1)

in a separate file called **common_utils.py**

Import them into this file. You will not be repenalised for any error in QA1 here as the code in QA1 will not be remarked.

The following code cell will not be marked.

In [2]:
# YOUR CODE HERE
from common_utils import MLP, split_dataset, preprocess_dataset, CustomDataset, loss_fn, preprocess, EarlyStopper
import pandas as pd

# redefine MLP to create depth 

class MLP(MLP):
    def __init__(self, no_features, num_neurons, depth, no_labels):
        super().__init__(no_features, num_neurons, no_labels)
        layers = []
        layers.append(nn.Linear(no_features, num_neurons))
        layers.append(nn.ReLU())
        layers.append(nn.Dropout(0.2))

        for _ in range(depth - 1):
            layers.append(nn.Linear(num_neurons, num_neurons))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(0.2))

        layers.append(nn.Linear(num_neurons, no_labels))
        layers.append(nn.Sigmoid())

        self.mlp_stack = nn.Sequential(*layers)

def train_loop(dataloader, model, loss_fn, optimizer):
    model.train()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    train_loss, correct = 0, 0
    for batch, (X, y) in enumerate(dataloader):
        # FP
        pred = model(X)
        loss = loss_fn(pred, y)
        train_loss += loss.item()
        correct += ((pred > 0.5).type(torch.float) == y).type(torch.float).sum().item()

        # BP
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
    
    train_loss /= num_batches
    correct /= size
    print(f"Train Error: \n Accuracy: {(correct*100):>0.1f}%, Avg loss: {train_loss:>8f} \n")
    
    return train_loss, correct

def test_loop(dataloader, model, loss_fn):
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += ((pred > 0.5).type(torch.float) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
    
    return test_loss, correct

df = pd.read_csv('simplified.csv')
df['label'] = df['filename'].str.split('_').str[-2]

X_train, y_train, X_test, y_test = preprocess(df)

train_data = CustomDataset(X_train, y_train)
test_data = CustomDataset(X_test, y_test)

3.Perform hyperparameter tuning for the different neurons with 5-fold cross validation.

In [3]:
def train(model, X_train_scaled, y_train2, X_val_scaled, y_val2, batch_size):

    # YOUR CODE HERE
    train_data = CustomDataset(X_train_scaled, y_train2)
    val_data = CustomDataset(X_val_scaled, y_val2)

    train_dataloader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
    val_dataloader = DataLoader(val_data, batch_size=batch_size, shuffle=True)

    loss_fn = nn.BCELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    early_stopper = EarlyStopper(patience=3, min_delta=0)

    train_accuracies = []
    train_losses = []
    test_accuracies = []
    test_losses = []

    times = []

    for t in range(100):
        print(f"Epoch {t+1}\n-------------------------------")
        
        # Train
        start = time.time()
        train_loss, train_acc = train_loop(train_dataloader, model, loss_fn, optimizer)

        # Log train
        train_accuracies.append(train_acc), train_losses.append(train_loss)

        # Validation
        test_loss, test_acc = test_loop(val_dataloader, model, loss_fn)
        end = time.time()
        
        # Log test
        times.append(end-start)
        test_accuracies.append(test_acc), test_losses.append(test_loss)

        if early_stopper.early_stop(test_loss):
            print("Early stopping")
            break
    return train_accuracies, train_losses, test_accuracies, test_losses, times

In [None]:
def find_optimal_hyperparameter(X_train, y_train, parameters, mode, batch_size):
    # YOUR CODE HERE
    kf = KFold(n_splits=5, shuffle=True, random_state=0)
    
    cross_validation_accuracies = []
    cross_validation_times = []

    for parameter in parameters:
        print(f"Parameter {parameter}")
        accuracies = []
        times = []
        depth = parameter[0]
        num_neuron = parameter[1]
        for train_index, val_index in kf.split(X_train):
            X_train2, X_val2 = X_train[train_index], X_train[val_index]
            y_train2, y_val2 = y_train[train_index], y_train[val_index]

            model = MLP(77,num_neuron,depth,1)

            _, _, test_accuracies, _, time = train(model, X_train2, y_train2, X_val2, y_val2, batch_size)
            
            # Save the accuracy for each fold at the last epoch
            accuracies.append(test_accuracies[-1])
            
            # Save the time taken to train for each fold at the last epoch
            times.append(time[-1])
            
        # Mean Accuracy for the Number of Neurons (Mean of Accuracy at Last Epoch for each Fold)
        cross_validation_accuracies.append(np.mean(accuracies))

        # Mean Time taken for the Number of Neurons (Mean of Time at Last Epoch for each Fold)
        cross_validation_times.append(np.mean(times))

    return cross_validation_accuracies, cross_validation_times

'''
optimal_bs = 0. Fill your optimal batch size in the following code.
'''
# YOUR CODE HERE
optimal_bs = 512
num_neurons = [64,128,256]
depths = [1,2,3]
parameters = [(d, n) for d in depths for n in num_neurons]
cross_validation_accuracies, cross_validation_times = find_optimal_hyperparameter(X_train, y_train, parameters, 'num_neurons', optimal_bs)

In [None]:
print(f"The number of considered combination is {len(parameters)}")

4. Plot the mean cross-validation accuracies on the final epoch for at least 8 different combinations of different depth (limit to 1-3 layers) and widths (limit to 64, 128 or 256 neurons) using a scatter plot. 

In [None]:
# YOUR CODE HERE
plt.plot(num_neurons, cross_validation_accuracies, marker='x', linestyle='None')
plt.xlabel('Number of neurons')
plt.ylabel('Accuracy')
plt.title('Accuracy vs Number of neurons')
plt.show()

In [None]:
# YOUR CODE HERE
plt.plot(depths, cross_validation_accuracies, marker='x', linestyle='None')
plt.xlabel('Number of depth')
plt.ylabel('Accuracy')
plt.title('Accuracy vs Depth')
plt.show()

5. Select the optimal combination for the depth and width. State the rationale for your selection.

In [3]:
optimal_combination = [3,256]
reason = "It seems that for this combination, the model performs the best. Depth 1 is insufficient to capture the complexity of the data, and the model is underfitting. Depth 2 and 3 perform similarly, but depth 3 has a higher accuracy. The number of neurons 256 is the best performing number of neurons, as it has the highest accuracy."
# YOUR CODE HERE

6.Plot the train and test accuracies against training epochs with the optimal number of neurons using a line plot.


In [5]:
# YOUR CODE HERE

df = pd.read_csv('simplified.csv')
df['label'] = df['filename'].str.split('_').str[-2]

X_train, y_train, X_test, y_test = preprocess(df)

input_features = X_train.shape[1]
no_labels = 1
final_model = MLP(input_features,optimal_combination[1],optimal_combination[0],no_labels)
optimal_bs = 256

train_accuracies, train_losses, test_accuracies, test_losses, times = train(final_model, X_train, y_train, X_test, y_test, optimal_bs)

# save the model
torch.save(final_model.state_dict(), 'model.pt')

7.As you've astutely observed, we're facing a significant challenge in enumerating all possible combinations of widths and depths and searching over them. Given the circumstances, could you explore and implement a more efficient method for searching through these combinations that could significantly reduce the computational time but achieve similar search results?

In [None]:
# YOUR CODE HERE
answer = """
One naive way is to do grid serach, but that would be greatly inefficient. The
better alternative is to use random search, where we randomly sample the hyperparameters.
This often performs better than grid search.

Searching on the internet, we find that there is indeed a technique called Bayesian Optimization
which is used to optimize hyperparameters. It is a sequential model-based optimization technique
that uses the past evaluations to determine the next hyperparameters to evaluate. This is more efficient
than random search and grid search.
"""