# Find the optimal architecture for the Neural Network

This piece of code was written by me to find the optimal architecture for the third model, a neural network. As the Neural Network training takes a long time due to complexity I have chosen the Bayesian Optimization (BO) method because it helps to find the model close to the optimal with the minium number of attempts in the space of 188k possible variants of NN.
Through some experiments outside the scope of this document, I have found that for training the network on a test dataset of 30 features and 6000 samples, I would prefer a fully connected model with no more than 4 hidden layers. An additional series of simple experiments showed that the most preferred activation functions for the network layers are 
$$ ReLU - Sigmoid - Sigmoid - Sigmoid $$ 

The architecture in this sense will consist of finding the optimal number of neurons for each hidden layer.
The criterion for the optimal architecture is chosen relatively voluntaristically - the maximum accuracy of the model on a test samples after 100,000 training epochs with a step of 0.002 on a training data set. The accuracy of the model is measured by L2 norm of future 20 days Moving Average prediction. I also store the accuracy of predicting the market direction on the test data for futher analysis. I assume that if the model shows a good training result under these conditions, then it will be close to optimal and suitable for the final modeling in production.

To limit the search space, I introduce a couple more restrictions:
1) The number of network parameters should not exceed the half number of samples, otherwise the network will simply remember them and will be unsuitable due to the large variance on the test data
2) My experience of building well-functioning networks shows that the number of neurons in the first layers is usually larger than the subsequent ones.

These two conservative conditions will allow me to somewhat narrow the parameter space and simplify the difficult task. Bayesian Optimization is implemented via Gaussian Process Regressor with Radial Base Function as a kernel. I used extensively GPU to speed up the calculations in one single batch. It took 8 hours of powerful machine to train and compare ~250 models. Here is how:

In [1]:
# Load required packages
import numpy as np
import pandas as pd
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.nn.functional as F
import time
import torch.optim as optim
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score

### Define a few important global constants

*As a seasoned C programmer I don't like any global variables, but I will deviate from this good rule for the sake of task resolution speed*

In [2]:
Verbose = True # True: print more run-time info, False: less 

In [3]:
MaxNNParameters = 3500 # This is the maximum allowed network parameters. It should be less than the number of test samples

In [4]:
NumFeatures = 30 # The number of features in input class

In [5]:
LearningRate = 0.002 # The standard learning rate for NN test
WeightDecay = 1e-8 # The Wait Decay for Adam algorythm to improve convergence
NumEpochs = 100000 # The number of epochs to train NN for evaluation

In [6]:
# standard precision
np.set_printoptions(precision=8)

### Load a dataset for NN training

In [7]:
ds = pd.read_excel('dataset.xlsx', parse_dates=[0], index_col=0)

In [9]:
# The last two columns in data set are the real future data to be predicted
X_ = ds.iloc[:,:-2].values
# The last one is a column to be predicted, note this is regression rather than classification problem
y_ = ds.iloc[:,-2].values * 100 # Y Scaler!

In [10]:
#Split the data into training and testing sets, I use 20% test sample
X_train, X_test, y_train, y_test = train_test_split(X_, y_, test_size=0.2)

In [11]:
# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [12]:
# With GPU Nvidia 4090 it works much faster:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
#device = torch.device('cpu')
if Verbose: print(device)

cuda:0


In [13]:
# Convert numpy arrays to PyTorch tensors
X_train = torch.tensor(X_train_scaled, dtype=torch.float32).to(device)
y_train = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1).to(device)
X_test = torch.tensor(X_test_scaled, dtype=torch.float32).to(device)
y_test = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1).to(device)

### Now build the search space

In [14]:
space = []
for x1 in range(1,130):
    for x2 in range(1,100):
        for x3 in range(1,100):
            for x4 in range(1,100):
                if 30*x1 + x1 + x1*x2 + x2 + x2*x3 + x3 + x3*x4 + x4 > MaxNNParameters : continue
                if x1 < x2 or x2 < x3 or x3 < x4 : continue
                space.append([x1, x2, x3, x4])

In [15]:
space = np.array(space)
len(space)

188547

### Define a calss for 4 layered NN

In [16]:
class Net(nn.Module):
    def __init__(self, x1=50, x2=20, x3=10, x4=2, name=None):
        super(Net, self).__init__()
        if name:
            self.name = name
        self.x1 = x1
        self.x2 = x2
        self.x3 = x3
        self.x4 = x4
        # Define network layers here:
        self.fc1 = nn.Linear(NumFeatures, x1)
        self.fc2 = nn.Linear(x1, x2)
        self.fc3 = nn.Linear(x2, x3)
        self.fc4 = nn.Linear(x3, x4)
        self.fc5 = nn.Linear(x4, 1)    # This is regression problem, so 1 output is enough
        
        # compute the total number of parameters
        total_params = sum(p.numel() for p in self.parameters() if p.requires_grad)
        if Verbose: print(self.name + ': total params:', total_params)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        x = F.sigmoid(self.fc3(x))
        x = F.sigmoid(self.fc4(x))
        x = self.fc5(x)
        return x

In [17]:
# Define the program to train NN
def train_model(model, X, y, criterion, optimizer, num_epochs=25, verbose=False):
    start = time.time()
    best_loss = float('inf')
    best_model = model
    
    model.train()
    for epoch in range(num_epochs):

        outputs = model(X)
        loss = criterion(outputs, y)
        optimizer.zero_grad()
        loss.backward()

        if loss.item() < best_loss:
            best_loss = loss.item()
            best_model = model

        optimizer.step()

        if verbose and (epoch+1) % int(num_epochs / 5) == 0:
            print(f'Epoch [{epoch+1:6d}/{num_epochs}], Loss: {loss.item():.8f}')

    model = best_model
    model.eval()
    end = time.time()
    if verbose : print(f'Training accuracy:  {best_loss:.8f}')
    # if verbose: print(f'training accuracy: {best_loss:.8f}')
    #if verbose: print('training time sec', end - start)
    return best_loss

In [18]:
# Define the accuracy function for NN
# It should be able to hangle GPU
def get_accuracy(net, X, y):
    net.eval()
    with torch.no_grad():
        y_pred = net(X).cpu()
    # Calculate accuracy of direction
    accuracy = accuracy_score(np.sign(y.cpu()), np.sign(y_pred))*100
    if Verbose: print(f'Direction accuracy: {accuracy:.2f}%')
    
#    Calculate confusion matrix
#    conf_matrix = confusion_matrix(np.sign(y.cpu()), np.sign(y_pred))
#    if Verbose: print('Confusion Matrix:')
#    if Verbose: print(conf_matrix)
    return accuracy

In [19]:
def evaluate(par, X, y, device, verbose=False):
    name = 'NN ' + str(par[0]) + '-' + str(par[1]) + '-' + str(par[2]) + '-' + str(par[3])
    model = Net(name=name, x1=par[0], x2=par[1], x3= par[2], x4=par[3]).to(device)
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=LearningRate, weight_decay=WeightDecay)
    error_l2 = train_model(model, X, y, criterion, optimizer, num_epochs=NumEpochs, verbose=verbose)
    # accuracy = get_accuracy(model, X_test, y_test)
    accuracy = criterion(model(X_test), y_test).item()
    if verbose : print(f'Testing  accuracy:  {accuracy:.8f}')
    return accuracy, error_l2, model.cpu()

In [20]:
# Define evaluation lists
accuracy_rates = []
dir_accuracy = []
error_rates = []
models = []

### Initiate first samples

In [21]:
init_idx = np.random.choice(len(space), size=2, replace=False)

In [22]:
hyperparams = np.concatenate((space[init_idx], np.array([[50, 20, 10, 2]])), axis=0)

In [23]:
hyperparams

array([[26, 25, 17, 17],
       [28, 27, 20, 10],
       [50, 20, 10,  2]])

In [24]:
# Run preliminary circle to collect few points
for par in hyperparams:
    accuracy, error, model = evaluate(par, X_train, y_train, device, verbose=True)
    accuracy_rates.append(accuracy)
    error_rates.append(error)
    models.append(model)
    dir_accuracy.append(get_accuracy(model.to(device), X_test, y_test))

NN 26-25-17-17: total params: 2247
Epoch [ 20000/100000], Loss: 0.18826523
Epoch [ 40000/100000], Loss: 0.13508458
Epoch [ 60000/100000], Loss: 0.11558980
Epoch [ 80000/100000], Loss: 0.10436151
Epoch [100000/100000], Loss: 0.09589542
Training accuracy:  0.09580736
Testing  accuracy:  0.87984765
Direction accuracy: 93.14%
NN 28-27-20-10: total params: 2432
Epoch [ 20000/100000], Loss: 0.20629238
Epoch [ 40000/100000], Loss: 0.14392425
Epoch [ 60000/100000], Loss: 0.11703361
Epoch [ 80000/100000], Loss: 0.09935556
Epoch [100000/100000], Loss: 0.09302331
Training accuracy:  0.09181925
Testing  accuracy:  1.07597053
Direction accuracy: 92.47%
NN 50-20-10-2: total params: 2805
Epoch [ 20000/100000], Loss: 0.12951216
Epoch [ 40000/100000], Loss: 0.10056552
Epoch [ 60000/100000], Loss: 0.09416066
Epoch [ 80000/100000], Loss: 0.08965769
Epoch [100000/100000], Loss: 0.08563521
Training accuracy:  0.08544391
Testing  accuracy:  2.08913016
Direction accuracy: 91.12%


## Use Gaussian Process Regressor with Radial Based Function

In [25]:
# The code to recover calculations from excel file
#df = pd.read_excel('hyperparameters.xlsx')
#hyperparams = df[['x1','x2','x3','x4']].values
#accuracy_rates = list(df['TestAcc'].values)
#error_rates = list(df['TrainEcc'].values)
#dir_accuracy = list(df['DirAcc'].values)

In [26]:
# Apply workaround with custom optmizer to get rid of max_iter warning
from scipy.optimize import fmin_l_bfgs_b

# Custom optimizer with maxiter set
def custom_optimizer(obj_func, initial_theta, bounds):
    optimized_result = fmin_l_bfgs_b(
        obj_func,
        initial_theta,
        bounds=bounds,
        maxiter=50000  # Increase maxiter as needed
    )
    # fmin_l_bfgs_b returns (optimized_params, min_value, info_dict)
    return optimized_result[0], optimized_result[1]

kernel = RBF(length_scale = 1., length_scale_bounds=(1e-10 , 1e10))

# Create the GaussianProcessRegressor with the custom optimizer
bo_model = GaussianProcessRegressor(
    kernel=kernel,
    alpha=1e-8,
    n_restarts_optimizer=1,
    normalize_y=True,
    optimizer=custom_optimizer
)

In [27]:
# Do the fixed number of BO iterations
for i in range(1, 250):
    
    bo_model.fit(hyperparams, accuracy_rates)
    # Predict for the grid
    post_mean, post_std = bo_model.predict(space, return_std=True)
     
    # Define acquiring function
    a_fun = post_mean - 1 * post_std 
    # Find the index and minimum value of acquiring function
    i = np.argmin(a_fun)
    print(f'The min of acquiring = {a_fun[i]:.8f}, post_mean = {post_mean[i]:.8f}, post_std = {post_std[i]:.8f}, index {i}')
    #print('post_mean =', post_mean[i],' post_std =', post_std[i])
    #print('the index ', i)
    accuracy, error, model = evaluate(space[i], X_train, y_train, device, verbose=True)
    hyperparams = np.concatenate((hyperparams, space[i].reshape(1,4)), axis=0)
    accuracy_rates.append(accuracy)
    dir_accuracy.append(get_accuracy(model.to(device), X_test, y_test))
    error_rates.append(error)
    models.append(model)

The min of acquiring = 0.64285871, post_mean = 1.06417563, post_std = 0.42131691, index 23226
NN 26-25-17-16: total params: 2228
Epoch [ 20000/100000], Loss: 0.21014476
Epoch [ 40000/100000], Loss: 0.15238474
Epoch [ 60000/100000], Loss: 0.12571113
Epoch [ 80000/100000], Loss: 0.11170570
Epoch [100000/100000], Loss: 0.10280173
Training accuracy:  0.10278814
Testing  accuracy:  0.74630129
Direction accuracy: 93.63%
The min of acquiring = 0.36124104, post_mean = 0.70609292, post_std = 0.34485187, index 19967
NN 25-24-15-13: total params: 1996
Epoch [ 20000/100000], Loss: 0.19587722
Epoch [ 40000/100000], Loss: 0.15128170
Epoch [ 60000/100000], Loss: 0.13372713
Epoch [ 80000/100000], Loss: 0.12316079
Epoch [100000/100000], Loss: 0.11355854
Training accuracy:  0.11351389
Testing  accuracy:  0.77089673
Direction accuracy: 93.02%
The min of acquiring = 0.39421265, post_mean = 0.73231638, post_std = 0.33810374, index 17434
NN 24-24-19-14: total params: 2114
Epoch [ 20000/100000], Loss: 0.3383

Epoch [100000/100000], Loss: 0.09012498
Training accuracy:  0.09012498
Testing  accuracy:  1.06896448
Direction accuracy: 93.14%
The min of acquiring = 0.62462397, post_mean = 0.62465507, post_std = 0.00003110, index 25960
NN 27-23-19-15: total params: 2253
Epoch [ 20000/100000], Loss: 0.18618529
Epoch [ 40000/100000], Loss: 0.12158580
Epoch [ 60000/100000], Loss: 0.09879699
Epoch [ 80000/100000], Loss: 0.09004293
Epoch [100000/100000], Loss: 0.08498426
Training accuracy:  0.08435856
Testing  accuracy:  0.75639546
Direction accuracy: 94.37%
The min of acquiring = 0.58126661, post_mean = 0.87803479, post_std = 0.29676819, index 17417
NN 24-24-18-15: total params: 2095
Epoch [ 20000/100000], Loss: 0.27621436
Epoch [ 40000/100000], Loss: 0.19339354
Epoch [ 60000/100000], Loss: 0.16564730
Epoch [ 80000/100000], Loss: 0.15012564
Epoch [100000/100000], Loss: 0.12777597
Training accuracy:  0.12763107
Testing  accuracy:  0.66390544
Direction accuracy: 93.51%
The min of acquiring = 0.46478676, 

Epoch [ 20000/100000], Loss: 0.34411129
Epoch [ 40000/100000], Loss: 0.25449860
Epoch [ 60000/100000], Loss: 0.20314196
Epoch [ 80000/100000], Loss: 0.16928619
Epoch [100000/100000], Loss: 0.14541654
Training accuracy:  0.14506358
Testing  accuracy:  0.76885456
Direction accuracy: 93.57%
The min of acquiring = -0.03334550, post_mean = 0.88993054, post_std = 0.92327604, index 14893
NN 23-23-21-10: total params: 2000
Epoch [ 20000/100000], Loss: 0.20308349
Epoch [ 40000/100000], Loss: 0.13695635
Epoch [ 60000/100000], Loss: 0.11828724
Epoch [ 80000/100000], Loss: 0.10680693
Epoch [100000/100000], Loss: 0.10189636
Training accuracy:  0.10185643
Testing  accuracy:  1.05449474
Direction accuracy: 92.28%
The min of acquiring = -0.00873540, post_mean = 0.87546021, post_std = 0.88419561, index 20414
NN 25-25-23-12: total params: 2324
Epoch [ 20000/100000], Loss: 0.22448795
Epoch [ 40000/100000], Loss: 0.16355373
Epoch [ 60000/100000], Loss: 0.13605358
Epoch [ 80000/100000], Loss: 0.11832711
Ep

The min of acquiring = 0.04081824, post_mean = 0.80652907, post_std = 0.76571083, index 40290
NN 30-29-23-19: total params: 2995
Epoch [ 20000/100000], Loss: 0.14815998
Epoch [ 40000/100000], Loss: 0.10695746
Epoch [ 60000/100000], Loss: 0.08754351
Epoch [ 80000/100000], Loss: 0.07342092
Epoch [100000/100000], Loss: 0.06631831
Training accuracy:  0.06628232
Testing  accuracy:  0.88697112
Direction accuracy: 94.18%
The min of acquiring = 0.04612628, post_mean = 0.72534124, post_std = 0.67921495, index 46059
NN 31-31-21-18: total params: 3040
Epoch [ 20000/100000], Loss: 0.14375438
Epoch [ 40000/100000], Loss: 0.10288635
Epoch [ 60000/100000], Loss: 0.08228409
Epoch [ 80000/100000], Loss: 0.06992324
Epoch [100000/100000], Loss: 0.06131219
Training accuracy:  0.06129716
Testing  accuracy:  0.92361987
Direction accuracy: 93.75%
The min of acquiring = -0.07174241, post_mean = 0.61624979, post_std = 0.68799220, index 35751
NN 29-29-21-17: total params: 2791
Epoch [ 20000/100000], Loss: 0.226

Epoch [ 80000/100000], Loss: 0.12348095
Epoch [100000/100000], Loss: 0.11072013
Training accuracy:  0.11055759
Testing  accuracy:  0.82037944
Direction accuracy: 93.45%
The min of acquiring = 0.06229228, post_mean = 0.68683840, post_std = 0.62454612, index 40676
NN 30-30-21-16: total params: 2880
Epoch [ 20000/100000], Loss: 0.25961244
Epoch [ 40000/100000], Loss: 0.19713272
Epoch [ 60000/100000], Loss: 0.15995000
Epoch [ 80000/100000], Loss: 0.14068633
Epoch [100000/100000], Loss: 0.12424610
Training accuracy:  0.12366804
Testing  accuracy:  0.87034428
Direction accuracy: 93.02%
The min of acquiring = 0.05041467, post_mean = 0.68268868, post_std = 0.63227402, index 23688
NN 26-26-24-13: total params: 2495
Epoch [ 20000/100000], Loss: 0.21264262
Epoch [ 40000/100000], Loss: 0.13957143
Epoch [ 60000/100000], Loss: 0.10755284
Epoch [ 80000/100000], Loss: 0.09000540
Epoch [100000/100000], Loss: 0.08696630
Training accuracy:  0.07768955
Testing  accuracy:  0.77069581
Direction accuracy: 93

Epoch [ 20000/100000], Loss: 0.14118427
Epoch [ 40000/100000], Loss: 0.09328926
Epoch [ 60000/100000], Loss: 0.07711130
Epoch [ 80000/100000], Loss: 0.06843081
Epoch [100000/100000], Loss: 0.06293277
Training accuracy:  0.06288820
Testing  accuracy:  0.84731716
Direction accuracy: 93.20%
The min of acquiring = 0.10545643, post_mean = 0.55662657, post_std = 0.45117014, index 40266
NN 30-29-22-17: total params: 2898
Epoch [ 20000/100000], Loss: 0.15294188
Epoch [ 40000/100000], Loss: 0.10667431
Epoch [ 60000/100000], Loss: 0.08766961
Epoch [ 80000/100000], Loss: 0.08002463
Epoch [100000/100000], Loss: 0.07255930
Training accuracy:  0.07254028
Testing  accuracy:  1.18178654
Direction accuracy: 93.14%
The min of acquiring = -0.10584097, post_mean = 0.39548881, post_std = 0.50132978, index 39527
NN 30-27-24-17: total params: 2882
Epoch [ 20000/100000], Loss: 0.21398808
Epoch [ 40000/100000], Loss: 0.15867853
Epoch [ 60000/100000], Loss: 0.12926134
Epoch [ 80000/100000], Loss: 0.11433223
Epo

The min of acquiring = 0.17033871, post_mean = 0.69656880, post_std = 0.52623009, index 35370
NN 29-28-22-21: total params: 2882
Epoch [ 20000/100000], Loss: 0.17634745
Epoch [ 40000/100000], Loss: 0.12569194
Epoch [ 60000/100000], Loss: 0.09873657
Epoch [ 80000/100000], Loss: 0.08429199
Epoch [100000/100000], Loss: 0.07427087
Training accuracy:  0.07425689
Testing  accuracy:  0.60112172
Direction accuracy: 95.28%
The min of acquiring = 0.17839343, post_mean = 0.72760474, post_std = 0.54921131, index 44762
NN 31-28-20-18: total params: 2834
Epoch [ 20000/100000], Loss: 0.18018389
Epoch [ 40000/100000], Loss: 0.12244226
Epoch [ 60000/100000], Loss: 0.09504306
Epoch [ 80000/100000], Loss: 0.09387525
Epoch [100000/100000], Loss: 0.07216077
Training accuracy:  0.07190157
Testing  accuracy:  0.57893211
Direction accuracy: 93.75%
The min of acquiring = 0.17468193, post_mean = 0.72335416, post_std = 0.54867224, index 45169
NN 31-29-20-19: total params: 2908
Epoch [ 20000/100000], Loss: 0.2188

Epoch [100000/100000], Loss: 0.11407211
Training accuracy:  0.11279431
Testing  accuracy:  0.60497838
Direction accuracy: 94.06%
The min of acquiring = 0.23315733, post_mean = 0.69793904, post_std = 0.46478172, index 39207
NN 30-26-25-24: total params: 3060
Epoch [ 20000/100000], Loss: 0.22008008
Epoch [ 40000/100000], Loss: 0.16568092
Epoch [ 60000/100000], Loss: 0.13728671
Epoch [ 80000/100000], Loss: 0.11257025
Epoch [100000/100000], Loss: 0.09772085
Training accuracy:  0.09755754
Testing  accuracy:  0.68966347
Direction accuracy: 94.06%
The min of acquiring = 0.23530941, post_mean = 0.76003520, post_std = 0.52472579, index 20371
NN 25-25-21-12: total params: 2248
Epoch [ 20000/100000], Loss: 0.16088808
Epoch [ 40000/100000], Loss: 0.12298328
Epoch [ 60000/100000], Loss: 0.11174440
Epoch [ 80000/100000], Loss: 0.09798068
Epoch [100000/100000], Loss: 0.09211494
Training accuracy:  0.09203050
Testing  accuracy:  1.24070501
Direction accuracy: 91.92%
The min of acquiring = 0.20531588, 

Epoch [ 20000/100000], Loss: 0.17793778
Epoch [ 40000/100000], Loss: 0.11834237
Epoch [ 60000/100000], Loss: 0.09770463
Epoch [ 80000/100000], Loss: 0.08267520
Epoch [100000/100000], Loss: 0.07458474
Training accuracy:  0.07172592
Testing  accuracy:  0.89431113
Direction accuracy: 94.61%
The min of acquiring = 0.26656365, post_mean = 0.72945428, post_std = 0.46289063, index 39801
NN 30-28-19-18: total params: 2728
Epoch [ 20000/100000], Loss: 0.17985199
Epoch [ 40000/100000], Loss: 0.14224212
Epoch [ 60000/100000], Loss: 0.10766196
Epoch [ 80000/100000], Loss: 0.09213943
Epoch [100000/100000], Loss: 0.08442388
Training accuracy:  0.08438874
Testing  accuracy:  0.79448956
Direction accuracy: 92.71%
The min of acquiring = 0.26939748, post_mean = 0.72654515, post_std = 0.45714767, index 35344
NN 29-28-21-16: total params: 2717
Epoch [ 20000/100000], Loss: 0.20156160
Epoch [ 40000/100000], Loss: 0.15500827
Epoch [ 60000/100000], Loss: 0.13447231
Epoch [ 80000/100000], Loss: 0.11342297
Epoc

The min of acquiring = 0.28144241, post_mean = 0.70409802, post_std = 0.42265561, index 44784
NN 31-28-21-20: total params: 2927
Epoch [ 20000/100000], Loss: 0.17915255
Epoch [ 40000/100000], Loss: 0.12552762
Epoch [ 60000/100000], Loss: 0.10213874
Epoch [ 80000/100000], Loss: 0.08819110
Epoch [100000/100000], Loss: 0.07867582
Training accuracy:  0.07854411
Testing  accuracy:  0.73559260
Direction accuracy: 94.43%
The min of acquiring = 0.28471262, post_mean = 0.45054512, post_std = 0.16583250, index 34992
NN 29-27-22-21: total params: 2830
Epoch [ 20000/100000], Loss: 0.11570671
Epoch [ 40000/100000], Loss: 0.08357563
Epoch [ 60000/100000], Loss: 0.07022254
Epoch [ 80000/100000], Loss: 0.06228460
Epoch [100000/100000], Loss: 0.05695362
Training accuracy:  0.05619498
Testing  accuracy:  1.01140177
Direction accuracy: 94.79%
The min of acquiring = 0.15295181, post_mean = 0.56315071, post_std = 0.41019890, index 31289
NN 28-28-21-21: total params: 2773
Epoch [ 20000/100000], Loss: 0.1812

Epoch [100000/100000], Loss: 0.07291102
Training accuracy:  0.07245655
Testing  accuracy:  0.56879234
Direction accuracy: 94.73%
The min of acquiring = 0.26412076, post_mean = 0.69874107, post_std = 0.43462031, index 35755
NN 29-29-21-21: total params: 2883
Epoch [ 20000/100000], Loss: 0.22252718
Epoch [ 40000/100000], Loss: 0.16119528
Epoch [ 60000/100000], Loss: 0.13106546
Epoch [ 80000/100000], Loss: 0.11085549
Epoch [100000/100000], Loss: 0.09544574
Training accuracy:  0.09111571
Testing  accuracy:  0.62893504
Direction accuracy: 93.39%
The min of acquiring = 0.28071552, post_mean = 0.68094941, post_std = 0.40023389, index 44405
NN 31-27-21-19: total params: 2851
Epoch [ 20000/100000], Loss: 0.21142927
Epoch [ 40000/100000], Loss: 0.14616616
Epoch [ 60000/100000], Loss: 0.11348261
Epoch [ 80000/100000], Loss: 0.09638990
Epoch [100000/100000], Loss: 0.09003468
Training accuracy:  0.08548211
Testing  accuracy:  0.61732697
Direction accuracy: 93.45%
The min of acquiring = 0.27446524, 

Epoch [ 20000/100000], Loss: 0.27274320
Epoch [ 40000/100000], Loss: 0.18147811
Epoch [ 60000/100000], Loss: 0.14484052
Epoch [ 80000/100000], Loss: 0.12077396
Epoch [100000/100000], Loss: 0.10329588
Training accuracy:  0.10318696
Testing  accuracy:  0.61178011
Direction accuracy: 94.12%
The min of acquiring = 0.30152617, post_mean = 0.73738985, post_std = 0.43586368, index 35288
NN 29-28-18-17: total params: 2602
Epoch [ 20000/100000], Loss: 0.18140060
Epoch [ 40000/100000], Loss: 0.11496015
Epoch [ 60000/100000], Loss: 0.09161600
Epoch [ 80000/100000], Loss: 0.07313238
Epoch [100000/100000], Loss: 0.06478114
Training accuracy:  0.06467756
Testing  accuracy:  0.69416153
Direction accuracy: 94.73%
The min of acquiring = 0.29331424, post_mean = 0.60229920, post_std = 0.30898496, index 35799
NN 29-29-23-22: total params: 3010
Epoch [ 20000/100000], Loss: 0.19062513
Epoch [ 40000/100000], Loss: 0.13893990
Epoch [ 60000/100000], Loss: 0.11163542
Epoch [ 80000/100000], Loss: 0.09096125
Epoc

The min of acquiring = 0.33028170, post_mean = 0.77289031, post_std = 0.44260861, index 20124
NN 25-24-23-22: total params: 2525
Epoch [ 20000/100000], Loss: 0.29337591
Epoch [ 40000/100000], Loss: 0.20208165
Epoch [ 60000/100000], Loss: 0.16257933
Epoch [ 80000/100000], Loss: 0.14067017
Epoch [100000/100000], Loss: 0.12612669
Training accuracy:  0.12607324
Testing  accuracy:  1.08470416
Direction accuracy: 93.39%
The min of acquiring = 0.33462970, post_mean = 0.76009732, post_std = 0.42546762, index 39233
NN 30-26-26-25: total params: 3139
Epoch [ 20000/100000], Loss: 0.17529130
Epoch [ 40000/100000], Loss: 0.11353855
Epoch [ 60000/100000], Loss: 0.09019426
Epoch [ 80000/100000], Loss: 0.07484923
Epoch [100000/100000], Loss: 0.06555535
Training accuracy:  0.06553199
Testing  accuracy:  0.71181637
Direction accuracy: 94.18%
The min of acquiring = 0.32832641, post_mean = 0.61477638, post_std = 0.28644997, index 20375
NN 25-25-21-16: total params: 2340
Epoch [ 20000/100000], Loss: 0.2120

In [28]:
df = pd.DataFrame(hyperparams)
df.columns=['x1','x2','x3','x4']
df['TrainAcc'] = error_rates
df['TestAcc'] = accuracy_rates
df['DirAcc'] = dir_accuracy
# Sort the DataFrame in reverse order by the "Accuracy" column
df = df.sort_values(by='TestAcc', ascending=True)

In [29]:
df.head(10)

Unnamed: 0,x1,x2,x3,x4,TrainAcc,TestAcc,DirAcc
60,30,30,22,17,0.095148,0.488462,94.182486
41,24,24,21,12,0.111062,0.4895,94.672382
100,29,27,22,20,0.07941,0.494213,94.611145
149,26,26,22,21,0.112939,0.503866,94.427434
173,30,28,22,20,0.111859,0.505518,94.30496
211,29,29,23,23,0.105824,0.508156,94.182486
115,29,27,21,21,0.068041,0.52322,94.611145
108,29,27,23,18,0.1031,0.523859,94.366197
120,31,31,22,17,0.088129,0.539657,93.570116
113,30,30,23,17,0.062956,0.556739,94.549908


In [30]:
df.describe()

Unnamed: 0,x1,x2,x3,x4,TrainAcc,TestAcc,DirAcc
count,252.0,252.0,252.0,252.0,252.0,252.0,252.0
mean,27.900794,25.928571,20.452381,17.551587,0.123772,0.834504,93.534881
std,3.141117,2.838771,3.002276,3.708679,0.377154,0.458629,1.21313
min,1.0,1.0,1.0,1.0,0.056081,0.488462,77.770974
25%,26.0,24.0,19.0,15.0,0.082123,0.666219,93.141457
50%,28.0,26.0,21.0,17.0,0.097525,0.769775,93.631353
75%,30.0,28.0,22.0,21.0,0.114861,0.903833,94.075321
max,50.0,31.0,26.0,25.0,6.075071,7.290603,95.345989


In [31]:
df.to_excel('hyperparameters.xlsx')

In [33]:
# Check the evaluation of the best network
accuracy, error, model = evaluate([30,30,22,17], X_train, y_train, device, verbose=True)

NN 30-30-22-17: total params: 2951
Epoch [ 20000/100000], Loss: 0.12997872
Epoch [ 40000/100000], Loss: 0.08670329
Epoch [ 60000/100000], Loss: 0.07215597
Epoch [ 80000/100000], Loss: 0.06438527
Epoch [100000/100000], Loss: 0.05819779
Training accuracy:  0.05744401
Testing  accuracy:  0.71489292


In [34]:
get_accuracy(model.to(device), X_test, y_test)

Direction accuracy: 93.75%


93.75382731169627

### Conclusion
The testing accuracy 0.7148 of 30-30-22-17 model is different from optimzation run (Accuracy was 0.488462). This is because of complexity of NN models which might not always converge to the same local minimum with the same accuracy. Therefore the best models table should be treated as a guidelines only. That's why I would like to take the average numbers of best 8 models for production NN instead of top performing model.

In [57]:
df.head(8).mean()

x1          28.250000
x2          27.250000
x3          22.000000
x4          19.000000
TrainAcc     0.098423
TestAcc      0.504599
DirAcc      94.419780
dtype: float64

The production Neural Network will be with 28-27-22-19 neurons