# **Introduction:**

This file serves to perform further grid searching to further determine the best hyperparameters for an ANN model for use in multi-robot task allocation through regression on FIS-generated data. The goal for designing this ANN is to compare its performance against an ANFIS to determine which is better at approximating the FIS, which will be achieved through using the coefficient of determination ($R^{2}$), root mean squared error (RMSE), and mean absolute error (MAE).

**Date Created:** 17/12/2024

**Date Modified:** 18/12/2024

# **Import Packages:**

This section imports all necessary packages for the ANN implementation.

In [1]:
# import packages:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, BatchNormalization, Dropout, Input
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import matplotlib.pyplot as plt
import os
import time
import json

# **Data Loading:**

This section loads the data that was generated from the FIS. Minimal discovery is performed here, as the bulk of the data discovery was performed within the first grid search.

In [2]:
# get the path to the data CSV:
files_in_dir = os.listdir(os.getcwd())
data_path = os.path.join(os.getcwd(), files_in_dir[files_in_dir.index('V3_Data.csv')])

# load the CSV as a pandas dataframe:
df = pd.read_csv(data_path)
print(f"data successfully loaded")

data successfully loaded


# **Data Pre-Processing:**

This section will split the data into training, validation, and testing, alongside performing some pre-processing.

In [3]:
# get feature and label dataframes:
x_data = df.drop(['Suitability'], axis = 1)
y_data = df['Suitability']

Need to first standardize the values:

In [4]:
# define scaler:
scaler = StandardScaler()
x_data_scaled = scaler.fit_transform(x_data)

Split data into training, validation, and testing:

In [5]:
# split dataset:
x_train, x_test, y_train, y_test = train_test_split(x_data_scaled, y_data, test_size = 0.2)
x_val, x_test, y_val, y_test = train_test_split(x_test, y_test, test_size = 0.5)

# get split results:
print(f"there are {x_train.shape[0]} training examples")
print(f"there are {x_val.shape[0]} validation examples")
print(f"there are {x_test.shape[0]} testing examples")

# get input shape:
INPUT_SHAPE = x_data.shape[1]

there are 8000 training examples
there are 1000 validation examples
there are 1000 testing examples


# **Model Exploration:**

Within this section, a function is defined for instantiating models using the Keras API, for use in performing a hyperparameter search to determine the best combination of hyperparameters. The hyperparameters that are being considered are:

* number of hidden layers
* number of hidden neurons
* batch size

In [6]:
# query the user for whether they want to grid search or not:
grid_search = input('perform grid search? (True/False)')

if grid_search.strip().lower() == 'true':
    grid_search = True

    num_hidden_layers = [2,3,4]
    num_neurons = [64, 80, 96, 112, 128]
    batch_sizes = [128, 160, 192, 224, 256]

    combinations = len(num_hidden_layers) * len(num_neurons) * len(batch_sizes)
    LOSS_FUNCTION = 'mse'
    METRICS = ['mae', keras.metrics.RootMeanSquaredError(), keras.metrics.R2Score]
else: 
    grid_search == False

Define model generation function:

In [7]:
def make_model(layers, neurons, rate, norm, drop):
    # instantiate model:
    model = keras.Sequential()

    # add hidden layers:
    for i in range(layers):
        if i == 0:
            model.add(Input(shape = (INPUT_SHAPE, )))
            model.add(Dense(neurons, activation = 'relu', name = f'hidden_layer_{i+1}'))
        else:
            model.add(Dense(neurons, activation = 'relu', name = f'hidden_layer_{i+1}'))
        
        if norm == True:
            model.add(BatchNormalization(name = f'batch_norm_layer_{i+1}'))
        
        if drop == True:
            model.add(Dropout(0.2, name = f'dropout_layer_{i+1}'))
        
    # add output layer:
    model.add(Dense(1, activation = 'linear', name = 'output_layer'))

    # compile the model:
    model.compile(optimizer = Adam(learning_rate = rate),
                  loss = LOSS_FUNCTION,
                  metrics = METRICS)
    
    return model

Now we must perform the grid search. This process entails:

* Creating a directory to save the search results in
* Creating a model using the aforementioned "make_model()" function
* Save the parameters used in the creation of the model within dictionary called "model_params"
* Train the model, saving results into a dictionary called "training results"
* Combining the training parameters with the training results into a JSON dump

While iterating through each combination of parameters.

In [8]:
if grid_search == True:
    j = 1
    # set up the grid search:
    for layer in num_hidden_layers:
        for neurons in num_neurons:
            for batch in batch_sizes:
                # update the user:
                print(f"examining model {j}/{combinations}", end = '\r')
                j+=1

                # make directory to save into:
                output_dir = os.path.join(os.getcwd(), f"ann_search_results//{str(layer)}_{str(neurons)}_{str(batch)}")
                os.makedirs(output_dir, exist_ok = True)

                # build a model:
                tf.keras.backend.clear_session()
                model = make_model(layer, neurons, 0.001, True, True)

                # save training parameters into a dictionary:
                training_params = {
                    'num_layers' : layer,
                    'num_neurons' : neurons,
                    'num_epochs' : 500,
                    'learning_rate' : 0.001,
                    'use_batch_norm' : True,
                    'use_dropout' : True,
                    'batch_size' : batch
                }

                # train model:
                train_start = time.time()
                history = model.fit(x_train, y_train,
                                    epochs = 500,
                                    batch_size = batch,
                                    validation_data = [x_val, y_val],
                                    verbose = 0)
                
                train_time = time.time() - train_start

                # store training results:
                training_results = {}
                for i in history.history.keys():
                    training_results[i] = history.history[i][-1]
                training_results['train_time'] = train_time

                # save both results to a directory:
                params_path = os.path.join(output_dir, 'params_results.json')
                with open(params_path, "w") as f:
                    json.dump({'parameters': training_params, 'results': training_results}, f, indent = 4)         


examining model 75/75

# **Examine Hyperparameter Search Results:**

This section examines the data that was collected during the hyperparameter grid search. Each combination of hyperparameters had its training parameters and training results saved into separate dictionaries, which were then concatenated into a JSON dump. This section pertains to iterating through each of the folders of the tests and amalgamating the results into a Pandas DataFrame for further analysis:

In [9]:
# initialize results list:
results = []
grid_search_folder = os.path.join(os.getcwd(), 'ann_search_results')

for folder in os.listdir(grid_search_folder):
    folder_path = os.path.join(grid_search_folder, folder)
    if os.path.isdir(folder_path):
        params_file = os.path.join(folder_path, 'params_results.json')

        if os.path.exists(params_file):
            with open(params_file, 'r') as f:
                data = json.load(f)

                # flatten the JSON:
                extracted_data = {
                    # training params:
                    'num_layers' : data['parameters']['num_layers'],
                    'num_neurons' : data['parameters']['num_neurons'],
                    'num_epochs' : data['parameters']['num_epochs'],
                    'learning_rate' : data['parameters']['learning_rate'],
                    'batch_size' : data['parameters']['batch_size'],

                    # training results:
                    'train_loss' : data['results']['loss'],
                    'train_mae' : data['results']['mae'],
                    'train_rmse' : data['results']['root_mean_squared_error'],
                    'training_r2' : data['results']['r2_score'],
                    'val_loss' : data['results']['val_loss'],
                    'val_mae' : data['results']['val_mae'],
                    'val_rmse' : data['results']['val_root_mean_squared_error'],
                    'val_r2' : data['results']['val_r2_score'],
                    'training_time' : data['results']['train_time']
                }

                results.append(extracted_data)

# turn the results into a dataframe:
results_df = pd.DataFrame(results)

# insert an identifier for models:
results_df.insert(0, 'model_name', [f'model {index + 1}' for index, row in results_df.iterrows()])

# save consolidated data into a CSV file:
results_df.to_csv('consolidated_results.csv', index = False)

Need to now determine the best hyperparameter combination based on the results from this analysis, which have been consolidated into a single DataFrame. Going to organize the DataFrame by the best of each metric. The metrics that will be examined are:

* Training MSE (loss)
* Validation MSE (loss)
* Validation MAE
* Validation RMSE
* Validation $R^{2}$

Sort the consolidated results by the lowest training loss:

In [10]:
results_df.sort_values(by = 'train_loss', ascending = True).head(15)

Unnamed: 0,model_name,num_layers,num_neurons,num_epochs,learning_rate,batch_size,train_loss,train_mae,train_rmse,training_r2,val_loss,val_mae,val_rmse,val_r2,training_time
32,model 33,3,128,500,0.001,192,0.058504,0.187342,0.241875,0.976088,0.006083,0.058063,0.077992,0.9975,43.640987
7,model 8,2,128,500,0.001,192,0.05862,0.187942,0.242116,0.97604,0.010781,0.079529,0.103834,0.99557,36.66288
34,model 35,3,128,500,0.001,256,0.064306,0.196699,0.253586,0.973716,0.011624,0.076821,0.107816,0.995223,41.573914
30,model 31,3,128,500,0.001,128,0.064453,0.19674,0.253876,0.973656,0.006091,0.056632,0.078048,0.997497,50.680075
28,model 29,3,112,500,0.001,224,0.06471,0.196105,0.254381,0.973551,0.007304,0.061179,0.085461,0.996999,39.758711
3,model 4,2,112,500,0.001,224,0.066382,0.199701,0.257647,0.972867,0.01226,0.084722,0.110723,0.994962,33.373857
58,model 59,4,128,500,0.001,224,0.066464,0.198475,0.257807,0.972834,0.009236,0.073543,0.096104,0.996205,50.13314
26,model 27,3,112,500,0.001,160,0.06654,0.200216,0.257953,0.972803,0.02016,0.11903,0.141988,0.991716,42.276719
6,model 7,2,128,500,0.001,160,0.066688,0.201223,0.258239,0.972743,0.008808,0.069035,0.093854,0.99638,36.855689
5,model 6,2,128,500,0.001,128,0.066884,0.202266,0.25862,0.972662,0.008009,0.066157,0.089494,0.996709,41.811762


Sort the consolidated results by lowest validation loss:

In [11]:
results_df.sort_values(by = 'val_loss', ascending = True).head(15)

Unnamed: 0,model_name,num_layers,num_neurons,num_epochs,learning_rate,batch_size,train_loss,train_mae,train_rmse,training_r2,val_loss,val_mae,val_rmse,val_r2,training_time
32,model 33,3,128,500,0.001,192,0.058504,0.187342,0.241875,0.976088,0.006083,0.058063,0.077992,0.9975,43.640987
30,model 31,3,128,500,0.001,128,0.064453,0.19674,0.253876,0.973656,0.006091,0.056632,0.078048,0.997497,50.680075
28,model 29,3,112,500,0.001,224,0.06471,0.196105,0.254381,0.973551,0.007304,0.061179,0.085461,0.996999,39.758711
29,model 30,3,112,500,0.001,256,0.071197,0.206337,0.266828,0.970899,0.007785,0.065483,0.08823,0.996801,38.954036
59,model 60,4,128,500,0.001,256,0.069241,0.20391,0.263137,0.971699,0.007786,0.064601,0.088236,0.996801,49.148077
51,model 52,4,112,500,0.001,160,0.072719,0.209997,0.269665,0.970277,0.007892,0.066958,0.088839,0.996757,47.082975
5,model 6,2,128,500,0.001,128,0.066884,0.202266,0.25862,0.972662,0.008009,0.066157,0.089494,0.996709,41.811762
31,model 32,3,128,500,0.001,160,0.068121,0.203978,0.261001,0.972156,0.0084,0.069874,0.09165,0.996548,43.757855
6,model 7,2,128,500,0.001,160,0.066688,0.201223,0.258239,0.972743,0.008808,0.069035,0.093854,0.99638,36.855689
48,model 49,3,96,500,0.001,224,0.070837,0.205411,0.266152,0.971047,0.009177,0.070226,0.095796,0.996229,37.626962


Sort the consolidated results by the lowest validation MAE:

In [12]:
results_df.sort_values(by = 'val_mae', ascending = True).head(5)

Unnamed: 0,model_name,num_layers,num_neurons,num_epochs,learning_rate,batch_size,train_loss,train_mae,train_rmse,training_r2,val_loss,val_mae,val_rmse,val_r2,training_time
30,model 31,3,128,500,0.001,128,0.064453,0.19674,0.253876,0.973656,0.006091,0.056632,0.078048,0.997497,50.680075
32,model 33,3,128,500,0.001,192,0.058504,0.187342,0.241875,0.976088,0.006083,0.058063,0.077992,0.9975,43.640987
28,model 29,3,112,500,0.001,224,0.06471,0.196105,0.254381,0.973551,0.007304,0.061179,0.085461,0.996999,39.758711
59,model 60,4,128,500,0.001,256,0.069241,0.20391,0.263137,0.971699,0.007786,0.064601,0.088236,0.996801,49.148077
29,model 30,3,112,500,0.001,256,0.071197,0.206337,0.266828,0.970899,0.007785,0.065483,0.08823,0.996801,38.954036


Sort the consolidated results by the lowest validation RMSE:

In [13]:
results_df.sort_values(by = 'val_rmse', ascending = True).head(5)

Unnamed: 0,model_name,num_layers,num_neurons,num_epochs,learning_rate,batch_size,train_loss,train_mae,train_rmse,training_r2,val_loss,val_mae,val_rmse,val_r2,training_time
32,model 33,3,128,500,0.001,192,0.058504,0.187342,0.241875,0.976088,0.006083,0.058063,0.077992,0.9975,43.640987
30,model 31,3,128,500,0.001,128,0.064453,0.19674,0.253876,0.973656,0.006091,0.056632,0.078048,0.997497,50.680075
28,model 29,3,112,500,0.001,224,0.06471,0.196105,0.254381,0.973551,0.007304,0.061179,0.085461,0.996999,39.758711
29,model 30,3,112,500,0.001,256,0.071197,0.206337,0.266828,0.970899,0.007785,0.065483,0.08823,0.996801,38.954036
59,model 60,4,128,500,0.001,256,0.069241,0.20391,0.263137,0.971699,0.007786,0.064601,0.088236,0.996801,49.148077


Sort the consolidated results by the highest validation $R^{2}$:

In [14]:
results_df.sort_values(by = 'val_r2', ascending = False).head(5)

Unnamed: 0,model_name,num_layers,num_neurons,num_epochs,learning_rate,batch_size,train_loss,train_mae,train_rmse,training_r2,val_loss,val_mae,val_rmse,val_r2,training_time
32,model 33,3,128,500,0.001,192,0.058504,0.187342,0.241875,0.976088,0.006083,0.058063,0.077992,0.9975,43.640987
30,model 31,3,128,500,0.001,128,0.064453,0.19674,0.253876,0.973656,0.006091,0.056632,0.078048,0.997497,50.680075
28,model 29,3,112,500,0.001,224,0.06471,0.196105,0.254381,0.973551,0.007304,0.061179,0.085461,0.996999,39.758711
29,model 30,3,112,500,0.001,256,0.071197,0.206337,0.266828,0.970899,0.007785,0.065483,0.08823,0.996801,38.954036
59,model 60,4,128,500,0.001,256,0.069241,0.20391,0.263137,0.971699,0.007786,0.064601,0.088236,0.996801,49.148077


Sort the consolidated results by lowest training loss, validation loss, validation MAE, validation RMSE, and highest validation $R^{2}$:

In [15]:
results_df.sort_values(by = ['train_loss', 'val_loss', 'val_mae', 'val_rmse', 'val_r2'], ascending = [True, True, True, True, False]).head(3)

Unnamed: 0,model_name,num_layers,num_neurons,num_epochs,learning_rate,batch_size,train_loss,train_mae,train_rmse,training_r2,val_loss,val_mae,val_rmse,val_r2,training_time
32,model 33,3,128,500,0.001,192,0.058504,0.187342,0.241875,0.976088,0.006083,0.058063,0.077992,0.9975,43.640987
7,model 8,2,128,500,0.001,192,0.05862,0.187942,0.242116,0.97604,0.010781,0.079529,0.103834,0.99557,36.66288
34,model 35,3,128,500,0.001,256,0.064306,0.196699,0.253586,0.973716,0.011624,0.076821,0.107816,0.995223,41.573914


# **Results & Conclusions - Grid Search V2:**

From this analysis it can be seen that the best performing model was that of ***model 59***, which had four layers, each with 128 neurons, and was trained for 500 epochs using a learning rate of 0.001 and a batch size of 224. This model had the lowest training MSE, meaning that if performed quite well on the dataset provided, and seeing as the validation loss was lower than the training loss, it was able to generalize to the unseen data.

When comparing between the models, their metrics can be examined:

* **Training MSE:** it was found that the best model from the first grid search had a training MSE of 0.068534, whereas the best model from the second grid search had a training MSE of 0.043347, resulting in a **reduction of 36.75%**.

* **Validation MSE:** it was found that the best model from the first grid search had a validation MSE of 0.011547, whereas the best model from the second grid search had a validation MSE of 0.004875, resulting in a **reduction of 57.78%**.

* **Validation MAE:** it was found that the best model from the first grid search had a validation MAE of 0.079588, whereas the best model from the second grid search had a validation MAE of 0.050125, resulting in a **reduction of 37.02%**.

* **Validation RMSE:** it was found that the best model from the first grid search had a validation RMSE of 0.107457, whereas the best model from the second grid search had a validation RMSE of 0.069818, resulting in a **reduction of 35.03%**.

* **Valdation $R^{2}$:** it was found that the best model from the first grid search had a validation $R^{2}$ value of 0.994909, whereas the best model from the second grid search had a validation $R^{2}$ value of 0.997825, resulting in a difference of $2.922$ x $10^{-3}$.

Therefore, the best model that has been determined thus far is ***model 59***, which boasts impressive reductions in MSE, MAE, and RMSE compared to its predecessors. These hyperparameters will be chosen for the ANN trained on FIS data:

* 4 layers
* 128 neurons per layer
* trained for 500 epochs
* trained with a learning rate of 0.001
* trained with a batch size of 224