# Neural Networks

One of the models we chose to implement for the purpose of lung cancer classification was a multilayer perceptron, a type of feedfoward neural network.  
MLPs are suitable for classification prediction problems and data provided in a tabular format, such as the features CSV file we are working with.

We start by importing relevant libraries and dropping useless columns from our CSV.

In [1]:
from kfold_and_metrics import *

import tensorflow as tf
import pandas as pd
import numpy as np

tf.keras.backend.clear_session()

In [2]:
df = pd.read_csv("final.csv")
df = df.drop(columns=['id'])
df.head()

Unnamed: 0,patient_id,diagnostics_Image-original_Mean,diagnostics_Mask-original_VoxelNum,diagnostics_Mask-original_VolumeNum,diagnostics_Image-interpolated_Mean,diagnostics_Image-interpolated_Minimum,diagnostics_Image-interpolated_Maximum,diagnostics_Mask-interpolated_VoxelNum,diagnostics_Mask-interpolated_VolumeNum,diagnostics_Mask-interpolated_Maximum,...,diagnostics_Mask-interpolated_BoundingBox_2,diagnostics_Mask-interpolated_BoundingBox_3,diagnostics_Mask-interpolated_BoundingBox_4,diagnostics_Mask-interpolated_BoundingBox_5,diagnostics_Mask-interpolated_CenterOfMassIndex_0,diagnostics_Mask-interpolated_CenterOfMassIndex_1,diagnostics_Mask-interpolated_CenterOfMassIndex_2,diagnostics_Mask-interpolated_CenterOfMass_0,diagnostics_Mask-interpolated_CenterOfMass_1,diagnostics_Mask-interpolated_CenterOfMass_2
0,LIDC-IDRI-0001,-826.943929,5905,1,-417.494203,-990.291016,1038.270874,909,2,237.087921,...,0.0,13.0,11.0,10.0,17.041265,16.108666,4.184319,128.652843,34.787644,-229.881362
1,LIDC-IDRI-0001,-826.943929,4613,1,-405.581777,-982.456726,949.768005,699,1,221.953705,...,0.0,13.0,11.0,10.0,17.041265,16.108666,4.184319,128.652843,34.787644,-229.881362
2,LIDC-IDRI-0001,-826.943929,4955,1,-410.236759,-990.291016,1038.270874,772,1,237.087921,...,0.0,13.0,11.0,10.0,17.041265,16.108666,4.184319,128.652843,34.787644,-229.881362
3,LIDC-IDRI-0001,-826.943929,5498,1,-416.576321,-990.291016,1038.270874,841,2,237.087921,...,0.0,13.0,11.0,10.0,17.041265,16.108666,4.184319,128.652843,34.787644,-229.881362
4,LIDC-IDRI-0002,-826.943929,10351,1,-546.359139,-1007.657349,1020.174988,749,1,160.687653,...,0.0,13.0,11.0,10.0,17.041265,16.108666,4.184319,128.652843,34.787644,-229.881362


### Parameter Hypertuning

In the machine learning world, it is mostly agreed that a single hidden layer is enough to develop a good neural network model.  

Taking this into consideration, we still needed to decide on the number of nodes it should hold, while trying to prevent both under and overfitting. For this, we tested values until 2/3 of the number of input nodes, one of the several rules of thumb for choosing the number of nodes in a hidden layer.  

Additionally, we tested different learning rates for our Stochastic gradient descent optimizer and different activation functions for the hidden layer.

In [4]:
best_auc = 0
best = {}

# input_nodes = len(df.columns)-2
input_nodes = 50
max_n_nodes = input_nodes * 2//3

for hidden_layer_act in ["softmax", "relu", "sigmoid"]:
    for n_nodes in range(0, max_n_nodes, 10):
        for l_rate in [0.001, 0.003, 0.005, 0.007, 0.01]:
            params = {'hidden_layer_nodes': n_nodes, 'hidden_layer_activation': hidden_layer_act, 'learning_rate': l_rate}
            print("Current parameter combination:")
            for parameter, value in params.items():
                print(f"\t{parameter}: {value}")
            print()

            nn_model = tf.keras.models.Sequential([
                tf.keras.layers.Input((50,), name="input"),
                tf.keras.layers.Dense(n_nodes,activation=hidden_layer_act),
                tf.keras.layers.Dense(2,activation='softmax')
            ])
            nn_model.compile(
                optimizer=tf.keras.optimizers.SGD(l_rate), 
                loss=tf.keras.losses.SparseCategoricalCrossentropy(), 
                metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
            )

            score = k_fold_cv_keras(compiled_model=nn_model, df=df, pca_components=50)
            results = mean_std_results_k_fold_CV(score)

            auc_avg = results.iloc[2,1]
            if auc_avg > best_auc:
                best_auc = auc_avg
                best = params

Current parameter combination:
	hidden_layer_nodes: 0
	hidden_layer_activation: softmax
	learning_rate: 0.001

Current parameter combination:
	hidden_layer_nodes: 0
	hidden_layer_activation: softmax
	learning_rate: 0.003

Current parameter combination:
	hidden_layer_nodes: 0
	hidden_layer_activation: softmax
	learning_rate: 0.005

Current parameter combination:
	hidden_layer_nodes: 0
	hidden_layer_activation: softmax
	learning_rate: 0.007

Current parameter combination:
	hidden_layer_nodes: 0
	hidden_layer_activation: softmax
	learning_rate: 0.01

Current parameter combination:
	hidden_layer_nodes: 10
	hidden_layer_activation: softmax
	learning_rate: 0.001

Current parameter combination:
	hidden_layer_nodes: 10
	hidden_layer_activation: softmax
	learning_rate: 0.003

Current parameter combination:
	hidden_layer_nodes: 10
	hidden_layer_activation: softmax
	learning_rate: 0.005

Current parameter combination:
	hidden_layer_nodes: 10
	hidden_layer_activation: softmax
	learning_rate: 0.007

After running, we got the following values:

In [None]:
print("Results of the grid search parameter hypertunning:")
for parameter, value in best.items():
    print(f"\t{parameter}: {value}")

Results of the grid search parameter hypertunning:
	hidden_layer_nodes: 25
	hidden_layer_activation: relu
	learning_rate: 0.01


In [6]:
best = {
    'hidden_layer_nodes': 25,
	'hidden_layer_activation': "relu",
	'learning_rate': 0.01
}

nn_model = tf.keras.models.Sequential([
    tf.keras.layers.Input((50,), name="input"),
    tf.keras.layers.Dense(best['hidden_layer_nodes'], activation=best['hidden_layer_activation']),
    tf.keras.layers.Dense(2,activation='softmax')
])

nn_model.compile(
    optimizer=tf.keras.optimizers.SGD(best['learning_rate']), 
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), 
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
)

score = k_fold_cv_keras(compiled_model=nn_model, df=df, pca_components=50)
results = mean_std_results_k_fold_CV(score)
results

Unnamed: 0,metric,mean,std
0,f1_score,0.510362,0.094239
1,accuracy_score,0.693754,0.027423
2,roc_auc_score,0.638943,0.047586


The number of hidden layers and rule of thumb considered can be read about in the following resources: