# Task

1. Import necessary packages, including TensorFlow, and access your Google Drive.
2. Load the top 30 selected features and standardize them.
3. Prepare the training and test datasets, along with their labels, for ANN training.
4. Perform a grid search for ANN hyperparameters:
  - In this tutorial, the target hyperparameters are 'activation functions', 'number of hidden layers', and 'learning rate'.
  - Set up to 20 different hyperparameter combinations.
  - Train each model with up to 100 epochs.
  - Use up to 30 neurons for each hidden layer.
  - Create temporary ANN models within the loop using the provided '*ANN_model*' function.
  - Store the performance (diagnostic accuracy) of each temporary model in a DataFrame (Accuracy_df)
5. Evaluate the best ANN model based on the confusion matrix and other evaluation metrics:
  - Consider the first row of the sorted 'Accuracy_df' as the best case.
  - Load and use the best model to make predictions on the test dataset.
  - Calculate and display the confusion matrix and evaluation metrics (accuracy, precision, recall, and F1 score) for the best model.

.

- *Refer to ML6_Code1 and ML7_Code1*

.

.

.

.

.

.

.

Prepare Data and Labels for ANN

.

.

.

.

.

.

.



## Grid search for Artificial Neural Network (ANN) hyperparameters

### [Main hyperparameters of ANN]

1. **Number of hidden layers**: The number of hidden layers in an ANN determines the depth of the network. A deeper network can learn more complex patterns and representations of the data. However, increasing the number of hidden layers can also make the network more prone to overfitting and increase the computational cost.

.

2. **Number of neurons per hidden layer**:The number of neurons in each hidden layer determines the width of the network. A wider network can learn more complex representations of the data, but it also increases the number of trainable parameters and can lead to overfitting and increased computational cost.

.

3. **Activation functions**: Activation functions introduce non-linearity into the network, allowing it to learn complex patterns and representations. Common activation functions include:

  - Sigmoid: $f(x) = \frac{1}{1 + e^{-x}}$
  - Hyperbolic Tangent (tanh): $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
  - Rectified Linear Unit (ReLU): $f(x) = max(0, x)$
  - Leaky ReLU: $f(x) = max(\alpha x, x)$, where $\alpha$ is a small constant (e.g., 0.01)

.

4. **Loss function**: The loss function measures the difference between the predicted output and the actual output (target) for each data point. The goal of training an ANN is to minimize the loss function. Common loss functions for classification tasks include:

  - *(Categorical) Cross-Entropy Loss*: For multi-class classification problems.
  - *Binary Cross-Entropy Loss*: For binary classification problems.

.

5. **Optimizer**: The optimizer is an algorithm used to update the weights of the network during training to minimize the loss function. Common optimizers include:

  - *Stochastic Gradient Descent (SGD)*: Simplest optimization algorithm, updates weights using a single data point, can be slow to converge.
  - *Momentum*: Extension of SGD with a momentum term, accelerates and dampens oscillations, converges faster.
  - *RMSProp*: Improvement over AdaGrad, resolves diminishing learning rate issue, suitable for non-stationary optimization problems.
  - *Adam*: Combines benefits of momentum and RMSProp, adapts learning rate for each weight, maintains smooth convergence, popular in deep learning.

.

6. **Learning rate**: The learning rate is a hyperparameter that controls the step size of the weight updates during training. A smaller learning rate will lead to slower convergence, while a larger learning rate may cause the model to overshoot the optimal weights and not converge at all.

.

7. **Epochs**: The number of epochs is the number of times the entire training dataset is passed through the network during training. Too few epochs can lead to underfitting, while too many epochs can lead to overfitting.

### Prepare lists of hyperparameters for grid search

In [None]:
# Hyperparameters for grid search
param_ActFn = [] # activation function      (e.g., ['relu', 'tanh', 'sigmoid'])
                 # you can just put names of activation functions
                 # Refer to names: https://keras.io/api/layers/activations/
param_Layer = [] # number of hiddent layers (e.g., [2, 3, 5, 10])
param_Lrate = [] # learning rate            (e.g., [0.0001, 0.001, 0.01])

# Fixed hyperparameters
noOfNeuron = 
Epoch      = 

In [None]:
# Define a function to create ANN models by inputting the hyperparameters for grid search

def ANN_model(input_data, noOfNeuron, temp_actfn, temp_layer, temp_lrate):
    keras.backend.clear_session()  # Clearing the Keras backend session (initiating variables)

    model = keras.Sequential()
    model.add(keras.layers.InputLayer(input_shape=(input_data.shape[1],)))  # Input Layer

    for i in range(temp_layer):
        model.add(keras.layers.Dense(units=noOfNeuron, activation=temp_actfn, name=f'Hidden{i+1}'))  # Hidden Layer

    model.add(keras.layers.Dense(units=2, activation='softmax', name='Output'))  # Output Layer

    model.compile(optimizer=keras.optimizers.Adam(learning_rate=temp_lrate),
                  loss=keras.losses.CategoricalCrossentropy(),
                  metrics=['accuracy'])
    return model

### Train the ANN models with different combinations of hyperparameters and save them

In [None]:









Accuracy_df = 

In [None]:
Accuracy_df_sorted = 

### Confusion matrix and evaluation metrics for the best ANN model