###  NN-Model based classification using Pytorch with a tabular vector borne disease dataset

#### TEAM MEMBERS:
* #### Nnamdi Ogbonnaya Odii     (11160333)
* #### Maryam Jahangir           (11159172)
* #### Niels

#### 1.0 The problem

The "Classification with a Tabular Vector Borne Disease Dataset" competition is part of Kaggle's Playground Series for the year 2023. The competition aims to develop robust neural networok model that can accurately classify instances of vector-borne diseases based on a tabular dataset containing various features. The goal is to create models that can generalize well to unseen data and provide accurate predictions on the test set.

Our task is to train the neural network model,evaluate its performance using the map@k and then predict likely top3 vector borne diseases, given some features.

#### 1.0 Introduction

In this report, we present a detailed analysis of a neural network model for disease classification. The aim of the study is to optimize the model's hyperparameters using Optuna, evaluate the performance of the model by using the mean average precision at K (MAPK) metric, and utilize the trained and optimized model to generate predictions for the test dataset. The dataset used for training and testing consists of various diseases and their corresponding symptoms.


#### 2.0 The Dataset

The dataset for this competition (both train and test) was generated from a deep learning model trained on the vector-borne-disease-prediction[1]. Our task is to predict the target prognosis of Vector-borne diseases.

Vectors[2] are living organisms that can transmit infectious pathogens between humans, or from animals to humans. Many of these vectors are bloodsucking insects, which ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later transmit it into a new host, after the pathogen has replicated. Often, once a vector becomes infectious, they are capable of transmitting the pathogen for the rest of their life during each subsequent bite/blood meal.

Vector-borne diseases[3] are human illnesses caused by parasites, viruses and bacteria that are transmitted by vectors. Every year there are more than 700,000 deaths from diseases such as malaria, dengue, schistosomiasis, human African trypanosomiasis, leishmaniasis, Chagas disease, yellow fever, Japanese encephalitis and onchocerciasis.

Two datasets were provided for this task: the train dataset, labeled as train.csv and the test.csv, for prediction purpose.

#### 2.0.1 Train.csv
* The train dataset consists of 707x64 features(symptoms) and 707x1 labels(Prognosis). 
* The features are binary digits (0s and 1s) where 0 represents absence of a particular symptom and 1 represents its presence. Each feature column is depicted by a particular symptom. 
* The labels are string-values of different kinds of vector borne diseases.This target variable serves as the ground truth for training the classification model.

#### 2.0.2 Test.csv
* The Test.csv dataset consists of 303x64 features(symptoms).
* The test dataset is also a tabular dataset that follows a similar structure to the training dataset.
* However, unlike the training dataset, the test dataset does not contain the target variable.
* The purpose of the test dataset is to evaluate the performance and generalization ability of the Neural network model developed. 

#### 3.0   The Methods
#### 3.0.1 Data Preprocessing:

We did the following preprocessing activities on the data:
* Loaded dataset:The dataset is loaded from a CSV file and split into features (X) and labels (y).
* Encoding:We used the OrdinalEncoder class to encode the categorical label column into numerical representation.

#### 3.0.2 Feature Engingeering:
The 64 given features represent only one single symptom each. However, illnesses often come along with combinations of symptoms. Also, one single symptom may apply to a whole range of illnesses. The applied feature engineering intents to combine the most likely symptoms of an illness to get a "Fingerprint" of every disease. This fingerprint will be added as an additional column, resulting in 11 features more. Taking the four most likely symptoms, this results in 14 possible combinations of AND and OR, referring to 8 ways to arrange the four symptoms and 6 more by setting logical paranthesis. Out of these 14 combinations, the one that bests represents the certain disease is selected. For the validation, every combination is applied to the whole dataset. An ideal fingerprint combination of symptoms results in a logical True for all records that have the prognosis of the certain illness and results in a logical False for all records that have a different prognosis. However, an ideal fingerprint is not achievable. Hence, the number of false negatives (False although prognosis is positive) and false positives (True although prognosis was different illness) is counted. The algorithm intents to reduce the number of false negatives as much as possible. However, just validating on false negatives leads to the OR-OR-OR combination being always chosen and respectively a high number of false positives. As a consequence, the threshold of n times false negatives limits the number of acceptable false positives to a multiple of the counted false negatives. This validation can be seen as a method of feature reduction. The number of features increased by 11 from 64 to 75 respectively.

#### 3.0.3 Hyperparameter Tuning:

To optimize the model's performance, we employed Optuna, a hyperparameter tuning tool. Optuna was used to search for the best combination of hyperparameters, including the choice of optimizer, hidden size, learning rate, and number of epochs. By iterating over multiple trials, Optuna helped identify the hyperparameters that maximize the model's validation MAPK score.

* Optimizer: The optimizer determines the update rule for adjusting the weights and biases of the neural network during the training process. The three optimizers used in the code are:
*  a. Adam: Adam (Adaptive Moment Estimation) is an adaptive learning rate optimization algorithm that combines the benefits of both AdaGrad and RMSProp. It maintains adaptive learning rates for different parameters and performs well in a wide range of tasks.
*  b. SGD: SGD (Stochastic Gradient Descent) is a classic optimization algorithm that updates the model parameters by computing gradients on randomly selected small batches of data. It is a simple and widely used optimizer but can be sensitive to the learning rate.
*  c. RMSprop: RMSprop (Root Mean Square Propagation) is an adaptive learning rate method that divides the learning rate by the exponentially decaying average of squared gradients. It tends to converge faster and overcome some limitations of SGD.
* Hidden Size: The hidden size refers to the number of units or neurons in the hidden layer(s) of the neural network. It determines the model's capacity to learn complex representations and capture intricate relationships in the data. In the provided code, the hidden size can take values of 32, 64, or 128, offering different levels of model complexity.
* Learning Rate: The learning rate controls the step size of weight updates during training. It determines how quickly or slowly the model learns from the data. A high learning rate may cause the model to overshoot optimal weights, leading to instability, while a low learning rate may slow down the learning process. The provided code explores learning rates ranging from 1e-5 to 1e-1 on a logarithmic scale, covering a wide range of learning rates.
* Number of Epochs: An epoch is a complete pass through the entire training dataset during the training process. The number of epochs defines how many times the model goes through the entire dataset. It influences the amount of training time and the model's ability to converge to an optimal solution. In the provided code, the number of epochs ranges from 10 to 200 with a step of 10, allowing for exploration of different training durations.

The choice of optimizers, hidden size, learning rate, and number of epochs can significantly impact the model's ability to learn and generalize patterns from the data, achieve convergence, and avoid overfitting. 

#### 3.0.4 Neural Network Architecture:

The code implements a neural network model architecture specifically designed for disease classification.The architecture consists of three fully connected layers.
* Batch normalization is applied after the first and second hidden layers to improve training efficiency. It addresses the issue of internal covariate shift, ensuring stable activations during training and enhancing gradient propagation
* The ReLU activation function is used after each hidden layer. It introduces non-linearity, enabling the model to learn complex patterns and avoid the vanishing gradient problem.

#### 3.0.5 Model Training and Evaluation:

The performance of our trained and optimized neural network model was evaluated using the MAPK metric. The MAPK score measures the average precision at various levels of recall, providing an overall assessment of the model's ability to rank the predicted diseases correctly. In the code:
* The training data is divided into five folds using Stratified K-Fold cross-validation. For each fold, the model is trained using    the specified hyperparameters and evaluated on the validation set.
* Early stopping is implemented to prevent overfitting by monitoring the validation loss.
* The MAPK score is calculated for each fold and averaged to obtain the final validation MAPK.

#### 3.0.6 Best Model Selection and Training:

* The best hyperparameters and validation accuracy are determined based on the optimization results.
* The best model is created using the optimal hyperparameters and trained on the full dataset.
* The model is trained for the specified number of epochs using the Adam optimizer and cross-entropy loss.

#### 3.0.7 Test Set Evaluation:
To evaluate the model's performance on unseen data, a test dataset was utilized. The test dataset consists of symptom information for a set of patients with unknown diseases. The trained model was applied to make predictions on this dataset.

The test dataset was preprocessed in the same manner as the training data, including feature scaling using the MinMaxScaler. The scaled test features were converted to PyTorch tensors for compatibility with the model.

The best-trained model, determined by the optimization process, was used to generate predictions for the test dataset. The top 3 predicted diseases were extracted for each patient, based on the highest probability scores obtained from the model.

The predicted disease labels were then decoded using the label encoder to obtain the actual disease names. These predictions provide insights into potential diseases for each patient in the test dataset. Example Predictions: Predicted Diseases: 
['Zika' 'Yellow_Fever' 'Japanese_encephalitis'].

#### 4.0 Results
Our trained and optimized neural network model achieved a validation MAPK score of [0.34035394399493885]. This indicates that our model performs well in ranking and predicting the vector-borne diseases correctly, considering the given features.

During the training and hyperparameter tuning process, we explored various combinations of hyperparameters using Optuna. By leveraging Optuna's capabilities, we identified the hyperparameters that provided the best performance on the validation set.

* Optimizer: RMSprop
* Hidden Size: 64
* Learning Rate: 0.0004992938012543013
* Number of Epochs: 30


#### 5.0 The discussion

The obtained hyperparameters suggest that RMSprop is the preferred optimizer for this classification task. The hidden size of 64 indicates that a moderate number of hidden units is sufficient for effective disease classification. The chosen learning rate and number of epochs are suitable for achieving a good balance between model performance and training time.

The validation MAPK score of 0.34035394399493885 indicates that the model performs reasonably well in predicting diseases based on symptoms. However, further improvements can be explored by considering more complex architectures, ensembling methods, or increasing the size of the dataset.

A comparison of the results with and without Feature Engineering shows only slightly differences. Looking at the feature selection, there are a lot of false diagnosis from the chosen fingerprints, meaning that the prints are blurred to some extend. Higher fingerprint accuracy could be obtained by including more symptoms, e.g. by increasing to the 5 most likely ones or including the least likely ones.

A submission to Kaggle delivers the results shown in below picture:

![Submission](./Submission_Kaggle_NN.JPG)

_Image 1: Results of the Kaggle Submission_


#### 6.0 The conclusion 

In this study, we successfully optimized the hyperparameters of a neural network model for disease classification using Optuna. The best model achieved a validation MAPK score of 0.34035394399493885. The findings suggest that the chosen hyperparameters and model architecture are effective for the given task.

Additionally, the model demonstrated promising performance when making predictions on unseen test data. The top 3 predicted diseases for each patient provide valuable insights into potential diagnoses. Further research and evaluation can be conducted to assess the model's generalizability and reliability in real-world healthcare scenarios.

#### 7.0 The References:

The authors would like to thank Yuganshu Wadhwa for supporting the implementation of the Feature Engineering model in Python

[1] Dataset: Vector-borne-disease-prediction. https://www.kaggle.com/datasets/richardbernat/vector-borne-disease-

[2] dataset: vectors. https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases#:~:text=and%20community%20mobilisation.-,Vectors,-Vectors%20are%20living

[3] Dataset: Vector borne disease. https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases#:~:text=bite/blood%20meal.-,Vector%2Dborne%20diseases,-Vector%2Dborne%20diseases

[4] Optuna: A Hyperparameter Optimization Framework. https://optuna.org/

[5] Sklearn StratifiedKFold Documentation. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html

[6] PyTorch Documentation. https://pytorch.org/docs/stable/index.html

[7] Pandas Documentation. https://pandas.pydata.org/docs/

In [1]:
features = 'yes'
#features = 'no'

if features == 'yes':
    initial_size = 75
    train_path = 'train_features.csv'
    test_path = 'test_features.csv'
else:
    initial_size = 64
    train_path = 'train.csv'
    test_path = 'test.csv'

#### 1.0 Import the necessary libraries

In [2]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import optuna
from spotPython.utils.metrics import mapk_score


#### 2.0 Load the Training dataset & Make neural network architecture

In [3]:
# Set a random seed for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed(42)
torch.cuda.manual_seed_all(42)  # Set random seed for all GPUs if using multiple GPUs
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In [4]:

# Define the neural network architecture
#This block defines a neural network model architecture with three fully connected layers (fc1, fc2, and fc3). 
#The batch normalization layers (bn1 and bn2) are added after each fully connected layer to improve training stability

class NeuralNetwork(nn.Module):
    def __init__(self, input_size=64, num_classes=11, hidden_size=64):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.bn1 = nn.BatchNorm1d(hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.bn2 = nn.BatchNorm1d(hidden_size)
        self.fc3 = nn.Linear(hidden_size, num_classes)
        
#The forward method defines the flow of data through the layers of the neural network. The input x is passed through 
#the fully connected layers (fc1, fc2, and fc3) interspersed with batch normalization layers (bn1 and bn2). 
#ReLU activation is applied after each fully connected layer, except for the last layer. The method returns the final 
#output of the network.   
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.bn1(out)
        out = nn.functional.relu(out)
        out = self.fc2(out)
        out = self.bn2(out)
        out = nn.functional.relu(out)
        out = self.fc3(out)
        return out

dataset = pd.read_csv(train_path,index_col='id')
print(dataset.head())

# Step 1: Split the dataset into features (X) and labels (y)
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

    sudden_fever  headache  mouth_bleed  nose_bleed  muscle_pain  joint_pain  \
id                                                                             
0              1         1            0           1            1           1   
1              0         0            0           0            0           0   
2              0         1            1           1            0           1   
3              0         0            1           1            1           1   
4              0         0            0           0            0           0   

    vomiting  rash  diarrhea  hypotension  ...  Zika_print  \
id                                         ...               
0          1     0         1            1  ...           0   
1          1     0         1            0  ...           0   
2          1     1         1            1  ...           0   
3          0     1         0            1  ...           0   
4          0     0         1            0  ...           0   

    

#### 3.0 Hyperparameter Tuning with Optuna

In [5]:

num_folds = 5
# Initialize lists to store the training and validation accuracies
train_accuracies = []
val_accuracies = []

# Initialize lists to store the training and validation losses
train_losses = []
val_losses = []

mapk_folds = []
skf = StratifiedKFold(n_splits=num_folds)

def objective(trial):
    optimizer_name = trial.suggest_categorical('optimizer', ['Adam', 'SGD', 'RMSprop'])
    hidden_size = trial.suggest_categorical('hidden_size', [32, 64, 128])
    learning_rate = trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)
    num_epochs = trial.suggest_int('epochs', 10, 100, step=5)

    fold_num = 0  # Track the fold number

    for train_index, val_index in skf.split(X, y_encoded):
        # Split the data into training and validation sets for each fold
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y_encoded[train_index], y_encoded[val_index]

        # Convert data to PyTorch tensors(for compatibility with PyTorch's computational graph and tensor operations)
        X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
        y_train_tensor = torch.tensor(y_train, dtype=torch.long)
        X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
        y_val_tensor = torch.tensor(y_val, dtype=torch.long)

        # Define the neural network architecture
        model = NeuralNetwork(input_size=X.shape[1], num_classes=len(label_encoder.classes_), hidden_size=hidden_size)

        # Define the loss function and optimizer
        criterion = nn.CrossEntropyLoss()
        optimizer_class = getattr(optim, optimizer_name)
        optimizer = optimizer_class(model.parameters(), lr=learning_rate)
        
        # Initialize variables for early stopping
        best_val_loss = float('inf')#to keep track of the lowest validation loss observed during training
        epochs_without_improvement = 0
        patience = 10  #If the validation loss does not improve for 'patience' consecutive epochs, training will be stopped to prevent overfitting or wasting computational resources

        # Training loop
        for epoch in range(num_epochs):
            # Set the model to training mode
            model.train()

            # Forward pass
            outputs = model(X_train_tensor)
            loss = criterion(outputs, y_train_tensor)

            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # Evaluate the model on validation set
            model.eval()
            val_predictions = model(X_val_tensor)
            val_loss = criterion(val_predictions, y_val_tensor)

            # Check if validation loss has improved
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                epochs_without_improvement = 0
            else:
                epochs_without_improvement += 1

            # Check early stopping condition
            if epochs_without_improvement >= patience:
                break

        # Evaluate the final model on validation set
        model.eval()
        val_predictions = model(X_val_tensor)
        val_predicted_probs = torch.softmax(val_predictions, dim=1)
        mapk_folds.append(mapk_score(y_val, val_predicted_probs.cpu().detach().numpy()))

        fold_num += 1  # Increment the fold number
        print(f"Fold: {fold_num}, Epoch [{epoch+1}/{num_epochs}], Val Loss: {val_loss.item():.4f}")

    mapk_final_val = np.average(mapk_folds)
    print(f'---------------------------------------------\n{num_folds}-fold Validation MAPK Val: {mapk_final_val}\n')

    return mapk_final_val

# Create the study and optimize the objective function
#A study is a container that manages the optimization process.The direction parameter is set to 'maximize', indicating
#that the objective function should be maximized during the optimization process.
#the optimize() method starts the optimization by repeatedly calling the objective function with different hyperparameter
#configurations.The goal is to find the hyperparameter configuration that maximizes the objective function,after n_trials
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10)

# Get the best trial
best_trial = study.best_trial

# Print the best hyperparameters and validation accuracy(objective function)
best_params = study.best_params
best_val_accuracy = study.best_value

# Create and load the best model using the best parameters
best_model = NeuralNetwork(input_size=X.shape[1], num_classes=len(label_encoder.classes_), hidden_size=best_params['hidden_size'])
# Train the best model on the full dataset
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y_encoded, dtype=torch.long)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(best_model.parameters(), lr=best_params['learning_rate'])#Consistent optimizer across different runs for reproducible & comparable results
num_epochs = best_params['epochs']
best_model.train()
for epoch in range(num_epochs):
    optimizer.zero_grad()#clears the gradients of all optimized parameters to prepare for the backward pass.
    outputs = best_model(X_tensor)#computes the forward pass of the best_model using the input features X_tensor to obtain the predicted outputs
    loss = criterion(outputs, y_tensor)#calculates the loss by comparing the predicted outputs with the ground truth labels y_tensor
    loss.backward()#computes the gradients of the model parameters with respect to the loss, performing backpropagation
    optimizer.step()#updates the model's parameters based on the computed gradients and the chosen optimization algorithm
    
# Set the best model to evaluation mode
#This mode disables techniques like dropout and batch normalization that are typically used during training. 
#Instead, the model will use its learned parameters to make predictions consistently.
best_model.eval()

print(f"Best Hyperparameters: {best_params}")
print(f"Best Validation MAPK: {best_val_accuracy}")

[I 2023-07-16 02:24:34,271] A new study created in memory with name: no-name-7dd7d348-50ed-4b0f-9348-f43ba466587b


Fold: 1, Epoch [45/45], Val Loss: 2.0675
Fold: 2, Epoch [45/45], Val Loss: 2.1136
Fold: 3, Epoch [45/45], Val Loss: 1.9370
Fold: 4, Epoch [45/45], Val Loss: 2.0728


[I 2023-07-16 02:24:34,718] Trial 0 finished with value: 0.33643991609229845 and parameters: {'optimizer': 'RMSprop', 'hidden_size': 32, 'learning_rate': 0.0006676307305247338, 'epochs': 45}. Best is trial 0 with value: 0.33643991609229845.


Fold: 5, Epoch [45/45], Val Loss: 2.0533
---------------------------------------------
5-fold Validation MAPK Val: 0.33643991609229845

Fold: 1, Epoch [11/80], Val Loss: 2.4119
Fold: 2, Epoch [26/80], Val Loss: 2.4048
Fold: 3, Epoch [11/80], Val Loss: 2.4143


[I 2023-07-16 02:24:34,877] Trial 1 finished with value: 0.23241767389205203 and parameters: {'optimizer': 'SGD', 'hidden_size': 32, 'learning_rate': 3.553870073566271e-05, 'epochs': 80}. Best is trial 0 with value: 0.33643991609229845.


Fold: 4, Epoch [27/80], Val Loss: 2.4055
Fold: 5, Epoch [15/80], Val Loss: 2.4253
---------------------------------------------
5-fold Validation MAPK Val: 0.23241767389205203

Fold: 1, Epoch [23/95], Val Loss: 2.8908
Fold: 2, Epoch [21/95], Val Loss: 2.7683
Fold: 3, Epoch [23/95], Val Loss: 2.7325
Fold: 4, Epoch [22/95], Val Loss: 2.9355


[I 2023-07-16 02:24:35,339] Trial 2 finished with value: 0.26339970476919833 and parameters: {'optimizer': 'Adam', 'hidden_size': 128, 'learning_rate': 0.008313469391907054, 'epochs': 95}. Best is trial 0 with value: 0.33643991609229845.


Fold: 5, Epoch [24/95], Val Loss: 2.8766
---------------------------------------------
5-fold Validation MAPK Val: 0.26339970476919833

Fold: 1, Epoch [65/65], Val Loss: 2.1603
Fold: 2, Epoch [65/65], Val Loss: 2.1960
Fold: 3, Epoch [65/65], Val Loss: 2.1174
Fold: 4, Epoch [65/65], Val Loss: 2.0944


[I 2023-07-16 02:24:36,482] Trial 3 finished with value: 0.2704516698298538 and parameters: {'optimizer': 'RMSprop', 'hidden_size': 128, 'learning_rate': 3.5522681768276785e-05, 'epochs': 65}. Best is trial 0 with value: 0.33643991609229845.
[I 2023-07-16 02:24:36,625] Trial 4 finished with value: 0.2497938933839443 and parameters: {'optimizer': 'SGD', 'hidden_size': 32, 'learning_rate': 1.8204018430281336e-05, 'epochs': 60}. Best is trial 0 with value: 0.33643991609229845.


Fold: 5, Epoch [65/65], Val Loss: 2.2029
---------------------------------------------
5-fold Validation MAPK Val: 0.2704516698298538

Fold: 1, Epoch [11/60], Val Loss: 2.4174
Fold: 2, Epoch [11/60], Val Loss: 2.4136
Fold: 3, Epoch [11/60], Val Loss: 2.4108
Fold: 4, Epoch [34/60], Val Loss: 2.4009
Fold: 5, Epoch [13/60], Val Loss: 2.4074
---------------------------------------------
5-fold Validation MAPK Val: 0.2497938933839443



[I 2023-07-16 02:24:36,844] Trial 5 finished with value: 0.23428700651505563 and parameters: {'optimizer': 'Adam', 'hidden_size': 64, 'learning_rate': 1.3598600169157745e-05, 'epochs': 35}. Best is trial 0 with value: 0.33643991609229845.


Fold: 1, Epoch [20/35], Val Loss: 2.4004
Fold: 2, Epoch [13/35], Val Loss: 2.4114
Fold: 3, Epoch [16/35], Val Loss: 2.4066
Fold: 4, Epoch [14/35], Val Loss: 2.4029
Fold: 5, Epoch [14/35], Val Loss: 2.3937
---------------------------------------------
5-fold Validation MAPK Val: 0.23428700651505563

Fold: 1, Epoch [43/95], Val Loss: 2.3735
Fold: 2, Epoch [11/95], Val Loss: 2.4153
Fold: 3, Epoch [13/95], Val Loss: 2.4036
Fold: 4, Epoch [21/95], Val Loss: 2.3886


[I 2023-07-16 02:24:37,201] Trial 6 finished with value: 0.22294166892608605 and parameters: {'optimizer': 'SGD', 'hidden_size': 128, 'learning_rate': 0.00012713335676495843, 'epochs': 95}. Best is trial 0 with value: 0.33643991609229845.
[I 2023-07-16 02:24:37,311] Trial 7 finished with value: 0.23577314953551093 and parameters: {'optimizer': 'RMSprop', 'hidden_size': 32, 'learning_rate': 0.0031670713130407813, 'epochs': 10}. Best is trial 0 with value: 0.33643991609229845.


Fold: 5, Epoch [18/95], Val Loss: 2.4133
---------------------------------------------
5-fold Validation MAPK Val: 0.22294166892608605

Fold: 1, Epoch [10/10], Val Loss: 2.1556
Fold: 2, Epoch [10/10], Val Loss: 2.1487
Fold: 3, Epoch [10/10], Val Loss: 2.0916
Fold: 4, Epoch [10/10], Val Loss: 2.2040
Fold: 5, Epoch [10/10], Val Loss: 2.1258
---------------------------------------------
5-fold Validation MAPK Val: 0.23577314953551093

Fold: 1, Epoch [11/35], Val Loss: 2.4054
Fold: 2, Epoch [13/35], Val Loss: 2.4058


[I 2023-07-16 02:24:37,513] Trial 8 finished with value: 0.22658335090659534 and parameters: {'optimizer': 'Adam', 'hidden_size': 64, 'learning_rate': 3.732371513541133e-05, 'epochs': 35}. Best is trial 0 with value: 0.33643991609229845.


Fold: 3, Epoch [11/35], Val Loss: 2.4034
Fold: 4, Epoch [24/35], Val Loss: 2.3964
Fold: 5, Epoch [11/35], Val Loss: 2.4109
---------------------------------------------
5-fold Validation MAPK Val: 0.22658335090659534

Fold: 1, Epoch [75/75], Val Loss: 2.2374
Fold: 2, Epoch [75/75], Val Loss: 2.1957
Fold: 3, Epoch [75/75], Val Loss: 2.2313
Fold: 4, Epoch [75/75], Val Loss: 2.2367


[I 2023-07-16 02:24:38,961] Trial 9 finished with value: 0.2319816535144674 and parameters: {'optimizer': 'Adam', 'hidden_size': 128, 'learning_rate': 4.998107030731937e-05, 'epochs': 75}. Best is trial 0 with value: 0.33643991609229845.


Fold: 5, Epoch [75/75], Val Loss: 2.2117
---------------------------------------------
5-fold Validation MAPK Val: 0.2319816535144674

Best Hyperparameters: {'optimizer': 'RMSprop', 'hidden_size': 32, 'learning_rate': 0.0006676307305247338, 'epochs': 45}
Best Validation MAPK: 0.33643991609229845


### 4.0 Top 3 Predictions on the Test.csv

#### 4.0.1 Load the Training dataset

In [6]:
# Load the new dataset
test_data = pd.read_csv(test_path, index_col='id')
print(test_data.head())

     sudden_fever  headache  mouth_bleed  nose_bleed  muscle_pain  joint_pain  \
id                                                                              
707             0         0            0           0            0           0   
708             1         1            0           1            0           1   
709             1         1            0           1            1           1   
710             0         1            0           0            0           1   
711             0         0            1           0            1           1   

     vomiting  rash  diarrhea  hypotension  ...  Tungiasis_print  Zika_print  \
id                                          ...                                
707         0     0         0            1  ...                0           1   
708         1     1         1            1  ...                0           0   
709         1     0         1            0  ...                1           0   
710         1     1         0   

#### 4.0.2 Generating Predictions for Test Data

In [8]:
test = test_data.values

# Convert the scaled features to PyTorch tensor
test_tensor = torch.tensor(test, dtype=torch.float32)

# Set the model to evaluation mode
best_model.eval()

# Generate predictions for the test data
probabilities = best_model(test_tensor)
top_k = 3
top_k_predictions = torch.topk(probabilities, k=top_k, dim=1).indices.numpy()

# Decode the predicted labels using the label encoder
predicted_labels = []
for sample_predictions in top_k_predictions:
    sample_classes = label_encoder.inverse_transform(sample_predictions)
    predicted_labels.append(sample_classes)

# Show the top 3 predictions 
print("Top 3 Predictions:")
for i, sample_predictions in enumerate(predicted_labels[:]):
    index = test_data.index[i]  # Get the index value from the test data
    print(index, ":", sample_predictions)
    #test_data[i, 'predictions'] = sample_predictions

Top 3 Predictions:
707 : ['Rift_Valley_fever' 'Tungiasis' 'Dengue']
708 : ['Chikungunya' 'Dengue' 'West_Nile_fever']
709 : ['West_Nile_fever' 'Lyme_disease' 'Zika']
710 : ['Tungiasis' 'Rift_Valley_fever' 'Lyme_disease']
711 : ['Zika' 'West_Nile_fever' 'Malaria']
712 : ['Yellow_Fever' 'Japanese_encephalitis' 'Malaria']
713 : ['Malaria' 'Plague' 'Japanese_encephalitis']
714 : ['Chikungunya' 'West_Nile_fever' 'Dengue']
715 : ['Lyme_disease' 'Japanese_encephalitis' 'Yellow_Fever']
716 : ['Malaria' 'Zika' 'Plague']
717 : ['West_Nile_fever' 'Japanese_encephalitis' 'Lyme_disease']
718 : ['West_Nile_fever' 'Dengue' 'Rift_Valley_fever']
719 : ['Rift_Valley_fever' 'Japanese_encephalitis' 'Dengue']
720 : ['Chikungunya' 'Dengue' 'West_Nile_fever']
721 : ['West_Nile_fever' 'Dengue' 'Japanese_encephalitis']
722 : ['Zika' 'Tungiasis' 'Yellow_Fever']
723 : ['West_Nile_fever' 'Malaria' 'Zika']
724 : ['Tungiasis' 'Rift_Valley_fever' 'Japanese_encephalitis']
725 : ['Chikungunya' 'West_Nile_fever' 'Rift_V