<a href="https://colab.research.google.com/github/YahyaEryani/quantum-model/blob/main/notebooks/05_TabNet_model_training_and_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Training, Tuning, and Evaluation

 In this notebook, we will train an TabNet model on the Higgs boson dataset we have preprocessed in the `01_data_exploration` notebook. We will perform the model training and tuning process to obtain the best model with the highest accuracy possible.

## Installing and Importing Libraries
In this section, we will install and import the necessary libraries and packages that will be used throughout the notebook.

In [1]:
!pip install torch==1.10.0+cpu torchvision==0.11.1+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/cpu/torch_stable.html
Collecting torch==1.10.0+cpu
  Downloading https://download.pytorch.org/whl/cpu/torch-1.10.0%2Bcpu-cp39-cp39-linux_x86_64.whl (199.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.3/199.3 MB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchvision==0.11.1+cpu
  Downloading https://download.pytorch.org/whl/cpu/torchvision-0.11.1%2Bcpu-cp39-cp39-linux_x86_64.whl (16.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.1/16.1 MB[0m [31m65.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 2.0.0+cu118
    Uninstalling torch-2.0.0+cu118:
      Successfully uninstalled torch-2.0.0+cu118
  Attempting uninstall: torchvision
    Found existing installation: torchvisio

In [2]:
!pip install pytorch-tabnet

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytorch-tabnet
  Downloading pytorch_tabnet-4.0-py3-none-any.whl (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pytorch-tabnet
Successfully installed pytorch-tabnet-4.0


In [3]:
!pip install torch -U


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch
  Downloading torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl (619.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m619.9/619.9 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-nccl-cu11==2.14.3
  Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.1/177.1 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cudnn-cu11==8.5.0.96
  Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m557.1/557.1 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu11==11.7.101
  Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.8/

In [12]:
import torch
import numpy as np
import pandas as pd
from pytorch_tabnet.tab_model import TabNetClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

## Loading Data

This code cell loads the training, validation, and test datasets that were saved in pickle format to the local directory.

In [5]:
# Mount Google Drive in Colab
from google.colab import drive
import pandas as pd

drive.mount('/content/drive')

# Load data from Google Drive
train_path = '/content/drive/MyDrive/Higgs_dataset/processed/training_data.pkl'
val_path   = '/content/drive/MyDrive/Higgs_dataset/processed/validation_data.pkl'
test_path  = '/content/drive/MyDrive/Higgs_dataset/processed/testing_data.pkl'

train_data = pd.read_pickle(train_path)
val_data = pd.read_pickle(val_path)
test_data = pd.read_pickle(test_path)

Mounted at /content/drive


## Prepare the data for training
This code separates the features and class labels from the train, validation, and test datasets.

In [6]:
# Separate features and labels
y_train = train_data['class_label']
X_train = train_data.drop('class_label', axis=1)
y_val = val_data['class_label']
X_val = val_data.drop('class_label', axis=1)
y_test = test_data['class_label']
X_test = test_data.drop('class_label', axis=1)

##Train the TabNet model
This code sets the hyperparameters for a TabNet model, including the number of decision steps, the number of attention heads, learning rate, batch size, and number of epochs. It then trains the TabNet model using the Adam optimizer with a specified learning rate, batch size, and number of epochs. Additionally, the model performance is monitored using the validation set during training.

In [7]:
# Set the TabNet hyperparameters
tabnet_params = dict(
    n_d=64,
    n_a=64,
    n_steps=5,
    gamma=1.3,
    n_independent=2,
    n_shared=2,
    seed=42,
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=0.02),
    scheduler_params=dict(
        mode="min",
        patience=5,
        min_lr=1e-5,
        factor=0.9,
    ),
    scheduler_fn=torch.optim.lr_scheduler.ReduceLROnPlateau,
    mask_type="entmax",
    verbose=1
)

# Train the TabNet model
tabnet_model = TabNetClassifier(**tabnet_params)
tabnet_model.fit(
    X_train=X_train.values, y_train=y_train.values,
    eval_set=[(X_val.values, y_val.values)],
    max_epochs=100,
    patience=10,
    batch_size=256,
    virtual_batch_size=128,
    num_workers=0,
    drop_last=False,
)



epoch 0  | loss: 0.62151 | val_0_auc: 0.76416 |  0:01:57s
epoch 1  | loss: 0.57629 | val_0_auc: 0.77859 |  0:03:54s
epoch 2  | loss: 0.56295 | val_0_auc: 0.78642 |  0:05:53s
epoch 3  | loss: 0.55643 | val_0_auc: 0.79139 |  0:08:00s
epoch 4  | loss: 0.54599 | val_0_auc: 0.80008 |  0:09:58s
epoch 5  | loss: 0.53801 | val_0_auc: 0.80774 |  0:11:57s
epoch 6  | loss: 0.53142 | val_0_auc: 0.80672 |  0:13:55s
epoch 7  | loss: 0.52765 | val_0_auc: 0.81273 |  0:15:54s
epoch 8  | loss: 0.52411 | val_0_auc: 0.81861 |  0:17:57s
epoch 9  | loss: 0.52227 | val_0_auc: 0.82037 |  0:19:56s
epoch 10 | loss: 0.51987 | val_0_auc: 0.81679 |  0:21:55s
epoch 11 | loss: 0.51796 | val_0_auc: 0.81806 |  0:23:53s
epoch 12 | loss: 0.51856 | val_0_auc: 0.82183 |  0:25:52s
epoch 13 | loss: 0.51542 | val_0_auc: 0.82313 |  0:27:57s
epoch 14 | loss: 0.51338 | val_0_auc: 0.82325 |  0:29:56s
epoch 15 | loss: 0.51177 | val_0_auc: 0.82552 |  0:31:55s
epoch 16 | loss: 0.51321 | val_0_auc: 0.82598 |  0:33:53s
epoch 17 | los



In [14]:
# Calculate the accuracy on the training set
y_train_pred = tabnet_model.predict(X_train.values)
train_accuracy = accuracy_score(y_train, y_train_pred)
print(f"Training Accuracy: {train_accuracy * 100:.2f}%")

Training Accuracy: 77.69%


## Make predictions on the test data and evaluate the model performance
This code uses the TabNet model that was previously trained to make predictions on the test data.

In [18]:
# Make predictions on the test data
y_test_pred = tabnet_model.predict(X_test.values)

# Calculate the accuracy of the model on the test data
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

Test Accuracy: 74.82%
