###  Comparison between best Deep Learning Models for Tabular data  using *mushroom_cleaned* dataset

I'm going to use these models:
- #### TabNet
- #### TabTransforner
- #### SAINT
- #### TF-Transformer

For each model I will get the execution time, loss, Test  Accuracy and their parameters archiving a minimum of .85 in val accuracy.

Every model have a training-testing size --> 80-20 with a random_seed=42, an *ADAM optimizer* and a *BCEWwithLogistLoss loss function*

#### Imports

In [1]:
import numpy as np
import pandas as pd
import time
# Preprocessing 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
#
import torch
import torch.nn as nn
# Models
from tab_transformer_pytorch import TabTransformer
# Metrics
from sklearn.metrics import accuracy_score, roc_auc_score, roc_curve

In [2]:
time_Model = {}

#### Load Data

In [3]:
dataset = pd.read_csv('mushroom_cleaned.csv', sep=',')
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54035 entries, 0 to 54034
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   cap-diameter     54035 non-null  int64  
 1   cap-shape        54035 non-null  int64  
 2   gill-attachment  54035 non-null  int64  
 3   gill-color       54035 non-null  int64  
 4   stem-height      54035 non-null  float64
 5   stem-width       54035 non-null  int64  
 6   stem-color       54035 non-null  int64  
 7   season           54035 non-null  float64
 8   class            54035 non-null  int64  
dtypes: float64(2), int64(7)
memory usage: 3.7 MB


##### Data treatment common for each model

In [4]:
categorical_cols = ['cap-diameter','cap-shape', 'gill-attachment', 'gill-color', 'stem-width', 'stem-color']
continuous_cols = ['stem-height', 'season']
label_col = 'class'
#
labelEncodersTabTransformer = {}
#
for col in categorical_cols: 
    le = LabelEncoder();
    dataset[col] = le.fit_transform(dataset[col])
    labelEncodersTabTransformer[col] = le
# Split data into 'data' and 'labels'
X = dataset.drop(label_col, axis=1)
y = dataset[label_col]
# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Separate categorical and continous features
X_train_cat =   X_train[categorical_cols].values
X_train_cont =  X_train[continuous_cols].values
#
X_test_cat =    X_test[categorical_cols].values
X_test_cont =   X_test[continuous_cols].values
#
# Convert to PyTorch Tensors
X_train_cat_tensor =    torch.tensor(X_train_cat,       dtype=torch.long)
X_train_cont_tensor =   torch.tensor(X_train_cont,      dtype=torch.float)
y_train_tensor =        torch.tensor(y_train.values,    dtype=torch.float)
# 
X_test_cat_tensor =     torch.tensor(X_test_cat, dtype=torch.long)
X_test_cont_tensor =    torch.tensor(X_test_cont, dtype=torch.float)
y_test_tensor =         torch.tensor(y_test.values, dtype=torch.float)

### TabTransformer

#### Define the model + Optimizer + Loss function

In [None]:
# Parameters
NUM_UNIQUE_CATEGORIES = [dataset[col].nunique() for col in categorical_cols]
NUM_CONTINOUS  = len(continuous_cols)
DIM = 32
DIM_OUT = 1
DEPTH = 6
HEADS = 8
ATTN_DROPOUT = .1
FF_DROPOUT = .1

In [None]:
modelTabTrans = TabTransformer(
    categories=NUM_UNIQUE_CATEGORIES,
    num_continuous=NUM_CONTINOUS,
    dim=DIM,
    dim_out=DIM_OUT,
    depth=DEPTH,
    heads=HEADS,
    attn_dropout=ATTN_DROPOUT,
    ff_dropout=FF_DROPOUT
)

In [None]:
# Optimizer + Loss function
optimizer = torch.optim.Adam(modelTabTrans.parameters(), lr=1e-3)
criterion = nn.BCEWithLogitsLoss()

#### TabTransformer Train

In [None]:
loss_TabTrans = [] # loss per epoch
ini_time = time.time()
modelTabTrans.train()
for epoch in range(15):
    optimizer.zero_grad()
    output = modelTabTrans(X_train_cat_tensor, X_train_cont_tensor)
    loss = criterion(output.squeeze(), y_train_tensor)
    loss.backward()
    optimizer.step()
    loss_TabTrans.append(loss.item())
    print(f'Epoch {epoch+1}, Loss: {loss.item()}')
time_Model['TabTransformer'] = time.time() - ini_time

#### TabTransformer Evaluation

In [None]:
modelTabTrans.eval()
with torch.no_grad():
    # Test Loss
    test_output = modelTabTrans(X_test_cat_tensor, X_test_cont_tensor)
    test_loss = criterion(test_output.squeeze(), y_test_tensor)
    print(f'Test Loss: {test_loss.item()}')
    # Accuracy
    test_preds = torch.sigmoid(test_output).squeeze().round()
    accuracy = accuracy_score(y_test_tensor.numpy(), y_test_tensor)
    print(f'Test Accuracy: {accuracy * 100:.4f}%')

### TF-Transformer

No tenemos una implementación directa de TF-Transformer en PyTorch o Skleaern.

### SAINT

Importamos SAINT desde nuestra fuente de librerías, si no está descargar la librería desde :
*git clone https://github.com/somepago/saint*

No podemos probar el SAINT ya que es una librería generada para la ejecución de entrenamientos y testeos de modelos tipo SAINT por comando usando aceleración por GPU y no dispongo de los núcleos CUDA suficientes para que sea verdaderamente efectivo

### TabNet

In [13]:
from pytorch_tabnet.tab_model import TabNetClassifier

In [15]:
modelTabNet = TabNetClassifier(
    n_d= 32, n_a=32, n_steps=5, gamma=1.5, n_independent=2, n_shared=2,
    cat_idxs=[i for i in range(len(categorical_cols))],
    cat_dims=[dataset[col].nunique() for col in categorical_cols],
    cat_emb_dim=1,
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=2e-2),
    scheduler_params={"step_size":10, "gamma":0.9},
    scheduler_fn=torch.optim.lr_scheduler.StepLR,
    mask_type='sparsemax'
)



In [17]:
start_time = time.time()
modelTabNet.fit(
    X_train=np.hstack([X_train_cat, X_train_cont]),
    y_train=y_train.values,
    eval_set=[(np.hstack([X_test_cat, X_test_cont]), y_test.values)],
    eval_name=['test'],
    eval_metric=['auc'],
    max_epochs=10,
    patience=0,
    batch_size=1024,
    virtual_batch_size=128
)
training_time_tabnet = time.time() - start_time

test_probs = modelTabNet.predict_proba(np.hstack([X_test_cat, X_test_cont]))[:, 1]
test_preds = (test_probs > 0.5).astype(int)
accuracy_tabnet = accuracy_score(y_test, test_preds)
print(f'Test Accuracy: {accuracy_tabnet * 100:.2f}%')

fpr_tabnet, tpr_tabnet, _ = roc_curve(y_test, test_probs)
roc_auc_tabnet = roc_auc_score(y_test, test_probs)
print(f'Test AUC: {roc_auc_tabnet:.2f}')



epoch 0  | loss: 0.70732 | test_auc: 0.63611 |  0:00:04s
epoch 1  | loss: 0.59771 | test_auc: 0.71279 |  0:00:08s
epoch 2  | loss: 0.55356 | test_auc: 0.79524 |  0:00:12s
epoch 3  | loss: 0.53332 | test_auc: 0.82095 |  0:00:16s
epoch 4  | loss: 0.48863 | test_auc: 0.86548 |  0:00:20s
epoch 5  | loss: 0.45256 | test_auc: 0.88555 |  0:00:24s
epoch 6  | loss: 0.42428 | test_auc: 0.90558 |  0:00:27s
epoch 7  | loss: 0.40643 | test_auc: 0.8989  |  0:00:33s
epoch 8  | loss: 0.39537 | test_auc: 0.91399 |  0:00:37s
epoch 9  | loss: 0.3986  | test_auc: 0.91976 |  0:00:43s
Test Accuracy: 83.11%
Test AUC: 0.92


### Conclusiones

Como no he podido usar los 4 métodos más usados seguiré la lista para hacer una comparación de mínimo 4 métodos.

### LightGMB

Es un algoritmo de alto rendimiento basado en el boosting. Este es un algoritmo conocido por la eficiencia en el entrenamiento

In [5]:
import lightgbm as lgb

In [6]:
# Preparamos los datos para el modelo
train_data_gmb =    lgb.Dataset(X_train, label=y_train)
test_data_gmb =     lgb.Dataset(X_test, label=y_test, reference=train_data_gmb)

#### Configuración del modelo

In [7]:
params = {
    'objective':        'binary',
    'metric':           'binnary_logloss',
    'boosting_type':    'gbdt',
    'num_leaves':       31,
    'learning_rate':    0.05,
    'feature_fraction': 0.9
}

#### Entrenamiento del modelo

In [8]:
start_time = time.time()
modellGBM = lgb.train(params, train_data_gmb, num_boost_round=100, valid_sets=[test_data_gmb])
training_time_lgbm = time.time() - start_time

: 

##### Predicción y evaluación

In [None]:
test_preds = modellGBM.predict(X_test, num_iteration=modellGBM.best_iteration)
test_preds_binary = (test_preds > 0.5).astype(int)
accuracy_lgbm = accuracy_score(y_test, test_preds_binary)
print(f'Test Accuracy: {accuracy_lgbm * 100:.2f}%')

fpr_lgbm, tpr_lgbm, _ = roc_curve(y_test, test_preds)
roc_auc_lgbm = roc_auc_score(y_test, test_preds)
print(f'Test AUC: {roc_auc_lgbm:.2f}')

### CatBoost

Es 