## Comparación entre Deep Tabular Learning y otros métodos

#### Vamos a utilizar los siguientes algoritmos para la comparación:
- ##### TabTransformer
- ##### TabNet
- ##### CNN 
- ##### XGBoost

#### Para todas los modelos vamos a utilizar las métricas:
- ##### Accuracy
- ##### F1-Score

#### Los resultados los vamos a reflejar en las gráficas:
- ##### ROC-Curve
- ##### Training time
- ##### Training loss

### IMPORTS

In [12]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import torch
import torch.nn as nn
from tab_transformer_pytorch import TabTransformer

### Carga + Preprocesado 

In [3]:
fPath = 'mushroom_cleaned.csv'
data = pd.read_csv(fPath)

In [4]:
data.head()

Unnamed: 0,cap-diameter,cap-shape,gill-attachment,gill-color,stem-height,stem-width,stem-color,season,class
0,1372,2,2,10,3.807467,1545,11,1.804273,1
1,1461,2,2,10,3.807467,1557,11,1.804273,1
2,1371,2,2,10,3.612496,1566,11,1.804273,1
3,1261,6,2,10,3.787572,1566,11,1.804273,1
4,1305,6,2,10,3.711971,1464,11,0.943195,1


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54035 entries, 0 to 54034
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   cap-diameter     54035 non-null  int64  
 1   cap-shape        54035 non-null  int64  
 2   gill-attachment  54035 non-null  int64  
 3   gill-color       54035 non-null  int64  
 4   stem-height      54035 non-null  float64
 5   stem-width       54035 non-null  int64  
 6   stem-color       54035 non-null  int64  
 7   season           54035 non-null  float64
 8   class            54035 non-null  int64  
dtypes: float64(2), int64(7)
memory usage: 3.7 MB


#### Separamos features-labels

In [8]:
X = data.drop('class', axis=1)
y = data['class']

#### Codificamos las etiquetas

In [9]:
labelEncoder = LabelEncoder()
y = labelEncoder.fit_transform(y)

#### Split Train-Test

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=123)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((43228, 8), (10807, 8), (43228,), (10807,))

#### Columnas categóricas y continuas

In [13]:
categorical_cols = ['cap-diameter','cap-shape', 'gill-attachment', 'gill-color', 'stem-width', 'stem-color']
continuous_cols = ['stem-height', 'season']

#### Variables para la representación

Los diccionarios que vamos a usar serán de la forma

    "nombreModelo" :    
    {
        "accuracy": accuracy,
        "f1-score": f1_score,
        "roc":      roc,
        "trainTime":training_time,
        "trainLoss":training_loss
    }  

In [14]:
diccionarioModelo = {}

### TabTransformer

#### Preprocesado adicional TabTransformer

Como trabajo adicional tenemos que sacar los parámetros del modelo así como los valores únicos en cata columna categórica 

In [21]:
# Parámetros
NUM_UNIQUE_CATEGORIES = [data[column].unique() for column in categorical_cols] # Sacamos los valores únicos de cada columna
NUM_CONTINOUS = len(continuous_cols)        # Con sacar el número de columnas contínuas el modelo funciona
DIM = 32                                    # Dimensión del transformer, default=32
DIM_OUT = 1                                 # Dimensión de salida, como es binario DIM_OUT=1
DEPTH = 6                                   # Numero de transformers
HEADS = 8                         # Dropout para el feed forward

In [None]:
# sacamos datos 

In [17]:
# Sacamos las columnas categóricas y contínuas
X_train_cat =   X_train[categorical_cols].values
X_train_cont =  X_train[continuous_cols].values
#
X_test_cat =    X_test[categorical_cols].values
X_test_cont =   X_test[continuous_cols].values
#
# Ahora lo pasamos todo a tensores PyTorch
X_train_cat_tensor =    torch.tensor(X_train_cat,   dtype=torch.long)
X_train_cont_tensor =   torch.tensor(X_train_cont,  dtype=torch.float)
y_train_tensor =        torch.tensor(y_train,       dtype=torch.float)
#
X_test_cat_tensor =     torch.tensor(X_test_cat,   dtype=torch.long)
X_test_cont_tensor =    torch.tensor(X_test_cont,  dtype=torch.float)
y_test_tensor =         torch.tensor(y_test,       dtype=torch.float)


In [20]:
diccionarioTabTrans = {}
tabTransformer = TabTransformer(
    categories=NUM_UNIQUE_CATEGORIES,
    num_continuous=NUM_CONTINOUS,
    dim=DIM,
    dim_out=DIM_OUT,
    depth=DEPTH,
    heads=HEADS
)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

### TabNet

### XGBoost

### MLP

### Evaluación de modelos

### Discusión + Conclusiones