## TabNet Implentation for Tabular Data

TabNet is proposed in [this article] (https://arxiv.org/abs/1908.07442) as a neural network architecture capable of learning a canonical representation of tabular data. This architecture has shown to perform well against the current gold-standard gradient boosting models for learning on tabular data.

TabNet uses a sequential attention mechanism to choose a subset of semantically meaningful
features to process at each decision step. Instance-wise feature selection enables efficient learning as the model capacity is fully used for the most salient features, and also yields
more interpretable decision making via visualization of selection masks. 


This implementation closely follows [the TabNet implementation in PyTorch linked here](https://github.com/dreamquark-ai/tabnet/tree/b6e1ebaf694f37ad40a6ba525aa016fd3cec15da). 

<img src="images/tabnet_schematic2.jpg" width="1000" height="800" align="center"/>


#### GLU Block

Gated Linear Units act as an attention mechanism where the gates formed involve taking two dense layer outputs, applying a sigmoid to one of them, and then multiplying them together

Following GLU blcok contains two dense layers, two ghost batch normalization layers, identity and sigmoid activation functions and multiplication operation.


### Feature Transformer Block

Builds two GLU blocks with a skip connection from the output of the first

<img src="images/tabnet_feature_transformer.jpg" width="700" height="500" align="center"/>

#### Attentive Transformer Block

Use TabNet prior as an input to layer and reserve to handle prior updates in TabNet step layer

> *prior is used to encourage orthogonal feature selection across decision steps, tell us what we know about features and how we have used them in the previous step

<img src="images/tabnet_attentive_transformer.jpg" width="200" height="200" align="center"/>

In [None]:
# ! pip install pytorch-tabnet

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import global_variables as gv
import utilities
from sklearn.preprocessing import StandardScaler, QuantileTransformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, f1_score, recall_score

from pytorch_tabnet.tab_model import TabNetClassifier
import torch

In [3]:
df = pd.read_csv('data/binary_full.csv')
pd.set_option('display.max_columns', None)
df.drop('Unnamed: 0', axis=1, inplace=True)
df.head()

Unnamed: 0,1319-0.0,1408-0.0,1329-0.0,1448-0.0,1538-0.0,6142-0.0,2050-0.0,1508-0.0,1339-0.0,30710-0.0,1349-0.0,30750-0.0,1468-0.0,20117-0.0,30740-0.0,1160-0.0,2090-0.0,31-0.0,1488-0.0,30850-0.0,4080-0.0,1369-0.0,21000-0.0,1200-0.0,1289-0.0,30790-0.0,845-0.0,48-0.0,30630-0.0,1299-0.0,1220-0.0,1548-0.0,1528-0.0,23099-0.0,49-0.0,30690-0.0,1389-0.0,2654-0.0,1249-0.0,1309-0.0,1379-0.0,1239-0.0,21003-0.0,30780-0.0,1438-0.0,30870-0.0,1359-0.0,30770-0.0,21001-0.0,1458-0.0,23100-0.0,6138-0.0,1418-0.0,1478-0.0,4079-0.0,30760-0.0,23101-0.0,2100-0.0,1428-0.0,30640-0.0,hypertension,outcome_cardiomyopathies,outcome_ischemic_heart_disease,outcome_heart_failure,outcome_peripheral_vascular_disease,outcome_cardiac_arrest,outcome_cerebral_infarction,outcome_arrhythmia,outcome_myocardial_infarction,CVD,age,sex,race,age-binned,race-binary
0,0.0,1.0,2.0,3.0,2.0,1.0,2.0,3.0,2.0,0.34,1.0,34.937,3.0,2.0,5.622,7.0,1.0,0.0,6.0,0.508,110.0,1.0,1001.0,3.0,6.0,54.4035,20.9,74.0,1.593,10.0,0.0,2.0,2.0,35.6,102.0,6.477,1.0,6.0,1.0,2.0,1.0,0.0,54.0,3.888,10.0,0.977,2.0,26.339,24.579,3.86,25.0,1.0,3.0,1.0,77.0,1.706,45.2,1.0,0.0,1.211,0,0,0,0,0,0,0,1,0,1,54.0,Female,British,50-59,1
1,0.0,3.0,2.0,1.0,0.0,1.0,1.0,2.0,2.0,3.94,4.0,40.9,5.0,2.0,5.052,9.0,0.0,1.0,2.0,13.088,166.0,2.0,1001.0,2.0,2.0,15.4,16.0,120.0,1.39,2.0,0.0,2.0,2.47,36.5,113.0,5.512,1.0,7.0,1.0,1.0,2.0,0.0,65.0,3.52,12.0,2.358,3.0,10.701,35.0861,7.0,42.9,3.0,2.0,1.0,91.0,1.173,74.6,0.0,1.0,1.019,1,0,1,0,0,0,0,0,1,0,65.0,Male,British,60-69,1
2,0.0,3.0,3.0,2.0,1.0,2.0,1.0,2.0,2.0,0.55,1.0,40.0,1.0,0.0,5.31,5.0,0.0,0.0,0.0,0.515,132.0,1.0,1001.0,3.0,2.0,32.1,16.0,66.0,2.005,4.0,0.0,1.0,1.0,29.5,88.0,7.079,1.0,7.0,3.0,4.0,2.0,0.0,69.0,4.227,8.0,0.655,2.0,10.693,19.3835,7.0,15.2,3.0,2.0,1.0,67.0,2.49,36.3,0.0,1.0,1.097,0,0,0,0,0,0,0,0,0,0,69.0,Female,British,60-69,1
3,3.0,3.0,3.0,3.0,0.0,2.0,1.0,2.0,2.0,0.45,2.0,37.3,4.0,2.0,4.449,7.0,0.0,1.0,5.0,4.675,178.0,2.0,1001.0,1.0,3.0,43.562,18.0,110.0,1.474,2.0,0.0,1.0,2.0,28.5,117.0,5.028,0.0,7.0,1.0,1.0,2.0,1.0,66.0,3.041,10.0,3.108,2.0,25.317,35.1281,7.0,31.7,3.0,2.0,1.0,84.0,1.169,79.6,0.0,3.0,0.923,0,0,0,0,0,0,0,0,0,0,66.0,Male,British,60-69,1
4,0.0,3.0,2.0,1.0,0.0,5.0,2.0,2.0,2.0,0.75,2.0,32.2,1.0,2.0,4.616,6.0,0.0,1.0,3.04,20.162,178.0,1.0,1001.0,3.0,1.0,71.11,22.38,94.0,2.149,1.0,0.0,2.0,2.0,24.8,100.0,7.958,1.0,7.0,2.0,1.0,1.0,0.0,48.0,4.983,8.0,1.173,1.0,26.523,25.8866,1.0,20.1,1.0,2.0,1.0,88.0,2.053,61.0,0.0,3.0,1.443,0,0,0,0,0,0,0,0,0,0,48.0,Male,British,40-49,1


### Test TabNet Binary Classifier out-of-the-box

In [4]:
X_train1, X_val1, X_test1, y_train1, y_val1, y_test1 = utilities.process_features(df, 'CVD', QuantileTransformer(output_distribution='uniform'), one_hot=False)
X_train1, y_train1= utilities.resample_data(X_train1, y_train1, 'under')

X_train= X_train1.to_numpy()
X_val= X_val1.to_numpy()
X_test= X_test1.to_numpy()

y_train= y_train1.to_numpy().astype(int)
y_val= y_val1.to_numpy().astype(int)
y_test= y_test1.to_numpy().astype(int)

In [4]:
y_test

array([1, 0, 0, ..., 1, 0, 0])

In [None]:
from pytorch_tabnet.metrics import Metric
from keras import backend as K
class my_recall(Metric):
    def __init__(self):
        self._name = "recall"
        self._maximize = True

    def __call__(self, y_true, y_score):
        return recall_score(y_true, y_score[:,1])

In [None]:
clf = TabNetClassifier()  


clf.fit(X_train, y_train,
  eval_set=[(X_val, y_val)],
  eval_metric=["auc"]
)

preds = clf.predict(X_test)

In [None]:
df['CVD'].value_counts()

In [None]:
# plot losses
plt.plot(clf.history['loss'])
plt.title('Basic TabNet Loss CVD')
plt.savefig('charts/models/TabNet/basic_loss.png', bbox_inches='tight')

### Global Explainability

In [None]:
feat_weights = clf.feature_importances_

# zip to feature names
input_cols = X_train1.columns.to_list()
feat_dict = dict(zip(input_cols, feat_weights))

In [None]:
import operator
sorted_d = dict( sorted(feat_dict.items(), key=operator.itemgetter(1),reverse=True))

In [None]:
top = dict()
# Iterate over all the items in dictionary and filter items which has even keys
for (key, value) in sorted_d.items():
   # Check if key is even then add pair to new dictionary
   if value >= 0.01:
        top[key] = value
print('Top Contritbuting Features : ')
replaced_list = [x if x not in gv.input_mapping else gv.input_mapping[x] for x in list(top.keys()) ]
print(replaced_list)

In [None]:
no_contribution = dict()
# Iterate over all the items in dictionary and filter items which has even keys
for (key, value) in sorted_d.items():
   # Check if key is even then add pair to new dictionary
   if value ==0:
        no_contribution[key] = value
print('Non-contritbuting Features : ')
replaced_list2 = [x if x not in gv.input_mapping else gv.input_mapping[x] for x in list(no_contribution.keys()) ]
print(replaced_list2)

In [None]:
len(replaced_list2)

In [None]:
lists = sorted(feat_dict.items()) # sorted by key, return a list of tuples

x, y = zip(*lists) # unpack a list of pairs into two tuples

frame1 = plt.gca()
plt.plot(x, y)
plt.title('Global Feature Importances')
frame1.axes.xaxis.set_ticklabels([])
plt.show()
plt.savefig('charts/models/TabNet/basic_global_features.png', bbox_inches='tight')

### Local Explainablity

In [None]:
preds = clf.predict_proba(X_test)
test_auc = roc_auc_score(y_score=preds[:,1], y_true=y_test)
test_auc

In [None]:
explain_matrix, masks = clf.explain(X_test)

fig, axs = plt.subplots(1, 3, figsize=(20,20))

for i in range(3):
    axs[i].imshow(masks[i][:50])
    axs[i].set_title(f"mask {i}")

plt.savefig('charts/models/TabNet/basic_masks.png')

## Customize Model Parameters

#### Fit parameters

<ul>
  <li> <b>X_train</b> (np.array): Training Features </li>
  <li> <b>y_train</b> (np.array): Training Targets </li>
  <li> <b>eval_set</b> (list of eval tuple set):  last one used for early stopping </li>
  <li> <b>eval_name</b> (list of str): list of eval set names </li>
  <li> <b>eval_metric</b> (list of str: list of evaluation metrics; last used for early stopping </li>
  <li> <b>max_epochs</b> (int=200): max epochs for training</li>
  <li> <b>patience</b> (int=10):#epochs before early stopping, if 0 then no early stopping performed </li>
  <li> <b>weights</b> (int or dict=0): only for TabNetClassifier, sampling param 0 => no sampling, param 0 => automated sampling with inverse class occurences </li>
  <li> <b>loss_fn</b>(torch.loss): loss fn for training, w classification can set a list of same length as num tasks  </li>
  <li> <b>batch_size</b> (int=1024): #  examples/batch </li>
  <li> <b>virtual_batch_size</b> (int=128): size of mini batches for ghost batch normalization  </li>
  <li> <b>num_workers</b> (int=0): # workers used in torch.utils.data.Dataloader  </li>
  <li> <b>drop_last</b> (bool=False): whether to drop last batch if not complete during training  </li>
  <li> <b>callbacks</b> (list of callback fn): list of custom callbacks </li>
  <li> <b>pretraining_ratio</b> (float): %input features to mask during pretraining  </li>
  <li> <b>warm_start</b> (bool=False): allows to fit twice the same model and start from a warm start  </li>

</ul>

In [None]:
METRICS = [utilities.recall_m]

In [None]:
cat_idxs = [ i for i, f in enumerate(features) if f in categorical_cols]

cat_dims = [ categorical_dims[f] for i, f in enumerat

In [None]:
tabnet_params = {"cat_idxs":cat_idxs, # list of categorical feature indices
                 "cat_dims":cat_dims, # list of categorical features number of modalities (#unique values for a categorical feature)
                 "cat_emb_dim":1, # list of embeddings size for each categorical features
                 "optimizer_fn":torch.optim.Adam, # pytorch optimizer function
                 "optimizer_params":dict(lr=2e-2), # parameters compatible with optimizer_fn
                 "scheduler_params":{"step_size":50, # how to use learning rate scheduler
                                 "gamma":0.9}, # dictionary of parameters to apply to the scheduler
                 "scheduler_fn":torch.optim.lr_scheduler.StepLR,
                 "mask_type":'entmax' # "sparsemax" # either sparsemax or entmac, masking fn for selecting features
                }


In [9]:
from pytorch_tabnet.metrics import Metric
from keras import backend as K
from sklearn.metrics import recall_score

# custom metrics
class Recall(Metric):
    def __init__(self):
        self._name = "recall"
        self._maximize = True
        
    def __call__(self, y_true, y_pred):
        return recall_score(y_true, y_pred)
    

def precision_m(y_true, y_pred):
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / predicted_positives 
    return precision

def f1_m(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

In [5]:
# build model
clf = TabNetClassifier(
    scheduler_fn=torch.optim.lr_scheduler.ReduceLROnPlateau,
    scheduler_params={"mode":'max', # max because default eval metric for binary is AUC
                 "factor":0.1,
                 "patience":1}
)  


Device used : cpu


In [6]:
# fit model
clf.fit(
    X_train=X_train, y_train=y_train,
    eval_set=[(X_train, y_train), (X_val, y_val)],
    eval_name=['train', 'val'],
    eval_metric=["auc", 'accuracy'],
#     eval_metric=[Recall],

    max_epochs=100 , patience=0,
    batch_size=400,
    virtual_batch_size=128,
    num_workers=0,
    weights=1,
    drop_last=False
) 

No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 0.64617 | train_auc: 0.72594 | train_accuracy: 0.67684 | val_auc: 0.73309 | val_accuracy: 0.67462 |  0:00:12s
epoch 1  | loss: 0.60559 | train_auc: 0.7412  | train_accuracy: 0.68213 | val_auc: 0.74964 | val_accuracy: 0.67636 |  0:00:28s
epoch 2  | loss: 0.60208 | train_auc: 0.74534 | train_accuracy: 0.68298 | val_auc: 0.75159 | val_accuracy: 0.63589 |  0:00:48s
epoch 3  | loss: 0.59891 | train_auc: 0.74656 | train_accuracy: 0.6882  | val_auc: 0.75322 | val_accuracy: 0.65836 |  0:01:08s
epoch 4  | loss: 0.59823 | train_auc: 0.75073 | train_accuracy: 0.69108 | val_auc: 0.75803 | val_accuracy: 0.69878 |  0:01:28s
epoch 5  | loss: 0.59646 | train_auc: 0.75112 | train_accuracy: 0.69166 | val_auc: 0.75827 | val_accuracy: 0.69876 |  0:01:47s
epoch 6  | loss: 0.59507 | train_auc: 0.75191 | train_accuracy: 0.69246 | val_auc: 0.75905 | val_accuracy: 0.69793 |  0:02:09s
epoch 7  | loss: 0.59351 | train_auc: 

epoch 64 | loss: 0.59236 | train_auc: 0.75215 | train_accuracy: 0.69252 | val_auc: 0.75904 | val_accuracy: 0.69246 |  0:21:16s
epoch 65 | loss: 0.59321 | train_auc: 0.7522  | train_accuracy: 0.69244 | val_auc: 0.759   | val_accuracy: 0.69206 |  0:21:33s
epoch 66 | loss: 0.59657 | train_auc: 0.75217 | train_accuracy: 0.69243 | val_auc: 0.75886 | val_accuracy: 0.69263 |  0:21:49s
epoch 67 | loss: 0.59478 | train_auc: 0.75215 | train_accuracy: 0.69244 | val_auc: 0.75881 | val_accuracy: 0.69085 |  0:22:06s
epoch 68 | loss: 0.5937  | train_auc: 0.75222 | train_accuracy: 0.69257 | val_auc: 0.75902 | val_accuracy: 0.69032 |  0:22:21s
epoch 69 | loss: 0.59324 | train_auc: 0.75214 | train_accuracy: 0.69249 | val_auc: 0.7589  | val_accuracy: 0.69094 |  0:22:39s
epoch 70 | loss: 0.59318 | train_auc: 0.75215 | train_accuracy: 0.69232 | val_auc: 0.75895 | val_accuracy: 0.69195 |  0:22:57s
epoch 71 | loss: 0.59173 | train_auc: 0.75217 | train_accuracy: 0.69252 | val_auc: 0.759   | val_accuracy: 0.69

In [16]:
preds = clf.predict(X_test)
test_recall = recall_score( y_true=y_test, y_pred=preds)
test_recall

0.6888554863646124

In [14]:
preds

array([0, 0, 0, ..., 0, 0, 1])

In [15]:
y_test

array([1, 0, 0, ..., 1, 0, 0])

### Save & Load TabNet Classifier

In [17]:
# save tabnet model
saving_path_name = "saved_models/binary_tabnet_model_1"
saved_filepath = clf.save_model(saving_path_name)

# define new model with basic parameters and load state dict weights
# loaded_clf = TabNetClassifier()
# loaded_clf.load_model(saved_filepath)

Successfully saved model at saved_models/binary_tabnet_model_1.zip
