## TabNet and Transformer-Based Models for Tabular Data - Part 06

+ TabNet and Transformer-based models are advanced architectures designed to handle tabular data more effectively than traditional neural networks.
+ They utilize attention mechanisms and novel architectures to capture complex patterns in the data, often leading to improved performance in many machine learning tasks.

**1. TabNet: An Advanced Model for Tabular Data**

+ TabNet is a deep learning model specifically designed for tabular data.
+ Developed by Google Research, it leverages a self-supervised learning approach and uses sequential attention mechanisms to select the most relevant features for each decision step, providing interpretability while maintaining high accuracy.

**Key Features of TabNet:**

+ `Attention Mechanism`: TabNet uses a novel sparse attention mechanism that allows the model to select which features to focus on at each decision step, mimicking the decision-making process of an expert.
+ `Interpretable Model`: The attention mechanism enables interpretability by showing which features the model is focusing on when making a prediction.
+ `Efficient Training`: TabNet uses efficient architecture for faster training on tabular data while maintaining accuracy.
+ `Combines Supervised and Self-Supervised Learning`: It supports both types of learning, making it flexible for various tasks.

**How TabNet Works:**

+ `Feature Selection`: At each decision step, TabNet selects a subset of the input features using a learnable mask, which helps in focusing on the most important features for the prediction.
+ `Decision Steps`: The model is composed of multiple decision steps, where each step builds on the previous one, allowing the model to capture complex patterns in the data.
+ `Sparse Feature Masking`: Only a fraction of features is selected at each decision step, promoting interpretability and efficiency.

In [5]:
## import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from pytorch_tabnet.tab_model import TabNetClassifier
from sklearn.metrics import classification_report

In [6]:
## load and process the data
# Load your dataset
data = pd.read_csv('no_missing_values_customer_data.csv')

# Convert the target variable 'Churn' to numeric
data['Churn'] = data['Churn'].map({'Yes': 1, 'No': 0})

# Encode categorical variables using Label Encoding
for col in data.select_dtypes(include=['object']).columns:
    if col != 'customerID':
        data[col] = LabelEncoder().fit_transform(data[col])

# Separate features and target
features = data.drop(['Churn', 'customerID'], axis=1).values
target = data['Churn'].values

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

In [9]:
## model development
# Initialize TabNetClassifier
tabnet_clf = TabNetClassifier()

# Train TabNet model
tabnet_clf.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    eval_metric=['accuracy'],
    max_epochs=100,
    patience=10,
    batch_size=256,
    virtual_batch_size=128
)

# Make predictions
y_pred = tabnet_clf.predict(X_test)
print("Classification Report for TabNet:\n", classification_report(y_test, y_pred))



epoch 0  | loss: 0.55888 | val_0_accuracy: 0.72534 |  0:00:02s
epoch 1  | loss: 0.46961 | val_0_accuracy: 0.74592 |  0:00:04s
epoch 2  | loss: 0.45274 | val_0_accuracy: 0.78353 |  0:00:06s
epoch 3  | loss: 0.45175 | val_0_accuracy: 0.79063 |  0:00:07s
epoch 4  | loss: 0.43802 | val_0_accuracy: 0.78282 |  0:00:09s
epoch 5  | loss: 0.43967 | val_0_accuracy: 0.79063 |  0:00:11s
epoch 6  | loss: 0.43142 | val_0_accuracy: 0.78992 |  0:00:13s
epoch 7  | loss: 0.43401 | val_0_accuracy: 0.79702 |  0:00:15s
epoch 8  | loss: 0.43634 | val_0_accuracy: 0.79844 |  0:00:17s
epoch 9  | loss: 0.42908 | val_0_accuracy: 0.79915 |  0:00:19s
epoch 10 | loss: 0.4316  | val_0_accuracy: 0.79276 |  0:00:21s
epoch 11 | loss: 0.43058 | val_0_accuracy: 0.79418 |  0:00:22s
epoch 12 | loss: 0.42625 | val_0_accuracy: 0.78992 |  0:00:24s
epoch 13 | loss: 0.42809 | val_0_accuracy: 0.79844 |  0:00:26s
epoch 14 | loss: 0.42762 | val_0_accuracy: 0.79347 |  0:00:28s
epoch 15 | loss: 0.42729 | val_0_accuracy: 0.80412 |  0



Classification Report for TabNet:
               precision    recall  f1-score   support

           0       0.85      0.91      0.88      1036
           1       0.68      0.55      0.60       373

    accuracy                           0.81      1409
   macro avg       0.76      0.73      0.74      1409
weighted avg       0.80      0.81      0.80      1409

