# TTML Basic Usage Example

This notebook demonstrates the basic usage of the Tabular Transformer (TTML) model using the Titanic dataset. We'll cover:

1. Loading and preprocessing data
2. Configuring and initializing the model
3. Training the model
4. Making predictions
5. Basic evaluation

In [1]:
import sys
import os
import numpy as np
import pandas as pd
import torch
from sklearn.metrics import accuracy_score, classification_report

# Import TTML modules
from tabular_transformer.models import TabularTransformer
from tabular_transformer.models.task_heads import ClassificationHead
from tabular_transformer.training import Trainer
from tabular_transformer.inference import predict
from tabular_transformer.utils.config import TransformerConfig
from tabular_transformer.data.dataset import TabularDataset

# Import data utilities
from data_utils import download_titanic_dataset, prepare_dataset

## 1. Load and Preprocess Data

First, we'll download the Titanic dataset and prepare it for training.

In [2]:
# Download Titanic dataset
df = download_titanic_dataset(save_csv=False)
print("Dataset shape:", df.shape)
print("\nFeature types:")
print(df.dtypes)

Dataset shape: (1309, 14)

Feature types:
pclass          int64
survived     category
name           object
sex          category
age           float64
sibsp           int64
parch           int64
ticket         object
fare          float64
cabin          object
embarked     category
boat           object
body          float64
home.dest      object
dtype: object


In [3]:
# Identify numeric and categorical columns
numeric_features = df.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_features = df.select_dtypes(include=['object']).columns.tolist()

# Remove target column from features
target_column = 'survived'
if target_column in numeric_features:
    numeric_features.remove(target_column)
if target_column in categorical_features:
    categorical_features.remove(target_column)

# Create train/test datasets
train_dataset, test_dataset, _ = TabularDataset.from_dataframe(
    dataframe=df,
    numeric_columns=numeric_features,
    categorical_columns=categorical_features,
    target_columns={'main': [target_column]},
    validation_split=0.2,
    random_state=42
)

2025-03-17 13:16:50,694 - tabular_transformer.FeaturePreprocessor - INFO - Fitted numeric scaler to 6 columns
2025-03-17 13:16:50,698 - tabular_transformer.FeaturePreprocessor - INFO - Column name: 1307 categories, embedding dim 50
2025-03-17 13:16:50,701 - tabular_transformer.FeaturePreprocessor - INFO - Column ticket: 929 categories, embedding dim 50
2025-03-17 13:16:50,704 - tabular_transformer.FeaturePreprocessor - INFO - Column cabin: 187 categories, embedding dim 50
2025-03-17 13:16:50,706 - tabular_transformer.FeaturePreprocessor - INFO - Column boat: 28 categories, embedding dim 14
2025-03-17 13:16:50,711 - tabular_transformer.FeaturePreprocessor - INFO - Column home.dest: 370 categories, embedding dim 50
2025-03-17 13:16:50,741 - tabular_transformer.TabularDataset - INFO - Created dataset with 1048 samples, 6 numeric features, 5 categorical features, 1 tasks
2025-03-17 13:16:50,756 - tabular_transformer.TabularDataset - INFO - Created dataset with 261 samples, 6 numeric featur

## 2. Configure and Initialize Model

Now we'll set up the TTML model with a classification head for the survival prediction task.

In [4]:
# Get feature dimensions from preprocessor
feature_dims = train_dataset.preprocessor.get_feature_dimensions()
numeric_dim = feature_dims['numeric_dim']
categorical_dims = feature_dims['categorical_dims']
categorical_embedding_dims = feature_dims['categorical_embedding_dims']

# Model configuration
config = TransformerConfig(
    embed_dim=64,
    num_heads=4,
    num_layers=2,
    dropout=0.1,
    variational=False
)

# Initialize transformer encoder
encoder = TabularTransformer(
    numeric_dim=numeric_dim,
    categorical_dims=categorical_dims,
    categorical_embedding_dims=categorical_embedding_dims,
    config=config
)

# Initialize classification head
task_head = ClassificationHead(
    name="main",  # Task name should match the key in target_columns
    input_dim=64,  # Should match config.embed_dim
    num_classes=2  # Binary classification for survival
)

## 3. Train the Model

We'll use the TTML Trainer to train our model.

In [None]:
# Create data loaders
train_loader = train_dataset.create_dataloader(batch_size=32, shuffle=True)
test_loader = test_dataset.create_dataloader(batch_size=32, shuffle=False)

# Initialize trainer
trainer = Trainer(
    encoder=encoder,
    task_head={'main': task_head},  # Map task head to task name
    optimizer=None,  # Will be created by trainer
    device=None  # Will use CUDA if available
)

# Train the model
history = trainer.train(
    train_loader=train_loader,
    val_loader=test_loader,
    num_epochs=10,
    early_stopping_patience=3
)

## 4. Make Predictions

Let's use our trained model to make predictions on the test set.

In [None]:
# Make predictions
predictions = trainer.predict(test_loader)

# Get predictions for the main task
y_pred = predictions['main']['predicted_class'].numpy()
y_test = test_dataset.targets['main']

## 5. Evaluate Results

Finally, let's evaluate our model's performance.

In [None]:
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Display detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

## Conclusion

This notebook demonstrated the basic usage of the TTML model for a binary classification task using the Titanic dataset. The model achieved reasonable performance in predicting survival outcomes.

For more advanced usage and different tasks, check out the other example notebooks:
- classification_examples.ipynb
- regression_examples.ipynb
- clustering_examples.ipynb
- survival_analysis.ipynb
- multi_task_examples.ipynb