# Titanic Survival Prediction: MLP with PyTorch Lightning & Optuna

This notebook contains an end-to-end workflow for the Kaggle Titanic competition. The main steps are:,
1.  **Preprocessing**: Clean the data and engineer new features.,
2.  **Data Module**: Set up a PyTorch Lightning `DataModule` to handle datasets and dataloaders.,
3.  **Model Definition**: Define the MLP architecture using PyTorch Lightning.,
4.  **Hyperparameter Tuning**: Use Optuna to find the best hyperparameters for the model.,
5.  **Final Training & Prediction**: Train the best model on all data and generate the submission file."

##    Imports

In [27]:
import pandas as pd
import numpy as np
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
import torchmetrics
import optuna
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
import lightgbm as lgb
from pytorch_lightning.callbacks import EarlyStopping
import re
import warnings
import logging
import json
import os

# Suppress unnecessary warnings for cleaner output
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)
optuna.logging.set_verbosity(optuna.logging.WARNING)

## 1. Preprocessing

We define a single function to handle all data preprocessing steps consistently.

In [None]:
def preprocess_data(train_path='data/train.csv', test_path='data/test.csv'):
    """
    Loads, cleans, and feature-engineers the Titanic dataset.

    This function handles:
    - Loading train and test CSVs.
    - Engineering features like Title, FamilySize, IsAlone, and Deck.
    - Imputing missing values for Age, Fare, and Embarked.
    - Returning separate, processed dataframes for training and prediction,
      along with metadata useful for modeling.

    Args:
        train_path (str): The file path for the training data.
        test_path (str): The file path for the test data.

    Returns:
        tuple: A tuple containing:
            - X_full (pd.DataFrame): The full processed training feature set.
            - y_full (pd.Series): The full training target variable.
            - X_predict (pd.DataFrame): The full processed test feature set.
            - test_passenger_ids (pd.Series): The PassengerIds for the test set.
            - numerical_features (list): A list of numerical column names.
            - categorical_features (list): A list of categorical column names.
    """
    print("--- Loading and Preprocessing Data ---")
    try:
        train_df_orig = pd.read_csv(train_path)
        test_df_orig = pd.read_csv(test_path)
    except FileNotFoundError as e:
        print(f"Error: Could not find {e.filename}. Please ensure train.csv and test.csv are in the correct directory.")
        raise

    test_passenger_ids = test_df_orig['PassengerId']
    y_full = train_df_orig['Survived'].copy()
    
    combined_df = pd.concat([train_df_orig.drop('Survived', axis=1), test_df_orig], ignore_index=True)

    # Feature Engineering
    def engineer_features(df):
        df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\\.', expand=False)
        common_titles = ['Mr', 'Miss', 'Mrs', 'Master']
        df['Title'] = df['Title'].apply(lambda x: x if x in common_titles else 'Other')
        df['FamilySize'] = df['SibSp'] + df['Parch'] + 1
        df['IsAlone'] = (df['FamilySize'] == 1).astype(int)
        df['Deck'] = df['Cabin'].apply(lambda s: s[0] if pd.notnull(s) else 'U')
        return df

    combined_df = engineer_features(combined_df)

    # Imputation 
    combined_df['Embarked'].fillna(combined_df['Embarked'].mode()[0], inplace=True)

    # First, calculate the grouped medians. This might have NaNs for groups where all ages are missing.
    age_median_map = combined_df.groupby(['Pclass', 'Title'])['Age'].median()
    # Then, calculate a global median to use as a fallback.
    global_age_median = combined_df['Age'].median()
    # Fill any NaNs in the median map itself with the global median.
    age_median_map.fillna(global_age_median, inplace=True)
    # Now, we can safely apply this robust map to fill NaNs in the main dataframe.
    combined_df['Age'] = combined_df.apply(
        lambda row: age_median_map.loc[row['Pclass'], row['Title']] if pd.isnull(row['Age']) else row['Age'],
        axis=1
    )

    # Robust Fare Imputation
    combined_df['Fare'] = combined_df.groupby('Pclass')['Fare'].transform(lambda x: x.fillna(x.median()))

    # Final Feature Selection 
    numerical_features = ['Age', 'Fare', 'FamilySize', 'IsAlone']
    categorical_features = ['Pclass', 'Sex', 'Embarked', 'Title', 'Deck']
    
    combined_df_processed = combined_df[numerical_features + categorical_features]

    # Final check for any remaining NaNs
    if combined_df_processed.isnull().sum().sum() > 0:
        print("Error: NaNs still present after preprocessing!")
        print(combined_df_processed.isnull().sum())
        raise ValueError("Preprocessing failed, NaNs remain in the data.")
    
    # Split back into final training and prediction sets 
    X_full = combined_df_processed.iloc[:len(train_df_orig)]
    X_predict = combined_df_processed.iloc[len(train_df_orig):]

    print("Preprocessing complete.")
    
    return X_full, y_full, X_predict, test_passenger_ids, numerical_features, categorical_features

In [19]:
print("Running preprocessing script as a standalone test...")
X_train, y_train, X_test, _, _, _ = preprocess_data()
print("\nShape of processed training features (X_full):", X_train.shape)
print("Shape of training labels (y_full):", y_train.shape)
print("Shape of processed test features (X_predict):", X_test.shape)
print("\nFirst 5 rows of processed training data:")
print(X_train.head())

Running preprocessing script as a standalone test...
--- Loading and Preprocessing Data ---
Preprocessing complete.

Shape of processed training features (X_full): (891, 9)
Shape of training labels (y_full): (891,)
Shape of processed test features (X_predict): (418, 9)

First 5 rows of processed training data:
    Age     Fare  FamilySize  IsAlone  Pclass     Sex Embarked Title Deck
0  22.0   7.2500           2        0       3    male        S    Mr    U
1  38.0  71.2833           2        0       1  female        C   Mrs    C
2  26.0   7.9250           1        1       3  female        S  Miss    U
3  35.0  53.1000           2        0       1  female        S   Mrs    C
4  35.0   8.0500           1        1       3    male        S    Mr    U


## 3. PyTorch Lightning DataModule

This module encapsulates all data handling logic: loading, splitting, scaling, and creating `DataLoaders`.

In [None]:
class TitanicDataset(Dataset):
    def __init__(self, X, y=None):
        self.X = torch.tensor(X.values, dtype=torch.float32),
        self.y = torch.tensor(y.values, dtype=torch.float32).unsqueeze(1) if y is not None else None,

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        if self.y is not None:
            return self.X[idx], self.y[idx]
        else:
            return self.X[idx]

class TitanicDataModule(pl.LightningDataModule):
    def __init__(self, X_full, y_full, X_predict, batch_size=64, num_workers=0):
        super().__init__()
        self.save_hyperparameters(ignore=['X_full', 'y_full', 'X_predict'])
        self.X_full = X_full
        self.y_full = y_full
        self.X_predict = X_predict
        self.scaler = StandardScaler()

    def setup(self, stage=None):
        # Scale numerical features based on the training set
        self.X_full_scaled = self.scaler.fit_transform(self.X_full)
        self.X_predict_scaled = self.scaler.transform(self.X_predict)

        # Split data for training, validation, and testing
        X_train, X_temp, y_train, y_temp = train_test_split(self.X_full_scaled, self.y_full, test_size=0.3, random_state=42, stratify=self.y_full)
        X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)

        self.train_ds = TitanicDataset(pd.DataFrame(X_train), y_train)
        self.val_ds = TitanicDataset(pd.DataFrame(X_val), y_val)
        self.test_ds = TitanicDataset(pd.DataFrame(X_test), y_test)
        self.predict_ds = TitanicDataset(pd.DataFrame(self.X_predict_scaled))

    def train_dataloader(self):
        return DataLoader(self.train_ds, batch_size=self.hparams.batch_size, shuffle=True, num_workers=self.hparams.num_workers)

    def val_dataloader(self):
        return DataLoader(self.val_ds, batch_size=self.hparams.batch_size, num_workers=self.hparams.num_workers)

    def test_dataloader(self):
        return DataLoader(self.test_ds, batch_size=self.hparams.batch_size, num_workers=self.hparams.num_workers)
    
    def predict_dataloader(self):
        return DataLoader(self.predict_ds, batch_size=self.hparams.batch_size, num_workers=self.hparams.num_workers)

## 4. PyTorch Lightning Model (FT Transformer)

Here we define our FT Transformer model. It's a `LightningModule`, which organizes the model architecture, training logic, and optimizer configuration.

In [None]:
class FeatureTokenizer(nn.Module):
    """
    Tokenizes numerical and categorical features into a sequence of embeddings.
    """
    def __init__(self, num_numerical_features, cat_cardinalities, embed_dim):
        super().__init__()
        self.num_linears = nn.ModuleList([nn.Linear(1, embed_dim) for _ in range(num_numerical_features)])
        self.cat_embeddings = nn.ModuleList([nn.Embedding(num_classes, embed_dim) for num_classes in cat_cardinalities])
        self.cls_token = nn.Parameter(torch.randn(1, 1, embed_dim))

    def forward(self, x_num, x_cat):
        batch_size = x_num.shape[0]
        
        # Tokenize numerical features
        num_token_list = []
        x_num_unsqueezed = x_num.unsqueeze(-1)
        for i, linear_layer in enumerate(self.num_linears):
            num_token_list.append(linear_layer(x_num_unsqueezed[:, i, :]))
        
        # Tokenize categorical features
        cat_token_list = []
        x_cat_long = x_cat.long()
        for i, embedding_layer in enumerate(self.cat_embeddings):
            cat_token_list.append(embedding_layer(x_cat_long[:, i]))
        
        num_tokens = torch.stack(num_token_list, dim=1)
        cat_tokens = torch.stack(cat_token_list, dim=1)
        
        # Concatenate all feature tokens
        feature_tokens = torch.cat([num_tokens, cat_tokens], dim=1)
        
        # Prepend the [CLS] token
        cls_tokens = self.cls_token.expand(batch_size, -1, -1)
        tokens = torch.cat([cls_tokens, feature_tokens], dim=1)
        
        return tokens

class TransformerBlock(nn.Module):
    """
    A standard Transformer block with Multi-Head Attention and a Feed-Forward Network.
    """
    def __init__(self, embed_dim, num_heads, ff_dim, dropout=0.1):
        super().__init__()
        self.attn = nn.MultiheadAttention(embed_dim, num_heads, dropout=dropout, batch_first=True)
        self.ffn = nn.Sequential(
            nn.Linear(embed_dim, ff_dim),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(ff_dim, embed_dim)
        )
        self.norm1 = nn.LayerNorm(embed_dim)
        self.norm2 = nn.LayerNorm(embed_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        attn_out, _ = self.attn(x, x, x)
        x = self.norm1(x + self.dropout(attn_out))
        ffn_out = self.ffn(x)
        x = self.norm2(x + self.dropout(ffn_out))
        return x

class TransformerEncoder(nn.Module):
    """
    A stack of TransformerBlocks.
    """
    def __init__(self, embed_dim, num_heads, ff_dim, num_layers, dropout=0.1):
        super().__init__()
        self.layers = nn.ModuleList([
            TransformerBlock(embed_dim, num_heads, ff_dim, dropout) for _ in range(num_layers)
        ])
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

class PredictionHead(nn.Module):
    """
    The final prediction head that takes the [CLS] token output.
    """
    def __init__(self, embed_dim, output_dim):
        super().__init__()
        self.fc = nn.Sequential(
            nn.LayerNorm(embed_dim),
            nn.ReLU(),
            nn.Linear(embed_dim, output_dim)
        )
    def forward(self, x):
        # Use the output of the CLS token (the first token) for prediction
        cls_token_output = x[:, 0]
        return self.fc(cls_token_output)

# Main LightningModule for the FT-Transformer 
class FTTransformerFromScratch(pl.LightningModule):
    """
    The main PyTorch Lightning module that combines all the building blocks
    and defines the training, validation, and prediction logic.
    """
    def __init__(self, num_numerical, cat_cardinalities, embed_dim, num_heads, ff_dim, num_layers, lr, weight_decay, dropout):
        super().__init__()
        self.save_hyperparameters()
        
        # Build the model from the blocks
        self.tokenizer = FeatureTokenizer(num_numerical, cat_cardinalities, embed_dim)
        self.encoder = TransformerEncoder(embed_dim, num_heads, ff_dim, num_layers, dropout)
        self.head = PredictionHead(embed_dim, 1) # Output dim is 1 for binary classification
        
        # Define loss function and metrics
        self.loss_fn = nn.BCEWithLogitsLoss()
        self.val_acc = torchmetrics.Accuracy(task="binary")

    def forward(self, x_num, x_cat):
        x = self.tokenizer(x_num, x_cat)
        x = self.encoder(x)
        output = self.head(x)
        return output
        
    def training_step(self, batch, batch_idx):
        x_num, x_cat, y = batch
        logits = self(x_num, x_cat)
        loss = self.loss_fn(logits, y)
        self.log("train_loss", loss, on_step=False, on_epoch=True)
        return loss

    def validation_step(self, batch, batch_idx):
        x_num, x_cat, y = batch
        logits = self(x_num, x_cat)
        loss = self.loss_fn(logits, y)
        self.val_acc(logits, y)
        self.log("val_loss", loss, on_step=False, on_epoch=True)
        self.log("val_acc", self.val_acc, on_step=False, on_epoch=True, prog_bar=True)
        return loss
    
    def predict_step(self, batch, batch_idx, dataloader_idx=0):
        x_num, x_cat = batch
        logits = self(x_num, x_cat)
        preds = torch.round(torch.sigmoid(logits))
        return preds

    def configure_optimizers(self):
        return torch.optim.AdamW(self.parameters(), lr=self.hparams.lr, weight_decay=self.hparams.weight_decay)

## 5. Hyperparameter Tuning with Optuna

Here we define the optuna objective functions for all 3 models (FT Transformer, LGBM, Simple Logistic Regression) to find the best Hyperparams

In [25]:
def objective_ft_transformer(trial, X_train, y_train, X_val, y_val, numerical_features, categorical_features):
    """Objective function for tuning the from-scratch FT-Transformer."""
    X_train_ft, X_val_ft = X_train.copy(), X_val.copy()
    scaler = StandardScaler()
    X_train_ft[numerical_features] = scaler.fit_transform(X_train_ft[numerical_features])
    X_val_ft[numerical_features] = scaler.transform(X_val_ft[numerical_features])
    
    cat_maps = {col: X_train_ft[col].astype('category').cat.categories for col in categorical_features}
    cardinalities = [len(cat_maps[col]) for col in categorical_features]
    
    for col in categorical_features:
        X_train_ft[col] = pd.Categorical(X_train_ft[col], categories=cat_maps[col]).codes
        X_val_ft[col] = pd.Categorical(X_val_ft[col], categories=cat_maps[col]).codes

    train_ds = TitanicFTDataset(X_train_ft[numerical_features], X_train_ft[categorical_features], y_train)
    val_ds = TitanicFTDataset(X_val_ft[numerical_features], X_val_ft[categorical_features], y_val)
    train_loader = DataLoader(train_ds, batch_size=32, shuffle=True)
    val_loader = DataLoader(val_ds, batch_size=32)

    # Hyperparameter suggestions for the FT-Transformer
    embed_dim = trial.suggest_categorical("embed_dim", [128, 192, 256])
    num_heads = trial.suggest_categorical("num_heads", [4, 8])
    if embed_dim % num_heads != 0: raise optuna.exceptions.TrialPruned("embed_dim must be divisible by num_heads.")
    
    ff_dim = embed_dim * trial.suggest_categorical("ff_dim_factor", [2, 4])
    num_layers = trial.suggest_int("num_layers", 2, 5)
    lr = trial.suggest_float("lr", 1e-5, 1e-3, log=True)
    weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-3, log=True)
    dropout = trial.suggest_float("dropout", 0.0, 0.4)

    model = FTTransformerFromScratch(
        num_numerical=len(numerical_features), 
        cat_cardinalities=cardinalities, 
        embed_dim=embed_dim, 
        num_heads=num_heads, 
        ff_dim=ff_dim, 
        num_layers=num_layers, 
        lr=lr, 
        weight_decay=weight_decay, 
        dropout=dropout
    )
    trainer = pl.Trainer(max_epochs=30, accelerator="auto", callbacks=[EarlyStopping(monitor="val_loss", patience=5)], logger=False, enable_progress_bar=False, enable_model_summary=False)
    
    try:
        trainer.fit(model, train_loader, val_loader)
        return trainer.callback_metrics.get("val_acc", 0.0).item()
    except Exception as e:
        print(f"Trial failed for FT-Transformer with error: {e}")
        return 0.0 # Return a low score to Optuna


def objective_lgbm(trial, X_train, y_train, X_val, y_val):
    """Objective function for tuning LightGBM."""
    params = {
        'objective': 'binary', 'verbosity': -1,
        'n_estimators': trial.suggest_int('n_estimators', 100, 2000), 
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'num_leaves': trial.suggest_int('num_leaves', 20, 300), 
        'max_depth': trial.suggest_int('max_depth', 3, 12),
        'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True), 
        'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.4, 1.0), 
        'subsample': trial.suggest_float('subsample', 0.4, 1.0),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100)
    }
    
    X_train_lgbm, X_val_lgbm = X_train.copy(), X_val.copy()
    for col in X_train_lgbm.select_dtypes(include=['object', 'category']).columns:
        X_train_lgbm[col] = X_train_lgbm[col].astype('category')
        X_val_lgbm[col] = X_val_lgbm[col].astype('category')
    
    model = lgb.LGBMClassifier(**params)
    model.fit(X_train_lgbm, y_train, 
              eval_set=[(X_val_lgbm, y_val)], 
              eval_metric='accuracy', 
              callbacks=[lgb.early_stopping(100, verbose=False)])
    return accuracy_score(y_val, model.predict(X_val_lgbm))


def objective_logreg(trial, X_train, y_train, X_val, y_val, numerical_features, categorical_features):
    """Objective function for tuning Logistic Regression."""
    C = trial.suggest_float("C", 1e-4, 1e2, log=True)
    solver = trial.suggest_categorical("solver", ["liblinear", "lbfgs", "saga"])
    
    preprocessor = ColumnTransformer(transformers=[
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])
    
    pipeline = Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('classifier', LogisticRegression(random_state=42, C=C, solver=solver, max_iter=2000))
    ])
    
    pipeline.fit(X_train, y_train)
    return accuracy_score(y_val, pipeline.predict(X_val))

In [26]:
# Get preprocessed data and split it for validation
X_full, y_full, _, _, numerical_features, categorical_features = preprocess_data()
X_train, X_val, y_train, y_val = train_test_split(
    X_full, y_full, test_size=0.2, random_state=42, stratify=y_full)

all_best_params = {}

# Tune All Models 
print("\n--- Tuning FT-Transformer ---")
study_ft = optuna.create_study(direction="maximize")
study_ft.optimize(lambda t: objective_ft_transformer(t, X_train, y_train, X_val, y_val, numerical_features, categorical_features), n_trials=30)
all_best_params['ft_transformer'] = study_ft.best_params
print(f"Best FT-Transformer Val Accuracy: {study_ft.best_value:.4f}")

print("\n--- Tuning LightGBM ---")
study_lgbm = optuna.create_study(direction="maximize")
study_lgbm.optimize(lambda t: objective_lgbm(t, X_train, y_train, X_val, y_val), n_trials=30)
all_best_params['lightgbm'] = study_lgbm.best_params
print(f"Best LightGBM Val Accuracy: {study_lgbm.best_value:.4f}")

print("\n--- Tuning Logistic Regression ---")
study_logreg = optuna.create_study(direction="maximize")
study_logreg.optimize(lambda t: objective_logreg(t, X_train, y_train, X_val, y_val, numerical_features, categorical_features), n_trials=30)
all_best_params['logistic_regression'] = study_logreg.best_params
print(f"Best Logistic Regression Val Accuracy: {study_logreg.best_value:.4f}")

--- Loading and Preprocessing Data ---
Preprocessing complete.

--- Tuning FT-Transformer ---
Best FT-Transformer Val Accuracy: 0.8380

--- Tuning LightGBM ---
Best LightGBM Val Accuracy: 0.8547

--- Tuning Logistic Regression ---
Best Logistic Regression Val Accuracy: 0.8547


In [29]:
# Save Best Hyperparameters to a file
output_path = 'logs'
os.makedirs(output_path, exist_ok=True)

with open(os.path.join(output_path ,'best_hyperparameters.json'), 'w') as f:
    json.dump(all_best_params, f, indent=4)
    
print("\n----------------------------------------------------")
print("Hyperparameter tuning complete for all models.")
print("Best parameters saved to 'best_hyperparameters.json'")
print("----------------------------------------------------")
print("\nBest Parameters Found:")
print(json.dumps(all_best_params, indent=4))


----------------------------------------------------
Hyperparameter tuning complete for all models.
Best parameters saved to 'best_hyperparameters.json'
----------------------------------------------------

Best Parameters Found:
{
    "ft_transformer": {
        "embed_dim": 192,
        "num_heads": 4,
        "ff_dim_factor": 4,
        "num_layers": 3,
        "lr": 6.633442318929932e-05,
        "weight_decay": 1.969683126228165e-05,
        "dropout": 0.2356785069572193
    },
    "lightgbm": {
        "n_estimators": 642,
        "learning_rate": 0.04515157839370836,
        "num_leaves": 36,
        "max_depth": 6,
        "reg_alpha": 2.722043594358596e-07,
        "reg_lambda": 2.0379171849455357e-05,
        "colsample_bytree": 0.6570997858667309,
        "subsample": 0.9980314167105858,
        "min_child_samples": 92
    },
    "logistic_regression": {
        "C": 2.4633756846663513,
        "solver": "saga"
    }
}


## 6. Training the models with the hyperparameters found by optuna 

In [None]:
pl.seed_everything(42)

# Load Data and Best Hyperparameters
try:
    with open('logs/best_hyperparameters.json', 'r') as f:
        all_best_params = json.load(f)
    print("Successfully loaded 'best_hyperparameters.json'")
except FileNotFoundError:
    print("Error: 'best_hyperparameters.json' not found.")
    print("Please run the 'objective_optuna.py' script first to generate the hyperparameters.")
    exit()

X_full, y_full, X_predict, test_passenger_ids, numerical_features, categorical_features = preprocess_data()

Seed set to 42


Successfully loaded 'best_hyperparameters.json'
--- Loading and Preprocessing Data ---
Preprocessing complete.


In [31]:
 # --- Train Final Models and Predict ---
print("\n--- Training Final Models on Full Data and Generating Submissions ---")

# --- Model 1: FT-Transformer ---
print("\n1. Training and Predicting with final FT-Transformer...")
best_params_ft = all_best_params['ft_transformer']

# Data preparation specific to FT-Transformer
X_train_ft_final, X_predict_ft_final = X_full.copy(), X_predict.copy()
scaler_ft = StandardScaler()
X_train_ft_final[numerical_features] = scaler_ft.fit_transform(X_train_ft_final[numerical_features])
X_predict_ft_final[numerical_features] = scaler_ft.transform(X_predict_ft_final[numerical_features])

cat_maps = {col: X_train_ft_final[col].astype('category').cat.categories for col in categorical_features}
cardinalities = [len(cat_maps[col]) for col in categorical_features]

for col in categorical_features:
    X_train_ft_final[col] = pd.Categorical(X_train_ft_final[col], categories=cat_maps[col]).codes
    X_predict_ft_final[col] = pd.Categorical(X_predict_ft_final[col], categories=cat_maps[col]).codes
    # Handle any categories in predict set not seen in training
    if (X_predict_ft_final[col] == -1).any():
        mode_code = cat_maps[col].get_loc(X_full[col].mode()[0])
        X_predict_ft_final[col].replace(-1, mode_code, inplace=True)
        
final_train_ds = TitanicFTDataset(X_train_ft_final[numerical_features], X_train_ft_final[categorical_features], y_full)
predict_ds = TitanicFTDataset(X_predict_ft_final[numerical_features], X_predict_ft_final[categorical_features])
final_train_loader = DataLoader(final_train_ds, batch_size=32, shuffle=True)
predict_loader = DataLoader(predict_ds, batch_size=32)

# Instantiate and train the final model
final_model_ft = FTTransformerFromScratch(
    num_numerical=len(numerical_features), 
    cat_cardinalities=cardinalities,
    embed_dim=best_params_ft['embed_dim'], 
    num_heads=best_params_ft['num_heads'],
    ff_dim=best_params_ft['embed_dim'] * best_params_ft['ff_dim_factor'],
    num_layers=best_params_ft['num_layers'], 
    lr=best_params_ft['lr'], 
    weight_decay=best_params_ft['weight_decay'], 
    dropout=best_params_ft['dropout']
)
final_trainer_ft = pl.Trainer(max_epochs=30, accelerator="auto", logger=False, enable_progress_bar=True, enable_model_summary=False)
final_trainer_ft.fit(final_model_ft, final_train_loader)

# Predict and save
preds_ft_final = torch.cat(final_trainer_ft.predict(final_model_ft, predict_loader)).flatten().cpu().numpy().astype(int)
pd.DataFrame({'PassengerId': test_passenger_ids, 'Survived': preds_ft_final}).to_csv('data/submission_ft_transformer_tuned.csv', index=False)
print("FT-Transformer submission file created.")

# --- Model 2: LightGBM ---
print("\n2. Training and Predicting with final LightGBM...")
best_params_lgbm = all_best_params['lightgbm']

X_full_lgbm, X_predict_lgbm = X_full.copy(), X_predict.copy()
for col in categorical_features:
    X_full_lgbm[col] = X_full_lgbm[col].astype('category')
    X_predict_lgbm[col] = X_predict_lgbm[col].astype('category')
    
final_lgbm = lgb.LGBMClassifier(objective='binary', **best_params_lgbm)
final_lgbm.fit(X_full_lgbm, y_full)
preds_lgbm = final_lgbm.predict(X_predict_lgbm)
pd.DataFrame({'PassengerId': test_passenger_ids, 'Survived': preds_lgbm}).to_csv('data/submission_lgbm_tuned.csv', index=False)
print("LightGBM submission file created.")

# --- Model 3: Logistic Regression ---
print("\n3. Training and Predicting with final Logistic Regression...")
best_params_logreg = all_best_params['logistic_regression']

preprocessor_final = ColumnTransformer(transformers=[
    ('num', StandardScaler(), numerical_features),
    ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
])

final_logreg = Pipeline(steps=[
    ('preprocessor', preprocessor_final),
    ('classifier', LogisticRegression(random_state=42, max_iter=2000, **best_params_logreg))
])
final_logreg.fit(X_full, y_full)
preds_lr = final_logreg.predict(X_predict)
pd.DataFrame({'PassengerId': test_passenger_ids, 'Survived': preds_lr}).to_csv('data/submission_logreg_tuned.csv', index=False)
print("Logistic Regression submission file created.")

# --- Final Ensemble ---
print("\n--- Creating Final Ensemble Submission ---")
stacked_preds = np.vstack([preds_ft_final, preds_lgbm, preds_lr]).T
from scipy.stats import mode
ensemble_preds, _ = mode(stacked_preds, axis=1)

submission_df = pd.DataFrame({
    'PassengerId': test_passenger_ids,
    'Survived': ensemble_preds.flatten()
})
submission_df.to_csv('data/ensemble_submission_tuned.csv', index=False)

print("\n" + "-" * 40)
print("All submission files created successfully!")
print("You can now upload them to the Kaggle competition.")
print("-" * 40)
print("Ensemble submission head:")
print(submission_df.head())


--- Training Final Models on Full Data and Generating Submissions ---

1. Training and Predicting with final FT-Transformer...


Training: |          | 0/? [00:00<?, ?it/s]

Predicting: |          | 0/? [00:00<?, ?it/s]

FT-Transformer submission file created.

2. Training and Predicting with final LightGBM...
LightGBM submission file created.

3. Training and Predicting with final Logistic Regression...
Logistic Regression submission file created.

--- Creating Final Ensemble Submission ---

----------------------------------------
All submission files created successfully!
You can now upload them to the Kaggle competition.
----------------------------------------
Ensemble submission head:
   PassengerId  Survived
0          892         0
1          893         0
2          894         0
3          895         0
4          896         1
