# Customized model base

For researchers or model base developers, the basic need is comparing their own models with existing benchmarks in `tabensemb`. In this part, a model base is built within the framework assuming that we want to integrate `TabNet` ([from dreamquark-ai team](https://github.com/dreamquark-ai/tabnet)) into `tabensemb` (indeed `pytorch_tabular` and `pytorch_widedeep` have done that) for regression and classification tasks.

**Remark**: For `PyTorch`-based models, we have implemented most requirements of the framework so that users can integrate `torch.nn.Module`s more conveniently.

## Example: Implement TabNet as a model base from scratch

In [1]:
import tabensemb
import numpy as np
import torch
import os
from tempfile import TemporaryDirectory

temp_path = TemporaryDirectory()
tabensemb.setting["default_output_path"] = os.path.join(temp_path.name, "output")
tabensemb.setting["default_config_path"] = os.path.join(temp_path.name, "configs")
tabensemb.setting["default_data_path"] = os.path.join(temp_path.name, "data")

device = "cuda" if torch.cuda.is_available() else "cpu"

All model bases inherit `AbstractModel` and implement methods within the class. If necessary methods are not implemented, `NotImplementedError` will be raised during usage.

In [2]:
from tabensemb.model import AbstractModel

We use `scikit-optimize` (https://github.com/scikit-optimize/scikit-optimize) to do Bayesian hyperparameter optimization, so space classes are imported.

In [3]:
from skopt.space import Integer, Real, Categorical

First, we define the initialization of the model base. Always remember to pass all args and kwargs to `__init__` of `AbstractModel`. You can do other things in `__init__`. All `*args` and `**kwargs` (including arguments like the `some_param` shown below) are recorded in `self.init_params`.

```python
class TabNetFromAbstract(AbstractModel):
    def __init__(self, *args, some_param=1.1, **kwargs):
        super(TabNetFromAbstract, self).__init__(*args, **kwargs)
        # Do something else here
        self.some_param = some_param
        print(self.init_params)
```

We should define the name of the model base and all available models in the model base.

```python
    def _get_program_name(self):
        return "TabNetFromAbstract"

    def _get_model_names(self):
        return ["TabNet"]
```

For each model in the model base, the program will request initial hyperparameters of the model and their search spaces. They are defined as

```python
    def _space(self, model_name):
        return [
            Integer(low=4, high=16, prior="uniform", name="n_d", dtype=int),
            Integer(low=4, high=16, prior="uniform", name="n_a", dtype=int),
            Integer(low=1, high=6, prior="uniform", name="n_steps", dtype=int),
            Real(low=1.0, high=1.5, prior="uniform", name="gamma"),
            Integer(
                low=1, high=4, prior="uniform", name="n_independent", dtype=int
            ),
            Integer(low=1, high=4, prior="uniform", name="n_shared", dtype=int),
        ] + self.trainer.SPACE

    def _initial_values(self, model_name):
        return {
            "n_d": 8,
            "n_a": 8,
            "n_steps": 3,
            "gamma": 1.3,
            "n_independent": 2,
            "n_shared": 2,
            "lr": self.trainer.args["lr"],
            "weight_decay": self.trainer.args["weight_decay"],
            "batch_size": self.trainer.args["batch_size"],
        }
```

Before training, each model base has its own way of processing the dataset.

`_train_data_preprocess` will return the processed dataset according to a given `Trainer` which provides all training information and data required. In this example, `X_train/X_val/X_test` represent training/validation/testing sets, and `y_train/y_val/y_test` represent corresponding labels.

**Remark**: The tabular dataset has gone through all processing stages defined in the `DataModule` inside the trainer **except scaling**. Call `self.trainer.datamodule.data_transform(df, scaler_only=True)` to scale it using the trained scaler if no scaling stage is defined internally in the model.

```python
    def _train_data_preprocess(self, model_name):
        data = self.trainer.datamodule
        all_feature_names = data.all_feature_names

        X_train = data.data_transform(data.X_train, scaler_only=True)[
            all_feature_names
        ].values.astype(np.float64)
        X_val = data.data_transform(data.X_val, scaler_only=True)[
            all_feature_names
        ].values.astype(np.float64)
        X_test = data.data_transform(data.X_test, scaler_only=True)[
            all_feature_names
        ].values.astype(np.float64)
        y_train = data.y_train.astype(np.float64)
        y_val = data.y_val.astype(np.float64)
        y_test = data.y_test.astype(np.float64)

        return {
            "X_train": X_train,
            "y_train": y_train,
            "X_val": X_val,
            "y_val": y_val,
            "X_test": X_test,
            "y_test": y_test,
        }
```

Correspondingly, `_data_preprocess` will process an upcoming new dataset, including the tabular data `df` containing continuous features and categorical features, and unstacked derived data `derived_data` (multi-modal data or something else depending on the configuration introduced in "Using data functionalities"). The returned value should have the same structure as the `X_test` returned in `_train_data_preprocess`.

```python
    def _data_preprocess(self, df, derived_data, model_name):
        return self.trainer.datamodule.data_transform(df, scaler_only=True)[
            self.trainer.all_feature_names
        ].values.astype(np.float64)
```

The program will pass a selected set of hyperparameters as `kwargs` to initialize a model, train a model, and predict using the model. The returned `model` will be stored locally and reloaded for evaluation and inference, so make sure it contains all the information needed to make predictions.

Here we initialize the model using information contained in the `DataModule` instance, including the indices of categorical features `cat_idxs`, the number of categories of each categorical feature `cat_dims`, the current task `task` (possible values are "regression", "binary", and "multiclass"), the device to train the model `self.trainer.device`, and the hyperparameters `kwargs`. `model_name` is ignored because we only have one model in the model base. All model bases should at least follow the guidance of `self.trainer.device`, `self.trainer.datamodule.task`, `model_name`, and `kwargs` to make all models trained in a consistent way within the framework.

```python
    def _new_model(self, model_name, verbose, **kwargs):
        from pytorch_tabnet.tab_model import TabNetRegressor, TabNetClassifier

        datamodule = self.trainer.datamodule
        cat_idxs = [
            datamodule.all_feature_names.index(x)
            for x in datamodule.get_feature_names_by_type("Categorical")
        ]
        cat_dims = [
            datamodule.cat_num_unique[x]
            for x in datamodule.get_feature_idx_by_type("Categorical")
        ]
        self.task = datamodule.task
        init_kwargs = dict(
            verbose=tabensemb.setting["verbose_per_epoch"] if verbose else 0,
            optimizer_params={
                "lr": kwargs["lr"],
                "weight_decay": kwargs["weight_decay"],
            },
            cat_idxs=cat_idxs,
            cat_dims=cat_dims,
            cat_emb_dim=3,
            device_name=self.trainer.device,
        )
        if self.trainer.datamodule.task == "regression":
            model = TabNetRegressor(**init_kwargs)
        else:
            model = TabNetClassifier(**init_kwargs)

        model.set_params(
            **{
                "n_d": kwargs["n_d"],
                "n_a": kwargs["n_a"],
                "n_steps": kwargs["n_steps"],
                "gamma": kwargs["gamma"],
                "n_independent": kwargs["n_independent"],
                "n_shared": kwargs["n_shared"],
            }
        )
        return model
```

**Remark**: `kwargs` has all keys defined in `_initial_values`. If a parameter named `batch_size` is included, a new key named `original_batch_size` exists in `kwargs`. The values of `batch_size` and `original_batch_size` may be different if the program finds that the batch size will make the mini-batches tiny. The threshold is defined by `self.limit_batch_size` (default to 6). A tiny batch might interrupt some models, so it is better to use the modified `batch_size` value.

The framework will pass `X_train`, `y_train`, `X_val`, and `y_val` from `_train_data_preprocess` to the following `_train_single_model` method, along with some other arguments stating the current training stage. `epoch` is the number of epochs to train the model. `warm_start=True` means the passed model is already trained and should be fine-tuned based on a new dataset. `in_bayes_opt=True` means that the passed `kwargs` is selected by a bayesian hyperparameter optimization step, and a simplified training routine is needed to reduce optimization time, so we set the `max_epochs` to "bayes_epoch" in the configuration.

**Remark**: `epoch` will be `self.trainer.args["bayes_epoch"]` if `in_bayes_opt=True`, and `self.trainer.args["epoch"]` otherwise.

```python
    def _train_single_model(
        self,
        model,
        epoch,
        X_train,
        y_train,
        X_val,
        y_val,
        verbose,
        warm_start,
        in_bayes_opt,
        **kwargs,
    ):
        eval_set = [(X_val, y_val if self.task == "regression" else y_val.flatten())]

        model.fit(
            X_train,
            y_train if self.task == "regression" else y_train.flatten(),
            eval_set=eval_set,
            max_epochs=epoch if not in_bayes_opt else self.trainer.args["bayes_epoch"],
            patience=self.trainer.args["patience"],
            loss_fn=torch.nn.MSELoss()
            if self.task == "regression"
            else torch.nn.CrossEntropyLoss(),
            eval_metric=["mse" if self.task == "regression" else "logloss"],
            batch_size=int(kwargs["batch_size"]),
            warm_start=warm_start,
            drop_last=False,
        )
```

To evaluate the model or make use of the model, `_pred_single_model` is defined, and `X_test` processed in `_train_data_preprocess` or `_data_preprocess` is passed as an argument. The returned value should always be a two-dimensional `np.ndarray`. For binary classification tasks, the output is the probability of the positive (1) class, and for multiclass classification, the output is the probability of each class. `AbstractModel` automatically deals with the probabilities for metrics and final outputs.

```python
    def _pred_single_model(self, model, X_test, verbose, **kwargs):
        if self.task == "regression":
            return model.predict(X_test).reshape(-1, 1)
        elif self.task == "binary":
            return model.predict_proba(X_test)[:, 1].reshape(-1, 1)
        else:
            return model.predict_proba(X_test)
```

The full code is as follows:

In [4]:
class TabNetFromAbstract(AbstractModel):
    def __init__(self, *args, some_param=1.1, **kwargs):
        super(TabNetFromAbstract, self).__init__(*args, **kwargs)
        # Do something else here
        self.some_param = some_param
        print(self.init_params)

    def _get_program_name(self):
        return "TabNetFromAbstract"

    def _get_model_names(self):
        return ["TabNet"]

    def _space(self, model_name):
        return [
            Integer(low=4, high=16, prior="uniform", name="n_d", dtype=int),
            Integer(low=4, high=16, prior="uniform", name="n_a", dtype=int),
            Integer(low=1, high=6, prior="uniform", name="n_steps", dtype=int),
            Real(low=1.0, high=1.5, prior="uniform", name="gamma"),
            Integer(
                low=1, high=4, prior="uniform", name="n_independent", dtype=int
            ),
            Integer(low=1, high=4, prior="uniform", name="n_shared", dtype=int),
        ] + self.trainer.SPACE

    def _initial_values(self, model_name):
        return {
            "n_d": 8,
            "n_a": 8,
            "n_steps": 3,
            "gamma": 1.3,
            "n_independent": 2,
            "n_shared": 2,
            "lr": self.trainer.args["lr"],
            "weight_decay": self.trainer.args["weight_decay"],
            "batch_size": self.trainer.args["batch_size"],
        }

    def _train_data_preprocess(self, model_name):
        data = self.trainer.datamodule
        all_feature_names = data.all_feature_names

        X_train = data.data_transform(data.X_train, scaler_only=True)[
            all_feature_names
        ].values.astype(np.float64)
        X_val = data.data_transform(data.X_val, scaler_only=True)[
            all_feature_names
        ].values.astype(np.float64)
        X_test = data.data_transform(data.X_test, scaler_only=True)[
            all_feature_names
        ].values.astype(np.float64)
        y_train = data.y_train.astype(np.float64)
        y_val = data.y_val.astype(np.float64)
        y_test = data.y_test.astype(np.float64)

        return {
            "X_train": X_train,
            "y_train": y_train,
            "X_val": X_val,
            "y_val": y_val,
            "X_test": X_test,
            "y_test": y_test,
        }

    def _data_preprocess(self, df, derived_data, model_name):
        return self.trainer.datamodule.data_transform(df, scaler_only=True)[
            self.trainer.all_feature_names
        ].values.astype(np.float64)

    def _new_model(self, model_name, verbose, **kwargs):
        from pytorch_tabnet.tab_model import TabNetRegressor, TabNetClassifier

        datamodule = self.trainer.datamodule
        cat_idxs = [
            datamodule.all_feature_names.index(x)
            for x in datamodule.get_feature_names_by_type("Categorical")
        ]
        cat_dims = [
            datamodule.cat_num_unique[x]
            for x in datamodule.get_feature_idx_by_type("Categorical")
        ]
        self.task = datamodule.task
        init_kwargs = dict(
            verbose=tabensemb.setting["verbose_per_epoch"] if verbose else 0,
            optimizer_params={
                "lr": kwargs["lr"],
                "weight_decay": kwargs["weight_decay"],
            },
            cat_idxs=cat_idxs,
            cat_dims=cat_dims,
            cat_emb_dim=3,
            device_name=self.trainer.device,
        )
        if self.trainer.datamodule.task == "regression":
            model = TabNetRegressor(**init_kwargs)
        else:
            model = TabNetClassifier(**init_kwargs)

        model.set_params(
            **{
                "n_d": kwargs["n_d"],
                "n_a": kwargs["n_a"],
                "n_steps": kwargs["n_steps"],
                "gamma": kwargs["gamma"],
                "n_independent": kwargs["n_independent"],
                "n_shared": kwargs["n_shared"],
            }
        )
        return model

    def _train_single_model(
        self,
        model,
        epoch,
        X_train,
        y_train,
        X_val,
        y_val,
        verbose,
        warm_start,
        in_bayes_opt,
        **kwargs,
    ):
        eval_set = [(X_val, y_val if self.task == "regression" else y_val.flatten())]

        model.fit(
            X_train,
            y_train if self.task == "regression" else y_train.flatten(),
            eval_set=eval_set,
            max_epochs=epoch if not in_bayes_opt else self.trainer.args["bayes_epoch"],
            patience=self.trainer.args["patience"],
            loss_fn=torch.nn.MSELoss()
            if self.task == "regression"
            else torch.nn.CrossEntropyLoss(),
            eval_metric=["mse" if self.task == "regression" else "logloss"],
            batch_size=int(kwargs["batch_size"]),
            warm_start=warm_start,
            drop_last=False,
        )

    def _pred_single_model(self, model, X_test, verbose, **kwargs):
        if self.task == "regression":
            return model.predict(X_test).reshape(-1, 1)
        elif self.task == "binary":
            return model.predict_proba(X_test)[:, 1].reshape(-1, 1)
        else:
            return model.predict_proba(X_test)

## Example: Implement TabNet as a `PyTorch`-based model

Indeed, the example shown above uses `TabNetRegressor` and `TabNetClassifier` from `pytorch_tabnet` that have already implemented the training and evaluation procedures over the `torch.nn.Module` subclass called `TabNet`. We can also directly build a model base for `nn.Module`s with less effort. These model bases inherit `TorchModel`, and `nn.Module`s should inherit `AbstractNN` (just needs to change a few lines to migrate previous code into this framework).

In [5]:
from tabensemb.model import TorchModel, AbstractNN
from pytorch_tabnet.tab_network import TabNet
from typing import Dict

First, we implement an `AbstractNN` (which inherits `pytorch_lightning.LightningModule` that further inherits `torch.nn.Module`).

We initialize the model in `__init__`. `kwargs` will depend on the arguments passed from `_new_model`, which will be implemented later, but at least it should contain all keys defined in `_initial_values`, as introduced in an above remark.

Remember to call `super().__init__`. There is nothing more difficult than initializing a `LightningModule`.

We can use `self.hparams.some_param` to get a hyperparameter (equivalent to `kwargs["some_param"]`) if you call `super().__init__(datamodule, **kwargs)` instead of `super().__init__(datamodule)` because `AbstractNN` uses the `LightningModule.save_hyperparameters` utility (which you should **not** call in your own `__init__`).

**Remark**: To migrate existing `nn.Module` code (Part 1)

* Change `class SomeModel(nn.Module)` to `class SomeModel(AbstractNN)`.
* Change the indices of categorical features to `[0, 1, ..., self.n_cat-1]` and the numbers of unique categories of categorical features to `self.cat_num_unique`.
* Change the number of input dimensions to `self.n_cont+self.n_cat` and the number of output dimensions `self.n_outputs`.

```python
class TabNetNN(AbstractNN):
    def __init__(
        self,
        datamodule,
        **kwargs,
    ):
        super(TabNetNN, self).__init__(datamodule, **kwargs)
        self.network = TabNet(
            input_dim=self.n_cont+self.n_cat,
            output_dim=self.n_outputs,
            n_d=self.hparams.n_d,
            n_a=self.hparams.n_a,
            n_steps=self.hparams.n_steps,
            gamma=self.hparams.gamma,
            cat_idxs=list(range(self.n_cat)),
            cat_dims=self.cat_num_unique,
            cat_emb_dim=[3] * self.n_cat,
            n_independent=self.hparams.n_independent,
            n_shared=self.hparams.n_shared,
        )
```

Then we implement the computation step of the model. We should implement `_forward` instead of `forward` which is already implemented by `AbstractNN` and is used to automatically process inputs and outputs of `_forward`.

There are two input arguments for `_forward`: `x` and `derived_tensors`. `x` is a tensor of continuous features. `derived_tensors` is a dictionary containing contents in `datamodule.derived_data` (which is introduced in the last two sections of the "Using data functionalities" part), including categorical data (with the key "categorical" if there is any categorical feature), the signal for each data point representing whether it is an augmented one (with the key "augmented" if there is any augmented data point), and derived unstacked data (with the key `derived_name` specified in the configuration). This is how multimodal data is passed to a deep learning model in our framework.

In the following lines, we build the input of the neural network from the continuous features `x` and the categorical features `derived_tensors["categorical"]` by concatenation (that's why the indices of categorical features are set to `[0, 1, ..., self.n_cat-1]`), calculate the output of the network, and return the output.

**Remark**: The default loss function is `torch.nn.MSELoss` for regression, `torch.nn.BCEWithLogitsLoss` for binary classification, and `torch.nn.CrossEntropyLoss` for multiclass classification. To change this behavior, implement `self.loss_fn`. See the "Advanced customized model base" part for details.

**Remark**: For binary classification tasks, `self.n_outputs=1` so we expect the logits of the positive class (instead of a normalized probability). The output is then used to calculate `torch.nn.BCEWithLogitsLoss` by default. For multiclass classification tasks, `self.n_outputs` is the number of classes, so we expect the logits of these classes (instead of probabilities from `Softmax` or something else). The output is then used to calculate `torch.nn.CrossEntropyLoss` by default.

**Remark**: To migrate existing `nn.Module` code (Part 2)

* Change `forward` to `_forward`
* Get categorical features from `derived_tensors`
* Get multimodal features from `derived_tensors` (and load multimodal features using data derivers)
* Return logits instead of probabilities

```python
    def _forward(
        self, x: torch.Tensor, derived_tensors: Dict[str, torch.Tensor]
    ) -> torch.Tensor:
        x_cont = x
        if "categorical" in derived_tensors.keys():
            x_cat = derived_tensors["categorical"]
            x_in = torch.concat([x_cat, x_cont], dim=-1)
        else:
            x_in = x_cont
        output, _ = self.network(x_in)
        return output
```

The code is as follows:

In [6]:
class TabNetNN(AbstractNN):
    def __init__(
        self,
        datamodule,
        **kwargs,
    ):
        super(TabNetNN, self).__init__(datamodule, **kwargs)
        self.network = TabNet(
            input_dim=self.n_cont+self.n_cat,
            output_dim=self.n_outputs,
            n_d=self.hparams.n_d,
            n_a=self.hparams.n_a,
            n_steps=self.hparams.n_steps,
            gamma=self.hparams.gamma,
            cat_idxs=list(range(self.n_cat)),
            cat_dims=self.cat_num_unique,
            cat_emb_dim=[3] * self.n_cat,
            n_independent=self.hparams.n_independent,
            n_shared=self.hparams.n_shared,
        )

    def _forward(
        self, x: torch.Tensor, derived_tensors: Dict[str, torch.Tensor]
    ) -> torch.Tensor:
        x_cont = x
        if "categorical" in derived_tensors.keys():
            x_cat = derived_tensors["categorical"]
            x_in = torch.concat([x_cat, x_cont], dim=-1)
        else:
            x_in = x_cont
        output, _ = self.network(x_in)
        return output

Finally, we build the model base for the neural network. It inherits `TorchModel` which has implemented most required methods. Necessary methods for `TorchModel` can be written similarly with `TabNetFromAbstract`.

In the following implementation, `_new_model` passes the datamodule and hyperparameters to the neural network, which is what you saw above in `__init__`. You can also pass other arguments as you want.

In [7]:
class TabNetFromTorch(TorchModel):
    def _new_model(self, model_name, verbose, **kwargs):
        return TabNetNN(datamodule=self.trainer.datamodule, **kwargs)

    def _get_program_name(self):
        return "TabNetFromTorch"

    def _get_model_names(self):
        return ["TabNet"]

    def _space(self, model_name):
        return [
            Integer(low=4, high=16, prior="uniform", name="n_d", dtype=int),
            Integer(low=4, high=16, prior="uniform", name="n_a", dtype=int),
            Integer(low=1, high=6, prior="uniform", name="n_steps", dtype=int),
            Real(low=1.0, high=1.5, prior="uniform", name="gamma"),
            Integer(
                low=1, high=4, prior="uniform", name="n_independent", dtype=int
            ),
            Integer(low=1, high=4, prior="uniform", name="n_shared", dtype=int),
        ] + self.trainer.SPACE

    def _initial_values(self, model_name):
        return {
            "n_d": 8,
            "n_a": 8,
            "n_steps": 3,
            "gamma": 1.3,
            "n_independent": 2,
            "n_shared": 2,
            "lr": self.trainer.args["lr"],
            "weight_decay": self.trainer.args["weight_decay"],
            "batch_size": self.trainer.args["batch_size"],
        }

## Comparison of different implementations in other model bases

We can compare our models with TabNet implemented in the other two model bases. Note that because of different training routines and randomization, they perform differently. Let's try the models on a regression task first.

In [8]:
from tabensemb.trainer import Trainer
from tabensemb.config import UserConfig
from tabensemb.model import PytorchTabular, WideDeep

trainer = Trainer(device=device)
mpg_columns = [
    "mpg",
    "cylinders",
    "displacement",
    "horsepower",
    "weight",
    "acceleration",
    "model_year",
    "origin",
    "car_name",
]
cfg = UserConfig.from_uci("Auto MPG", column_names=mpg_columns, sep=r"\s+")
trainer.load_config(cfg)
trainer.load_data()
trainer.add_modelbases(
    [
        PytorchTabular(trainer, model_subset=["TabNet"]),
        WideDeep(trainer, model_subset=["TabNet"]),
        TabNetFromAbstract(trainer),
        TabNetFromTorch(trainer),
    ]
)
trainer.train(stderr_to_stdout=True)
trainer.get_leaderboard()

Downloading https://archive.ics.uci.edu/static/public/9/auto+mpg.zip to /tmp/tmpubgrjq4r/data/Auto MPG.zip
cylinders is Integer and will be treated as a continuous feature.
model_year is Integer and will be treated as a continuous feature.
origin is Integer and will be treated as a continuous feature.
Unknown values are detected in ['horsepower']. They will be treated as np.nan.
The project will be saved to /tmp/tmpubgrjq4r/output/auto-mpg/2023-09-12-11-22-40-0_UserInputConfig
Dataset size: 238 80 80
Data saved to /tmp/tmpubgrjq4r/output/auto-mpg/2023-09-12-11-22-40-0_UserInputConfig (data.csv and tabular_data.csv).
{'some_param': 1.1, 'program': None, 'model_subset': None, 'exclude_models': None, 'store_in_harddisk': True}

-------------Run PytorchTabular-------------

Training TabNet
Global seed set to 42
2023-09-12 11:22:40,814 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
2023-09-12 11:22:40,815 - {pytorch_tabular.tabular_datamodule:290} - INFO - Setting 

Unnamed: 0,Program,Model,Training RMSE,Training MSE,Training MAE,Training MAPE,Training R2,Training MEDIAN_ABSOLUTE_ERROR,Training EXPLAINED_VARIANCE_SCORE,Testing RMSE,...,Testing R2,Testing MEDIAN_ABSOLUTE_ERROR,Testing EXPLAINED_VARIANCE_SCORE,Validation RMSE,Validation MSE,Validation MAE,Validation MAPE,Validation R2,Validation MEDIAN_ABSOLUTE_ERROR,Validation EXPLAINED_VARIANCE_SCORE
0,TabNetFromAbstract,TabNet,11.058018,122.279759,10.120124,0.425556,-0.897031,9.511139,0.688216,10.589669,...,-1.085708,9.735932,0.711417,10.528652,110.852511,9.544754,0.417929,-0.980271,9.028342,0.575915
1,TabNetFromTorch,TabNet,14.438922,208.482471,13.648272,0.578209,-2.234367,12.755719,0.655482,14.299357,...,-2.802959,13.421765,0.60603,13.845871,191.708137,12.779896,0.562416,-2.424678,11.474345,0.490396
2,PytorchTabular,TabNet,14.308127,204.722492,13.48649,0.570209,-2.176035,12.484611,0.645709,14.527732,...,-2.925404,13.481362,0.597145,13.641444,186.088988,12.641969,0.558155,-2.324297,11.443497,0.530719
3,WideDeep,TabNet,14.374912,206.638091,13.423684,0.562884,-2.205754,12.262027,0.560759,14.584161,...,-2.955957,13.293868,0.569623,13.16692,173.367793,12.166225,0.534651,-2.097046,11.464049,0.41124


We can see that `TabNet` does not perform well with the current hyperparameters. We can use `trainer.args["bayes_opt"] = True` to activate Bayesian hyperparameter optimization to improve its performance. Alternatively, we can directly provide a set of hyperparameters (which is found by Bayesian optimization) in `AbstractModel.model_params`. As shown below, the performance significantly improves.

In [9]:
trainer = Trainer(device=device)
trainer.load_config(cfg)
trainer.load_data()
modelbase = PytorchTabular(trainer, model_subset=["TabNet"])
trainer.add_modelbases([modelbase])
modelbase.model_params["TabNet"] = {'n_d': 8, 'n_a': 15, 'n_steps': 1, 'gamma': 1.0, 'n_independent': 3, 'n_shared': 4, 'lr': 0.026917811078469658, 'weight_decay': 1e-09, 'batch_size': 64}
trainer.train(stderr_to_stdout=True)
trainer.get_leaderboard()

The project will be saved to /tmp/tmpubgrjq4r/output/auto-mpg/2023-09-12-11-23-17-0_UserInputConfig
Dataset size: 238 80 80
Data saved to /tmp/tmpubgrjq4r/output/auto-mpg/2023-09-12-11-23-17-0_UserInputConfig (data.csv and tabular_data.csv).

-------------Run PytorchTabular-------------

Training TabNet
Previous params loaded: {'n_d': 8, 'n_a': 15, 'n_steps': 1, 'gamma': 1.0, 'n_independent': 3, 'n_shared': 4, 'lr': 0.026917811078469658, 'weight_decay': 1e-09, 'batch_size': 64}
Global seed set to 42
2023-09-12 11:23:17,405 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
2023-09-12 11:23:17,405 - {pytorch_tabular.tabular_datamodule:290} - INFO - Setting up the datamodule for regression task
2023-09-12 11:23:17,413 - {pytorch_tabular.tabular_model:521} - INFO - Preparing the Model: TabNetModel
2023-09-12 11:23:17,423 - {pytorch_tabular.tabular_model:268} - INFO - Preparing the Trainer
  rank_zero_deprecation(
Auto select gpus: [0]
GPU available: True (cuda), used

Unnamed: 0,Program,Model,Training RMSE,Training MSE,Training MAE,Training MAPE,Training R2,Training MEDIAN_ABSOLUTE_ERROR,Training EXPLAINED_VARIANCE_SCORE,Testing RMSE,...,Testing R2,Testing MEDIAN_ABSOLUTE_ERROR,Testing EXPLAINED_VARIANCE_SCORE,Validation RMSE,Validation MSE,Validation MAE,Validation MAPE,Validation R2,Validation MEDIAN_ABSOLUTE_ERROR,Validation EXPLAINED_VARIANCE_SCORE
0,PytorchTabular,TabNet,2.210867,4.887931,1.642482,0.069627,0.924169,1.215234,0.924173,2.603684,...,0.873915,1.517266,0.873926,2.90542,8.441465,2.17954,0.097734,0.849201,1.557569,0.849222


Then the binary classification task:

In [10]:
trainer = Trainer(device=device)
adult_columns = [
    "age",
    "workclass",
    "fnlwgt",
    "education",
    "education-num",
    "marital-status",
    "occupation",
    "relationship",
    "race",
    "sex",
    "capital-gain",
    "capital-loss",
    "hours-per-week",
    "native-country",
    "income",
]
cfg = UserConfig.from_uci("Adult", column_names=adult_columns, sep=", ")
trainer.load_config(cfg)
trainer.load_data()
trainer.add_modelbases(
    [
        PytorchTabular(trainer, model_subset=["TabNet"]),
        WideDeep(trainer, model_subset=["TabNet"]),
        TabNetFromAbstract(trainer),
        TabNetFromTorch(trainer),
    ]
)
trainer.train(stderr_to_stdout=True)
trainer.get_leaderboard()

Downloading https://archive.ics.uci.edu/static/public/2/adult.zip to /tmp/tmpubgrjq4r/data/Adult.zip


  df = pd.read_csv(StringIO(s), names=names, sep=sep)


age is Integer and will be treated as a continuous feature.
fnlwgt is Integer and will be treated as a continuous feature.
education-num is Integer and will be treated as a continuous feature.
capital-gain is Integer and will be treated as a continuous feature.
capital-loss is Integer and will be treated as a continuous feature.
hours-per-week is Integer and will be treated as a continuous feature.
The project will be saved to /tmp/tmpubgrjq4r/output/adult/2023-09-12-11-23-33-0_UserInputConfig
Dataset size: 19536 6512 6513
Data saved to /tmp/tmpubgrjq4r/output/adult/2023-09-12-11-23-33-0_UserInputConfig (data.csv and tabular_data.csv).
{'some_param': 1.1, 'program': None, 'model_subset': None, 'exclude_models': None, 'store_in_harddisk': True}

-------------Run PytorchTabular-------------

Training TabNet
Global seed set to 42
2023-09-12 11:23:34,487 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
2023-09-12 11:23:34,488 - {pytorch_tabular.tabular_datamodule:29

Unnamed: 0,Program,Model,Training F1_SCORE,Training PRECISION_SCORE,Training RECALL_SCORE,Training JACCARD_SCORE,Training ACCURACY_SCORE,Training BALANCED_ACCURACY_SCORE,Training COHEN_KAPPA_SCORE,Training HAMMING_LOSS,...,Validation ACCURACY_SCORE,Validation BALANCED_ACCURACY_SCORE,Validation COHEN_KAPPA_SCORE,Validation HAMMING_LOSS,Validation MATTHEWS_CORRCOEF,Validation ZERO_ONE_LOSS,Validation ROC_AUC_SCORE,Validation LOG_LOSS,Validation BRIER_SCORE_LOSS,Validation AVERAGE_PRECISION_SCORE
0,PytorchTabular,TabNet,0.663081,0.757096,0.589836,0.495977,0.855702,0.764917,0.573066,0.144298,...,0.847359,0.753237,0.548046,0.152641,0.555038,0.152641,0.895756,0.336181,0.106143,0.855486
1,WideDeep,TabNet,0.664649,0.734311,0.607059,0.497734,0.852529,0.768709,0.571219,0.147471,...,0.845055,0.762183,0.55293,0.154945,0.556004,0.154945,0.89415,0.338528,0.107826,0.852035
2,TabNetFromTorch,TabNet,0.687457,0.739652,0.642143,0.52376,0.859439,0.785239,0.59737,0.140561,...,0.848434,0.772037,0.56745,0.151566,0.569406,0.151566,0.901323,0.325557,0.104401,0.864685
3,TabNetFromAbstract,TabNet,0.693543,0.759231,0.638316,0.530858,0.864199,0.787067,0.607153,0.135801,...,0.846284,0.76321,0.555862,0.153716,0.559121,0.153716,0.901369,0.328637,0.104109,0.860162


Finally the multiclass classification task:

In [11]:
trainer = Trainer(device=device)
iris_columns = [
    "sepal length",
    "sepal width",
    "petal length",
    "petal width",
    "class",
]
cfg = UserConfig.from_uci("Iris", column_names=iris_columns, datafile_name="iris")
trainer.load_config(cfg)
trainer.load_data()
trainer.add_modelbases(
    [
        PytorchTabular(trainer, model_subset=["TabNet"]),
        WideDeep(trainer, model_subset=["TabNet"]),
        TabNetFromAbstract(trainer),
        TabNetFromTorch(trainer),
    ]
)
trainer.train(stderr_to_stdout=True)
trainer.get_leaderboard()

Downloading https://archive.ics.uci.edu/static/public/53/iris.zip to /tmp/tmpubgrjq4r/data/Iris.zip
The project will be saved to /tmp/tmpubgrjq4r/output/iris/2023-09-12-11-33-59-0_UserInputConfig
Dataset size: 90 30 30
Data saved to /tmp/tmpubgrjq4r/output/iris/2023-09-12-11-33-59-0_UserInputConfig (data.csv and tabular_data.csv).
{'some_param': 1.1, 'program': None, 'model_subset': None, 'exclude_models': None, 'store_in_harddisk': True}

-------------Run PytorchTabular-------------

Training TabNet
Global seed set to 42
2023-09-12 11:33:59,986 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
2023-09-12 11:33:59,987 - {pytorch_tabular.tabular_datamodule:290} - INFO - Setting up the datamodule for classification task
2023-09-12 11:33:59,993 - {pytorch_tabular.tabular_model:521} - INFO - Preparing the Model: TabNetModel
2023-09-12 11:34:00,004 - {pytorch_tabular.tabular_model:268} - INFO - Preparing the Trainer
  rank_zero_deprecation(
Auto select gpus: [0]
GPU a

Unnamed: 0,Program,Model,Training ACCURACY_SCORE,Training BALANCED_ACCURACY_SCORE,Training COHEN_KAPPA_SCORE,Training HAMMING_LOSS,Training MATTHEWS_CORRCOEF,Training ZERO_ONE_LOSS,Training PRECISION_SCORE_MACRO,Training PRECISION_SCORE_MICRO,...,Validation F1_SCORE_MICRO,Validation F1_SCORE_WEIGHTED,Validation JACCARD_SCORE_MACRO,Validation JACCARD_SCORE_MICRO,Validation JACCARD_SCORE_WEIGHTED,Validation TOP_K_ACCURACY_SCORE,Validation LOG_LOSS,Validation ROC_AUC_SCORE_OVR_MACRO,Validation ROC_AUC_SCORE_OVR_WEIGHTED,Validation ROC_AUC_SCORE_OVO
0,PytorchTabular,TabNet,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,0.933333,0.934656,0.888889,0.875,0.88,1.0,0.303345,0.975059,0.966566,0.972956
1,WideDeep,TabNet,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,0.933333,0.934656,0.888889,0.875,0.88,1.0,0.297838,0.980985,0.975455,0.97994
2,TabNetFromAbstract,TabNet,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,0.9,0.901217,0.8375,0.818182,0.82625,1.0,0.17915,0.989874,0.988788,0.990417
3,TabNetFromTorch,TabNet,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,0.933333,0.934656,0.888889,0.875,0.88,1.0,0.296352,0.980985,0.975455,0.97994


Results show that models perform much worse on the validation set than on the testing set. To get reliable results, we recommend using cross-validation to get the leaderboard:

```python
# trainer.train(stderr_to_stdout=True)  # No need to run `train`
trainer.get_leaderboard(cross_validation=5, split_type="cv")
```