# Bayesian hyperparameter optimization

Hyperparameters are crucial for machine learning models. The optimal hyperparameters usually differ among different tasks. To find a better set of hyperparameters than the initial values, we provide an entry in the configuration to activate Bayesian hyperparameter optimization.

The functionality is implemented based on `scikit-optimize` ([link](https://scikit-optimize.github.io/)).

In [1]:
import torch
from tabensemb.trainer import Trainer
from tabensemb.model import *
from tabensemb.config import UserConfig
import tabensemb
import os

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))

from tempfile import TemporaryDirectory

temp_path = TemporaryDirectory()
tabensemb.setting["default_output_path"] = os.path.join(temp_path.name, "output")
tabensemb.setting["default_config_path"] = os.path.join(temp_path.name, "configs")
tabensemb.setting["default_data_path"] = os.path.join(temp_path.name, "data")

trainer = Trainer(device=device)
mpg_columns = [
    "mpg",
    "cylinders",
    "displacement",
    "horsepower",
    "weight",
    "acceleration",
    "model_year",
    "origin",
    "car_name",
]
cfg = UserConfig.from_uci("Auto MPG", column_names=mpg_columns, sep=r"\s+")
trainer.load_config(cfg)
trainer.load_data()
models = [
    PytorchTabular(trainer, model_subset=["Category Embedding"]),
]
trainer.add_modelbases(models)

Using cuda device
Downloading https://archive.ics.uci.edu/static/public/9/auto+mpg.zip to /tmp/tmphxmeot_i/data/Auto MPG.zip
cylinders is Integer and will be treated as a continuous feature.
model_year is Integer and will be treated as a continuous feature.
origin is Integer and will be treated as a continuous feature.
Unknown values are detected in ['horsepower']. They will be treated as np.nan.
The project will be saved to /tmp/tmphxmeot_i/output/auto-mpg/2023-09-23-20-37-29-0_UserInputConfig
Dataset size: 238 80 80
Data saved to /tmp/tmphxmeot_i/output/auto-mpg/2023-09-23-20-37-29-0_UserInputConfig (data.csv and tabular_data.csv).


The initial hyperparameters can be seen using the following line.

In [2]:
models[0]._get_params("Category Embedding")

{'dropout': 0.0,
 'embedding_dropout': 0.1,
 'lr': 0.001,
 'weight_decay': 1e-09,
 'batch_size': 1024}

Let us see the performance of the model using initial hyperparameters.

In [3]:
trainer.train(stderr_to_stdout=True)
trainer.get_leaderboard()


-------------Run PytorchTabular-------------

Training Category Embedding
Global seed set to 42
2023-09-23 20:37:30,020 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
2023-09-23 20:37:30,021 - {pytorch_tabular.tabular_datamodule:290} - INFO - Setting up the datamodule for regression task
2023-09-23 20:37:30,029 - {pytorch_tabular.tabular_model:521} - INFO - Preparing the Model: CategoryEmbeddingModel
2023-09-23 20:37:30,041 - {pytorch_tabular.tabular_model:268} - INFO - Preparing the Trainer
  rank_zero_deprecation(
Auto select gpus: [0]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
2023-09-23 20:37:31,045 - {pytorch_tabular.tabular_model:582} - INFO - Training Started
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will tr

Unnamed: 0,Program,Model,Training RMSE,Training MSE,Training MAE,Training MAPE,Training R2,Training MEDIAN_ABSOLUTE_ERROR,Training EXPLAINED_VARIANCE_SCORE,Testing RMSE,...,Testing R2,Testing MEDIAN_ABSOLUTE_ERROR,Testing EXPLAINED_VARIANCE_SCORE,Validation RMSE,Validation MSE,Validation MAE,Validation MAPE,Validation R2,Validation MEDIAN_ABSOLUTE_ERROR,Validation EXPLAINED_VARIANCE_SCORE
0,PytorchTabular,Category Embedding,3.354362,11.251746,2.445915,0.101659,0.825442,1.775388,0.854523,2.799644,...,0.854221,1.963455,0.888258,3.51671,12.36725,2.731159,0.125136,0.779071,2.375105,0.808039


To activate Bayesian hyperparameter optimization, change the value of `bayes_opt` to `True` in the configuration file or later in `trainer.args`. By improving the MSE loss (default for regression tasks) or log loss (default for classification tasks) on the validation set, the performance on the testing set might improve.

**Remark**: But the improvement is not always guaranteed (neither on the validation set nor on the testing set):

1. The number of epochs (the configuration `bayes_epoch`) used in an optimization iteration is less than that used in the final formal training.
2. The validation set and the testing set might come from different distributions, which forms the so-called observational bias.

The hyperparameters obtained by Bayesian hyperparameter optimization are saved and will be loaded and used automatically in the future.

In [4]:
import warnings
trainer.args["bayes_opt"] = True
trainer.args["bayes_epoch"] = 20
trainer.args["bayes_calls"] = 30
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", module="pytorch_lightning")
    trainer.train(stderr_to_stdout=True)
trainer.get_leaderboard()


-------------Run PytorchTabular-------------

Training Category Embedding
Bayes-opt 1/30, tot 0.62s, avg 0.62s/it: {'ls': 349.8274, 'param': [0.0, 0.1, 0.001, 0.0, 1024], 'min ls': 349.8274, 'min param': [0.0, 0.1, 0.001, 0.0, 1024], 'min at': 1}
Bayes-opt 2/30, tot 1.16s, avg 0.58s/it: {'ls': 38.2165, 'param': [0.29642, 0.42213, 0.02068, 0.00333, 512], 'min ls': 38.2165, 'min param': [0.29642, 0.42213, 0.02068, 0.00333, 512], 'min at': 2}
Bayes-opt 3/30, tot 1.81s, avg 0.60s/it: {'ls': 629.1395, 'param': [0.19219, 0.14877, 0.00014, 0.0, 256], 'min ls': 38.2165, 'min param': [0.29642, 0.42213, 0.02068, 0.00333, 512], 'min at': 2}
Bayes-opt 4/30, tot 2.26s, avg 0.56s/it: {'ls': 324.7059, 'param': [0.40608, 0.23999, 0.00115, 0.00273, 256], 'min ls': 38.2165, 'min param': [0.29642, 0.42213, 0.02068, 0.00333, 512], 'min at': 2}
Bayes-opt 5/30, tot 2.60s, avg 0.52s/it: {'ls': 30.744, 'param': [0.32409, 0.18412, 0.03831, 0.0, 1024], 'min ls': 30.744, 'min param': [0.32409, 0.18412, 0.03831,

Unnamed: 0,Program,Model,Training RMSE,Training MSE,Training MAE,Training MAPE,Training R2,Training MEDIAN_ABSOLUTE_ERROR,Training EXPLAINED_VARIANCE_SCORE,Testing RMSE,...,Testing R2,Testing MEDIAN_ABSOLUTE_ERROR,Testing EXPLAINED_VARIANCE_SCORE,Validation RMSE,Validation MSE,Validation MAE,Validation MAPE,Validation R2,Validation MEDIAN_ABSOLUTE_ERROR,Validation EXPLAINED_VARIANCE_SCORE
0,PytorchTabular,Category Embedding,2.304163,5.309167,1.631622,0.065262,0.917634,1.16303,0.919889,2.068042,...,0.920456,1.294664,0.923491,2.860293,8.181274,2.009064,0.087603,0.85385,1.49169,0.854234
