Hyperparameter Tunning #382

balazsgonczy · 2022-04-07T18:18:44Z

Hi,

I would like to know whether it worths fine-tunning the hyperparameters of TABNET for a binary classification task.
Also if it is, then which approach would you suggest taking?

Best,

Balázs

Optimox · 2022-04-08T08:08:27Z

Hello @balazsgonczy,

Sure it's worth to try to tune the parameters.

There is plenty of example out there on how to tune TabNet.

Here is just one example with optuna:
https://www.kaggle.com/code/neilgibbons/tuning-tabnet-with-optuna/notebook

balazsgonczy · 2022-04-08T15:42:52Z

There are these 2 parameters called: "cat_idxs" and "cat_dims". What are they representing? The documentations are not so understandable to me? I have categorical variables, but I haven't specified these parameters and the model still performs nicely. Is this an issue? (I am doing binary classification.)

Optimox · 2022-04-08T16:15:37Z

Those parameters are useful for categorical embedding, however you can't tune them (they are fixed parameters depending on your dataset). If you don't specify them the model won't create embeddings and will treat categorical variables as numerical.

If you specify them then emb_dim is a tunable parameter (1 should be fine if you don't have a huge number of categories)

You can have a look at how those parameters are used in the example notebooks of the repo.

balazsgonczy · 2022-04-09T19:27:28Z

So let me ask it like this:

I have a table with categorical features in the 1st and 3rd columns. The 1st feature has 3- (0,1,2), and the 3rd one has 2 (0,1) levels.

-cat_idxs -> [1,3]

-cat_dims -> [3,2]

-cat_emb_dim -> [3,2] #Here I am just guessing. I need further explanation on how to choose the cat_emb_dim list items.

Am I comprehending the parameters right?

Optimox · 2022-04-09T23:04:23Z

Yes you are right except than indexes in python start at 0 so it should be [0,2] not [1,3].

For the embedding dimensions, with low modalities like this you can set them all to 1.
Embeddings can follow rule of thumbs of n_emb=log(n_mod). You can search more the internet about what’s the best dimensions but in me experience bellow 10 or 50 modalities you can just set the embeddings to 1 or 2 and you’ll get full power.

balazsgonczy · 2022-04-10T06:55:51Z

Last question: What do you mean by modality here? Do you mean the type of datasource in the table, like images, text etc.? Sorry this way it tells me nothing. :D

balazsgonczy · 2022-04-10T07:08:41Z

I think I'll drop these, because I have already done encoding and also had to logscale a few of them due to not normal distribution of the variables. But thank you very much! Please if you have time then answer my last question and you can close this thread!
("What do you mean by modality here?")

Optimox · 2022-04-10T08:39:16Z

What I mean by modalities is the number of unique values in one categorical column.

balazsgonczy · 2022-04-10T09:04:12Z

Oh so you meant the set. Then thx! :)

Sry again for my question:

I have run my fine-tuned optuna algorithm and it returned something like this:

Best parameters: {'mask_type': 'entmax', 'n_d': 60, 'n_steps': 7, 'gamma': 1.0, 'n_independent': 2, 'n_shared': 5, 'momentum': 0.35000000000000003, 'lambda_sparse': 3.907960748444502e-06, 'optimizer_fn': , 'patienceScheduler': 9, 'patience': 25}

My question is: Where shall I put the patienceScheduler and the patience parameters during the model initialization? I think the latter goes somewhere here:

TabNetClassifier(...
scheduler_params = dict(mode="min",patience=25),
...)

But where does the patienceScheduler parameter go?

Optimox · 2022-04-11T08:06:46Z

It does not make sense to me to try to optimize patience.

Patience is here to save you some time : if an experiment does not seem to get any better after 5 or 10 epochs (on a total of 50 or 100), just move on to another experiment and don't waste your time on this one.

Trying to optimize patience defies completely this purpose, so just have a look at a few training logs and decide if after 5 (or 10 or 50 whatever) epochs of no improvement it's not worth training longer.

Moreover there is the main patience which performs early stopping (saving you time), and there is a patience parameter in some learning rate scheduler (which lower the learning rate when things are not improving). You need the scheduler's patience to be lower than the early stopping patience, otherwise you'll never decay your learning rate at all.

Please also note that you can perfectly train a TabNet model without any of those 2 patience parameters : use a OneCycleLearningRate scheduler (like in this notebook : https://www.kaggle.com/code/optimo/the-beauty-of-tabnet-a-simple-baseline) and disable patience by setting it to -1. The only important parameters now become the number of epochs of training and the learning rate: try to minimize the number of epochs to make as many experiment as you can as quickly as possible.

In the end, I think you won't gain much by doing a blind hyper parameter optimization, you need to understand what each parameter does and get a feeling of what's happening before doing any random grid search properly.

For example, without knowing your dataset or you problem I'm almost certain that(I might be wrong on your specific problem):

n_shared is too big in your experiment
n_steps is too big
you do not need to change the momentum
switch to one CycleLearningRate and you'll improve your scores

Good luck!

balazsgonczy · 2022-04-14T15:43:14Z

Could you please explain why do you think the 4 recommendation should work (with citation if possible)?

"

n_shared is too big in your experiment
n_steps is too big
you do not need to change the momentum
switch to one CycleLearningRate and you'll improve your scores

"

Optimox · 2022-04-14T16:31:25Z

Not really, those are just my personal feelings, I may be wrong. Nothing scientific here.

Please share the results of your experiments if you try those suggestion so that we can benefit from your results.

balazsgonczy · 2022-06-16T14:38:24Z

Hi Optimox,

I would like to create a hyperparameter range for my thesis, and I wonder what the value range of "lambda_sparse is". It starts from 0.01, and the step size is 0.000001 for me. So I suppose the minimum value is around 0, and the max would be 1?

I look forward to your feedback!

Best,

Balázs

Optimox · 2022-06-20T08:20:42Z

@balazsgonczy lambda_sparse is a multiplicative weight for the sparsity loss.
0 means that you don't add any constraint on sparsity. The maximum acceptable value for lambda_sparse depends on the loss function you are using on your problem. If you have an RMSE arround 10K and the sparsity loss has scores arround 0.1, then you can set a high weight if you want, but if your average loss is around 1e-5, then a weight of 0.1 might be too high already.

balazsgonczy added the enhancement New feature or request label Apr 7, 2022

balazsgonczy assigned eduardocarvp and Optimox Apr 7, 2022

Optimox closed this as completed Apr 14, 2022

Optimox mentioned this issue Mar 23, 2023

Running error: all loss is 0 when using GridSearchCV and patience=10 no use #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter Tunning #382

Hyperparameter Tunning #382

balazsgonczy commented Apr 7, 2022

Optimox commented Apr 8, 2022

balazsgonczy commented Apr 8, 2022

Optimox commented Apr 8, 2022

balazsgonczy commented Apr 9, 2022

Optimox commented Apr 9, 2022

balazsgonczy commented Apr 10, 2022

balazsgonczy commented Apr 10, 2022

Optimox commented Apr 10, 2022

balazsgonczy commented Apr 10, 2022

Optimox commented Apr 11, 2022 •

edited

Loading

balazsgonczy commented Apr 14, 2022

Optimox commented Apr 14, 2022

balazsgonczy commented Jun 16, 2022

Optimox commented Jun 20, 2022

Hyperparameter Tunning #382

Hyperparameter Tunning #382

Comments

balazsgonczy commented Apr 7, 2022

Optimox commented Apr 8, 2022

balazsgonczy commented Apr 8, 2022

Optimox commented Apr 8, 2022

balazsgonczy commented Apr 9, 2022

Optimox commented Apr 9, 2022

balazsgonczy commented Apr 10, 2022

balazsgonczy commented Apr 10, 2022

Optimox commented Apr 10, 2022

balazsgonczy commented Apr 10, 2022

Optimox commented Apr 11, 2022 • edited Loading

balazsgonczy commented Apr 14, 2022

Optimox commented Apr 14, 2022

balazsgonczy commented Jun 16, 2022

Optimox commented Jun 20, 2022

Optimox commented Apr 11, 2022 •

edited

Loading