Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabNet overfits (help wanted, not a bug) #522

Closed
micheusch opened this issue Oct 24, 2023 · 9 comments
Closed

TabNet overfits (help wanted, not a bug) #522

micheusch opened this issue Oct 24, 2023 · 9 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@micheusch
Copy link

Model overfits severely, feature importance limited to less than 10 features

What is the current behavior?
I'm solving a binary classification problem on a dozen rolling window monthly snapshots. My dataset has 70k rows, 100+ features.
When solving with Random Forest or Gradient Boosting, feature importance spreads over a large number of features and remains consistent, with the boxplot of feature importance showing limited range of variability.
With TabNet, each month's model non-zero feature importance is on a small (often less than 10) number of features which vary wildly from month to month, which I assume comes down to overfitting.

I tried a few options to reduce what could lead to over-fitting:

  • set n_d=n_a=8
  • also tried n_steps=2
  • augmentations using ClassificationSMOTE

but none of these seemed to help.

Expected behavior
Month on month features used being more consistent and a larger number of features being considered.
Is there anything obvious I may have done wrong to explain this behaviour?
Many thanks!

@micheusch micheusch added the bug Something isn't working label Oct 24, 2023
@Optimox Optimox added help wanted Extra attention is needed and removed bug Something isn't working labels Oct 26, 2023
@Optimox
Copy link
Collaborator

Optimox commented Oct 26, 2023

There is not enough information to consider you are in the over fitting realm : what is your train vs valid score ? the fact that features change from month to month simply shows that you have data shifts over time, not that it's not reasonable to rely on different features.

You can also set lambda_sparse to 0 to limit sparsity.

You can also limit number of epochs to avoid overfitting.

@micheusch
Copy link
Author

thanks, I will investigate further and let you know!

@micheusch
Copy link
Author

Fair enough, after plotting loss curves doesn’t look like over-fitting per se.
See below for n_d=n_a=8, n_steps=2, lambda_sparse=e-6
image001

Model is retrained monthly, but while there may be data shifts over time it shouldn’t be too dramatic as there’s only 2% of the time-period changing from one month to the next.

There could be variations from the random customer base used for the validation set, but again this is only 10% of the data. Not clear why this would be so much more erratic for TabNet than XGBoost.
Boxplot below shows feature importance variation on 12 consecutive models
image002

Looking at a few of the most unstable features, feature importance changes look a bit erratic.
image003

Compare this to the same box-plot for XGBoost, features are picked up much more consistently.
image004

Any thoughts on what could be causing these variations.
Many thanks!

@Optimox
Copy link
Collaborator

Optimox commented Nov 3, 2023

Are you computing feature importance on the training set or the monthly validation set?

Do you have better predictive scores with XGBoost or TabNet ?

Are you sure that all your tabnet models converge correctly before the end of training?

@micheusch
Copy link
Author

micheusch commented Nov 3, 2023

Hi again @Optimox,

Feature importances computed on training set as per here.

I have slightly better performance with XGBoost on most months.

I had max_epochs=100, patience=60.
Now I've increased it to max_epochs=200 I have a mix of runs ending between 120 and 200 epochs, a few getting all the way to 200, although loss trajectory looks very flat.

Many thanks again!

@Optimox
Copy link
Collaborator

Optimox commented Nov 3, 2023

Are you using a learning rate scheduler ?

IMO, the best way to train neural networks is to tweak the learning rate and number of epochs so that you don't need to use early stopping anymore. With a good decay and number of epochs your model should reach its best validation score at the last epoch (or very close to best score). I do not know the size of your dataset here, but early stopping could be one explanation for the large differences you see on a monthly basis.

@micheusch
Copy link
Author

micheusch commented Nov 3, 2023 via email

@micheusch
Copy link
Author

Hi again, so, also getting this behaviour using a StepLR scheduler, training to 200 epochs, not using early stopping

@Optimox
Copy link
Collaborator

Optimox commented Nov 10, 2023

Then I don't know, are you sure you are always feeding your features in the same order and correctly attributing the features and the importance ?

@Optimox Optimox closed this as completed Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants