Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you improve the state of the art once again? #1

Open
LifeIsStrange opened this issue Apr 7, 2022 · 1 comment
Open

Could you improve the state of the art once again? #1

LifeIsStrange opened this issue Apr 7, 2022 · 1 comment

Comments

@LifeIsStrange
Copy link

LifeIsStrange commented Apr 7, 2022

@dirkneuhaeuser Thanks for making the world a better place, your classifier is extremely helpful for natural language understanding.
Unfortunately, 91% accuracy is still not really great for widespread use. I actively follow the evolution of transformers.
Your use of BERT was a great choice at the time since it is a strong baseline (I'll assume you already use BERT-large),
however there are now significantly better transformers than BERT, which generally bring a few percents accuracy gains and this difference can be major for enabling real world use.
As such, I would love if you could replace your BERT implementation by a XLnet one (best transformer out there) or by https://github.com/microsoft/MPNet (MPNet is an evolution of XLnet although it might be significantly slower in terms of training time), xlnet on the other hand is relatively comparable to BERT regarding training times.

Another, lesser known and complementary way to reach higher accuracy would be to use a better activtion function (Mish), a better optimizer (RAdam), eventually with fallback optimizers (such as lookahead) and methods such as gradient centralization. Each of those generally bring ~1-2% accuracy gains.
cf https://github.com/lessw2020/Best-Deep-Learning-Optimizers
As for XLnet it can in many case bring a +5% accuracy gain over BERT-large

@LifeIsStrange
Copy link
Author

note that the transformers library could be used and that ROBERTA would be an even lower hanging fruit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant