You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@dirkneuhaeuser Thanks for making the world a better place, your classifier is extremely helpful for natural language understanding.
Unfortunately, 91% accuracy is still not really great for widespread use. I actively follow the evolution of transformers.
Your use of BERT was a great choice at the time since it is a strong baseline (I'll assume you already use BERT-large),
however there are now significantly better transformers than BERT, which generally bring a few percents accuracy gains and this difference can be major for enabling real world use.
As such, I would love if you could replace your BERT implementation by a XLnet one (best transformer out there) or by https://github.com/microsoft/MPNet (MPNet is an evolution of XLnet although it might be significantly slower in terms of training time), xlnet on the other hand is relatively comparable to BERT regarding training times.
Another, lesser known and complementary way to reach higher accuracy would be to use a better activtion function (Mish), a better optimizer (RAdam), eventually with fallback optimizers (such as lookahead) and methods such as gradient centralization. Each of those generally bring ~1-2% accuracy gains.
cf https://github.com/lessw2020/Best-Deep-Learning-Optimizers
As for XLnet it can in many case bring a +5% accuracy gain over BERT-large
The text was updated successfully, but these errors were encountered:
@dirkneuhaeuser Thanks for making the world a better place, your classifier is extremely helpful for natural language understanding.
Unfortunately, 91% accuracy is still not really great for widespread use. I actively follow the evolution of transformers.
Your use of BERT was a great choice at the time since it is a strong baseline (I'll assume you already use BERT-large),
however there are now significantly better transformers than BERT, which generally bring a few percents accuracy gains and this difference can be major for enabling real world use.
As such, I would love if you could replace your BERT implementation by a XLnet one (best transformer out there) or by https://github.com/microsoft/MPNet (MPNet is an evolution of XLnet although it might be significantly slower in terms of training time), xlnet on the other hand is relatively comparable to BERT regarding training times.
Another, lesser known and complementary way to reach higher accuracy would be to use a better activtion function (Mish), a better optimizer (RAdam), eventually with fallback optimizers (such as lookahead) and methods such as gradient centralization. Each of those generally bring ~1-2% accuracy gains.
cf https://github.com/lessw2020/Best-Deep-Learning-Optimizers
As for XLnet it can in many case bring a +5% accuracy gain over BERT-large
The text was updated successfully, but these errors were encountered: