ELECTRA: use gelu for pooled output of ELECTRA model #364

stefan-it · 2020-05-13T13:40:40Z

Hi,

this PR fixes the activation function for pooled output from the ELECTRA model, to match the original implementation.

gelu is now used, wheras e.g. BERT uses tanh as activation function. See discussion in #362.

…al implementation

stefan-it · 2020-05-13T13:41:12Z

@brandenchan Would be interesting so see the performance diff :)

brandenchan · 2020-05-14T15:37:47Z

Great! I tested this on an Electra checkpoint and the performance is still good (in fact gelu got 0.2% averaged over 3 germeval tasks than tanh). Our CI is currently crashing but when I tested the branch locally, all tests passed!

modeling: use gelu for pooled output of ELECTRA model to match origin…

98affdc

…al implementation

brandenchan self-requested a review May 14, 2020 15:36

brandenchan approved these changes May 14, 2020

View reviewed changes

brandenchan merged commit b8c5299 into deepset-ai:master May 14, 2020

stefan-it deleted the electra-pooled-output-activation-fix branch May 14, 2020 18:36

brandenchan mentioned this pull request May 28, 2020

Fix activation function for pooled ELECTRA output #362

Closed

Timoeller mentioned this pull request Sep 8, 2020

Add Electra Specific Classification Head #530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELECTRA: use gelu for pooled output of ELECTRA model #364

ELECTRA: use gelu for pooled output of ELECTRA model #364

stefan-it commented May 13, 2020

stefan-it commented May 13, 2020

brandenchan commented May 14, 2020

ELECTRA: use gelu for pooled output of ELECTRA model #364

ELECTRA: use gelu for pooled output of ELECTRA model #364

Conversation

stefan-it commented May 13, 2020

stefan-it commented May 13, 2020

brandenchan commented May 14, 2020