Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELECTRA: use gelu for pooled output of ELECTRA model #364

Merged

Conversation

stefan-it
Copy link
Contributor

Hi,

this PR fixes the activation function for pooled output from the ELECTRA model, to match the original implementation.

gelu is now used, wheras e.g. BERT uses tanh as activation function. See discussion in #362.

@stefan-it
Copy link
Contributor Author

@brandenchan Would be interesting so see the performance diff :)

@brandenchan brandenchan self-requested a review May 14, 2020 15:36
@brandenchan
Copy link
Contributor

Great! I tested this on an Electra checkpoint and the performance is still good (in fact gelu got 0.2% averaged over 3 germeval tasks than tanh). Our CI is currently crashing but when I tested the branch locally, all tests passed!

@brandenchan brandenchan merged commit b8c5299 into deepset-ai:master May 14, 2020
@stefan-it stefan-it deleted the electra-pooled-output-activation-fix branch May 14, 2020 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants