Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix random behavior when loading ELECTRA models #599

Merged
merged 1 commit into from Oct 21, 2020

Conversation

bogdankostic
Copy link
Contributor

@bogdankostic bogdankostic commented Oct 21, 2020

This PR should fix #519. @brandenchan @stefan-it @PhilipMay

Summary of the bug

When training an ELECTRA model, saving it and loading it later on, the loaded model produced different (much worse) predictions than the trained model.

Root of the bug

In this transformers PR, summary_use_proj was set by default to True, which means that a linear layer is stacked on top of pooling. Therefore, each time an ELECTRA model is loaded in FARM, the weights of this linear layer are randomly initialized.

Fix

Set summary_use_proj to False in order to not add that linear layer.

@Timoeller Timoeller self-requested a review October 21, 2020 15:12
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding! This was a nasty one :)

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for the detailed description @bogdankostic

What a big headache this behaviour created and what small changes needed to fix that headache.

@Timoeller Timoeller merged commit dac388a into master Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Getting different predictions on different runs with same ELECTRA model.
3 participants