Fix random behavior when loading ELECTRA models #599

bogdankostic · 2020-10-21T14:45:18Z

This PR should fix #519. @brandenchan @stefan-it @PhilipMay

Summary of the bug

When training an ELECTRA model, saving it and loading it later on, the loaded model produced different (much worse) predictions than the trained model.

Root of the bug

In this transformers PR, summary_use_proj was set by default to True, which means that a linear layer is stacked on top of pooling. Therefore, each time an ELECTRA model is loaded in FARM, the weights of this linear layer are randomly initialized.

Fix

Set summary_use_proj to False in order to not add that linear layer.

tholor

Nice finding! This was a nasty one :)

Timoeller

Looking good. Thanks for the detailed description @bogdankostic

What a big headache this behaviour created and what small changes needed to fix that headache.

Set summary_use_proj to False

20ea44e

Timoeller self-requested a review October 21, 2020 15:12

tholor approved these changes Oct 21, 2020

View reviewed changes

Timoeller approved these changes Oct 21, 2020

View reviewed changes

Timoeller merged commit dac388a into master Oct 21, 2020

brandenchan mentioned this pull request Nov 3, 2020

Release German Electra #567

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix random behavior when loading ELECTRA models #599

Fix random behavior when loading ELECTRA models #599

bogdankostic commented Oct 21, 2020 •

edited by Timoeller

tholor left a comment

Timoeller left a comment

Fix random behavior when loading ELECTRA models #599

Fix random behavior when loading ELECTRA models #599

Conversation

bogdankostic commented Oct 21, 2020 • edited by Timoeller

Summary of the bug

Root of the bug

Fix

tholor left a comment

Choose a reason for hiding this comment

Timoeller left a comment

Choose a reason for hiding this comment

bogdankostic commented Oct 21, 2020 •

edited by Timoeller