Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch models #2

Closed
gerardb7 opened this issue Oct 8, 2020 · 3 comments
Closed

PyTorch models #2

gerardb7 opened this issue Oct 8, 2020 · 3 comments

Comments

@gerardb7
Copy link

gerardb7 commented Oct 8, 2020

Hi,
It seems that support for PyTorch models is currently limited to bert and xlm. Would it be possible to add support for lighter models, e.g. DistilBERT or ALBERT?
Do you think that using these models would hurt the performance of the scorers significantly?

Thanks!

@JulianSlzr
Copy link
Contributor

Thanks for the suggestion! I updated to Transformers 3.3.1 and added DistilBERT & ALBERT. The main thing is defining an *BERTForMaskedLMOptimized class for speed. You can follow my example to add support for other MLMs. Pull requests welcome 🙂.

I also ran the two on BLiMP. Though ALBERT improves on downstream tasks like GLUE over RoBERTa, its BLiMP scoring is on par with BERT. Likewise, DistilBERT does similar to BERT on GLUE, but is much worse on BLiMP (78% vs 84%).

Possible takeaways:

  • DistilBERT's KD objective and taking alternate layers has an effect. Quantifier and island effects performance degrades significantly (maybe the knowledge was encoded in the offset layers? maybe probabilities are now too soft?).
  • Having pre-training match evaluation is likely more important. We saw this with LibriSpeech in our paper. Here, ALBERT and BERT are trained on the same corpus, while RoBERTa is trained on a larger corpus that may cover BLiMP better.
# distilbert-base-cased
anaphor_agreement:  0.983
argument_structure:  0.7857777777777778
binding:  0.7335714285714285
control_raising:  0.7788
determiner:  0.970375
ellipsis:  0.915
filler_gap:  0.7464285714285716
irregular_forms:  0.9555
island_effects:  0.54925
npi_licensing:  0.7901428571428571
quantifiers:  0.5895
subject_verb_agreement:  0.8965000000000001
overall:  0.782955223880597

# albert-xxlarge-v2
anaphor_agreement:  0.956
argument_structure:  0.8375555555555555
binding:  0.7912857142857143
control_raising:  0.865
determiner:  0.9395
ellipsis:  0.8735
filler_gap:  0.8188571428571427
irregular_forms:  0.9255
island_effects:  0.74975
npi_licensing:  0.9115714285714285
quantifiers:  0.6739999999999999
subject_verb_agreement:  0.8808333333333334
overall:  0.8435820895522389

@gerardb7
Copy link
Author

Grand job, thanks a lot!

@Ago3
Copy link

Ago3 commented Jan 22, 2021

Hi,

I'm extending the framework to include another PyTorch model. When using MLMScorerPT we don't need to pass a vocab, do we? I couldn't find any function where it is actually used..

Thank you!

PS: Very cool work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants