PyTorch models #2

gerardb7 · 2020-10-08T19:04:14Z

Hi,
It seems that support for PyTorch models is currently limited to bert and xlm. Would it be possible to add support for lighter models, e.g. DistilBERT or ALBERT?
Do you think that using these models would hurt the performance of the scorers significantly?

Thanks!

JulianSlzr · 2020-10-11T08:41:48Z

Thanks for the suggestion! I updated to Transformers 3.3.1 and added DistilBERT & ALBERT. The main thing is defining an *BERTForMaskedLMOptimized class for speed. You can follow my example to add support for other MLMs. Pull requests welcome 🙂.

I also ran the two on BLiMP. Though ALBERT improves on downstream tasks like GLUE over RoBERTa, its BLiMP scoring is on par with BERT. Likewise, DistilBERT does similar to BERT on GLUE, but is much worse on BLiMP (78% vs 84%).

Possible takeaways:

DistilBERT's KD objective and taking alternate layers has an effect. Quantifier and island effects performance degrades significantly (maybe the knowledge was encoded in the offset layers? maybe probabilities are now too soft?).
Having pre-training match evaluation is likely more important. We saw this with LibriSpeech in our paper. Here, ALBERT and BERT are trained on the same corpus, while RoBERTa is trained on a larger corpus that may cover BLiMP better.

# distilbert-base-cased
anaphor_agreement:  0.983
argument_structure:  0.7857777777777778
binding:  0.7335714285714285
control_raising:  0.7788
determiner:  0.970375
ellipsis:  0.915
filler_gap:  0.7464285714285716
irregular_forms:  0.9555
island_effects:  0.54925
npi_licensing:  0.7901428571428571
quantifiers:  0.5895
subject_verb_agreement:  0.8965000000000001
overall:  0.782955223880597

# albert-xxlarge-v2
anaphor_agreement:  0.956
argument_structure:  0.8375555555555555
binding:  0.7912857142857143
control_raising:  0.865
determiner:  0.9395
ellipsis:  0.8735
filler_gap:  0.8188571428571427
irregular_forms:  0.9255
island_effects:  0.74975
npi_licensing:  0.9115714285714285
quantifiers:  0.6739999999999999
subject_verb_agreement:  0.8808333333333334
overall:  0.8435820895522389

gerardb7 · 2020-10-14T11:45:41Z

Grand job, thanks a lot!

Ago3 · 2021-01-22T12:52:55Z

Hi,

I'm extending the framework to include another PyTorch model. When using MLMScorerPT we don't need to pass a vocab, do we? I couldn't find any function where it is actually used..

Thank you!

PS: Very cool work :)

JulianSlzr added a commit that referenced this issue Oct 11, 2020

Add DistilBERT support (#2); fix XLM support for transformers v3

da49bed

JulianSlzr added a commit that referenced this issue Oct 11, 2020

Add ALBERT support (#2); XLM will prompt for lang only if needed

f33f240

JulianSlzr added a commit that referenced this issue Oct 11, 2020

Update tests, add BLiMP results for DistilBERT/ALBERT (#2).

6727297

JulianSlzr closed this as completed Oct 14, 2020

JulianSlzr mentioned this issue Jan 4, 2022

how to integrate models not available via huggingface or gluon? #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch models #2

PyTorch models #2

gerardb7 commented Oct 8, 2020

JulianSlzr commented Oct 11, 2020

gerardb7 commented Oct 14, 2020

Ago3 commented Jan 22, 2021

PyTorch models #2

PyTorch models #2

Comments

gerardb7 commented Oct 8, 2020

JulianSlzr commented Oct 11, 2020

gerardb7 commented Oct 14, 2020

Ago3 commented Jan 22, 2021