Mask-Filling with pretrained BORT #9

patrickvonplaten · 2021-02-08T11:07:06Z

Hello, I am trying to get the "mask-filling" to work correctly with BORT and don't seem to get good results. In short, I load the pretrained BORT model including the decoder (use_decoder=True) and then pass as an input something along the lines "The weather is today." I would expect the pretrained BORT model with decoder head to be able to predict a sensible word in this case, but I only get weird results (such as predicting the token...) :-/

I made a short notebook that showcases exactly how I use BORT with gluonnlp and mxnet and am not able to get any good results for the mask-filling problem.

Here is the notebook: https://colab.research.google.com/drive/17qNu6g1s2KJEwuRl1s5c3ipk2-99dZfm?usp=sharing that:

a) loads a tokenizer + vocab
b) loads the pretrained model + decoder
c) runs a forward pass through the encoder and the decoder lm head
d) shows that the result is not as good as expected

@adewynter I would be very grateful if you could take a look at possible errors I've made in the notebook that could explain the strange behavior.

Thank you very much!

cc @stefan-it

The text was updated successfully, but these errors were encountered:

adewynter · 2021-02-08T16:48:03Z

Hi!

That's definitely a very odd error. Have you tried using the mask token implicitly? E.g., like in here. The reason being is that this particular version of Gluon is based on a script that did not originally support RoBERTa-style pretraining, so we kind of had to hack our way around that.

I'm unsure if the actual mask token in Gluon+RoBERTa + this version is defined as "<MASK>", so perhaps you could change it to vocab.mask_token and try again? Indeed, note how the output token is actually lowercase ("<unk>") on the notebook you shared.

Sorry about that!

-Adrian

PS: We never did mask prediction because Bort was not designed for that -- indeed, we have conjectured in the paper that MLM/pre-training is not something that is even needed. So, you should get "a" token (you shouldn't get the unknown token for sure!), but I'm not superduper sure if it'll be a sensible token, as you say.

adewynter · 2021-02-19T18:34:47Z

Hey hey, did it work? I checked on this page and looks like lowercasing the tokens was the way to go. Should I close this then?

patrickvonplaten · 2021-02-22T14:13:36Z

Sadly, @stefan-it and I didn't get it to work yet :-/. As you can see on this page the proposed word to fill the mask don't seem to be sensible. I tried to make it work using gluonnlp and mxnet in the notebook posted above without success. If you find a bit of time to make the mask-filling work in the gluonnlp library this would be super helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mask-Filling with pretrained BORT #9

Mask-Filling with pretrained BORT #9

patrickvonplaten commented Feb 8, 2021

adewynter commented Feb 8, 2021

adewynter commented Feb 19, 2021

patrickvonplaten commented Feb 22, 2021

Mask-Filling with pretrained BORT #9

Mask-Filling with pretrained BORT #9

Comments

patrickvonplaten commented Feb 8, 2021

adewynter commented Feb 8, 2021

adewynter commented Feb 19, 2021

patrickvonplaten commented Feb 22, 2021