Skip to content
This repository has been archived by the owner on Aug 31, 2022. It is now read-only.

Mask-Filling with pretrained BORT #9

Open
patrickvonplaten opened this issue Feb 8, 2021 · 3 comments
Open

Mask-Filling with pretrained BORT #9

patrickvonplaten opened this issue Feb 8, 2021 · 3 comments

Comments

@patrickvonplaten
Copy link

Hello, I am trying to get the "mask-filling" to work correctly with BORT and don't seem to get good results. In short, I load the pretrained BORT model including the decoder (use_decoder=True) and then pass as an input something along the lines "The weather is today." I would expect the pretrained BORT model with decoder head to be able to predict a sensible word in this case, but I only get weird results (such as predicting the token...) :-/

I made a short notebook that showcases exactly how I use BORT with gluonnlp and mxnet and am not able to get any good results for the mask-filling problem.

Here is the notebook: https://colab.research.google.com/drive/17qNu6g1s2KJEwuRl1s5c3ipk2-99dZfm?usp=sharing that:

a) loads a tokenizer + vocab
b) loads the pretrained model + decoder
c) runs a forward pass through the encoder and the decoder lm head
d) shows that the result is not as good as expected

@adewynter I would be very grateful if you could take a look at possible errors I've made in the notebook that could explain the strange behavior.

Thank you very much!

cc @stefan-it

@adewynter
Copy link
Contributor

Hi!

That's definitely a very odd error. Have you tried using the mask token implicitly? E.g., like in here. The reason being is that this particular version of Gluon is based on a script that did not originally support RoBERTa-style pretraining, so we kind of had to hack our way around that.

I'm unsure if the actual mask token in Gluon+RoBERTa + this version is defined as "<MASK>", so perhaps you could change it to vocab.mask_token and try again? Indeed, note how the output token is actually lowercase ("<unk>") on the notebook you shared.

Sorry about that!

-Adrian

PS: We never did mask prediction because Bort was not designed for that -- indeed, we have conjectured in the paper that MLM/pre-training is not something that is even needed. So, you should get "a" token (you shouldn't get the unknown token for sure!), but I'm not superduper sure if it'll be a sensible token, as you say.

@adewynter
Copy link
Contributor

Hey hey, did it work? I checked on this page and looks like lowercasing the tokens was the way to go. Should I close this then?

@patrickvonplaten
Copy link
Author

Sadly, @stefan-it and I didn't get it to work yet :-/. As you can see on this page the proposed word to fill the mask don't seem to be sensible. I tried to make it work using gluonnlp and mxnet in the notebook posted above without success. If you find a bit of time to make the mask-filling work in the gluonnlp library this would be super helpful!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants