Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to handle masking for RoBERTa #3359

Closed
matt-gardner opened this issue Oct 15, 2019 · 3 comments
Closed

Figure out how to handle masking for RoBERTa #3359

matt-gardner opened this issue Oct 15, 2019 · 3 comments

Comments

@matt-gardner
Copy link
Member

@matt-gardner matt-gardner commented Oct 15, 2019

For some reason, the RoBERTa folks chose to use 0 for the [CLS] token instead of for padding, which breaks a bunch of assumptions that we make in our code. We need to figure out a way to get a mask for RoBERTa that doesn't require the model knowing whether it's using BERT or RoBERTa. I'm not really sure how to do this, other than to add a key to the token indexer output dictionary, or something (which I believe is also what some of the more complex indexers do already).

@matt-gardner matt-gardner added this to the 1.0.0 milestone Oct 15, 2019
@ChunchuanLv

This comment has been minimized.

Copy link

@ChunchuanLv ChunchuanLv commented Oct 21, 2019

We can pre-decide all the special token ids in constructor, then them in forward pass, instead of using 0 or 1 directly.

For example, in token embedder init, we have

        if "roberta" in model_name:
            self.transformer_model = RobertaModel.from_pretrained(model_name, output_hidden_states=True)
            self.pad_id = 1  #or call some function
        else:
            self.transformer_model = BertModel.from_pretrained(model_name, output_hidden_states=True)
            self.pad_id = 0

Then, in forward

        input_mask = (input_ids != self.pad_id).long()

Similarly for the token indexer. Maybe, I missed something?

@matt-gardner

This comment has been minimized.

Copy link
Member Author

@matt-gardner matt-gardner commented Oct 27, 2019

This is the method that we use to get the mask: https://github.com/allenai/allennlp/blob/master/allennlp/nn/util.py#L555-L602. The issue isn't just when computing vectors, it's also when normalizing attentions, computing a loss over non-padded word indices, etc. We need the mask outside of the TokenEmbedder.

@matt-gardner

This comment has been minimized.

Copy link
Member Author

@matt-gardner matt-gardner commented Jan 3, 2020

I believe this is fixed by #3560. If someone thinks it's not, please comment here or open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.