Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of tokens per batch mismatch - longformer vs roberta #248

Open
nbroad1881 opened this issue Dec 13, 2022 · 1 comment
Open

Number of tokens per batch mismatch - longformer vs roberta #248

nbroad1881 opened this issue Dec 13, 2022 · 1 comment

Comments

@nbroad1881
Copy link

nbroad1881 commented Dec 13, 2022

I see in your conversion notebook that you suggest that the number of tokens per batch should be the same as roberta: 2^18 = 260k

When I look at the roberta paper, it says it uses a sequence length of 512 and a batch size of 8k. This means that each batch has 512*8k = 4M tokens

Am I missing something?

@nbroad1881
Copy link
Author

@ibeltagy would you be able to chime in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant