We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I see in your conversion notebook that you suggest that the number of tokens per batch should be the same as roberta: 2^18 = 260k
When I look at the roberta paper, it says it uses a sequence length of 512 and a batch size of 8k. This means that each batch has 512*8k = 4M tokens
Am I missing something?
The text was updated successfully, but these errors were encountered:
@ibeltagy would you be able to chime in here?
Sorry, something went wrong.
No branches or pull requests
I see in your conversion notebook that you suggest that the number of tokens per batch should be the same as roberta: 2^18 = 260k
When I look at the roberta paper, it says it uses a sequence length of 512 and a batch size of 8k. This means that each batch has 512*8k = 4M tokens
Am I missing something?
The text was updated successfully, but these errors were encountered: