Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DPR training batch size #898

Merged
merged 5 commits into from
Mar 17, 2021
Merged

Fix DPR training batch size #898

merged 5 commits into from
Mar 17, 2021

Conversation

brandenchan
Copy link
Contributor

@brandenchan brandenchan commented Mar 17, 2021

We found that we could actually fit 16 samples per batch on a V100 GPU when training a DPR model at:

max_seq_len_query=64,
max_seq_len_passage=256

As pointed out by #896, the training batch size in the paper is actually 128 so we have set grad_acc_steps=8 to match the original experiment. We have also updated the expected performance metrics after training.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@brandenchan brandenchan merged commit 24d0c4d into master Mar 17, 2021
@brandenchan brandenchan deleted the fix_dpr_bs2 branch March 17, 2021 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants