-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any benchmark results? #1
Comments
|
Thanks. Does 24 indicate the global batch size? i.e.: 12 samples of length 128 on each Tesla P40 card (24 GB memory)? |
24 is the batch size for each GPU. Global batch size is 24 * 2 = 48. |
Oops, this is awesome. Your batch size is twice as large as that reported in BERT readme (after scaling according to memory size). How did this happen? By using fp16 precision? |
No fp16, but I'm planning to support it. |
Thanks for the clarification. A nice project and I'm playing with it. |
Feel free to open a new issue if you encounter any problems while you are playing. |
Now fp16 is available on branch |
Thanks for your quick development. Currently I don't have access to a Volta architecture GPU, so I guess the fp16 performance would be much slower than its fp32 counterpart? e.g.: for GTX 1080 Ti and Tesla P40 (both of them are of Pascal architecture), the fp16 performance is 1:64 (1/64 of fp32 FLOPS). |
I'm facing some strange issues with BERT-Large on my 11 GB TX 1080 Ti.
I guess the poor performance is not due to this repo but the original BERT repo. Have you encountered any strange issue with BERT-Large like this? |
I only did a simple test for FP16, and the training speed did not drop significantly (probably I made a mistake). I will do more detailed and comprehensive testing later.
It is weird. Did OOM occur when training LARGE model with this repo? I don't recommend trying |
Yes, even a batch size of 1 leads to OOM. I'd better play with BERT-Base. Thanks for your quick reply. |
What GPUs are you using? What's the maximum batch_size for 11/12/16/32 GB GPUs (e.g.: GTX 1080 Ti, Titan X, V100) and the corresponding performance? Thanks.
The text was updated successfully, but these errors were encountered: