-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed
Description
I'm doing pre-training of BERT and trying to find ways to speed up.
I just want to share and get checked the training speed on RTX 2080 ti of BERT.
Hope to know any way to improve speed though speed looks reasonable.
spec :
RTX 2080 ti(11GB memory)
tensorflow 1.14
sequence length = 512
n_gpu(w/ horovod) | fp32(batch 4) | amp+xla(batch 6) |
---|---|---|
1 | 13 | 32 |
2 | 18 | 43 |
4 | 59 | 77 |
the numbers in the table mean examples/sec
I wonder is there anything I can do more to speed up.
Unfortunately, due to large sequence length, I cannot increase batch size up to 8, which may enable to use TensorCore.
Also wanna compare with others.
Metadata
Metadata
Assignees
Labels
No labels