Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fp8 transformer engine only brings 35% speed up? #396

Closed
FeixLiu opened this issue Jul 3, 2023 · 3 comments
Closed

fp8 transformer engine only brings 35% speed up? #396

FeixLiu opened this issue Jul 3, 2023 · 3 comments
Assignees

Comments

@FeixLiu
Copy link

FeixLiu commented Jul 3, 2023

Hi there,

I've used Megatron to train 13B gpt model on a H100 machine.
Before I use fp8 transformer engine, the speed of the training is about 0.34s/step.
After I enabled the fp8 transformer engine with these two arguments --fp8-hybrid, --transformer-impl "transformer_engine", the speed of the training is about 0.24s/step.
From this blog, the fp8 should have 100% spped up compared with bf16. But I only got 35% speed up on Megatron.
Does the 35% speed up reasonable or I've made some mistakes on using fp8 transformer engine?

Thanks a lot for the reply.

@lmcafee-nvidia lmcafee-nvidia self-assigned this Jul 19, 2023
@lmcafee-nvidia
Copy link
Contributor

lmcafee-nvidia commented Jul 19, 2023

I assume you are referencing Figure 9 from the white paper linked from that blog? If so, that figure is simply stating that fp8 is computationally 2x the throughput of bf16, when isolating arithmetic operations. The actual end-to-end speedup will be less than this, since you must account for other overheads like communication, memory bandwidth, and the optimizer step. The speedup will also vary greatly depending on your model size and micro batch size.

@FeixLiu
Copy link
Author

FeixLiu commented Jul 19, 2023

Got it, thanks for the reply!

@exnx
Copy link

exnx commented Jul 9, 2024

should it possible to use fp8 with pipeline parallelism? My training gets hung up when I try to use both. I can use fp8 with model parallel ok though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants