fp8 transformer engine only brings 35% speed up? #396

FeixLiu · 2023-07-03T03:07:17Z

Hi there,

I've used Megatron to train 13B gpt model on a H100 machine.
Before I use fp8 transformer engine, the speed of the training is about 0.34s/step.
After I enabled the fp8 transformer engine with these two arguments --fp8-hybrid, --transformer-impl "transformer_engine", the speed of the training is about 0.24s/step.
From this blog, the fp8 should have 100% spped up compared with bf16. But I only got 35% speed up on Megatron.
Does the 35% speed up reasonable or I've made some mistakes on using fp8 transformer engine?

Thanks a lot for the reply.

The text was updated successfully, but these errors were encountered:

lmcafee-nvidia · 2023-07-19T19:42:40Z

I assume you are referencing Figure 9 from the white paper linked from that blog? If so, that figure is simply stating that fp8 is computationally 2x the throughput of bf16, when isolating arithmetic operations. The actual end-to-end speedup will be less than this, since you must account for other overheads like communication, memory bandwidth, and the optimizer step. The speedup will also vary greatly depending on your model size and micro batch size.

FeixLiu · 2023-07-19T23:18:10Z

Got it, thanks for the reply!

exnx · 2024-07-09T05:32:36Z

should it possible to use fp8 with pipeline parallelism? My training gets hung up when I try to use both. I can use fp8 with model parallel ok though.

lmcafee-nvidia self-assigned this Jul 19, 2023

FeixLiu closed this as completed Jul 19, 2023

felipeliliti mentioned this issue May 13, 2024

How to set up fp8 training #817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8 transformer engine only brings 35% speed up? #396

fp8 transformer engine only brings 35% speed up? #396

FeixLiu commented Jul 3, 2023

lmcafee-nvidia commented Jul 19, 2023 •

edited

Loading

FeixLiu commented Jul 19, 2023

exnx commented Jul 9, 2024

fp8 transformer engine only brings 35% speed up? #396

fp8 transformer engine only brings 35% speed up? #396

Comments

FeixLiu commented Jul 3, 2023

lmcafee-nvidia commented Jul 19, 2023 • edited Loading

FeixLiu commented Jul 19, 2023

exnx commented Jul 9, 2024

lmcafee-nvidia commented Jul 19, 2023 •

edited

Loading