FP8 mixed precision via nvidia's Transformer Engine #17172

carmocca · 2023-03-22T22:01:42Z

Description & Motivation

Support https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Pitch

Write a precision plugin using the library above that is enabled via:

precision="transformer-engine"

Alternatives

Don't implement this until it's vendored by PyTorch, if that ever happens.

Additional context

No response

cc @Borda @carmocca @justusschock @awaelchli

The text was updated successfully, but these errors were encountered:

awaelchli · 2023-05-10T01:02:36Z

The library only requires enabling an autocast context manager

There is one more thing. The user needs to replace their layers with the custom ones from the library. What's the plan here? Will the plugin implement the module_init_context() manager? On the other hand, one might not want to replace all layers. If this is left to the user, then there is a lot less value in adding the plugin.

carmocca · 2023-05-10T02:40:23Z

Yes, we'll need to implement a replacement mechanism. The plugin can have a flag to disable it if necessary

This also means that we'll have it in Fabric first, as these APIs do not exist in the trainer yet.

carmocca · 2023-05-10T03:22:00Z

Actually convert_module might be a better fit than init_context if we prefer replacing existing layers than patching the torch.nn classes.

nanand2 · 2023-06-19T21:02:27Z

Any update on support for this?

carmocca · 2023-06-19T22:40:41Z

@nanand2 Our access to H100s is very limited so we haven't merged this yet. However, the branch https://github.com/Lightning-AI/lightning/tree/carmocca/transformer-engine should be usable if you want to play with it right now

nanand2 · 2023-06-19T22:46:35Z

Great, thanks!

carmocca added feature Is an improvement or enhancement plugin performance labels Mar 22, 2023

carmocca added this to the future milestone Mar 22, 2023

carmocca added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Mar 22, 2023

This was referenced May 9, 2023

H100 Transformer Engine implementation Lightning-AI/lit-llama#249

Open

Support NVIDIA's Transformer Engine #17597

Merged

carmocca closed this as completed in #17597 Jul 19, 2023

carmocca modified the milestones: future, 2.1 Jul 19, 2023

carmocca mentioned this issue Sep 2, 2023

Support the TransformerEngine precision plugin with the Trainer #18459

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 mixed precision via nvidia's Transformer Engine #17172

FP8 mixed precision via nvidia's Transformer Engine #17172

carmocca commented Mar 22, 2023 •

edited

awaelchli commented May 10, 2023 •

edited

carmocca commented May 10, 2023 •

edited

carmocca commented May 10, 2023

nanand2 commented Jun 19, 2023

carmocca commented Jun 19, 2023

nanand2 commented Jun 19, 2023

FP8 mixed precision via nvidia's Transformer Engine #17172

FP8 mixed precision via nvidia's Transformer Engine #17172

Comments

carmocca commented Mar 22, 2023 • edited

Description & Motivation

Pitch

Alternatives

Additional context

awaelchli commented May 10, 2023 • edited

carmocca commented May 10, 2023 • edited

carmocca commented May 10, 2023

nanand2 commented Jun 19, 2023

carmocca commented Jun 19, 2023

nanand2 commented Jun 19, 2023

carmocca commented Mar 22, 2023 •

edited

awaelchli commented May 10, 2023 •

edited

carmocca commented May 10, 2023 •

edited