Raise autocast usage error #93

ksivaman · 2023-03-11T00:50:56Z

Without this change, the following use case errors out with a not very helpful message,

import torch
import transformer_engine.pytorch as te

model = te.Linear(512, 512)
inp = torch.rand((128, 512), device="cuda")
epochs = 5

def train():
    # Incorrect usage: model is being run with fp8 multiple times
    # under same autocast region with amax reduction turned on.
    with te.fp8_autocast(enabled=True):
        for _ in range(epochs):
            activation = model(inp)

train()
train() # Error!

This PR fixes this case and catches the error the first time train() is called in the above script. The "correct" usage is below:

import torch
import transformer_engine.pytorch as te

model = te.Linear(512, 512)
inp = torch.rand((128, 512), device="cuda")
epochs = 5

def train():
    for _ in range(epochs):
        with te.fp8_autocast(enabled=True):
            activation = model(inp)

train()
train()

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman · 2023-03-12T18:33:40Z

/te-ci

timmoon10

LGTM

For my own understanding, we expect this assert error to trigger if we perform multiple forward or backward passes within an FP8, ~~or if we perform a partial forward or backward pass within an FP8 context (e.g. if we've frozen most of the model to finetune a specific section)~~ (edit: partial forward or backward passes should run fine). It should run fine if the number of forward and backward passes don't match.

ksivaman · 2023-03-13T17:30:54Z

We don't want to run any portion of the model twice under the same autocast call when using amax reduction with FP8 training

* catch incorrect usage of fp8_autocast Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * catch error on first time double execution Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

* catch incorrect usage of fp8_autocast Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> * catch error on first time double execution Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Charlene Yang <charleney@nvidia.com>

erlebach · 2023-04-28T23:32:06Z

The following code give the error:
AssertionError: Same module is being invoked more than once inside an `fp8_autocast` region when using FP8 with amax reduction. This behavior is currently unsupported. For more details and correct usage, please see https://github.com/NVIDIA/TransformerEngine/pull/93.

Here is the Python code:

import transformer_engine.pytorch as te
from transformer_engine.common.recipe import Format, DelayedScaling

fp8_format = Format.HYBRID  # E4M3 during forward pass, E5M2 during backward pass
fp8_recipe = DelayedScaling(fp8_format=fp8_format, amax_history_len=16, amax_compute_algo="max")
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device("cuda:0")
torch.manual_seed(12345)
my_linear = te.Linear(768, 768, bias=True).to(device)

inp = torch.rand((1024, 768)).to(device)
#inp = torch.rand((1024, 768)).cuda()

with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
    out_fp8 = my_linear(inp)

loss_fp8 = out_fp8.mean()
loss_fp8.backward()  # This backward pass uses FP8, since out_fp8 was calculated inside fp8_autocast

out_fp32 = my_linear(inp)

The code runs fine without the last line. Since the last line runs outside the fp8_autocast why would this error occur?

Thanks for any insight. I am running with CUDA 11.8 for this library and CUDA 12.0 on the H100.

ksivaman · 2023-05-02T21:44:46Z

@erlebach This was a bug that was now been fixed in main (#187).

ksivaman added 2 commits March 10, 2023 15:49

catch incorrect usage of fp8_autocast

58c25b9

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

catch error on first time double execution

dfbc2c9

Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

ksivaman requested a review from ptrendx March 11, 2023 00:50

ksivaman added 2 commits March 10, 2023 16:51

Merge branch 'main' into raise_autocast_usage_error

0f3c14a

Merge branch 'main' into raise_autocast_usage_error

8e7abab

ksivaman requested a review from timmoon10 March 13, 2023 16:48

timmoon10 approved these changes Mar 13, 2023

View reviewed changes

ksivaman merged commit 6605597 into NVIDIA:main Mar 13, 2023

ksivaman mentioned this pull request Apr 30, 2023

Handle nested fp8 autocasts #187

Merged

abhi-mosaic mentioned this pull request Jun 2, 2023

adding te Linear for fp8 support mosaicml/llm-foundry#271

Closed

ksivaman deleted the raise_autocast_usage_error branch July 19, 2023 01:40

PiotrDabkowski mentioned this pull request Sep 3, 2023

FP8 & Activation checkpointing do not play well together #415

Open

Wong4j mentioned this pull request Nov 12, 2023

[NVIDIA] TE Integration PaddlePaddle/PaddleNLP#7229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise autocast usage error #93

Raise autocast usage error #93

ksivaman commented Mar 11, 2023

ksivaman commented Mar 12, 2023

timmoon10 left a comment •

edited

Loading

ksivaman commented Mar 13, 2023

erlebach commented Apr 28, 2023

ksivaman commented May 2, 2023

Raise autocast usage error #93

Raise autocast usage error #93

Conversation

ksivaman commented Mar 11, 2023

ksivaman commented Mar 12, 2023

timmoon10 left a comment • edited Loading

Choose a reason for hiding this comment

ksivaman commented Mar 13, 2023

erlebach commented Apr 28, 2023

ksivaman commented May 2, 2023

timmoon10 left a comment •

edited

Loading