[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1 by cyanguwa · Pull Request #2917 · NVIDIA/TransformerEngine

cyanguwa · 2026-04-22T19:03:36Z

Description

This PR fixes an issue with CP A2A when NVTE_FP8_DPA_BWD=1 (default) but fp8_dpa=False. The bug comes from some refactoring work in #2719 for CP. The unit tests didn't catch this because in every test NVTE_FP8_DPA_BWD is explicitly set to appropriate values (0 or 1) whereas in real life users may not do this. We will not change that test logic but will consider adding more tests regarding the recipe control in the future.

The necessary change for the bug fix is only one line in A2A, but this PR also cleaned up a few other places in context_parallel.py.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

See Description.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

cyanguwa · 2026-04-22T19:15:01Z

/te-ci torch L1

greptile-apps · 2026-04-22T19:16:44Z

Greptile Summary

This PR fixes a bug in CP A2A (AttnFuncWithCPAndQKVOA2A) where is_bwd_fp8 was set to the raw NVTE_FP8_DPA_BWD env-var integer (default 1) without gating on the fp8 flag. When fp8=False but the env var defaulted to 1, bwd_requires_o_fp8 would be True and bwd_requires_o_f16 would be False — causing out_part to be quantized to FP8 and saved as an F16 tensor in the backward context, corrupting gradient computation. The fix (is_bwd_fp8 = fp8 and _use_fp8_dpa_bwd) correctly gates the env var on the fp8 flag in all three CP variants (P2P, AllGather, A2A), and the follow-on cleanups to ctx.fp8 and out_fp8 conditions are consistent and correct.

Confidence Score: 5/5

This PR is safe to merge; the fix is targeted, correct, and the cleanup changes are logically consistent.

The root cause is well-diagnosed: raw env-var int (1) used as a boolean without the fp8 gate caused FP8 quantization of the output tensor in a non-FP8 path, leading to corrupt backward tensors. The one-line fix and consistent cleanup across all three CP classes are correct. No new edge cases are introduced.

No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py	Bug fix: is_bwd_fp8 now correctly includes the fp8 flag across all three CP classes (P2P, AllGather, A2A); also cleans up ctx.fp8 and out_fp8 condition redundancies introduced in #2719.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CP Forward (A2A/P2P/AllGather)"] --> B{"fp8?"}
    B -- No --> C["is_bwd_fp8 = False\n(BEFORE: 1 from env var)"]
    B -- Yes --> D{"NVTE_FP8_DPA_BWD?"}
    D -- "0" --> E["is_bwd_fp8 = False"]
    D -- "1 (default)" --> F["is_bwd_fp8 = True"]
    C --> G["bwd_requires_o_fp8 = False\nbwd_requires_o_f16 = True\nout_part = F16 tensor ✅"]
    E --> G
    F --> H["bwd_requires_o_fp8 = True\nout_part = FP8 tensor ✅"]
    G --> I["ctx.fp8 = False\nf16_tensors saved ✅"]
    H --> J["ctx.fp8 = True\nfp8_tensors saved ✅"]

_{Reviews (1): Last reviewed commit: "fix fp8 and is_bwd_fp8 relationship" | Re-trigger Greptile}

fix fp8 and is_bwd_fp8 relationship Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

fix fp8 and is_bwd_fp8 relationship

3affede

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

cyanguwa force-pushed the fix_2719 branch from 59fda71 to 3affede Compare April 22, 2026 19:08

cyanguwa added the 2.15.0 label Apr 22, 2026

cyanguwa marked this pull request as ready for review April 22, 2026 19:13

cyanguwa requested a review from ksivaman April 22, 2026 19:13

cyanguwa requested a review from ptrendx April 22, 2026 22:43

ksivaman approved these changes Apr 22, 2026

View reviewed changes

cyanguwa merged commit 424b031 into NVIDIA:main Apr 23, 2026
46 of 53 checks passed

YigongQin pushed a commit to YigongQin/TransformerEngine that referenced this pull request Apr 23, 2026

[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1 (NVIDIA#2917)

da92845

fix fp8 and is_bwd_fp8 relationship Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

ptrendx added this to the 2.15 milestone Apr 23, 2026

KshitijLakhani pushed a commit that referenced this pull request Apr 27, 2026

[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1 (#2917)

45fb909

fix fp8 and is_bwd_fp8 relationship Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

faradawn pushed a commit to faradawn/TransformerEngine that referenced this pull request May 14, 2026

[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1 (NVIDIA#2917)

b265980

fix fp8 and is_bwd_fp8 relationship Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

hungryGeek16 mentioned this pull request May 31, 2026

fix unfused padding causal sdpa #3063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1#2917

[PyTorch] Fix CP A2A F16 when NVTE_FP8_DPA_BWD=1#2917
cyanguwa merged 1 commit into
NVIDIA:mainfrom
cyanguwa:fix_2719

cyanguwa commented Apr 22, 2026 •

edited

Loading

Uh oh!

cyanguwa commented Apr 22, 2026

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cyanguwa commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

cyanguwa commented Apr 22, 2026

Uh oh!

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cyanguwa commented Apr 22, 2026 •

edited

Loading