Skip to content

Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs#650

Merged
ksivaman merged 8 commits intoNVIDIA:mainfrom
cyanguwa:upgrade-to-cudnn-fe-1.0.3
Feb 3, 2024
Merged

Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs#650
ksivaman merged 8 commits intoNVIDIA:mainfrom
cyanguwa:upgrade-to-cudnn-fe-1.0.3

Conversation

@cyanguwa
Copy link
Collaborator

This fixes the SDPA bwd NaN problem when running with cuDNN v9.0.0.306. The problem wasn't with cuDNN but with cuDNN Frontend. Its 1.0.3 fixes the issue.

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
@cyanguwa cyanguwa requested a review from ptrendx January 31, 2024 20:45
@cyanguwa cyanguwa changed the title Update cuDNN Frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs Update cudnn-frontend to 1.0.3 to fix cuDNN v9 SDPA NaNs Jan 31, 2024
@cyanguwa
Copy link
Collaborator Author

/te-ci

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
@cyanguwa
Copy link
Collaborator Author

cyanguwa commented Feb 2, 2024

/te-ci

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with stylistic comments.

cyanguwa and others added 4 commits February 2, 2024 20:30
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>
@cyanguwa
Copy link
Collaborator Author

cyanguwa commented Feb 3, 2024

Pipeline 12530327

@ksivaman ksivaman merged commit 2aee059 into NVIDIA:main Feb 3, 2024
ptrendx pushed a commit that referenced this pull request Feb 3, 2024
* Update cudnn frontend to 1.0.3 to fix cudnn v9 Nans

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* make d_out contiguous for bwd

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* remove cudnnDestroy to let torch handle it

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

* Update transformer_engine/pytorch/attention.py

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>

* Update transformer_engine/pytorch/attention.py

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>

* Update transformer_engine/pytorch/attention.py

Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>

---------

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: cyanguwa <8636796+cyanguwa@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
@cyanguwa cyanguwa deleted the upgrade-to-cudnn-fe-1.0.3 branch February 22, 2024 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants