Skip to content

Navigation Menu

Appearance settings

View all features
- BY COMPANY SIZE
  Enterprises
  Small and medium teams
  Startups
  Nonprofits
- BY USE CASE
  App Modernization
  DevSecOps
  DevOps
  CI/CD
  View all use cases
- BY INDUSTRY
  Healthcare
  Financial services
  Manufacturing
  Government
  View all industries
View all solutions
- EXPLORE BY TOPIC
  AI
  Software Development
  DevOps
  Security
  View all topics
- EXPLORE BY TYPE
  Customer stories
  Events & webinars
  Ebooks & reports
  Business insights
  GitHub Skills
- SUPPORT & SERVICES
  Documentation
  Customer support
  Community forum
  Trust center
  Partners
View all resources
- COMMUNITY
  GitHub SponsorsFund open source developers
- PROGRAMS
  Security Lab
  Maintainer Community
  Accelerator
  GitHub Stars
  Archive Program
- REPOSITORIES
  Topics
  Trending
  Collections
- ENTERPRISE SOLUTIONS
  Enterprise platformAI-powered developer platform
- AVAILABLE ADD-ONS
  GitHub Advanced SecurityEnterprise-grade security features
  Copilot for BusinessEnterprise-grade AI features
  Premium SupportEnterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 702
Star 3.3k

Code
Issues 230
Pull requests 124
Discussions
Actions
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security and quality
Insights

Add support for head_dim > 128#1797

Merged

cyanguwa merged 27 commits intoNVIDIA:mainNVIDIA/TransformerEngine:mainfrom

cyanguwa:d_256cyanguwa/TransformerEngine:d_256Copy head branch name to clipboard

Jun 13, 2025

Conversation Commits27 (27)Checks Files changed

Merged

Add support for head_dim > 128#1797
cyanguwa merged 27 commits intoNVIDIA:mainfrom
cyanguwa:d_256

Conversation

Copy link

Copy Markdown

Collaborator

cyanguwa commented May 18, 2025 •

edited

Loading

Description

This PR adds support for head_dim > 128, fprop, all architectures from cuDNN 9.10 and cudnn-frontend 1.12, and head_dim_qk = 192, head_dim_v = 128, bprop, sm100 from cuDNN 9.11 and cudnn-frontend 1.12.1.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Add is_training to nvte_get_fused_attn_backend (breaking change)
Add head_dim > 128 support and unit tests
Improve head_dim selection logic

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions


          add support for head dim > 128

494944e

Signed-off-by: Charlene Yang <charleney@nvidia.com>

cyanguwa force-pushed the d_256 branch from 4d9b33a to 494944e Compare

May 18, 2025 20:44


          remove debugging

750dd91

Signed-off-by: Charlene Yang <charleney@nvidia.com>

cyanguwa force-pushed the d_256 branch from 960b4d0 to 750dd91 Compare

May 18, 2025 20:50


          [pre-commit.ci] auto fixes from pre-commit.com hooks

517030d

for more information, see https://pre-commit.ci

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented May 18, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

cyanguwa added the 2.5.0 label


          raise tols slightly to tolerate 1/2048 mismatches

049b19f

Signed-off-by: Charlene Yang <charleney@nvidia.com>

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented May 24, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.


          Merge branch 'main' into d_256

d363bc3

cyanguwa requested review from KshitijLakhani and sudhakarsingh27

May 24, 2025 00:24


          fix is_training for test_te_layer

66e912b

Signed-off-by: Charlene Yang <charleney@nvidia.com>

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented May 25, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

cyanguwa and others added 5 commits

June 4, 2025 04:15


          Merge branch 'main' into d_256

a070c8c

Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>


          add bprop support for blackwell

e323399

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          [pre-commit.ci] auto fixes from pre-commit.com hooks

2dbdafa

for more information, see https://pre-commit.ci


          minor tweak for format

39e54eb

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          fix backend selection results

0ce126c

Signed-off-by: Charlene Yang <charleney@nvidia.com>

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented Jun 3, 2025

/te-ci L1

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.


          bump sm100 to sm100+

0fdbb2c

Signed-off-by: Charlene Yang <charleney@nvidia.com>

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented Jun 3, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

cyanguwa and others added 4 commits

June 3, 2025 16:27


          add sq=1 test for MLA

3596a36

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          enable sq=1 for bprop

78e2e93

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          minor tweak in comments

bc75c84

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          Merge branch 'NVIDIA:main' into d_256

bb0e83a

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented Jun 5, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

sudhakarsingh27 previously approved these changes

View reviewed changes

Copy link

Copy Markdown

Collaborator

sudhakarsingh27 left a comment

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question but otherwise LGTM

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Comment thread

tests/pytorch/fused_attn/test_kv_cache.py Outdated

Uh oh!

There was an error while loading. Please reload this page.


          fix head_dim logic and remove pytest skip

39248e8

Signed-off-by: Charlene Yang <charleney@nvidia.com>

cyanguwa dismissed sudhakarsingh27’s stale review via

39248e8

June 10, 2025 21:16

cyanguwa and others added 3 commits

June 12, 2025 03:57


          Merge branch 'main' into d_256

6f6ce66


          add FE fix for d>128

408e110

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          Merge branch 'main' into d_256

3d2b1fb

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented Jun 12, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

cyanguwa requested a review from sudhakarsingh27

June 12, 2025 20:19

cyanguwa and others added 5 commits

June 12, 2025 14:25


          update FE again to take in small fixes

bbe9ef9

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          add cuDNN version info in L0 tests

3e1b426

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          increase tols for Unfused + large dim

0e36eeb

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          Merge branch 'main' into d_256

70647e3


          Revert "add cuDNN version info in L0 tests"

e795580

This reverts commit 3e1b426.

Signed-off-by: Charlene Yang <charleney@nvidia.com>

cyanguwa force-pushed the d_256 branch from b025412 to e795580 Compare

June 12, 2025 22:16

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented Jun 12, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

cyanguwa requested review from sudhakarsingh27 and removed request for sudhakarsingh27

June 12, 2025 22:16

cyanguwa and others added 2 commits

June 12, 2025 17:32


          fix tols for Unfused

d8bba8e

Signed-off-by: Charlene Yang <charleney@nvidia.com>


          Merge branch 'main' into d_256

57ff6a6

Copy link

Copy Markdown

Collaborator Author

cyanguwa commented Jun 13, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

sudhakarsingh27 approved these changes

View reviewed changes

Comment thread

tests/pytorch/fused_attn/test_kv_cache.py

Uh oh!

There was an error while loading. Please reload this page.

cyanguwa merged commit 71c76b6 into NVIDIA:main

38 checks passed

Uh oh!

There was an error while loading. Please reload this page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

sudhakarsingh27 sudhakarsingh27 approved these changes

KshitijLakhani Awaiting requested review from KshitijLakhani

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Uh oh!

There was an error while loading. Please reload this page.

2 participants

Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Footer

© 2026 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Community
Docs
Contact

You can’t perform that action at this time.