Skip to content

Add support for head_dim > 128#1797

Merged
cyanguwa merged 27 commits intoNVIDIA:mainfrom
cyanguwa:d_256
Jun 13, 2025
Merged

Add support for head_dim > 128#1797
cyanguwa merged 27 commits intoNVIDIA:mainfrom
cyanguwa:d_256

Conversation

@cyanguwa
Copy link
Copy Markdown
Collaborator

@cyanguwa cyanguwa commented May 18, 2025

Description

This PR adds support for head_dim > 128, fprop, all architectures from cuDNN 9.10 and cudnn-frontend 1.12, and head_dim_qk = 192, head_dim_v = 128, bprop, sm100 from cuDNN 9.11 and cudnn-frontend 1.12.1.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Add is_training to nvte_get_fused_attn_backend (breaking change)
  • Add head_dim > 128 support and unit tests
  • Improve head_dim selection logic

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci

@cyanguwa cyanguwa added the 2.5.0 label May 23, 2025
Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci

Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci

cyanguwa and others added 5 commits June 4, 2025 04:15
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

cyanguwa commented Jun 3, 2025

/te-ci L1

Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

cyanguwa commented Jun 3, 2025

/te-ci

cyanguwa and others added 4 commits June 3, 2025 16:27
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

cyanguwa commented Jun 5, 2025

/te-ci

sudhakarsingh27
sudhakarsingh27 previously approved these changes Jun 5, 2025
Copy link
Copy Markdown
Collaborator

@sudhakarsingh27 sudhakarsingh27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question but otherwise LGTM

Comment thread tests/pytorch/fused_attn/test_kv_cache.py Outdated
Signed-off-by: Charlene Yang <charleney@nvidia.com>
cyanguwa and others added 3 commits June 12, 2025 03:57
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci

@cyanguwa cyanguwa requested a review from sudhakarsingh27 June 12, 2025 20:19
cyanguwa and others added 5 commits June 12, 2025 14:25
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
Signed-off-by: Charlene Yang <charleney@nvidia.com>
This reverts commit 3e1b426.

Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci

@cyanguwa cyanguwa requested review from sudhakarsingh27 and removed request for sudhakarsingh27 June 12, 2025 22:16
cyanguwa and others added 2 commits June 12, 2025 17:32
Signed-off-by: Charlene Yang <charleney@nvidia.com>
@cyanguwa
Copy link
Copy Markdown
Collaborator Author

/te-ci

Comment thread tests/pytorch/fused_attn/test_kv_cache.py
@cyanguwa cyanguwa merged commit 71c76b6 into NVIDIA:main Jun 13, 2025
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants