v1.11.0 release #136

Anerudhan · 2025-03-14T07:15:00Z

cudnn frontend v1.11 release notes

cudnn frontend v1.11 is the preferred cudnn frontend version for cudnn version 9.8.0 and above. With cuDNN frontend v1.11, the minimum supported cudnn version is 9.0.0.

New API

cudnn frontend v1.11 releases flexible score modifier to the python SDPA API. Samples showcasing soft cap of the attention scores, arrow mask are available in the cudnn_frontend/test/python/test_flexible_sdpa.py file. A sample usage of score modifier is shown below:

        score_mod=partial(
            custom_mask,
            mod_tensor=mod_tensor,
            neg_inf=neg_inf_tensor,
            seq_len_q=seq_len_q,
            seq_len_kv=seq_len_kv,
        )

The Concatenate operation merges two or more tensors into one, along the specified axis. The user may also specify an in-place merge.

std::shared_ptr<Tensor_attributes>
concatenate(std::vector<std::shared_ptr<Tensor_attributes>>, Concatenate_attributes);

pip wheels compatible with windows x86_64 architecture are now available on pypi.
sdpa paged attention API now supports Q tensor to be ragged when used with cudnn version 9.7.0 and above.

Improvements

Users can now pass the CMake flag -DCMAKE_CXX_FLAGS="-DNV_CUDNN_FRONTEND_DISABLE_LOGGING" to disable logging in the cuDNN frontend.
Adds a new sample to showcase native cudagraph creation from cudnn for sdpa bprop operation. Fixed a bug when using the update_cuda_graph API to update cuda graph for sdpa bprop operation.
Updates the create_container_and_page_table example function to use the layout that's desired for the more performant kernel."

Bug Fixes

Fixes memory leak in the test harness for some legacy tests that use ragged tensors.
Fixes a bug introduced in the benchmarking script that prevented the sdpa cudnn operation from being executed. This was because the use_padding_mask attribute was made mandatory for the sdpa operation. This has been fixed as well.
Updates the paged attention sample to not cause illegal memory access when changing the dimensions of the tensors in the sample.
Updates the DgradDReluBNBwdWeight sample to perform the right operation for the dgrad + drelu fusion.

## cudnn frontend v1.11 release notes cudnn frontend v1.11 is the preferred cudnn frontend version for cudnn version 9.8.0 and above. With cuDNN frontend v1.11, the minimum supported cudnn version is 9.0.0. ## New API - cudnn frontend v1.11 release flexible score modifier to the python SDPA API. Samples showcasing soft cap of the attention scores, arrow mask are available in the [cudnn_frontend/test/python/test_flexible_sdpa.py](https://github.com/NVIDIA/cuDNN-frontend/blob/main/cudnn_frontend/test/python/test_flexible_sdpa.py) file. A sample usage of score modifier is shown below: ``` score_mod=partial( custom_mask, mod_tensor=mod_tensor, neg_inf=neg_inf_tensor, seq_len_q=seq_len_q, seq_len_kv=seq_len_kv, ) ``` - The Concatenate operation merges two or more tensors into one, along the specified axis. The user may also specify an in-place merge. ``` std::shared_ptr<Tensor_attributes> concatenate(std::vector<std::shared_ptr<Tensor_attributes>>, Concatenate_attributes); ``` - pip wheels compatible with windows x86_64 architecture are now available on [pypi](https://pypi.org/project/nvidia-cudnn-frontend/). - sdpa paged attention API now supports Q tensor to be ragged when used with cudnn version 9.7.0 and above. ## Improvements - Users can now pass the CMake flag `-DCMAKE_CXX_FLAGS="-DNV_CUDNN_FRONTEND_DISABLE_LOGGING"` to disable logging in the cuDNN frontend. - Added a new sample to showcase native cudagraph creation from cudnn for sdpa bprop operation. Fixed a bug when using the update_cuda_graph API to update cuda graph for sdpa bprop operation. ## Bug Fixes - Fixed memory leak in the test harness for some legacy tests that use ragged tensors. - Fixed a bug introduced in the benchmarking script that prevented the sdpa cudnn operation from being executed. This was because the `use_padding_mask` attribute was made mandatory for the sdpa operation. This has been fixed as well. - Updated the paged attention sample to not cause illegal memory access when changing the dimensions of the tensors in the sample. - Updated the DgradDReluBNBwdWeight sample to perform the right operation for the dgrad + drelu fusion.

Skylion007

Whoops forgot to actually finish off my review

Skylion007 · 2025-03-15T16:35:27Z

include/cudnn_frontend/node/concatenate.h

+INode::concatenate(std::vector<std::shared_ptr<Tensor_attributes>> x,
+                   Concatenate_attributes attributes,
+                   std::shared_ptr<Tensor_attributes> y) {
+    for (auto& element : x) {


Nit you could add a reserve here to pre allocate space for the concatenation

Skylion007 · 2025-03-15T16:37:35Z

python/pygraph/sdpa.cpp

-    auto [O, Stats] = graph.sdpa(q, k, v, attributes);
+    if (fn.has_value()) {
+        attributes.set_score_mod(wrapper_function);
+        callback_fn = fn;


Can fn be moved into callback_fn here?

Skylion007 · 2025-03-15T16:37:35Z

python/pygraph/sdpa.cpp

-    auto [O, Stats] = graph.sdpa(q, k, v, attributes);
+    if (fn.has_value()) {
+        attributes.set_score_mod(wrapper_function);
+        callback_fn = fn;


Can fn be moved into callback_fn here?

Anerudhan force-pushed the 1.11.0-rc branch 2 times, most recently from d97adcd to 3ef994b Compare March 17, 2025 19:12

Anerudhan force-pushed the 1.11.0-rc branch from 3ef994b to 6ed19fd Compare March 18, 2025 17:58

Anerudhan merged commit 8801fd7 into main Mar 20, 2025

Skylion007 reviewed Mar 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.11.0 release #136

v1.11.0 release #136

Uh oh!

Anerudhan commented Mar 14, 2025 •

edited

Loading

Uh oh!

Skylion007 left a comment

Uh oh!

Skylion007 Mar 15, 2025

Uh oh!

Skylion007 Mar 15, 2025

Uh oh!

Skylion007 Mar 15, 2025

Uh oh!

Uh oh!

v1.11.0 release #136

v1.11.0 release #136

Uh oh!

Conversation

Anerudhan commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cudnn frontend v1.11 release notes

New API

Improvements

Bug Fixes

Uh oh!

Skylion007 left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Anerudhan commented Mar 14, 2025 •

edited

Loading