Skip to content

v1.16.0-release

Choose a tag to compare

@Anerudhan Anerudhan released this 07 Nov 04:50
· 29 commits to main since this release
be6c079

cuDNN Frontend v1.16.0 Release Notes

cuDNN Frontend v1.16.0 is the recommended version for cuDNN 9.15.0 and later releases.

New Features 🚀

Open-Source Kernels

This release introduces open-source implementations of commonly requested fused kernels for select architectures (Blackwell). These experimental kernels may require additional dependencies such as CuteDSL. The initial release includes:

Additional dependencies can be installed optionally using pip install nvidia-cudnn-frontend[cutedsl]. Usage examples and detailed documentation are available in the test/python/fe_api directory.

Please submit issue reports for additional kernel requests or bug reports.

Enhancements ✨

Scaled Dot-Product Attention (SDPA)

  • Block Mask Support: Starting with cuDNN 9.14.0, SDPA attributes now support block masks to exclude tiles that do not require computation. Refer to the sample implementation for usage details.

  • Bug Fix: Resolved an invalid memory access (IMA) issue in SDPA backward propagation (fixed in cuDNN backend version 9.15.1 and later) that occurred when s_kv is not a multiple of 128, padding mask is disabled, and operations are performed in CUDA graph replay mode.

Matrix Multiplication

  • CUDA Graph Compatibility: Added BehaviorNote_t::CUDNN_BEHAVIOR_NOTE_CUBLASLT_DEPENDENCY as a behavior note. This enables filtering of engine configurations (execution plans) that use cuBLAS as a backend, available starting with cuDNN version 9.15.0.

Additional Improvements

  • Block Scale Quantization: Added Python bindings for block scale quantize operations (#173). Refer to the sample implementation for usage details.

  • Dependency Optimization: PyTorch is no longer a required dependency for cuDNN Frontend (#177).

  • Tensor Alignment: Enhanced tensor descriptor API to accept alignment as an attribute (#153).

  • Plan Generation Control: Updated cudnnGetPlan API to accept an optional maximum plan count parameter, enabling users to limit the number of plans built and autotuned.

Benchmarking 📊

Resolved Issues 🔧

  • #153 - Tensor descriptor alignment support
  • #173 - Block scale quantize Python bindings
  • #177 - PyTorch dependency removal