v1.17.0-release
cuDNN Frontend v1.17.0 Release Notes
cuDNN Frontend v1.17.0 is the recommended version for cuDNN 9.17.0 and later releases.
New Features 🚀
Open-Source Kernels
-
Native Sparse Attention : The Native Sparse Attention (NSA) module implements Native Sparse attention as described in the Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. Samples of usage for Blackwell architecture in test/python/fe_api/nsa
-
Gemm/Swiglu : Gemm_Swiglu now supports block-scaled FP8/FP4 datatypes.
API changes:- Output tensors have been renamed from "C" and "Glu" to "AB12" and "C", respectively.
- "use_2cta_intrs" Option has been removed. This will be inferred automatically from tile shape.
Enhancements ✨
Scaled Dot-Product Attention (SDPA)
- More samples: Open sourcing our sdpa test harness and fp8 samples in test/python/test_sdpa_fp8.py
Additional Improvements
- Tensor properties: Added vector Dim and vectorization count to the tensor properties.
- Graph wrapper: Fixed an issue in the native graph wrapper that caused
BufferErrorin non-pytorch tensors.
Benchmarking 📊
- Updated the benchmark results for the sdpa improvements added in cuDNN 9.17.0. GB200 and GB300 data.
Samples
- ** cudnn Llama model **: Added reference implementation of the Llama model completely in cuDNN.