v1.17.0-release

Anerudhan released this 20 Dec 00:10

· 24 commits to main since this release

b372d39

cuDNN Frontend v1.17.0 Release Notes

cuDNN Frontend v1.17.0 is the recommended version for cuDNN 9.17.0 and later releases.

New Features 🚀

Open-Source Kernels

Native Sparse Attention : The Native Sparse Attention (NSA) module implements Native Sparse attention as described in the Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. Samples of usage for Blackwell architecture in test/python/fe_api/nsa
Gemm/Swiglu : Gemm_Swiglu now supports block-scaled FP8/FP4 datatypes.
API changes:
- Output tensors have been renamed from "C" and "Glu" to "AB12" and "C", respectively.
- "use_2cta_intrs" Option has been removed. This will be inferred automatically from tile shape.

Enhancements ✨

Scaled Dot-Product Attention (SDPA)

More samples: Open sourcing our sdpa test harness and fp8 samples in test/python/test_sdpa_fp8.py

Additional Improvements

Tensor properties: Added vector Dim and vectorization count to the tensor properties.
Graph wrapper: Fixed an issue in the native graph wrapper that caused BufferError in non-pytorch tensors.

Benchmarking 📊

Updated the benchmark results for the sdpa improvements added in cuDNN 9.17.0. GB200 and GB300 data.

Samples

** cudnn Llama model **: Added reference implementation of the Llama model completely in cuDNN.

Assets 2