Skip to content

v1.19.0-release

Choose a tag to compare

@Anerudhan Anerudhan released this 09 Mar 17:35
· 20 commits to main since this release
df73764

cuDNN Frontend v1.19.0 Release Notes

cuDNN Frontend v1.19.0 is the recommended version for cuDNN 9.19.1 and later releases.

Open-Source Kernels 🚀 🚀

  • Blackwell and Hopper SDPA Fprop Kernels: cuDNN's SDPA Fprop implementation is now open source. This kernel supports causal masking and outputs stats for use in bprop. Additional kernels will be added in future releases.
  • Grouped GEMM + dSwiGLU Fusion: A contiguous grouped block-scaled GEMM fused with a dSwiGLU backward epilogue on NVIDIA Blackwell GPUs (SM100+), designed for MoE (Mixture of Experts) workloads.

General Improvements 🚀

  • Removed multiple device queries for SM version during graph validation and replaced with a single query that can be skipped by setting sm_version on the cuDNN graph.
  • Fixed an issue where enabling logging with CUDA graphs in certain scenarios would cause a crash.
  • Significantly reduced the CPU overhead of the cuDNN OSS API by using tvm-ffi.
  • We are adding a new cudnn-repro tool to have a standalone reproducer from the cudnn frontend logs. See details

Enhancements ✨

Scaled Dot-Product Attention (SDPA)

  • Support Checks: Improved support checks for cleaner support surface queries.
  • New API: Added Python bindings for score-mod bprop function to enable the score bprop API.
  • Stats: Support independent generation of SDPA stats (LSE, SE, Max) in sdpa fprop (Requires 9.20.0 and up).

Normalization

  • More Benchmarks: New normalization benchmark results posted for GB200, GB300, and H200.

Benchmarking 📊

  • Updated the benchmark results for the SDPA improvements added in cuDNN 9.19.1